msmbuilder.decomposition.
tICA
(n_components=None, lag_time=1, shrinkage=None, kinetic_mapping=False)¶Time-structure Independent Component Analysis (tICA)
Linear dimensionality reduction using an eigendecomposition of the time-lag correlation matrix and covariance matrix of the data and keeping only the vectors which decorrelate slowest to project the data into a lower dimensional space.
- n_components
: int, None- Number of components to keep.
- lag_time
: int- Delay time forward or backward in the input data. The time-lagged correlations is computed between datas X[t] and X[t+lag_time].
- shrinkage
: float, default=None- The covariance shrinkage intensity (range 0-1). If shrinkage is not specified (the default) it is estimated using an analytic formula (the Rao-Blackwellized Ledoit-Wolf estimator) introduced in [5].
- kinetic_mapping
: bool, default=False
- If True, weigh the projections by the tICA eigenvalues, yielding
- kinetic distances as described in [6].
- components_
: array-like, shape (n_components, n_features)- Components with maximum autocorrelation.
- offset_correlation_
: array-like, shape (n_features, n_features)- Symmetric time-lagged correlation matrix, \(C=E[(x_t)^T x_{t+lag}]\).
- eigenvalues_
: array-like, shape (n_features,)- Eigenvalues of the tICA generalized eigenproblem, in decreasing order.
- eigenvectors_
: array-like, shape (n_components, n_features)- Eigenvectors of the tICA generalized eigenproblem. The vectors give a set of “directions” through configuration space along which the system relaxes towards equilibrium. Each eigenvector is associated with characteritic timescale :math:`-
the corresponding eigenvector. See [2] for more information.
Total number of sequences fit by the model. Note that the model is “reset” by calling fit() with new sequences, whereas partial_fit() updates the fit with new data, and is suitable for
online learning.
This method was introduced originally in [4], and has been applied to the analysis of molecular dynamics data in [1]_, [2], and [3]. In [1]_ and [2], tICA was used as a dimensionality reduction technique before fitting other kinetic models.
[1] | Schwantes, Christian R., and Vijay S. Pande. J. Chem Theory Comput. 9.4 (2013): 2000-2009. |
[2] | (1, 2) Perez-Hernandez, Guillermo, et al. J Chem. Phys (2013): 015102. |
[3] | Naritomi, Yusuke, and Sotaro Fuchigami. J. Chem. Phys. 134.6 (2011): 065101. |
[4] | Molgedey, Lutz, and Heinz Georg Schuster. Phys. Rev. Lett. 72.23 (1994): 3634. |
[5] | Chen, Yilun, Ami Wiesel, and Alfred O. Hero III. ICASSP (2009) |
[6] | Noe, F. and Clementi, C. arXiv arXiv:1506.06259 [physics.comp-ph] (2015) |
__init__
(n_components=None, lag_time=1, shrinkage=None, kinetic_mapping=False)¶Methods
__init__ ([n_components, lag_time, ...]) |
|
fit (sequences[, y]) |
Fit the model with a collection of sequences. |
fit_transform (sequences[, y]) |
Fit the model with X and apply the dimensionality reduction on X. |
get_params ([deep]) |
Get parameters for this estimator. |
partial_fit (X) |
Fit the model with X. |
partial_transform (features) |
Apply the dimensionality reduction on X. |
score (sequences[, y]) |
Score the model on new data using the generalized matrix Rayleigh quotient |
set_params (\*\*params) |
Set the parameters of this estimator. |
summarize () |
Some summary information. |
transform (sequences) |
Apply the dimensionality reduction on X. |
Attributes
components_ |
|
covariance_ |
|
eigenvalues_ |
|
eigenvectors_ |
|
means_ |
|
offset_correlation_ |
|
score_ |
Training score of the model, computed as the generalized matrix, |
timescales_ |
fit
(sequences, y=None)¶Fit the model with a collection of sequences.
This method is not online. Any state accumulated from previous calls to fit() or partial_fit() will be cleared. For online learning, use partial_fit.
Parameters: |
|
---|---|
Returns: | self – Returns the instance itself. |
Return type: | object |
fit_transform
(sequences, y=None)¶Fit the model with X and apply the dimensionality reduction on X.
This method is not online. Any state accumulated from previous calls to fit() or partial_fit() will be cleared. For online learning, use partial_fit.
Parameters: |
|
---|---|
Returns: | sequence_new |
Return type: | list of array-like, each of shape (n_samples_i, n_components) |
get_params
(deep=True)¶Get parameters for this estimator.
Parameters: | deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators. |
---|---|
Returns: | params – Parameter names mapped to their values. |
Return type: | mapping of string to any |
partial_fit
(X)¶Fit the model with X.
This method is suitable for online learning. The state of the model will be updated with the new data X.
Parameters: | X (array-like, shape (n_samples, n_features)) – Training data, where n_samples in the number of samples and n_features is the number of features. |
---|---|
Returns: | self – Returns the instance itself. |
Return type: | object |
partial_transform
(features)¶Apply the dimensionality reduction on X.
Parameters: | features (array-like, shape (n_samples, n_features)) – Training data, where n_samples in the number of samples and n_features is the number of features. This function acts on a single featurized trajectory. |
---|---|
Returns: | sequence_new – TICA-projected features |
Return type: | array-like, shape (n_samples, n_components) |
Notes
This function acts on a single featurized trajectory.
score
(sequences, y=None)¶Score the model on new data using the generalized matrix Rayleigh quotient
Parameters: | sequences (list of array, each of shape (n_samples_i, n_features)) – Test data. A list of sequences in afeature space, each of which is a 2D array of possibily different lengths, but the same number of features. |
---|---|
Returns: | gmrq – Generalized matrix Rayleigh quotient. This number indicates how
well the top n_timescales+1 eigenvectors of this tICA model perform
as slowly decorrelating collective variables for the new data in
sequences . |
Return type: | float |
References
[1] | McGibbon, R. T. and V. S. Pande, “Variational cross-validation of slow dynamical modes in molecular kinetics” J. Chem. Phys. 142, 124105 (2015) |
score_
¶Training score of the model, computed as the generalized matrix, Rayleigh quotient, the sum of the first n_components eigenvalues
set_params
(**params)¶Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each
component of a nested object.
Returns: | |
---|---|
Return type: | self |
summarize
()¶Some summary information.
transform
(sequences)¶Apply the dimensionality reduction on X.
Parameters: | sequences (list of array-like, each of shape (n_samples_i, n_features)) – Training data, where n_samples_i in the number of samples in sequence i and n_features is the number of features. |
---|---|
Returns: | sequence_new |
Return type: | list of array-like, each of shape (n_samples_i, n_components) |