msmbuilder.decomposition.tICA¶

class msmbuilder.decomposition.tICA(n_components=None, lag_time=1, shrinkage=None, kinetic_mapping=False)¶

Time-structure Independent Component Analysis (tICA)

Linear dimensionality reduction using an eigendecomposition of the time-lag correlation matrix and covariance matrix of the data and keeping only the vectors which decorrelate slowest to project the data into a lower dimensional space.

n_components
: int, None
Number of components to keep.

lag_time
: int
Delay time forward or backward in the input data. The time-lagged correlations is computed between datas X[t] and X[t+lag_time].

shrinkage
: float, default=None
The covariance shrinkage intensity (range 0-1). If shrinkage is not specified (the default) it is estimated using an analytic formula (the Rao-Blackwellized Ledoit-Wolf estimator) introduced in [5].

kinetic_mapping
: bool, default=False

If True, weigh the projections by the tICA eigenvalues, yielding

kinetic distances as described in [6].

components_
: array-like, shape (n_components, n_features)
Components with maximum autocorrelation.

offset_correlation_
: array-like, shape (n_features, n_features)
Symmetric time-lagged correlation matrix, \(C=E[(x_t)^T x_{t+lag}]\).

eigenvalues_
: array-like, shape (n_features,)
Eigenvalues of the tICA generalized eigenproblem, in decreasing order.

eigenvectors_
: array-like, shape (n_components, n_features)
Eigenvectors of the tICA generalized eigenproblem. The vectors give a set of “directions” through configuration space along which the system relaxes towards equilibrium. Each eigenvector is associated with characteritic timescale :math:`-

rac{lag_time}{ln lambda_i}, where \(lambda_i\) is

the corresponding eigenvector. See [2] for more information.

means_: The mean of the data along each feature
n_observations_: Total number of data points fit by the model. Note that the model is “reset” by calling fit() with new sequences, whereas partial_fit() updates the fit with new data, and is suitable for online learning.
n_sequences_: Total number of sequences fit by the model. Note that the model is “reset” by calling fit() with new sequences, whereas partial_fit() updates the fit with new data, and is suitable for

online learning.
timescales_: The implied timescales of the tICA model, given by -offset / log(eigenvalues)

This method was introduced originally in [4], and has been applied to the analysis of molecular dynamics data in [1]_, [2], and [3]. In [1]_ and [2], tICA was used as a dimensionality reduction technique before fitting other kinetic models.

[1]	Schwantes, Christian R., and Vijay S. Pande. J. Chem Theory Comput. 9.4 (2013): 2000-2009.

[2]	(1, 2) Perez-Hernandez, Guillermo, et al. J Chem. Phys (2013): 015102.

[3]	Naritomi, Yusuke, and Sotaro Fuchigami. J. Chem. Phys. 134.6 (2011): 065101.

[4]	Molgedey, Lutz, and Heinz Georg Schuster. Phys. Rev. Lett. 72.23 (1994): 3634.

[5]	Chen, Yilun, Ami Wiesel, and Alfred O. Hero III. ICASSP (2009)

[6]	Noe, F. and Clementi, C. arXiv arXiv:1506.06259 [physics.comp-ph] (2015)

__init__(n_components=None, lag_time=1, shrinkage=None, kinetic_mapping=False)¶

Methods

`__init__`([n_components, lag_time, ...])
`fit`(sequences[, y])	Fit the model with a collection of sequences.
`fit_transform`(sequences[, y])	Fit the model with X and apply the dimensionality reduction on X.
`get_params`([deep])	Get parameters for this estimator.
`partial_fit`(X)	Fit the model with X.
`partial_transform`(features)	Apply the dimensionality reduction on X.
`score`(sequences[, y])	Score the model on new data using the generalized matrix Rayleigh quotient
`set_params`(\\params)	Set the parameters of this estimator.
`summarize`()	Some summary information.
`transform`(sequences)	Apply the dimensionality reduction on X.

Attributes

`components_`
`covariance_`
`eigenvalues_`
`eigenvectors_`
`means_`
`offset_correlation_`
`score_`	Training score of the model, computed as the generalized matrix,
`timescales_`

fit(sequences, y=None)¶

Fit the model with a collection of sequences.

This method is not online. Any state accumulated from previous calls to fit() or partial_fit() will be cleared. For online learning, use partial_fit.

Parameters:	sequences (list of array-like, each of shape (n_samples_i, n_features)) – Training data, where n_samples_i in the number of samples in sequence i and n_features is the number of features. y (None) – Ignored
Returns:	self – Returns the instance itself.
Return type:	object

fit_transform(sequences, y=None)¶

Fit the model with X and apply the dimensionality reduction on X.

This method is not online. Any state accumulated from previous calls to fit() or partial_fit() will be cleared. For online learning, use partial_fit.

Parameters:	sequences (list of array-like, each of shape (n_samples_i, n_features)) – Training data, where n_samples_i in the number of samples in sequence i and n_features is the number of features. y (None) – Ignored
Returns:	sequence_new
Return type:	list of array-like, each of shape (n_samples_i, n_components)

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:	deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	params – Parameter names mapped to their values.
Return type:	mapping of string to any

partial_fit(X)¶

Fit the model with X.

This method is suitable for online learning. The state of the model will be updated with the new data X.

Parameters:	X (array-like, shape (n_samples, n_features)) – Training data, where n_samples in the number of samples and n_features is the number of features.
Returns:	self – Returns the instance itself.
Return type:	object

partial_transform(features)¶

Apply the dimensionality reduction on X.

Parameters:	features (array-like, shape (n_samples, n_features)) – Training data, where n_samples in the number of samples and n_features is the number of features. This function acts on a single featurized trajectory.
Returns:	sequence_new – TICA-projected features
Return type:	array-like, shape (n_samples, n_components)

Notes

This function acts on a single featurized trajectory.

score(sequences, y=None)¶

Score the model on new data using the generalized matrix Rayleigh quotient

Parameters:	sequences (list of array, each of shape (n_samples_i, n_features)) – Test data. A list of sequences in afeature space, each of which is a 2D array of possibily different lengths, but the same number of features.
Returns:	gmrq – Generalized matrix Rayleigh quotient. This number indicates how well the top `n_timescales+1` eigenvectors of this tICA model perform as slowly decorrelating collective variables for the new data in `sequences`.
Return type:	float

References

[1]	McGibbon, R. T. and V. S. Pande, “Variational cross-validation of slow dynamical modes in molecular kinetics” J. Chem. Phys. 142, 124105 (2015)

score_¶: Training score of the model, computed as the generalized matrix, Rayleigh quotient, the sum of the first n_components eigenvalues

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:	self

summarize()¶: Some summary information.

transform(sequences)¶

Apply the dimensionality reduction on X.

Parameters:	sequences (list of array-like, each of shape (n_samples_i, n_features)) – Training data, where n_samples_i in the number of samples in sequence i and n_features is the number of features.
Returns:	sequence_new
Return type:	list of array-like, each of shape (n_samples_i, n_components)

Version 3.6.1

msmbuilder.decomposition.tICA¶