msmbuilder.msm.
MarkovStateModel
(lag_time=1, n_timescales=None, reversible_type='mle', ergodic_cutoff='on', prior_counts=0, sliding_window=True, verbose=True)¶Reversible Markov State Model
This model fits a first-order Markov model to a dataset of integer-valued
timeseries. The key estimated attribute, transmat_
is a matrix
containing the estimated probability of transitioning between pairs
of states in the duration specified by lag_time
.
Unless otherwise specified, the model is constrained to be reversible (satisfy detailed balance), which is appropriate for equilibrium chemical systems.
Parameters: |
|
---|
References
[1] | Prinz, Jan-Hendrik, et al. “Markov models of molecular kinetics: Generation and validation.” J Chem. Phys. 134.17 (2011): 174105. |
[2] | Pande, V. S., K. A. Beauchamp, and G. R. Bowman. “Everything you wanted to know about Markov State Models but were afraid to ask” Methods 52.1 (2010): 99-105. |
n_states_
¶int – The number of states in the model
mapping_
¶dict – Mapping between “input” labels and internal state indices used by the
counts and transition matrix for this Markov state model. Input states
need not necessarily be integers in (0, ..., n_states_ - 1), for
example. The semantics of mapping_[i] = j
is that state i
from
the “input space” is represented by the index j
in this MSM.
countsmat_
¶array_like, shape = (n_states_, n_states_) – Number of transition counts between states. countsmat_[i, j] is counted during fit(). The indices i and j are the “internal” indices described above. No correction for reversibility is made to this matrix.
transmat_
¶array_like, shape = (n_states_, n_states_) – Maximum likelihood estimate of the reversible transition matrix. The indices i and j are the “internal” indices described above.
populations_
¶array, shape = (n_states_,) – The equilibrium population (stationary eigenvector) of transmat_
__init__
(lag_time=1, n_timescales=None, reversible_type='mle', ergodic_cutoff='on', prior_counts=0, sliding_window=True, verbose=True)¶Methods
__init__ ([lag_time, n_timescales, ...]) |
|
draw_samples (sequences, n_samples[, ...]) |
Sample conformations for a sequences of states. |
eigtransform (sequences[, right, mode]) |
Transform a list of sequences by projecting the sequences onto the first n_timescales dynamical eigenvectors. |
fit (sequences[, y]) |
Estimate model parameters. |
fit_transform (X[, y]) |
Fit to data, then transform it. |
get_params ([deep]) |
Get parameters for this estimator. |
inverse_transform (sequences) |
Transform a list of sequences from internal indexing into |
partial_transform (sequence[, mode]) |
Transform a sequence to internal indexing |
sample ([state, n_steps, random_state]) |
|
sample_discrete ([state, n_steps, random_state]) |
Generate a random sequence of states by propagating the model using discrete time steps given by the model lagtime. |
score (sequences[, y]) |
Score the model on new data using the generalized matrix Rayleigh quotient |
score_ll (sequences) |
log of the likelihood of sequences with respect to the model |
set_params (\*\*params) |
Set the parameters of this estimator. |
summarize () |
Return some diagnostic summary statistics about this Markov model |
transform (sequences[, mode]) |
Transform a list of sequences to internal indexing |
uncertainty_eigenvalues () |
Estimate of the element-wise asymptotic standard deviation in the model eigenvalues. |
uncertainty_timescales () |
Estimate of the element-wise asymptotic standard deviation in the model implied timescales. |
Attributes
eigenvalues_ |
Eigenvalues of the transition matrix. |
left_eigenvectors_ |
Left eigenvectors, \(\Phi\), of the transition matrix. |
right_eigenvectors_ |
Right eigenvectors, \(\Psi\), of the transition matrix. |
score_ |
Training score of the model, computed as the generalized matrix, |
state_labels_ |
|
timescales_ |
Implied relaxation timescales of the model. |
draw_samples
(sequences, n_samples, random_state=None)¶Sample conformations for a sequences of states.
Parameters: |
|
---|---|
Returns: | selected_pairs_by_state – shape=(n_states, n_samples, 2) selected_pairs_by_state[state] gives an array of randomly selected (trj, frame) pairs from the specified state. |
Return type: | np.array, dtype=int, |
See also
utils.map_drawn_samples()
index.()
eigenvalues_
¶Eigenvalues of the transition matrix.
eigtransform
(sequences, right=True, mode='clip')¶Transform a list of sequences by projecting the sequences onto the first n_timescales dynamical eigenvectors.
Parameters: |
|
---|---|
Returns: | transformed – Each element of transformed is an array of shape |
Return type: | list of 2d arrays |
References
[1] | Prinz, Jan-Hendrik, et al. “Markov models of molecular kinetics: Generation and validation.” J. Chem. Phys. 134.17 (2011): 174105. |
fit
(sequences, y=None)¶Estimate model parameters.
Parameters: | sequences (list of array-like) – List of sequences, or a single sequence. Each sequence should be a 1D iterable of state labels. Labels can be integers, strings, or other orderable objects. |
---|---|
Returns: | |
Return type: | self |
Notes
None and NaN are recognized immediately as invalid labels. Therefore, transition counts from or to a sequence item which is NaN or None will not be counted. The mapping_ attribute will not include the NaN or None.
fit_transform
(X, y=None, **fit_params)¶Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters: |
|
---|---|
Returns: | X_new – Transformed array. |
Return type: | numpy array of shape [n_samples, n_features_new] |
get_params
(deep=True)¶Get parameters for this estimator.
Parameters: | deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators. |
---|---|
Returns: | params – Parameter names mapped to their values. |
Return type: | mapping of string to any |
inverse_transform
(sequences)¶Transform a list of sequences from internal indexing into labels
Parameters: | sequences (list) – List of sequences, each of which is one-dimensional array of
integers in 0, ..., n_states_ - 1 . |
---|---|
Returns: | sequences – List of sequences, each of which is one-dimensional array of labels. |
Return type: | list |
left_eigenvectors_
¶Left eigenvectors, \(\Phi\), of the transition matrix.
The left eigenvectors are normalized such that:
lv[:, 0]
is the equilibrium populations and is normalized such that sum(lv[:, 0]) == 1`- The eigenvectors satisfy
sum(lv[:, i] * lv[:, i] / model.populations_) == 1
. In math notation, this is \(<\phi_i, \phi_i>_{\mu^{-1}} = 1\)
Returns: | lv – The columns of lv, lv[:, i] , are the left eigenvectors of
transmat_ . |
---|---|
Return type: | array-like, shape=(n_states, n_timescales+1) |
partial_transform
(sequence, mode='clip')¶Transform a sequence to internal indexing
Recall that sequence can be arbitrary labels, whereas transmat_
and countsmat_
are indexed with integers between 0 and
n_states - 1
. This methods maps a set of sequences from the labels
onto this internal indexing.
Parameters: |
|
---|---|
Returns: | mapped_sequence – If mode is “fill”, return an ndarray in internal indexing. If mode is “clip”, return a list of ndarrays each in internal indexing. |
Return type: | list or ndarray |
right_eigenvectors_
¶Right eigenvectors, \(\Psi\), of the transition matrix.
The right eigenvectors are normalized such that:
Weighted by the stationary distribution, the right eigenvectors are normalized to 1. That is,
sum(rv[:, i] * rv[:, i] * self.populations_) == 1
,or \(<\psi_i, \psi_i>_{\mu} = 1\)
Returns: | rv – The columns of lv, rv[:, i] , are the right eigenvectors of
transmat_ . |
---|---|
Return type: | array-like, shape=(n_states, n_timescales+1) |
sample_discrete
(state=None, n_steps=100, random_state=None)¶Generate a random sequence of states by propagating the model using discrete time steps given by the model lagtime.
Parameters: |
|
---|---|
Returns: | sequence – A randomly sampled label sequence |
Return type: | array of length n_steps |
score
(sequences, y=None)¶Score the model on new data using the generalized matrix Rayleigh quotient
Parameters: | sequences (list of array-like) – List of sequences, or a single sequence. Each sequence should be a 1D iterable of state labels. Labels can be integers, strings, or other orderable objects. |
---|---|
Returns: | gmrq – Generalized matrix Rayleigh quotient. This number indicates how
well the top n_timescales+1 eigenvectors of this MSM perform as
slowly decorrelating collective variables for the new data in
sequences . |
Return type: | float |
References
[1] | McGibbon, R. T. and V. S. Pande, “Variational cross-validation of slow dynamical modes in molecular kinetics” J. Chem. Phys. 142, 124105 (2015) |
score_
¶Training score of the model, computed as the generalized matrix, Rayleigh quotient, the sum of the first n_components eigenvalues
score_ll
(sequences)¶log of the likelihood of sequences with respect to the model
Parameters: | sequences (list of array-like) – List of sequences, or a single sequence. Each sequence should be a 1D iterable of state labels. Labels can be integers, strings, or other orderable objects. |
---|---|
Returns: | loglikelihood – The natural log of the likelihood, computed as \(\sum_{ij} C_{ij} \log(P_{ij})\) where C is a matrix of counts computed from the input sequences. |
Return type: | float |
set_params
(**params)¶Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each
component of a nested object.
Returns: | |
---|---|
Return type: | self |
summarize
()¶Return some diagnostic summary statistics about this Markov model
timescales_
¶Implied relaxation timescales of the model.
The relaxation of any initial distribution towards equilibrium is given, according to this model, by a sum of terms – each corresponding to the relaxation along a specific direction (eigenvector) in state space – which decay exponentially in time. See equation 19. from [1].
Returns: | timescales – The longest implied relaxation timescales of the model, expressed
in units of time-step between indices in the source data supplied
to fit() . |
---|---|
Return type: | array-like, shape = (n_timescales,) |
References
[1] | Prinz, Jan-Hendrik, et al. “Markov models of molecular kinetics: |
Generation and validation.” J. Chem. Phys. 134.17 (2011): 174105.
transform
(sequences, mode='clip')¶Transform a list of sequences to internal indexing
Recall that sequences can be arbitrary labels, whereas transmat_
and countsmat_
are indexed with integers between 0 and
n_states - 1
. This methods maps a set of sequences from the labels
onto this internal indexing.
Parameters: |
|
---|---|
Returns: | mapped_sequences – List of sequences in internal indexing |
Return type: | list |
uncertainty_eigenvalues
()¶Estimate of the element-wise asymptotic standard deviation in the model eigenvalues.
Returns: | sigma_eigs – The estimated symptotic standard deviation in the eigenvalues. |
---|---|
Return type: | np.array, shape=(n_timescales+1,) |
References
[1] | Hinrichs, Nina Singhal, and Vijay S. Pande. “Calculation of the distribution of eigenvalues and eigenvectors in Markovian state models for molecular dynamics.” J. Chem. Phys. 126.24 (2007): 244101. |
uncertainty_timescales
()¶Estimate of the element-wise asymptotic standard deviation in the model implied timescales.
Returns: | sigma_timescales – The estimated symptotic standard deviation in the implied timescales. |
---|---|
Return type: | np.array, shape=(n_timescales,) |
References
[1] | Hinrichs, Nina Singhal, and Vijay S. Pande. “Calculation of the distribution of eigenvalues and eigenvectors in Markovian state models for molecular dynamics.” J. Chem. Phys. 126.24 (2007): 244101. |