msmbuilder.msm.MarkovStateModel(lag_time=1, n_timescales=None, reversible_type='mle', ergodic_cutoff='on', prior_counts=0, sliding_window=True, verbose=True)¶Reversible Markov State Model
This model fits a first-order Markov model to a dataset of integer-valued
timeseries. The key estimated attribute, transmat_ is a matrix
containing the estimated probability of transitioning between pairs
of states in the duration specified by lag_time.
Unless otherwise specified, the model is constrained to be reversible (satisfy detailed balance), which is appropriate for equilibrium chemical systems.
| Parameters: |
|
|---|
References
| [1] | Prinz, Jan-Hendrik, et al. “Markov models of molecular kinetics: Generation and validation.” J Chem. Phys. 134.17 (2011): 174105. |
| [2] | Pande, V. S., K. A. Beauchamp, and G. R. Bowman. “Everything you wanted to know about Markov State Models but were afraid to ask” Methods 52.1 (2010): 99-105. |
n_states_¶int – The number of states in the model
mapping_¶dict – Mapping between “input” labels and internal state indices used by the
counts and transition matrix for this Markov state model. Input states
need not necessarily be integers in (0, ..., n_states_ - 1), for
example. The semantics of mapping_[i] = j is that state i from
the “input space” is represented by the index j in this MSM.
countsmat_¶array_like, shape = (n_states_, n_states_) – Number of transition counts between states. countsmat_[i, j] is counted during fit(). The indices i and j are the “internal” indices described above. No correction for reversibility is made to this matrix.
transmat_¶array_like, shape = (n_states_, n_states_) – Maximum likelihood estimate of the reversible transition matrix. The indices i and j are the “internal” indices described above.
populations_¶array, shape = (n_states_,) – The equilibrium population (stationary eigenvector) of transmat_
__init__(lag_time=1, n_timescales=None, reversible_type='mle', ergodic_cutoff='on', prior_counts=0, sliding_window=True, verbose=True)¶Methods
__init__([lag_time, n_timescales, ...]) |
|
draw_samples(sequences, n_samples[, ...]) |
Sample conformations for a sequences of states. |
eigtransform(sequences[, right, mode]) |
Transform a list of sequences by projecting the sequences onto the first n_timescales dynamical eigenvectors. |
fit(sequences[, y]) |
Estimate model parameters. |
fit_transform(X[, y]) |
Fit to data, then transform it. |
get_params([deep]) |
Get parameters for this estimator. |
inverse_transform(sequences) |
Transform a list of sequences from internal indexing into |
partial_transform(sequence[, mode]) |
Transform a sequence to internal indexing |
sample([state, n_steps, random_state]) |
|
sample_discrete([state, n_steps, random_state]) |
Generate a random sequence of states by propagating the model using discrete time steps given by the model lagtime. |
score(sequences[, y]) |
Score the model on new data using the generalized matrix Rayleigh quotient |
score_ll(sequences) |
log of the likelihood of sequences with respect to the model |
set_params(\*\*params) |
Set the parameters of this estimator. |
summarize() |
Return some diagnostic summary statistics about this Markov model |
transform(sequences[, mode]) |
Transform a list of sequences to internal indexing |
uncertainty_eigenvalues() |
Estimate of the element-wise asymptotic standard deviation in the model eigenvalues. |
uncertainty_timescales() |
Estimate of the element-wise asymptotic standard deviation in the model implied timescales. |
Attributes
eigenvalues_ |
Eigenvalues of the transition matrix. |
left_eigenvectors_ |
Left eigenvectors, \(\Phi\), of the transition matrix. |
right_eigenvectors_ |
Right eigenvectors, \(\Psi\), of the transition matrix. |
score_ |
Training score of the model, computed as the generalized matrix, |
state_labels_ |
|
timescales_ |
Implied relaxation timescales of the model. |
draw_samples(sequences, n_samples, random_state=None)¶Sample conformations for a sequences of states.
| Parameters: |
|
|---|---|
| Returns: | selected_pairs_by_state – shape=(n_states, n_samples, 2) selected_pairs_by_state[state] gives an array of randomly selected (trj, frame) pairs from the specified state. |
| Return type: | np.array, dtype=int, |
See also
utils.map_drawn_samples()index.()
eigenvalues_¶Eigenvalues of the transition matrix.
eigtransform(sequences, right=True, mode='clip')¶Transform a list of sequences by projecting the sequences onto the first n_timescales dynamical eigenvectors.
| Parameters: |
|
|---|---|
| Returns: | transformed – Each element of transformed is an array of shape |
| Return type: | list of 2d arrays |
References
| [1] | Prinz, Jan-Hendrik, et al. “Markov models of molecular kinetics: Generation and validation.” J. Chem. Phys. 134.17 (2011): 174105. |
fit(sequences, y=None)¶Estimate model parameters.
| Parameters: | sequences (list of array-like) – List of sequences, or a single sequence. Each sequence should be a 1D iterable of state labels. Labels can be integers, strings, or other orderable objects. |
|---|---|
| Returns: | |
| Return type: | self |
Notes
None and NaN are recognized immediately as invalid labels. Therefore, transition counts from or to a sequence item which is NaN or None will not be counted. The mapping_ attribute will not include the NaN or None.
fit_transform(X, y=None, **fit_params)¶Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
| Parameters: |
|
|---|---|
| Returns: | X_new – Transformed array. |
| Return type: | numpy array of shape [n_samples, n_features_new] |
get_params(deep=True)¶Get parameters for this estimator.
| Parameters: | deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators. |
|---|---|
| Returns: | params – Parameter names mapped to their values. |
| Return type: | mapping of string to any |
inverse_transform(sequences)¶Transform a list of sequences from internal indexing into labels
| Parameters: | sequences (list) – List of sequences, each of which is one-dimensional array of
integers in 0, ..., n_states_ - 1. |
|---|---|
| Returns: | sequences – List of sequences, each of which is one-dimensional array of labels. |
| Return type: | list |
left_eigenvectors_¶Left eigenvectors, \(\Phi\), of the transition matrix.
The left eigenvectors are normalized such that:
lv[:, 0]is the equilibrium populations and is normalized such that sum(lv[:, 0]) == 1`- The eigenvectors satisfy
sum(lv[:, i] * lv[:, i] / model.populations_) == 1. In math notation, this is \(<\phi_i, \phi_i>_{\mu^{-1}} = 1\)
| Returns: | lv – The columns of lv, lv[:, i], are the left eigenvectors of
transmat_. |
|---|---|
| Return type: | array-like, shape=(n_states, n_timescales+1) |
partial_transform(sequence, mode='clip')¶Transform a sequence to internal indexing
Recall that sequence can be arbitrary labels, whereas transmat_
and countsmat_ are indexed with integers between 0 and
n_states - 1. This methods maps a set of sequences from the labels
onto this internal indexing.
| Parameters: |
|
|---|---|
| Returns: | mapped_sequence – If mode is “fill”, return an ndarray in internal indexing. If mode is “clip”, return a list of ndarrays each in internal indexing. |
| Return type: | list or ndarray |
right_eigenvectors_¶Right eigenvectors, \(\Psi\), of the transition matrix.
The right eigenvectors are normalized such that:
Weighted by the stationary distribution, the right eigenvectors are normalized to 1. That is,
sum(rv[:, i] * rv[:, i] * self.populations_) == 1,or \(<\psi_i, \psi_i>_{\mu} = 1\)
| Returns: | rv – The columns of lv, rv[:, i], are the right eigenvectors of
transmat_. |
|---|---|
| Return type: | array-like, shape=(n_states, n_timescales+1) |
sample_discrete(state=None, n_steps=100, random_state=None)¶Generate a random sequence of states by propagating the model using discrete time steps given by the model lagtime.
| Parameters: |
|
|---|---|
| Returns: | sequence – A randomly sampled label sequence |
| Return type: | array of length n_steps |
score(sequences, y=None)¶Score the model on new data using the generalized matrix Rayleigh quotient
| Parameters: | sequences (list of array-like) – List of sequences, or a single sequence. Each sequence should be a 1D iterable of state labels. Labels can be integers, strings, or other orderable objects. |
|---|---|
| Returns: | gmrq – Generalized matrix Rayleigh quotient. This number indicates how
well the top n_timescales+1 eigenvectors of this MSM perform as
slowly decorrelating collective variables for the new data in
sequences. |
| Return type: | float |
References
| [1] | McGibbon, R. T. and V. S. Pande, “Variational cross-validation of slow dynamical modes in molecular kinetics” J. Chem. Phys. 142, 124105 (2015) |
score_¶Training score of the model, computed as the generalized matrix, Rayleigh quotient, the sum of the first n_components eigenvalues
score_ll(sequences)¶log of the likelihood of sequences with respect to the model
| Parameters: | sequences (list of array-like) – List of sequences, or a single sequence. Each sequence should be a 1D iterable of state labels. Labels can be integers, strings, or other orderable objects. |
|---|---|
| Returns: | loglikelihood – The natural log of the likelihood, computed as \(\sum_{ij} C_{ij} \log(P_{ij})\) where C is a matrix of counts computed from the input sequences. |
| Return type: | float |
set_params(**params)¶Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter> so that it’s possible to update each
component of a nested object.
| Returns: | |
|---|---|
| Return type: | self |
summarize()¶Return some diagnostic summary statistics about this Markov model
timescales_¶Implied relaxation timescales of the model.
The relaxation of any initial distribution towards equilibrium is given, according to this model, by a sum of terms – each corresponding to the relaxation along a specific direction (eigenvector) in state space – which decay exponentially in time. See equation 19. from [1].
| Returns: | timescales – The longest implied relaxation timescales of the model, expressed
in units of time-step between indices in the source data supplied
to fit(). |
|---|---|
| Return type: | array-like, shape = (n_timescales,) |
References
| [1] | Prinz, Jan-Hendrik, et al. “Markov models of molecular kinetics: |
Generation and validation.” J. Chem. Phys. 134.17 (2011): 174105.
transform(sequences, mode='clip')¶Transform a list of sequences to internal indexing
Recall that sequences can be arbitrary labels, whereas transmat_
and countsmat_ are indexed with integers between 0 and
n_states - 1. This methods maps a set of sequences from the labels
onto this internal indexing.
| Parameters: |
|
|---|---|
| Returns: | mapped_sequences – List of sequences in internal indexing |
| Return type: | list |
uncertainty_eigenvalues()¶Estimate of the element-wise asymptotic standard deviation in the model eigenvalues.
| Returns: | sigma_eigs – The estimated symptotic standard deviation in the eigenvalues. |
|---|---|
| Return type: | np.array, shape=(n_timescales+1,) |
References
| [1] | Hinrichs, Nina Singhal, and Vijay S. Pande. “Calculation of the distribution of eigenvalues and eigenvectors in Markovian state models for molecular dynamics.” J. Chem. Phys. 126.24 (2007): 244101. |
uncertainty_timescales()¶Estimate of the element-wise asymptotic standard deviation in the model implied timescales.
| Returns: | sigma_timescales – The estimated symptotic standard deviation in the implied timescales. |
|---|---|
| Return type: | np.array, shape=(n_timescales,) |
References
| [1] | Hinrichs, Nina Singhal, and Vijay S. Pande. “Calculation of the distribution of eigenvalues and eigenvectors in Markovian state models for molecular dynamics.” J. Chem. Phys. 126.24 (2007): 244101. |