msmbuilder.msm.
ContinuousTimeMSM
(lag_time=1, n_timescales=None, ergodic_cutoff=1, sliding_window=True, verbose=False, guess='log')¶Reversible first order master equation model
This model fits a continuous-time Markov model (master equation) from
discrete-time integer labeled timeseries. The key estimated attribute,
ratemat_
, is a matrix containing the estimated first order rate
constants between the states. See [1] for details.
Parameters: |
|
---|
n_states_
¶int – The number of states
ratemat_
¶np.ndarray, shape=(n_states_, n_state_) – The estimated state-to-state transition rates.
transmat_
¶np.ndarray, shape=(n_states_, n_state_) – The estimated state-to-state transition probabilities over an interval of 1 time unit.
timescales_
¶array of shape=(n_timescales,) – Estimated relaxation timescales of the model.
populations_
¶np.ndarray, shape=(n_states_,) – Estimated stationary probability distribution over the states.
countsmat_
¶array_like, shape = (n_states_, n_states_) – Number of transition counts between states, at a time delay of lag_time
countsmat_[i, j] is counted during fit().
optimizer_state_
¶object – Contains information about the optimization termination.
mapping_
¶dict – Mapping between “input” labels and internal state indices used by the
counts and transition matrix for this Markov state model. Input states
need not necessarily be integers in (0, ..., n_states_ - 1), for
example. The semantics of mapping_[i] = j
is that state i
from
the “input space” is represented by the index j
in this MSM.
theta_
¶array of shape n*(n+1)/2 or shorter – Optimized set of parameters for the model.
information_
¶np.ndarray, shape=(len(theta_), len(theta_)) – Approximate inverse of the hessian of the model log-likelihood
evaluated at theta_
.
eigenvalues_
¶array of shape=(n_timescales+1) – Largest eigenvalues of the rate matrix.
left_eigenvectors_
¶array of shape=(n_timescales+1) – Dominant left eigenvectors of the rate matrix.
right_eigenvectors_
¶array of shape=(n_timescales+1) – Dominant right eigenvectors of the rate matrix,
References
[1] | R. T. McGibbon and V. S. Pande, Efficient maximum likelihood parameterization of continuous-time Markov processes.” J. Chem. Phys. 143, 034109 (2015) http://dx.doi.org/10.1063/1.4926516 |
See also
MarkovStateModel
__init__
(lag_time=1, n_timescales=None, ergodic_cutoff=1, sliding_window=True, verbose=False, guess='log')¶Methods
__init__ ([lag_time, n_timescales, ...]) |
|
draw_samples (sequences, n_samples[, ...]) |
Sample conformations for a sequences of states. |
fit (sequences[, y]) |
|
fit_transform (X[, y]) |
Fit to data, then transform it. |
get_params ([deep]) |
Get parameters for this estimator. |
inverse_transform (sequences) |
Transform a list of sequences from internal indexing into |
partial_transform (sequence[, mode]) |
Transform a sequence to internal indexing |
sample_discrete ([state, n_steps, random_state]) |
Generate a random sequence of states by propagating the model using discrete time steps given by the model lagtime. |
score (sequences[, y]) |
Score the model on new data using the generalized matrix Rayleigh |
set_params (\*\*params) |
Set the parameters of this estimator. |
summarize () |
|
transform (sequences[, mode]) |
Transform a list of sequences to internal indexing |
uncertainty_K () |
Estimate of the element-wise asymptotic standard deviation |
uncertainty_eigenvalues () |
Estimate of the element-wise asymptotic standard deviation |
uncertainty_pi () |
Estimate of the element-wise asymptotic standard deviation in the stationary distribution. |
uncertainty_timescales () |
Estimate of the element-wise asymptotic standard deviation in the model relaxation timescales. |
Attributes
score_ |
Training score of the model, computed as the generalized matrix, |
draw_samples
(sequences, n_samples, random_state=None)¶Sample conformations for a sequences of states.
Parameters: |
|
---|---|
Returns: | selected_pairs_by_state – shape=(n_states, n_samples, 2) selected_pairs_by_state[state] gives an array of randomly selected (trj, frame) pairs from the specified state. |
Return type: | np.array, dtype=int, |
See also
utils.map_drawn_samples()
index.()
fit_transform
(X, y=None, **fit_params)¶Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters: |
|
---|---|
Returns: | X_new – Transformed array. |
Return type: | numpy array of shape [n_samples, n_features_new] |
get_params
(deep=True)¶Get parameters for this estimator.
Parameters: | deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators. |
---|---|
Returns: | params – Parameter names mapped to their values. |
Return type: | mapping of string to any |
inverse_transform
(sequences)¶Transform a list of sequences from internal indexing into labels
Parameters: | sequences (list) – List of sequences, each of which is one-dimensional array of
integers in 0, ..., n_states_ - 1 . |
---|---|
Returns: | sequences – List of sequences, each of which is one-dimensional array of labels. |
Return type: | list |
partial_transform
(sequence, mode='clip')¶Transform a sequence to internal indexing
Recall that sequence can be arbitrary labels, whereas transmat_
and countsmat_
are indexed with integers between 0 and
n_states - 1
. This methods maps a set of sequences from the labels
onto this internal indexing.
Parameters: |
|
---|---|
Returns: | mapped_sequence – If mode is “fill”, return an ndarray in internal indexing. If mode is “clip”, return a list of ndarrays each in internal indexing. |
Return type: | list or ndarray |
sample_discrete
(state=None, n_steps=100, random_state=None)¶Generate a random sequence of states by propagating the model using discrete time steps given by the model lagtime.
Parameters: |
|
---|---|
Returns: | sequence – A randomly sampled label sequence |
Return type: | array of length n_steps |
score
(sequences, y=None)¶Score the model on new data using the generalized matrix Rayleigh quotient
Parameters: | sequences (list of array-like) – List of sequences, or a single sequence. Each sequence should be a 1D iterable of state labels. Labels can be integers, strings, or other orderable objects. |
---|---|
Returns: | gmrq – Generalized matrix Rayleigh quotient. This number indicates how
well the top n_timescales+1 eigenvectors of this model perform
as slowly decorrelating collective variables for the new data in
sequences . |
Return type: | float |
References
[1] | McGibbon, R. T. and V. S. Pande, “Variational cross-validation of slow dynamical modes in molecular kinetics” J. Chem. Phys. 142, 124105 (2015) |
score_
¶Training score of the model, computed as the generalized matrix, Rayleigh quotient, the sum of the first n_components eigenvalues
set_params
(**params)¶Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each
component of a nested object.
Returns: | |
---|---|
Return type: | self |
transform
(sequences, mode='clip')¶Transform a list of sequences to internal indexing
Recall that sequences can be arbitrary labels, whereas transmat_
and countsmat_
are indexed with integers between 0 and
n_states - 1
. This methods maps a set of sequences from the labels
onto this internal indexing.
Parameters: |
|
---|---|
Returns: | mapped_sequences – List of sequences in internal indexing |
Return type: | list |
uncertainty_K
()¶Estimate of the element-wise asymptotic standard deviation in the rate matrix
uncertainty_eigenvalues
()¶Estimate of the element-wise asymptotic standard deviation in the model eigenvalues
uncertainty_pi
()¶Estimate of the element-wise asymptotic standard deviation in the stationary distribution.
uncertainty_timescales
()¶Estimate of the element-wise asymptotic standard deviation in the model relaxation timescales.