cobsurv.models.CobraSurvival#
- class cobsurv.models.CobraSurvival(epsilon=0.12, alpha=4, machines='default', distance_function='euler')#
This is COBRA implementation for survival analysis, it is a meta-learner that combines the predictions of several survival cobsurv to produce a final prediction. The algorithm is described in the paper:
\[\hat{S}_{COBRA}(t | x; \cdot ) = \prod_{t' \in \mathcal{Y}_{\Gamma}(x; \cdot)} \left( 1 - \frac{\mathcal{D}_{\Gamma}(t' | x; \cdot )}{\mathcal{R}_{\Gamma}(t' | x; \cdot)} \right)^{\mathbb{I}(t' \leq t)}\]See the User Guide, and the research paper [1] for further description.
- Parameters:
epsilon (float , default is 0.12) – The \(\epsilon\) proximity which checks if the two machines survival prediction are close \(d(s_1,s_2) < \epsilon\)
alpha (integer , default is 4) – The \(\alpha\) the number of machines which are in consensus of \(\epsilon\) proximity
machines (list of scikit-learn type of machines, default is "default") – machines, parameter, set to default for now, The defaults are Random Survival Forest, Cox Ridge, Cox Lasso, Survival Tree, KNN Survival but user can provide their own machines with conditions that it have a predict_survival_function method which returns a numpy array of shape (n_samples,n_times), the survival prediction at times of unique times of training data.
distance_function (string or callable, default is "euler") – distance function, parameter, set to default for now, The default is the area type of norm, as used in the paper, but user can provide their own distance function which takes two numpy arrays of shape (n_times,) and returns a float. The distance function can be “trapz”, “euclidean”, “cross_entropy”, “cross_entropy_from_mean”, “shannon_jensen”, “kl”, “max”, “euler” or a callable function.
- unique_sorted_times_l#
Unique sorted time points of happening of an event or censoring in dataset \(D_l\).
- Type:
1d array
- unique_sorted_times_k#
Unique sorted time points of happening of an event or censoring in dataset \(D_k\).
- Type:
1d array
- pred_l#
The survival prediction of machines on dataset \(D_l\) of shape (n_samples,n_estimators,n_times).
- Type:
3d array
- epsilon#
The \(\epsilon\) proximity which checks if the two machines survival prediction sre close \(d(s_1,s_2) < \epsilon\).
- Type:
float
- alpha#
The \(\alpha\) the number of machines which are in consensus of $epsilon$ proximity.
- Type:
integer
- n_estimators#
The number of machines.
- Type:
int , default is 5
- indexer#
The IndexCalculator object which is used to calculate the index.
- Type:
A IndexCalculator object
- train_time#
The time points of dataset \(D_l\).
- Type:
1d array
- train_event#
A boolean indicator for censoring True represents the happening of an event of dataset \(D_l\).
- Type:
1d array
See also
cobsurv.distance_functionsThe distance functions used in COBRA A single survival tree.
Notes
The COBRA algorithm is described in the paper [1]. We must know that as we increase \(\epsilon\) the survival curve will become more like the population survival curve.
References
- __init__(epsilon=0.12, alpha=4, machines='default', distance_function='euler')#
Methods
__init__([epsilon, alpha, machines, ...])fit(X, y[, n_quantiles, l_by_n, experimental])This part does the splitting the dataset in two parts $X_l$ and $X_k$ and $y_l$ and $y_k$, the splitting is done strategically to ensure that censoring and event times are distributed equally in both the splits, Secondly this function train the initial cobsurv on $D_k$ dataset
Get the covariate relevance of the covariates This function computes the relevance with respect to given covariates, if no covariates are given then it computes the relevance with respect to average of all the in the set $D_l$
get_params([deep])Get parameters for this estimator.
plot(X, index)This plot the survival function of the given covariates
predict(X)Predict the survival function for a given set of covariates
score(X, y[, type])The score function provides the integrated brier score by default, if type is set to concordance then it provides the concordance index
set_params(**params)Set the parameters of this estimator.
- fit(X, y, n_quantiles=10, l_by_n=0.5, experimental=False)#
This part does the splitting the dataset in two parts $X_l$ and $X_k$ and $y_l$ and $y_k$, the splitting is done strategically to ensure that censoring and event times are distributed equally in both the splits, Secondly this function train the initial cobsurv on $D_k$ dataset
- Parameters:
X (array-like of shape (n_samples, n_features)) – The input samples.
y (structured numpy array of shape (n_samples,)) – The target variable with the first field as the event indicator and the second field as the event time
n_quantiles (int, default=10) – The number of quantiles to be used for splitting the data, defult is set to 10
l_by_n (float, default=0.5) – The split ratio, defult is set to 0.5
experimental (bool, default=False) – If set to True the cobsurv are trained on whole dataset and no splitting will happen , default is False
- Returns:
self
- Return type:
object
- get_covariate_relevance(X=None)#
Get the covariate relevance of the covariates This function computes the relevance with respect to given covariates, if no covariates are given then it computes the relevance with respect to average of all the in the set $D_l$
- Parameters:
X (array-like of shape (n_samples, n_features)) – The input samples.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
params – Parameter names mapped to their values.
- Return type:
dict
- plot(X, index)#
This plot the survival function of the given covariates
- Parameters:
X (array-like of shape (n_samples, n_features)) – The input samples.
index (integer) – The index of the covariate to be plotted
- predict(X)#
Predict the survival function for a given set of covariates
- Parameters:
X (array-like of shape (n_samples, n_features)) – The input samples.
- score(X, y, type='ibs')#
The score function provides the integrated brier score by default, if type is set to concordance then it provides the concordance index
- Parameters:
X (array-like of shape (n_samples, n_features)) – The input samples.
y (structured numpy array of shape (n_samples,)) – The target variable with the first field as the event indicator and the second field as the event time
type (string, default is "ibs") – The type of score to be calculated, it can be “ibs” or “concordance”
- Returns:
score – The score of the model
- Return type:
float
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance