cobsurv.models.CobraSurvival#

class cobsurv.models.CobraSurvival(epsilon=0.12, alpha=4, machines='default', distance_function='euler')#

This is COBRA implementation for survival analysis, it is a meta-learner that combines the predictions of several survival cobsurv to produce a final prediction. The algorithm is described in the paper:

\[\hat{S}_{COBRA}(t | x; \cdot ) = \prod_{t' \in \mathcal{Y}_{\Gamma}(x; \cdot)} \left( 1 - \frac{\mathcal{D}_{\Gamma}(t' | x; \cdot )}{\mathcal{R}_{\Gamma}(t' | x; \cdot)} \right)^{\mathbb{I}(t' \leq t)}\]

See the User Guide, and the research paper [1] for further description.

Parameters:
  • epsilon (float , default is 0.12) – The \(\epsilon\) proximity which checks if the two machines survival prediction are close \(d(s_1,s_2) < \epsilon\)

  • alpha (integer , default is 4) – The \(\alpha\) the number of machines which are in consensus of \(\epsilon\) proximity

  • machines (list of scikit-learn type of machines, default is "default") – machines, parameter, set to default for now, The defaults are Random Survival Forest, Cox Ridge, Cox Lasso, Survival Tree, KNN Survival but user can provide their own machines with conditions that it have a predict_survival_function method which returns a numpy array of shape (n_samples,n_times), the survival prediction at times of unique times of training data.

  • distance_function (string or callable, default is "euler") – distance function, parameter, set to default for now, The default is the area type of norm, as used in the paper, but user can provide their own distance function which takes two numpy arrays of shape (n_times,) and returns a float. The distance function can be “trapz”, “euclidean”, “cross_entropy”, “cross_entropy_from_mean”, “shannon_jensen”, “kl”, “max”, “euler” or a callable function.

unique_sorted_times_l#

Unique sorted time points of happening of an event or censoring in dataset \(D_l\).

Type:

1d array

unique_sorted_times_k#

Unique sorted time points of happening of an event or censoring in dataset \(D_k\).

Type:

1d array

pred_l#

The survival prediction of machines on dataset \(D_l\) of shape (n_samples,n_estimators,n_times).

Type:

3d array

epsilon#

The \(\epsilon\) proximity which checks if the two machines survival prediction sre close \(d(s_1,s_2) < \epsilon\).

Type:

float

alpha#

The \(\alpha\) the number of machines which are in consensus of $epsilon$ proximity.

Type:

integer

n_estimators#

The number of machines.

Type:

int , default is 5

indexer#

The IndexCalculator object which is used to calculate the index.

Type:

A IndexCalculator object

train_time#

The time points of dataset \(D_l\).

Type:

1d array

train_event#

A boolean indicator for censoring True represents the happening of an event of dataset \(D_l\).

Type:

1d array

See also

cobsurv.distance_functions

The distance functions used in COBRA A single survival tree.

Notes

The COBRA algorithm is described in the paper [1]. We must know that as we increase \(\epsilon\) the survival curve will become more like the population survival curve.

References

__init__(epsilon=0.12, alpha=4, machines='default', distance_function='euler')#

Methods

__init__([epsilon, alpha, machines, ...])

fit(X, y[, n_quantiles, l_by_n, experimental])

This part does the splitting the dataset in two parts $X_l$ and $X_k$ and $y_l$ and $y_k$, the splitting is done strategically to ensure that censoring and event times are distributed equally in both the splits, Secondly this function train the initial cobsurv on $D_k$ dataset

get_covariate_relevance([X])

Get the covariate relevance of the covariates This function computes the relevance with respect to given covariates, if no covariates are given then it computes the relevance with respect to average of all the in the set $D_l$

get_params([deep])

Get parameters for this estimator.

plot(X, index)

This plot the survival function of the given covariates

predict(X)

Predict the survival function for a given set of covariates

score(X, y[, type])

The score function provides the integrated brier score by default, if type is set to concordance then it provides the concordance index

set_params(**params)

Set the parameters of this estimator.

fit(X, y, n_quantiles=10, l_by_n=0.5, experimental=False)#

This part does the splitting the dataset in two parts $X_l$ and $X_k$ and $y_l$ and $y_k$, the splitting is done strategically to ensure that censoring and event times are distributed equally in both the splits, Secondly this function train the initial cobsurv on $D_k$ dataset

Parameters:
  • X (array-like of shape (n_samples, n_features)) – The input samples.

  • y (structured numpy array of shape (n_samples,)) – The target variable with the first field as the event indicator and the second field as the event time

  • n_quantiles (int, default=10) – The number of quantiles to be used for splitting the data, defult is set to 10

  • l_by_n (float, default=0.5) – The split ratio, defult is set to 0.5

  • experimental (bool, default=False) – If set to True the cobsurv are trained on whole dataset and no splitting will happen , default is False

Returns:

self

Return type:

object

get_covariate_relevance(X=None)#

Get the covariate relevance of the covariates This function computes the relevance with respect to given covariates, if no covariates are given then it computes the relevance with respect to average of all the in the set $D_l$

Parameters:

X (array-like of shape (n_samples, n_features)) – The input samples.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

plot(X, index)#

This plot the survival function of the given covariates

Parameters:
  • X (array-like of shape (n_samples, n_features)) – The input samples.

  • index (integer) – The index of the covariate to be plotted

predict(X)#

Predict the survival function for a given set of covariates

Parameters:

X (array-like of shape (n_samples, n_features)) – The input samples.

score(X, y, type='ibs')#

The score function provides the integrated brier score by default, if type is set to concordance then it provides the concordance index

Parameters:
  • X (array-like of shape (n_samples, n_features)) – The input samples.

  • y (structured numpy array of shape (n_samples,)) – The target variable with the first field as the event indicator and the second field as the event time

  • type (string, default is "ibs") – The type of score to be calculated, it can be “ibs” or “concordance”

Returns:

score – The score of the model

Return type:

float

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance