Package website: release | dev

Probabilistic Supervised Learning for **mlr3**.

**mlr3proba** is a machine learning toolkit for making probabilistic predictions within the **mlr3** ecosystem. It currently supports the following tasks:

- Probabilistic supervised regression - Supervised regression with a predictive distribution as the return type.
- Predictive survival analysis - Survival analysis where individual predictive hazards can be queried. This is equivalent to probabilistic supervised regression with censored observations.
- Unconditional distribution estimation, where the distribution is returned. Sub-cases are density estimation and unconditional survival estimation.

Key features of **mlr3proba** are

- A unified fit/predict model interface to any probabilistic predictive model (frequentist, Bayesian, or other)
- Pipeline/model composition
- Task reduction strategies
- Domain-agnostic evaluation workflows using task specific algorithmic performance measures.

**mlr3proba** makes use of the **distr6** probability distribution interface as its probabilistic predictive return type.

The current **mlr3proba** release focuses on survival analysis, and contains:

- Task frameworks for survival analysis (
`TaskSurv`

) - A comprehensive selection of 17 predictive survival learners
- A comprehensive selection of 21 performance measures for predictive survival learners, with respect to prognostic index (continuous rank) prediction, and probabilistic (distribution) prediction
- PipeOps integrated with
**mlr3pipelines**, for basic pipeline building, and reduction/composition strategies using linear predictors and baseline hazards.

The vision of **mlr3proba** is to provide comprehensive machine learning functionality to the mlr3 ecosystem for continuous probabilistic return types.

The lifecycle of the survival task and features are considered `maturing`

and any major changes are unlikely.

The density and probabilistic supervised regression tasks are currently in the early stages of development. Task frameworks have been drawn up, but may not be stable; learners need to be interfaced, and contributions are very welcome (see issues).

Install the last release from CRAN:

Install the development version from GitHub:

Learners are located either in mlr3proba, the mlr3learners repository, or the mlr3learners organisation. See here for instructions in how to install learners from the mlr3learners organisation.

ID | Learner | Package |
---|---|---|

surv.akritas | Akritas Conditional Non-Parametric Estimator | mlr3learners.proba |

surv.blackboost | Gradient Boosting with Regression Trees | mboost |

surv.coxboost | Cox Model with Likelihood Based Boosting | CoxBoost |

surv.coxph | Cox Proportional Hazards | survival |

surv.coxtime | Non-Linear Cox Neural Network | pycox |

surv.cvcoxboost | Cox Model with Cross-Validation Likelihood Based Boosting | CoxBoost |

surv.cvglmnet | Cross-Validated GLM with Elastic Net Regularization | glmnet |

surv.deephit | Discerete Deep Ranking Neural Network | pycox |

surv.deepsurv | Deep Cox Proportional Hazards | pycox |

surv.dnn | Deep Neural Network with Pseudo Values | mlr3learners.proba |

surv.flexible | Flexible Parametric Spline Models | flexsurv |

surv.gamboost | Gradient Boosting for Additive Models | mboost |

surv.gbm | Generalized Boosting Regression Modeling | gbm |

surv.glmboost | Gradient Boosting with Component-wise Linear Models | mboost |

surv.glmnet | GLM with Elastic Net Regularization | glmnet |

surv.kaplan | Kaplan-Meier Estimator | survival |

surv.loghaz | Logistic Hazard Neural Network | pycox |

surv.mboost | Gradient Boosting for Generalized Additive Models | mboost |

surv.nelson | Nelson-Aalen Estimator | survival |

surv.parametric | Fully Parametric Survival Models | survival |

surv.penalized | L1 and L2 Penalized Estimation in GLMs | penalized |

surv.pchazard | Piecewise Constant Hazard Neural Network | pycox |

surv.randomForestSRC | RandomForestSRC Survival Forest | randomForestSRC |

surv.ranger | Ranger Survival Forest | ranger |

surv.rpart | Rpart Survival Forest | rpart |

surv.svm | Regression, Ranking and Hybrid Support Vector Machines | survivalsvm |

surv.xgboost | Cox Model with Gradient Boosting Trees | xgboost |

ID | Measure | Package |
---|---|---|

surv.calib_alpha | van Houwelingen’s Alpha Calibration | mlr3proba |

surv.calib_beta | van Houwelingen’s Beta Calibration | mlr3proba |

surv.chambless_auc | Chambless and Diao’s AUC | survAUC |

surv.graf | Integrated Graf Score | mlr3proba |

surv.hungAUC | Hung and Chiang’s AUC | survAUC |

surv.intlogloss | Integrated Log Loss | mlr3proba |

surv.logloss | Log Loss | mlr3proba |

surv.nagelk_r2 | Nagelkerke’s R2 | survAUC |

surv.oquigley_r2 | O’Quigley, Xu, and Stare’s R2 | survAUC |

surv.song_auc | Song and Zhou’s AUC | survAUC |

surv.song_tnr | Song and Zhou’s TNR | survAUC |

surv.song_tpr | Song and Zhou’s TPR | survAUC |

surv.uno_auc | Uno’s AUC | survAUC |

surv.uno_tnr | Uno’s TNR | survAUC |

surv.uno_tpr | Uno’s TPR | survAUC |

surv.xu_r2 | Xu and O’Quigley’s R2 | survAUC |

Learners are located either in mlr3proba, the mlr3learners repository, or the mlr3learners organisation. See here for instructions in how to install learners from the mlr3learners organisation.

ID | Learner | Package |
---|---|---|

dens.hist | Univariate Histogram Density Estimator | graphics |

dens.kde | Univariate KDE for Different Kernels | distr6 |

dens.kdeKD | Nonparametric KDE Using Plug-in Method of Polansky and Baker | kerdiest |

dens.kdeKS | Nonparametric Gaussian KDE | ks |

dens.locfit | Nonparametric KDE Using Gaussian kernel | locfit |

dens.logspline | Logspline Method for Density Estimation | logspline |

dens.mixed | KDE Using Li and Racine Bandwidth Specification | np |

dens.nonpar | Nonparametric KDE Using Normal Optimal Smoothing Parameter | sm |

dens.pen | Density Estimation with a Penalized Mixture | pendensity |

dens.plug | Density Estimation with Iterative Plug-in Bandwidth Selection | plugdensity |

dens.spline | Density Estimation Using Smoothing Spline ANOVA | gss |

ID | Measure | Package |
---|---|---|

dens.logloss | Log Loss | mlr3proba |

- Add
`prob`

predict type to`TaskRegr`

, and associated learners/measures - Allow
`MeasureSurv`

to return measures at multiple time-points simultaneously - Continue to add survival measures and learners

**mlr3proba** is a free and open source software project that encourages participation and feedback. If you have any issues, questions, suggestions or feedback, please do not hesitate to open an “issue” about it on the GitHub page!

In case of problems / bugs, it is often helpful if you provide a “minimum working example” that showcases the behaviour (but don’t worry about this if the bug is obvious).

Predecessors to this package are previous instances of survival modelling in **mlr**. The **skpro** package in the python/scikit-learn ecosystem follows a similar interface for probabilistic supervised learning and is an architectural predecessor. Several packages exist which allow probabilistic predictive modelling with a Bayesian model specific general interface, such as **rjags** and **stan**. For implementation of a few survival models and measures, a central package is **survival**. There does not appear to be a package that provides an architectural framework for distribution/density estimation, see **this list** for a review of density estimation packages in R.

Several people contributed to the building of `mlr3proba`

. Firstly, thanks to Michel Lang for writing `mlr3survival`

. Several learners and measures implemented in `mlr3proba`

, as well as the prediction, task, and measure surv objects, were written initially in `mlr3survival`

before being absorbed into `mlr3proba`

. Secondly thanks to Franz Kiraly for major contributions towards the design of the proba-specific parts of the package, including compositors and predict types. Also for mathematical contributions towards the scoring rules implemented in the package. Finally thanks to Bernd Bischl and the rest of the mlr core team for building `mlr3`

and for many conversations about the design of `mlr3proba`

.