Commonly referred to as predictive modeling, the use of machine learning and statistical methods to improve healthcare outcomes has recently gained traction in biomedical informatics research. Given the vast opportunities enabled by large Electronic Health Records (EHR) data and powerful resources for conducting predictive modeling, Qian and Masino argue that it is yet crucial to first carefully examine the prediction task and then choose predictive methods accordingly. Specifically, they purport that there are at least three distinct prediction tasks that are often conflated in biomedical research: 1) data imputation, where a model fills in the missing values in a dataset, 2) future forecasting, where a model projects the development of a medical condition for a known patient based on existing observations, and 3) new-patient generalization, where a model transfers the knowledge learned from previously observed patients to newly encountered ones. Importantly, the latter two tasks-future forecasting and new-patient generalizations-tend to be more difficult than data imputation as they require predictions to be made on potentially out-of-sample data (i.e., data following a different predictable pattern from what has been learned by the model). Using hearing loss progression as an example, they investigate three regression models and show that the modeling of latent clusters is a robust method for addressing the more challenging prediction scenarios. Overall, findings suggest that there exist significant differences between various kinds of prediction tasks and that it is important to evaluate the merits of a predictive model relative to the specific purpose of a prediction task.

The article "Latent Patient Cluster Discovery for Robust Future Forecasting and New-Patient Generalization" authored by DBHi's Ting Qian, PhD and Aaron Masino, PhD was published in the September 16, 2016 issue of PLoS One.

Read more.