A Machine Learning Platform to Optimize the Translation of Personalized Network Models to the Clinic
Manuela Salvucci, Arman Rahman, Alexa J. Resler, Girish M. Udupi, Deborah A. McNamara, Elaine W. Kay, Pierre Laurent-Puig, Daniel B. Longley, Patrick G. Johnston, Mark Lawler, Richard Wilson, Manuel Salto-Tellez, Sandra Van Schaeybroeck, Mairin Rafferty, William M. Gallagher, Markus Rehm and Jochen H.M. Prehn.
Abstract
PURPOSE
Dynamic network models predict clinical prognosis and inform therapeutic intervention by elucidating disease-driven aberrations at the systems level. However, the personalization of model predictions requires the profiling of multiple model inputs, which hampers clinical translation.
PATIENTS AND METHODS
We applied APOPTO-CELL, a prognostic model of apoptosis signaling, to showcase the establishment of computational platforms that require a reduced set of inputs. We designed two distinct and complementary pipelines: a probabilistic approach to exploit a consistent subpanel of inputs across the whole cohort (Ensemble) and a machine learning approach to identify a reduced protein set tailored for individual patients (Tree). Development was performed on a virtual cohort of 3,200,000 patients, with inputs estimated from clinically relevant protein profiles. Validation was carried out in an in-house stage III colorectal cancer cohort, with inputs profiled in surgical resections by reverse phase protein array (n = 120) and/or immunohistochemistry (n = 117).
RESULTS
Ensemble and Tree reproduced APOPTO-CELL predictions in the virtual patient cohort with 92% and 99% accuracy while decreasing the number of inputs to a consistent subset of three proteins (40% reduction) or a personalized subset of 2.7 proteins on average (46% reduction), respectively. Ensemble and Tree retained prognostic utility in the in-house colorectal cancer cohort. The association between the Ensemble accuracy and prognostic value (Spearman ρ = 0.43; P = .02) provided a rationale to optimize the input composition for specific clinical settings. Comparison between profiling by reverse phase protein array (gold standard) and immunohistochemistry (clinical routine) revealed that the latter is a suitable technology to quantify model inputs.
CONCLUSION
This study provides a generalizable framework to optimize the development of network-based prognostic assays and, ultimately, to facilitate their integration in the routine clinical workflow.