Home Page of Eric M Roberts, MD PhD, Biostatistician
Projects
Select Projects (Verge Genomics)
Machine Learning for Prediction of Gene Targets
Led research in supervised machine learning classifiers for the prediction of gene targets for drug discovery efforts
Authored Python code stack managing complex API queries across company platform for the assembly of training and annotation data for user-selected disease indications, performance comparisons of a suite of classifiers using nested cross-validation, and downstream analysis of feature weights to inform future data generation efforts
Skills: Generalized Linear Models · Support Vector Machine (SVM) · Random Forest · XGBoost · Python (Programming Language)
Statistical Analytic Support for Bench Scientists
Conducted study design, power analysis, and results reporting for in vitro and in vivo studies using outcomes including rodent biomarker and behavioral endpoints, histological changes, cell survival, cellular morphological differentiation, gene expression, puncta and stress granule formation, and immunofluorescence in collaboration with bench scientists across the company
Skills: Statistical Modeling · R · Biostatistics · Technical Presentations
Co-expression Preservation
Formulated metrics for gauging the replicability of gene expression correlations across cohorts and between human, animal, and in vitro models based on the probability distributions of correlation matrices; metrics and associated R code were incorporated into standard company research protocols
Skills: Probability Theory · R · Statistical Data Analysis · Genomics
Joint Gene Set Analysis
Formulated a novel expansion of Gene Set Analysis (GSA; Efron and Tibshirani (2007), Ann. Appl. Stat. 1(1):107-129) to quantify the consistency of set-wise gene dysregulation across experiments and between human, animal, and in vitro models; Python code for this and custom implementation of the efficiency-enhanced versions of original GSA calculations was incorporated into the company analytic pipeline
Formulated novel unsupervised machine learning approach for the detection of clusters of co-expressed genes, similar in spirit to Langfelder Horvath (2008, BMC Bioinformatics 9:559) but based on probability theoretic reasoning for increased validity of output
R code was incorporated into the company research pipeline and adopted for standard analytic protocols
Skills: Probability Theory · R · Machine Learning · Cluster Analysis · Transcriptomics
We use cookies to improve your browsing experience when you visit this site. By continuing to browse, you accept the use of cookies or similar technologies.