Home Page of Eric M Roberts, MD PhD, Biostatistician

Projects

Select Projects (Verge Genomics)

Led research in supervised machine learning classifiers for the prediction of gene targets for drug discovery efforts
Authored Python code stack managing complex API queries across company platform for the assembly of training and annotation data for user-selected disease indications, performance comparisons of a suite of classifiers using nested cross-validation, and downstream analysis of feature weights to inform future data generation efforts
Skills: Generalized Linear Models · Support Vector Machine (SVM) · Random Forest · XGBoost · Python (Programming Language)

Conducted study design, power analysis, and results reporting for in vitro and in vivo studies using outcomes including rodent biomarker and behavioral endpoints, histological changes, cell survival, cellular morphological differentiation, gene expression, puncta and stress granule formation, and immunofluorescence in collaboration with bench scientists across the company
Skills: Statistical Modeling · R · Biostatistics · Technical Presentations

Formulated metrics for gauging the replicability of gene expression correlations across cohorts and between human, animal, and in vitro models based on the probability distributions of correlation matrices; metrics and associated R code were incorporated into standard company research protocols
Skills: Probability Theory · R · Statistical Data Analysis · Genomics

Formulated a novel expansion of Gene Set Analysis (GSA; Efron and Tibshirani (2007), Ann. Appl. Stat. 1(1):107-129) to quantify the consistency of set-wise gene dysregulation across experiments and between human, animal, and in vitro models; Python code for this and custom implementation of the efficiency-enhanced versions of original GSA calculations was incorporated into the company analytic pipeline
Skills: Python (Programming Language) · Statistical Analysis · Genomics

Formulated novel unsupervised machine learning approach for the detection of clusters of co-expressed genes, similar in spirit to Langfelder Horvath (2008, BMC Bioinformatics 9:559) but based on probability theoretic reasoning for increased validity of output
R code was incorporated into the company research pipeline and adopted for standard analytic protocols
Skills: Probability Theory · R · Machine Learning · Cluster Analysis · Transcriptomics

CV / Resume as of March 24, 2024