CV

Summary

I’m a passionate and experienced computational biologist specialized in biomarker analysis for clinical trial development, biomarker and clinical data standardization, harmonization, ML/AI algorithm development for prediction of various clinical endpoints and interactive app development for analysis and visualization. I’m skilled in bioinformatics, statistical analysis, immuno-oncology, machine learning and deep learning and have successfully applied my skills in a cross-functional industry team.

Education

Ph.D in Bioinformatics & Master in Statistics, University of Michigan, 2012-2016
- Dissertation title: Demographic and population separation history inference based on whole genome sequences
- Ph.D thesis committee: Jeffrey Kidd, Jun Li, Hyun Min Kang, Kerby A. Shedden, Patricia Wittkopp, Sebastian K. Zollner
B.S in Life Science, Peking University, 2008-2012

Experience

Sr Scientist Computational Biology, Dept of Clinical Biomarker and Diagnostic in Precision Medicine, Amgen Inc, Apr 2020 - Present
- Lead a team of 3 scientists (serve as people manager and individual contributor)and 2 functional service providers to build biomarker platform which includes 1) building automatic data ingestion pipeline to standardize and harmonize biomarker and clinical data, 2)building automatic analysis pipeline to find associated analytes regards to various clinical endpoints and perform predictive analytics/modeling 3) building an interactive web application for analysis and visualization
- Perform biomarker analysis in support of immuno-oncology clinical programs, especially BiTE pipelines
Genomics Data Scientist (Senior Staff), Ancestry.com, Aug 2016 - Apr 2020
- Project lead on ancestry inference algorithm and reference panel development
- Provide scientific support to product, UX and engineering teams for DNA origin story, DNA circle and Thrulines product lines
- Wrote two white papers, co-authored one journal publication and filed 4 non-provisional patents as lead inventor
- Intern manager of Aaron Stern from UC Berkeley in summer 2017
Graduate Student, Research Assistant, University of Michigan, 2012 - 2016
- Developed a computational pipeline to analyze and reconstruct global haplotypes with high accuracy using next generation sequencing data from pools of fosmid clones as part of 1000 Genomes Project Phase III
- Inferred population split times and migration rates using reconstructed haplotypes from different populations by combining Pairwise Sequential Markovian Coalescent model with Approximate Bayesian Computation
- Analyzed over 100 contemporary village/breed dogs and ancient dog whole genome sequences using population genetics methods (PCA, ADMIXTURE, f3/f4, G-PhoCS) to understand the evolutionary and demographic history of dogs since the primary wolf divergence
Computational Biology Intern, Ancestry.com, 2015 Summer
- Used Hadoop framework to process genotype data from 770,000 individuals, calculated genetic differentiation (Fst) among genetic communities, and performed hierarchal clustering based on Fst distance
- Identified haplotype clusters enriched in genetic communities (inferred from IBD network), applied PCA to identify ancestry of enriched haplotypes, drew haplotype network and explored the biological meaning for enriched haplotypes

Skills

Languages/Computing
- Python, R, C++, Bash, Perl
- AWS, SLURM/HPC, Hadoop/EMR, Spark
- Scikit-learn, Keras; Snakemake
Statistical analysis: statistical inference, hidden Markov models, machine learning, deep learning, network analysis
NGS analysis: read alignment (BWA, samtools), variant calling (GATK), CNV calling (Delly, Pindel), denovo assembly (velvet, SOAPdenovo), statistical phasing (ShapeIT, Eagle, beagle), RNA-seq analysis(tophat)
Population genetics tools: simulation tools (ms, macs), MEGA, PCAdmix, RFMix, Admixture, AdmixTools, PSMC, MSMC, G-PhoCS, ABCtoolbox

Certificates:

Completion of deeplearning.ai Deep Learning Specialization

Publications: google scholar

Wang, Yong, Shiya Song, Joshua G. Schraiber, Alisa Sedghifar, Jake K. Byrnes, David A. Turissini, Eurie L. Hong, Catherine A. Ball, and Keith Noto. Ancestry inference using reference labeled clusters of haplotypes. BMC bioinformatics 22, no. 1 (2021): 1-14.
Hateley, Shannon, Angelica Lopez-Izquierdo, Chuanchau J. Jou, Scott Cho, Joshua G. Schraiber, Shiya Song, Colin T. Maguire et al. “The history and geographic distribution of a KCNQ1 atrial fibrillation risk allele.” Nature communications 12, no. 1 (2021): 1-10.
Yu, He, Shiya Song, Jiazi Liu, Sheng Li, Lu Zhang, Dajun Wang, and Shu-Jin Luo. Effects of the Qinghai- Tibet Railway on the Landscape Genetics of the Endangered Przewalskis Gazelle (Procapra przewalskii). Scientific reports 7, no. 1 (2017): 17983.
Laura R. Botigue, Shiya Song, Amelie Scheu, Shyamalika Gopalan, Amanda L. Pendleton, Matthew Oetjens, Angela Taravella, Timo Seregely, Andrea Zeeb-Lanz, Rose- Marie Arbogast, Dean Bobo, Kevin Daly, Martina Unterlander, Joachim Burger, Jeffrey M. Kidd, Krishna R. Veeramah. (2017). Ancient European dog genomes reveal continuity since the Early Neolithic. Nature Communications, 8. (authors contributed equally)
Eunjung Han, Peter Carbonetto, Ross E. Curtis, Yong Wang, Julie M. Granka, Jake Byrnes, Keith Noto, Amir R. Kermany, Natalie M. Myres, Mathew J. Barber, Kristin Rand, Shiya Song, Theodore Roman, Erin Battat, Kenneth G. Chahine, Catherine A. Ball. (2017). Clustering of 770,000 genomes reveals post-colonial population structure of North America. Nature Communications, 8, 14238.
Shiya Song, Elzbieta Sliwerskas, Sarah Emery, Jeffrey M. Kidd. Modeling human population separation history using physically phased genomes. Genetics (2016): genetics-116
1000 Genomes Project Consortium. (2015). A global reference for human genetic variation. Nature, 526(7571), 68-74. (Contribute Supplemental 6.3 Section)
Kimberly F. McManus, Joanna L. Kelley, Shiya Song, Krishna Veeramah, August E. Woerner, Laurie S. Stevison, Oliver A. Ryder, Great Ape Genome Diversity Consortium, Jeffrey M. Kidd, Jeff Wall, Carlos D. Bustamante, and Michael Hammer. (2015). Inference of Gorilla Demographic and Selective History from Whole-Genome Sequence Data. Molecular biology and evolution, 32(3), 600-612. (authors contributed equally)

Selected Presentations

Poster presentation, American Society of Human Genetics annual meeting, San Diego, October 16-20, 2018, ”High throughput local ancestry inference reveals fine-scale population history”, Shiya Song, Ancestry.com DNA, LLC
Poster presentation, American Society of Human Genetics annual meeting, San Diego, October 16-20, 2018, ”Population genetics of North American immigrant communities”, Shiya Song, Ancestry.com DNA, LLC
Oral presentation, Bay area population genetics, Santa Cruz, April 21, 2018, ”Detailed characterization of demographic history in the United States”, Shiya Song, Ancestry.com DNA, LLC
Poster presentation, American Society of Human Genetics annual meeting, Orlando, October 17-21, 2017, ”Studying global variation of gene flow using geo-referenced genetic data”, Shiya Song, Ancestry.com DNA, LLC
Oral presentation, Society for Molecular Biology and Evolution annual meeting, Vienna, July 12-16, 2015, ”Exploring population separation history using physically phased genomes”, Shiya Song, Elzbieta Sliw- erska, Sarah Emery, Jeffrey M.Kidd • Oral presentation, Midwest Population Genetics Conference, Chicago, July 19, 2014, ”Population Split- time Estimation and X-to-autosome Effective Population Size Differences Inferred from Physically Phased Genomes”, Shiya Song, Elzbieta Sliwerska, Jeffrey M.Kidd