Molecular biology, genetics, and protein engineering have been slowly morphing into large-scale, data-driven sciences that can leverage machine learning and applied statistics. My talk will be a quick tour of several projects at this intersection. I will start by explaining some modeling challenges in finding the genetic underpinnings of disease: genome and epigenome-wide associations, wherein individual or sets of (epi)genetic markers are systematically scanned for association with disease are one window into disease processes. Naively, these associations can be found by use of a simple statistical test. However, a wide variety of structure and confounding factors lie hidden in the data, such as cell type heterogeneity and population structure, leading to both spurious and missed associations if not properly addressed. Once we uncover genetic causes, genome editing may one day let us fix the genome in a bespoke manner. I will describe how we developed state-of-the-art machine learning approaches for CRISPR guide design. Finally, I will close by giving a teaser on some of our new work in machine-learning based protein optimization, wherein we seek to find, for example, the promoter/codon sequence which will give us the desired protein expression.
From (Epi)Genetics to Gene Editing to Protein Optimization
Professor, Department of Electrical Engineering and Computer Science
University of California, Berkeley
We strive to make our events accessible and inclusive. The Innovative Genomics Institute Building is ADA accessible, and has a lactation room that visitors can use. For disability accommodation information and requests and/or access to the lactation room, please contact Kristy Nordahl at firstname.lastname@example.org or 510-664-7110.