We started with GIAB reference genomes, for which there is high-quality ground truth (or the closest approximation currently possible). Using multiple replicates of these genomes, we produced tens of millions of training examples in the form of multi-channel tensors encoding the HTS instrument data, and then trained a TensorFlow-based image classification model to identify the true genome sequence from the experimental data produced by the instruments. Although the resulting deep learning model,
DeepVariant, had no specialized knowledge about genomics or HTS, within a year it had won the the highest
SNP accuracy award at the
precisionFDA Truth Challenge, outperforming state-of-the-art methods. Since then, we've further reduced the error rate by more than 50%.
DeepVariant is being released as open source software to encourage collaboration and to accelerate the use of this technology to solve real world problems. To further this goal, we partnered with
Google Cloud Platform (GCP) to deploy
DeepVariant workflows on GCP, available today, in configurations optimized for low-cost and fast turnarounds using scalable GCP technologies like the
Pipelines API. This paired set of releases provides a smooth ramp for users to explore and evaluate the capabilities of DeepVariant in their current compute environment while providing a scalable, cloud-based solution to satisfy the needs of even the largest genomics datasets.
DeepVariant is the first of what we hope will be many contributions that leverage Google's computing infrastructure and ML expertise to both better understand the genome and to provide deep learning-based genomics tools to the community. This is all part of a broader goal to apply Google technologies to
healthcare and other
scientific applications, and to make the results of these efforts broadly accessible.
What's next?
We are excited about applying DeepVariant to enhance analysis of samples from Verily Clinical studies, such as
Project Baseline.
Learn how you can run DeepVariant by reading the
how-to-guide.
Posted by David Glazer, Director of Engineering, Verily