Industry Talk: Tales from the Frontier of Genomics

22 Mar 2019 2:10 PM - 3:00 PM
Presenter/Speaker: Dr Sean Irvine, Real Time Genomics
Location: G.1.15

In this talk, Sean will give some background on past projects he has been involved with, leading up to his current work in DNA sequence analysis at Real Time Genomics.  The main part of the discussion will concern the challenges in processing human DNA sequence data.  In particular, the talk will step through the computational steps involved in taking the output of a genomic sequencing machine through to generating outputs suitable for clinical interpretation.  These steps include the mapping of sequence data to a reference human genome, the identification of germline and somatic variants in a sample, and clinical interpretation.  Current problems in genomics including evaluation and bench-marking of results, dealing with pedigrees, and the detection of copy number and structural variants will be mentioned.  Lessons learned and success stories from dealing with data from real customers will be presented.  Time permitting, a brief introduction to metagenomic analysis and strain detection will be given.  The talk will concentrate on computational aspects of dealing with large datasets and accurately determining variants in individual samples including the ranking of variants using a machine learning approach.  There will be some discussion of the best practice software engineering techniques used by Real Time Genomics.  In particular, the continuous integration environment, testing framework, and Jumble system for measuring code and test quality in Java will be shown.  Brief mentions of projects involving entity extraction and text classification, leading to SureChEMBL system for identification of chemicals in patent documents.  Only a minimal knowledge of genomics will be assumed.

Sean is a senior researcher and developer at Real Time Genomics where he is involved with the development of algorithms and processing strategies for genomic sequence data.  In association with the US National Institute of Standards and Technology (NIST), Sean has been involved with the development of high-confidence human reference calls sets.  He completed his PhD ``Data Compression and Cryptology'' in 1997 at the University of Waikato and previously worked as a spy.  His other interests include computational number theory, classical cryptography, and genealogy.  Sean is a regular contributor to and an editor of the On-Line Encyclopedia of Integer Sequences.

