Day One at #ASHG14: Big Data and Crowdsourcing
The American Society of Human Genetics (ASHG) 2014 annual meeting kicked off in full force today at the San Diego Convention Center. The meeting brings together over 7000 people including basic scientists, physicians, ethicists, and other professionals working in the field of human genetics. Given the large number of interesting concurrent sessions, as well as over thousands of posters covering everything from genetic technologies, specific applications of genetics in various disease states to ethics and genetic counseling, it is impossible for a single blogger to cover all the amazing science! What I will try to do is cover few of the more interesting sessions, as well as some posters, and exhibitors with brief summaries. I would strongly suggest that you follow the #ASHG14 tag on Twitter to get the latest updates on various topics.
The field of human genetics has been revolutionized in the last few years by the falling cost of genome sequencing. The first human genome sequencing was a $3 billion effort involving 200 scientists over 11 years. The latest sequencing machine launched by Illumina earlier this year, HiSeq TenX, can sequence human genomes for as low as $1000 within a week. One of the consequence of inexpensive sequencing has been the tsunami of data that need to be analyzed into clinically relevant information and outcomes.
Therefore it was quite appropriate that one of the first session in the morning was focused on the big data problem. The Distinguished Speaker’s Symposium, titled ‘Separating Signal from Noise’, consisted of Ajay Royyuru of IBM, David Glazer of Google, and Muin Khoury of CDC, reflecting the cross-disciplinary effort required to tackle these problems. Ajay Royyuru showed how IBM’s Watson can go beyond answering questions on Jeopardy, and interpret huge chunks of genomic data. The machine learning ability of Watson was being used in solving the ‘cognitive overload’ problem faced in oncology. These analytical abilities are currently available to partners (e.g. the partnership with the New York Genome Center), but beta and commercial release is supposed to be by 2015 and 2016. The next speaker, David Glazer’s spoke of how genomics was becoming a data problem with ‘N of million activities’ and how Google’s tools like Dremel (that can work with trillions of rows of data) might solve the problem. Glazer showed examples of codes that have been used to analyze large data sets (including data deposited in the 1000 genome project) within hours. While this is a good progress, he pointed out that the ‘world of biology is not tidy biology, there’s no black and white answers’, and further progress could only be made through standardization of the data analysis. The Global Alliance for Genomics and Health is trying to move in that direction through standardizations and benchmarking of tools. Mouni Khoury further elaborated on the ‘squishiness’ of bology, talking about the need for strong epidemiological foundations, and knowledge integration of genomic data analysis so as to have both clinical validity and utility. He highlighted some failures of big data applications such as Google’s vast overestimation of the flu outbreak in the 2013 season, and the often repeated errors of confusing correlation with causation. He also noted that less than 1-2% of current genomic research is applied beyond bed side right now.
The session on Crowdsourced Genetics featured the most moving talk of the day by the husband and wife duo of Eric Vallabh Minikel, and Sonia Vallabh, an unusual tag-team combination, on their efforts in using crowdfunding research on prion diseases. More unusually, neither Sonia nor Eric were originally trained as scientists. Sonia was working as a lawyer, and Eric as a data analyst. In 2010 Sonia’s mother passed away suddenly after suffering from a rare neurodegenerative disorder Fatal Familial Insomnia (FFI) associated with a prion protein. Testing showed that Sonia carried the mutant D178N (cis- 129M) allele of PRNP gene. This lead the couple to embark on a journey of educating themselves on the disease via night school, embarking on graduate school in biomedicine, raise awareness, and start a crowd-sourced effort to fund research into prion disease! Follow their story and the blog at cureffi.com.
Other highlights of the session included a talk by Yanich Elrich on using applying bioinformatics tools on publicly available genomics data. They were able to extract lifespan information on millions of data points from the pedigree database geni.com. Result was a very cool video showing births of people through history, echoing the pattern of human migration in the modern age. Such ‘social media’ data can eventually be expanded to look at phenotypes other than age information. The Elrich lab also announced the launch of a genetics crowdsourcing platform, dna.land. The platform allows users to upload their 23andMe data to the website – this will eventually allow researchers to correlate more phenotypes. Currently the site is only in the alpha stage, and users are encouraged to upload data and provide feedback to iron out issues. Atul Butte meanwhile spoke of using not just analyzing publicly available genomic data to get to a clinical trial stage, but also using marketplace services such as Assay Depot to reduce cost of drug discovery. He pointed out how the $4-12Billion cost of drug discovery was unsustainable. Butte ended his talk with a vision of how Silicon Valley could spur a new generation of ‘garage biotechnologists’ performing preliminary research on a cure for cancer, on a credit card.
(A brief primer on this session is also available here.)
Both the sessions highlighted one common issue – the need to have data open and sharable among scientists. Seems like it happens more readily in the field of genomics than other sciences. Additionally, I noticed that two of the speakers, Atul Butte and Dr Khoury have uploaded their talks online.
I will update with more thoughts on the various companies exhibiting at the conference tomorrow, especially those from the San Diego area. In the meantime, continue to follow me at @omespeak, or even better, the #ASHG14 hashtag.