SBS Meeting report shows strong growth of Pubchem database #SBS10
I’m blogging this week from the 2010 Society for Biomolecular Science meeting with my colleague Mary Canady. I’ll be covering the scientific sessions and sharing interesting developments in drug discovery and screening technologies. You can also follow the #SBS10 hashtag on twitter for updates.
On Tuesday, Steve Bryant from NIH gave a report on the increasing utility of the NIH Pubchem database for pharma drug screening programs. The Pubchem database, named after Pubmed, the go-to resource for life science research abstracts, is an open repository for structure and activity information about molecules which have drug potential. The database is being developed under the Genbank model, wherein researchers are encouraged to upload the results of screening runs so that this information can be linked to in publications and accessible to others.
As a resource, Pubchem has seen strong adoption by researchers in industry and academia. To give a quick snapshot, the number of contributing organizations have grown 5-fold over the past 5 years. 60000 users submit data daily, with the number unique substances now numbering around 70 million. For each of these compounds, information on bioactivity is also being collected. Over 90 million activities are associated with these compounds, and the rate of increase of this bioassay data is on the steep part of the exponential growth curve, meaning that number I just wrote is already wrong. While the bioassay information currently requires direct upload, there are plans to derive additional data from published literature.
Following Steve’s report, Josh Bittker gave a brief summary of how the Broad Institute is using Pubchem. They have a reporting mandate as part of a grant and are developing a pipeline for automatically submitting machine readable assay results to Pubchem. As part of this automatic reporting, there’s an automatic embargo on the data for 1 year.
Simone Graeber then gave an update on an EU effort along the lines of Pubchem. They go a little further than just a database, developing their own library of 0.5 million compounds, with a 17000 compound subset derived from this covering much of the “activity space” of the larger set and more usable by smaller groups without the resources to screen the whole larger library. You may be wondering, as someone in the audience did, why they’re developing their own database instead of submitting to Pubchem. Apparently they included some proprietary compounds in their library and there are legal issues complicating the assay result reporting.