Over the past few years, one of the most surprising community effects have been playing out on chemicalize.org: building a huge crowdsourced database of interesting chemical structures. How?
Webpage Viewer and Document Viewer, two major services of the free chemicalize.org website save all visited URLs along with the chemical structures found on the page. Perhaps I don’t need to detail how this is done, but it involves things like Document to Structure and being able to understand every chemical format out there.
chemicalize.org now has 15000 unique visitors a month – which is a huge growth compared to spring 2012. These users contribute to the database every day, making sure it’s up-to-date and contains new interests as well. The database today contains 327000 structures that were converted from 545000 names and identifiers coming from 367000 webpages. To understand the value of this database, we put it to a test and submitted it to Pubchem.
The process was easy: sign up for a depositor account, create an SDF with the structures and some data fields (registration ID, substance URL) and after a few tries and some fight with their standardization tools plus curation by a nice fella it’s now in Pubchem’s Substance database. Before any evaluation, we became the 27th biggest depositor among 219. That’s huge!
What’s immediately clear is that 20-25% of our database is brand new to Pubchem. Filtering by Rule of 5, we still have 38000 structures. Filtering to same parent and connectivity, 42000. Pubchem has 35 million structures (or more) from a huge array of sources and this tiny website with it’s community provided a large set of novel structures? Now that’s huge!
Follow these links to start playing with the data:
The exact value of this deposition will take weeks and months to realize, some are already working on it – a hat tip to Chris Southan who also gave us the idea. If you have interesting findings, do share!