Benchmarking ChemAxon’s Name-to-Structure batch tool on patent text
SureChem is a provider of chemistry patent search products based on a database of 12 million chemical structures extracted from the full text of EP, US and WO patent documents as well as Japan patent and MEDLINE abstracts. The database is generated by annotating chemical names in text and then using a series of third-party name-to-structure generation tools to convert the names into valid chemical structures. SureChem began benchmarking of ChemAxon’s name-to-structure tool in April 2009 and added the tool to its production pipeline in early 2010. Here we present performance statistics spanning that period which point to a major improvement in the precision and recall of ChemAxon’s tool, increasing its importance to SureChem’s data generation process.