Development of a Means to Chemically Search ELN and Other Documents Using the ChemAxon Document-to-Database Program

September 2013
This talk will describe the development of a means to chemically search the DuPont Electronic Laboratory Notebook (ELN) database using the new ChemAxon Document-to-Database program. The goal was to explore the possibility of better leveraging prior work contained in a document repository that had no built-in chemical intelligence other than names and chemical structures embedded in, e.g., Microsoft Word documents. Issues and decisions encountered along the way that will be discussed include what to extract (chemical names, “live” structures, structure images, document metadata), what tool to use, where to store the extracted structures, user search interface to the extracted information, security on the extracted structures and document metadata, and problematic structures (e.g., functional group aliases like “NTs” that were not true abbreviated structural groups).