Document to Structure Code Examples

Document to Structure is a toolkit for extracting chemical structures out of text, HTML and PDF documents. Currently, it recognizes names, SMILES, and InChI. Its API class is chemaxon.naming.DocumentExtractor. Below is a list of real life use-cases and code examples that showcase the various ways to use it:

  1. Finding structures in text:
    Uses DocumentExtractor's processPlainText() method to process a string.
  2. Finding structures in a live webpage:
    Downloads a live webpage and processes it using DocumentExtractor's processHTML() method.
  3. Finding structures in a PDF document:
    Creates a DocumentExtractor instance that reads the text from the PDF document.
  4. Highlighting recognized structures in a webpage:
    Finds the recognized names in the HTML code and wraps them with a special element for highlighting.
  5. Saving results in SDF or MRV file:
    Saves the results and related information into a multi-molecule file for use in chemical editors.
  6. Storing results in a JChem structure table:
    Sets up a database connection and stores the hits in a chemical structure database for searching.
  7. Increasing processing speed by multithreading:
    Uses multithreading and breaks HTML pages into fragments.

Do you have a question? Would you like to learn more? Please browse among the related topics on our support forum or search the website. If you want to suggest modifications or improvements to our documentation email our support directly!