Leveraging the value of documents content - Plexus for agile dissemination of text/datamining results

Product ChemCurator Product group Markush IP

Most of the information “produced and consumed” (by the industry) is packed into documents either structured, e.g. as tables, or as free text. This heterogeneity and the constant increase in the data makes it very challenging to search for specific, simple or rather complex information. A, nowadays, feasible way to extract and structure this information is using a pipelining tool (e.g. KNIME) to combine several applications in the fields of cheminformatics, OCR, OSR, text-/datamining and semantics: it allows extracting “single-entity” terms, complex linguistic pattern, numerics, units, chemistry, etc.

Ideally, the results of such (complex) workflows should then be exposed to the users. As the process to reach a “satisfying” user-friendliness usually includes several iterative and interactive steps, agile and flexible solutions for search and visualization are demanded, essential to adapt to the various content. In this talk some use cases of how to leverage information from internal and external documents using Plexus, in combination with Instant JChem, will be shown, highlighting possibilities as well as the difficulties encountered.

Download slides in pdf