Normalize, check, validate and register chemical compounds
Compound Registration is a system built upon a set of web services, aiding users in decision making, with respects to the uniqueness of new small molecules - And in comparison to those already stored in the database. The decisions are made according to the configurable corporate business logic. The system includes a database (the registry) designed to store the relevant structural and accompanying information. The compounds go through normalization, validation and quality check steps before registration, and in case any issues can't be fixed automatically, they fall into a staging area. Upon successful registration, the compound can be stored in a three-level hierarchy.
Starting the registration process
New compounds are easy to add either one-by-one through the web-based registration form; or in bulk (a compound library or a set of structures) through the upload page. The system is capable to handle single and multi-component structures with all the validated custom data fields. Multi-component entities are composed of two or more already independently registered components. Currently alternates, mixtures, formulations and polymers are supported.
The registration system is also available from other applications. The publicly available web service API makes integration (typically into an ELN) straightforward, but the service is also possible from KNIME and Pipeline Pilot workflow tools.
Validated, unique structures
Compound structures go through a series of standardization, validation and quality checking steps (e.g.: valence error, auto-fixes for representation issues) before registration. External IDs and other specified data are also validated based on the configured business logic. Compounds, that don't pass the validation phase or simply can't be fixed automatically, fall into the Staging Area for manual revision.
After validation the newly submitted chemical structures are compared with the records in the registry. Match types include exact matches, stereoisomers, tautomeric structures and their combinations. The system puts a special focus on identifying and representing stereochemistry properly. Chemically Significant Text (CST) fills the gap for compounds where the structure is unknown or partially known.
Hierarchical compound storage
Registered compounds are stored in a 3-level hierarchical structure of parents, versions and lots (or batches). In this construction a version may describe e.g. an isotopic or salt/solvate variations of a parent, while a lot may represent different production batches of a version.
Salts and solvates are managed in a dedicated dictionary. They can easily be assigned to the compound along with the appropriate multiplicities during the registration, or extracted from the submitted chemical structure.
Amendment of registered entities
Authorized users can amend existing compounds in the registry. The system allows users to change the molecular structure or alter any related data on different hierarchy levels. All amendments committed to the registry are fully audited, and a complete data history can be displayed for selected entries upon request.
Deployment, configuration and integration
The Compound Registration system requires a Tomcat container. The publicly available REST API enables simple integration into an existing IT environment. The registration database currently supports Oracle or MySQL RDBMS, utilizing JChem tables and additional relational tables for capturing the data. In order to enhance the integration capability with downstream databases and processes, Compound Registration offers a publishing engine to mirror its content in flexible formats (downstream RDBMS, xml, sdf).
Large parts of the internal business logic are configurable to facilitate easy adoption to different corporate business rules. The system offers a web interface for all the exposed configuration options. It also provides all the typical authentication modes and a combination of role-based access control and permission handling based on project memberships.