…normalize, check, validate and register chemical compounds
ChemAxon’s Compound Registration is a system built on a set of web services, aiding users to register molecular structures into a compound database. The registration process spots unique compounds among a set of structures already contained within the database.
The configurability of the registration’s business rules enables flexible definition of uniqueness in accordance with enterprise level requirements. Incoming compound submissions are processed by these business rules, and fully audited registration data is stored for all unique compounds while minimizing the amount of manual curation.
Registration as a Quality Service
The process of registration is invoked by submitting a compound to the Autoregistration service from an upstream system (e.g. ELN), or via a Browser-based Registration Client that contain form based user interface for individual compounds and a Bulk Loader Client for external SD files.
Registration from an upstream data source may also happen through a Poller Web Service, which actively collects new compounds that are flagged ready to be registered.
Prior to registration, external IDs and any other specified data are validated according to the configured business logic. In addition, compound structures undergo a series of standardization, validation and quality checking steps. Some checkers serve as quality checks (e.g. substructures), whereas others pair with automatic Structure Fixers. Compounds that could not be fixed automatically fall into a Staging Area for manual registration.
Hierarchical Compound storage
Registered compounds are stored in a 3-level hierarchical structure of Parents, Versions and Lots. In this construction a Version structure may describe e.g. isotopic, or salt/solvate variations of a Parent, and a Lot may represent different production batches of a Version.
Various Contexts for Compounds
Register multi-component entities
Besides single compound registration the current system supports three types of multi-component structures: alternates, mixtures and formulations. An alternate represents an option to cover ambiguous analytical results, and only one of the drawn structures reflects the true identity of the compound.
Mixtures arise from production processes (i.e. reactions), where the final product contains more than one compound in significant amount, whereas formulations are the result of creating specific compositions of compounds. The constituents of formulations and mixtures are specified in absolute percentages and percentage ranges, respectively.
Representing salts & solvates
Salt or solvate compound isoforms are displayed in a compact image, which is generated on the fly from the parent structure and the salt/solvate data.
Multiplicities different from 1 are depicted with brackets having their multiplicity as the bracket index. The calculated molecular mass and the dot separated molecular formula include the multiplicity of salt and solvate. The salt and solvate dictionary is centrally administered using the dedicated tab on the user interface.
Correction of failed submissions
Submissions that failed any validation or structure checker steps are collected in the Staging Area, where a chemist or a registrar (depending on the security settings) can manually fix them either one by one or in bulk. Changes made on a submission can be saved, and any previous version can be restored.
A Unique Management of Your Registration Database
The Compound Registration process identifies structural matches between a newly submitted compound and the records in the registration database. The match types include exact structural matches, stereoisomers, tautomeric structures, and their combinations. Due to the hierarchical data structure, isotopic and charge variations are neglected. Special focus lies on proper identification of stereochemistry (unknown, major, etc. as attached data). For compounds with unknown or partially known structure, the Chemically Significant Text (CST) offers an option for match analysis. “Mock registration” enables users to preview the validated molecule along with all possible matching structures in the database.
Amendment of the registry
Authorized users, like registrars can amend existing compounds in the registry. This means changing either the molecular structure, or any related data on different hierarchy levels (e.g. CST, salt/solvate info, or any lot level data). Higher level amendments modify the whole sub-tree on one hand (e.g. a change on Parent level affects all Versions and Lots under). During amendment on lower levels, registration IDs are kept whenever it is possible. All amendments committed to the registry are fully audited, and a complete data history can be displayed for selected entries upon request.
Searching the registration database
Our thin client provides a Search tab, enabling users to query the registration database for specific compound submissions or registered compound groups. If desired, alternative interfaces (e.g. IJC) can be applied for the search. Query criteria include full structure and substructure searches, descriptor field queries (e.g. PCN, CN, CST, molecular weight, submitting user, and date). For numeric fields the regular operators “=”, “”, are supported. All returned results carry a direct link to the amendment page, simplifying the process of database curation.
The Compound Registration system requires a Java-enabled web server (e.g. Tomcat). The Web Services utilized by the registration system enable simple integration into your existing IT environment. In fully integrated environments external data connections are realized via middle-tier Web Services for data polling from upstream systems and for populating downstream databases.
The registration database is currently deployed as an Oracle or MySQL RDB, utilizing JChem tables and additional relational tables for data capturing. By this measure we ensure, that besides our dedicated web client, also other JChem-enabled front-ends (e.g. Instant JChem), and custom back-end tools can easily interface with the registration database.
Ease of configurability
Large parts of the internal business logic are configurable to facilitate easy adoption to different corporate business rules. For instance the formats of Corporate Registration IDs are freely definable, and may vary at any of the hierarchy levels. Configuration files are stored on server side in XML format. The registration service currently utilizes an action-driven security model based on user roles that determine specific operations a user may perform.