Compound Registration

…normalize, check, validate and register chemical compounds

ChemAxon’s Compound Registration is a system built on a set of web services, aiding users to register molecular structures into a compound database. The registration process spots unique compounds among a set of structures already contained within the database.
The configurability of the registration’s business rules enables flexible definition of uniqueness in accordance with enterprise level requirements. Incoming compound submissions are processed by these business rules, and fully audited registration data is stored for all unique compounds while minimizing the amount of manual curation.

Product Type:application
Interfaces:GUI ( Web )API ( Java )

Registration as a Quality Service

Process overview

The process of registration is invoked by submitting a compound to the Autoregistration service from an upstream system (e.g. ELN), or via a Browser-based Registration Client that contain form based user interface for individual compounds and a Bulk Loader Client for external SD files.

File support

Registration from an upstream data source may also happen through a Poller Web Service, which actively collects new compounds that are flagged ready to be registered.

Validated structures

Prior to registration, external IDs and any other specified data are validated according to the configured business logic. In addition, compound structures undergo a series of standardization, validation and quality checking steps. Some checkers serve as quality checks (e.g. substructures), whereas others pair with automatic Structure Fixers. Compounds that could not be fixed automatically fall into a Staging Area for manual registration.

Hierarchical Compound storage

Registered compounds are stored in a 3-level hierarchical structure of Parents, Versions and Lots. In this construction a Version structure may describe e.g. isotopic, or salt/solvate variations of a Parent, and a Lot may represent different production batches of a Version.

Salts and solvates are managed in a dedicated dictionary. During registration they can be assigned to the compound with their appropriate multiplicities, and the information will be stored on the Version level.

Hierarchical Compound storage

Various Contexts for Compounds

Register multi-component entities

Besides single compound registration the current system supports three types of multi-component structures: alternates, mixtures and formulations. An alternate represents an option to cover ambiguous analytical results, and only one of the drawn structures reflects the true identity of the compound.

Multi-component entities: alternates, mixtures, formulations

Mixtures arise from production processes (i.e. reactions), where the final product contains more than one compound in significant amount, whereas formulations are the result of creating specific compositions of compounds. The constituents of formulations and mixtures are specified in absolute percentages and percentage ranges, respectively.

Representing salts & solvates

Salt or solvate compound isoforms are displayed in a compact image, which is generated on the fly from the parent structure and the salt/solvate data.

Multiplicities different from 1 are depicted with brackets having their multiplicity as the bracket index. The calculated molecular mass and the dot separated molecular formula include the multiplicity of salt and solvate. The salt and solvate dictionary is centrally administered using the dedicated tab on the user interface.

Correction of failed submissions

Submissions that failed any validation or structure checker steps are collected in the Staging Area, where a chemist or a registrar (depending on the security settings) can manually fix them either one by one or in bulk. Changes made on a submission can be saved, and any previous version can be restored.

Correction of failed submissions

A Unique Management of Your Registration Database

Ensuring uniqueness

The Compound Registration process identifies structural matches between a newly submitted compound and the records in the registration database. The match types include exact structural matches, stereoisomers, tautomeric structures, and their combinations. Due to the hierarchical data structure, isotopic and charge variations are neglected. Special focus lies on proper identification of stereochemistry (unknown, major, etc. as attached data). For compounds with unknown or partially known structure, the Chemically Significant Text (CST) offers an option for match analysis. “Mock registration” enables users to preview the validated molecule along with all possible matching structures in the database.

Amendment of the registry

Authorized users, like registrars can amend existing compounds in the registry. This means changing either the molecular structure, or any related data on different hierarchy levels (e.g. CST, salt/solvate info, or any lot level data). Higher level amendments modify the whole sub-tree on one hand (e.g. a change on Parent level affects all Versions and Lots under). During amendment on lower levels, registration IDs are kept whenever it is possible. All amendments committed to the registry are fully audited, and a complete data history can be displayed for selected entries upon request.

Searching the registration database

Our thin client provides a Search tab, enabling users to query the registration database for specific compound submissions or registered compound groups. If desired, alternative interfaces (e.g. IJC) can be applied for the search. Query criteria include full structure and substructure searches, descriptor field queries (e.g. PCN, CN, CST, molecular weight, submitting user, and date). For numeric fields the regular operators “=”, “”, are supported. All returned results carry a direct link to the amendment page, simplifying the process of database curation.

Architectural Overview

System architecture

The Compound Registration system requires a Java-enabled web server (e.g. Tomcat). The Web Services utilized by the registration system enable simple integration into your existing IT environment. In fully integrated environments external data connections are realized via middle-tier Web Services for data polling from upstream systems and for populating downstream databases.

The registration database is currently deployed as an Oracle or MySQL RDB, utilizing JChem tables and additional relational tables for data capturing. By this measure we ensure, that besides our dedicated web client, also other JChem-enabled front-ends (e.g. Instant JChem), and custom back-end tools can easily interface with the registration database.

Ease of configurability

Large parts of the internal business logic are configurable to facilitate easy adoption to different corporate business rules. For instance the formats of Corporate Registration IDs are freely definable, and may vary at any of the hierarchy levels. Configuration files are stored on server side in XML format. The registration service currently utilizes an action-driven security model based on user roles that determine specific operations a user may perform.


Support & FAQ

Articles in the library

ChemAxon US User Meeting, San Francisco, April 10-11, 2017

Apr 10, 2017 - Report
  Introduction US ChemAxon users gathered at the Sheraton Fisherman’s Wharf, San Francisco for the annual meeting. A hands-on workshop and biomolecule forum took place in the mor…

Implementing an Integrated Compound Registration System in the Cloud

Sep 14, 2016 - Presentation
Compound Registration is a crucial component of small molecule research application platforms. The system needs to follow complex workflows in order to validate and standardize chemical s…

Biomolecules at ChemAxon

Apr 11, 2017 - Presentation
ChemAxon’s Biomolecule Toolkit – now extended with an editor, called BioEddie – is a web service-based toolkit to bridge the gap between biology and chemistry for complex biomol…

Still have questions?

Have a look on our support forum or drop us a line