Structure canonicalization and more
Chemical compounds can appear in various forms depending on the source and even on the habits of the chemist creating the representations. These differences affect not only the graphical appearance of the molecules but can influence more fundamental details of the topology. Various resonance structures, tautomers, salts and solvents might appear in the representations, making compound identification even more problematic. Standardizer is ChemAxon’s solution to transform chemical structures into customized, canonical representations to achieve best reliability with chemical databases.
Use standardization actions to get uniform structure representations
Robust searching with consistent representations
Certain patterns in chemical structures can occur in various forms that, depending on the search conditions, can impair structure based searching. A typical example is the nitro group that can be present in its charged or neutral form. Standardizer’s main purpose is to transform chemical structures into representations that obey certain chemical business rules to avoid such inconsistencies in a chemical database.
Create canonical structures
The uniformization of structures can change their molecular graph. These modifications are the most invasive and have the most influence on the search process. Modifications may include among others the addition or removal of explicit hydrogen atoms or the neutralization of charged fragments and functional groups. Representations of functional groups commonly used in old databases (e.g. aliases) can also be recognized and converted by Standardizer. Besides the graph modifications, removal of certain fragments, e.g. water and salt counterions is also possible with standardization actions.
Unify graphical representations
Besides topological modifications the graphical representation of compounds is also essential for everyday research. Aligned orientation of compounds, clean structures or uniform relative arrangements of fragments can help chemists to browse and recognize compounds. Standardization actions such as 2D cleaning and expanding abbreviated groups make the structures easier to read for the user. Unifying the orientation of the compounds by template based cleaning makes chemical libraries transparent and clean.
An easy way to fully customized databases
Create custom transformations
Due to an advanced transformation engine under the hood, custom standardization actions can be defined, that can perform any sort of transformation on compounds. These can be for instance removal or replacement of atoms, functional groups or patterns in the structures. Whereas predefined standardization actions have limited customization options, this feature gives almost unlimited freedom in defining canonicalization rules.
Identification of duplicates on registration as well as a consistent representation of the compounds is essential in corporate databases, therefore the standardization process is a key component of most registration systems. While storing all registered compounds in a compact, canonicalized form, the original input compounds can be kept as well restoring all input information coming from the chemist. This way any modifications in the standardization configuration can be applied on the original input structures.
Standardizer is available as a standalone, Java based application. It is platform independent and can be used via a wizard-like graphical user interface or through the batch mode. As all other ChemAxon applications, Standardizer also has a full featured Application Programming Interface (API) in Java and in .NET, making this solution integratable into in-house or third-party applications. Workflow management tools, like KNIME and PipelinePilot also integrate the Standardizer engine.