Standardizer User's Guide
Version 5.8.2
Contents
Further reading
Introduction
Standardizer brings molecules to a standardized form: the same molecule can have different tautomer and mesomer representations, chemists can draw an aromatic ring in aromatized form or by using alternating single and double bonds, draw hydrogen atoms explicitly or use only implicit hydrogens. However, when identifying a molecule or performing a substructure search, we need a common form to work with. For example, if we need those molecules from a database which have a pyrrole ring, we want to find the molecules either containing aromatized or non-aromatized pyrrole rings. We cannot predict whether an unknown input database or SDF file stores molecules in aromatized or non-aromatized forms.
There is a brute force approach for getting the expected search result, which performs two search operations: one with aromatized and another with a non-aromatized pyrrole ring as query and then merge the results. This algorithm doubles the search execution time for this simple pyrrole example and increases it exponentially with the number of possible molecule forms. For example, the query structure below has a two alterable functional groups (the pyrrole part and the enol part) and this accounts to four cases to be considered in the search:
![]() |
![]() |
We have chosen a much more efficient way to deal with this problem: first bring the molecules to standardized forms and then perform the search. We define the standard form for each variable group (i.e. oxo form for oxo-enol tautomers) and drawing modes (i.e. aromatic ring for aromatic-alternating bonds). This definition is specified in a configuration file. Standardizer performs the necessary transformations on the molecule in the order they are listed in the configuration file. The attentive reader may have noticed that these transformations also require search operations: we have to find the functional groups or aromatizable rings. However, with thoughtful planning these operations must be performed only once, then the molecules can be stored in standardized form and any operation that requires standardization (e.g. substructure search, reaction processing) can then be performed on these standardized molecules.
Standardization and search of standardized molecules in JChem databases can be performed as described here.
Standardizer provides some additional actions to refine the molecules by setting the stereo flag or performing coordinate cleaning. See an example below for the template based cleaning of a bridged skeleton.
| compound to clean | template | cleaned result |
| COC(=O)C1C(CC2CCC1N2C)CC(=O)C3=CC=CC=C3 | ![]() |
![]() |
A set of simple examples and working examples are also available.
Application (GUI)
Standardizer GUI is an easy-to-use, high-end graphical user interface for the Standardizer tool of ChemAxon. This GUI allows you to reach all the functionalities of the Standardizer without the need of using the command-line with parameters, or editing configuration files by hand.
The Standardizer GUI will provide you a friendly way to bring your molecules to a standardized form with guide informations for each task you may encounter.
Usage
standardize
Alternatively, on Win32, Unix or Mac / Java 2 (assuming that JChem is installed with creating shortcuts to Desktop or to the Start Menu):
by double-clicking the appropriate icon
Input and Output
Most molecular file formats are accepted ( Marvin Documents (MRV), MDL molfile, SDfile, RXNfile, RDfile, SMILES, etc.).
Input files can be added by browsing them from the file system. Selecting more files will result in a concatenated output file. There are no restrictions in file format, input files can differ from each other, as well as from output. This allows concatenating molecule files and/or export to another format by leaving standardization rules empty.
- Working with multiple input files.
Configuration
Standardizer GUI also provides an interface to manipulate configuration files. The embedded editor simplifies the creation of new configurations, or modification of existing ones. Building a configuration from the list of available commands, specifying the order of execution, setting custom parameters is fast and easy (download demo configuration files).
Custom transformations
You can add customized transformations to the standardization list.Resetting default search options: remove the transform from the standardizer list and add it again.
Note: search option "exact" has changed, the equivalent search option
is "Full". String for this option: SearchOptions="sep=, t:f"
- Building configuration for Standardizer - detailed information is available for commands
- Building configuration for Standardizer - graphical interface for command settings
- Standardization in progress
- Customizable view for Standardization results
The command-line tool
standardize [<input files>] -c <config file> [<options>]
Prepare the usage of the standardize script or batch file
as described in Preparing the Usage of JChem
Batch Files and Shell Scripts.
Alternatively, the Standardizer class can be directly invoked:
Win32 / Java 2 (assuming that JChem is installed in c:\jchem):
java -cp "c:\jchem\lib\jchem.jar;%CLASSPATH%" \
chemaxon.reaction.Standardizer [<input files>] \
-c <config file> [<options>]
Unix / Java 2 (assuming that JChem is installed in /usr/local/jchem):
java -cp "/usr/local/jchem/lib/jchem.jar:$CLASSPATH" \
chemaxon.reaction.Standardizer [<input files>] \
-c <config file> [<options>]
Options
General options:
-h, --help this help message
-g, --ignore-error continue with next molecule on error
--empty-mol-on-error write an empty molecule, and continue
with next molecule on error
--unstandardized-mol-on-error write the original molecule, and continue
Input options:
-c, --config configuration XML file
or action string,
actions separated by "..",
valid actions are:
- reaction SMARTS
- "aromatize" (Daylight, general)
- "aromatize:b" (ChemAxon, basic)
- "aromatize:l" (loose)
- "dearomatize"
- "addexplicitH" ("hydrogenize")
(converts implicit H-s to explicit)
- "removeexplicitH[:lonely:isotope:
charged:radical:mapped:wedged]"
("dehydrogenize")
(converts explicit H-s to implicit,
except for lonely, isotope, charged,
radical, mapped and wedged H-s;
if some of these are specified in a
':'-separated list, then the given
H types are also converted)
- "clearisotopes"
(converts isotopes to non-isotopic form)
- "neutralize" (neutralize molecule)
- "clean" (partial clean in 2D)
- "clean:full" (full clean in 2D)
- "clean:"
(template based clean in 2D)
- "clean:3" (clean in 3D)
- "aliastogroup" (converts pseudo
and alias atoms to groups)
- "aliastoatom" (converts pseudo and
alias atoms to normal atoms)
- "keepone" (largest atom count)
- "keepone:mass" (largest mass)
- "removergroupdefinitions"
(remove R-group definitions)
- "removeatomvalues"
(remove atom values)
- "removeattacheddata"
(remove attached data)
- "sgroups:contract" (contract Sgroups)
- "sgroups:expand" (expand Sgroups)
- "sgroups:ungroup" (ungroup Sgroups)
- "creategroup" (creates groups from
abbreviated group definitions)
- "clearstereo" (chirality, double bond)
- "clearstereo:chirality" (chirality)
- "clearstereo:doublebond" (double bond)
- "clearstereo:singleupordownbond"
(single up or down bond)
- "absolutestereo:clear"
(clear absolute stereo flag)
- "absolutestereo:set"
(set absolute stereo flag)
- "converttoenhancedstereo:abs"
(converts to enhanced stereo, unlabeled
stereo atoms go into a new "abs" group)
- "converttoenhancedstereo:and"
(converts to enhanced stereo, unlabeled
stereo atoms go into a new "and" group)
- "converttoenhancedstereo:or"
(converts to enhanced stereo, unlabeled
stereo atoms go into a new "or" group)
- "wedgeclean"
(rearranges stereo wedges according to
the IUPAC recommendations)
- "convertwedgeinterpretation"
(converts each wedge between two stereo
centers into two wedges)
- "convertdoublebonds:wiggly"
(converts double bonds with unspecified
CIS/TRANS stereo information to wiggly
representation)
- "convertdoublebonds:crossed"
(converts double bonds with unspecified
CIS/TRANS stereo information to crossed
representation)
- "removestereocarebox"
(remove stereo search markers from
double bonds)
- "tautomerize"
(take canonical tautomer form)
- "mesomerize"
(take canonical mesomer form)
- "mapreaction"
(add atom maps to reaction)
- "unmap"
(remove atom maps)
Output options:
-f, --format <format> output file format (default: smiles)
-o, --output <filepath> output file (default: standard output)
-e, --export-fields-to-smiles export property fields to SMILES
-v, --verbose verbose output with time results
-l, --log <level> sets the log level
levels: [severe|warning|info|off]
--logfile <filepath> log file (default: standard error)
Examples:
standardize in.sdf -c "keepone..aromatize..[O-][N+]=O>>O=N=O"
standardize in.sdf -c "aromatize..clean:templates.sdf"
standardize in.sdf -c Standardizer.xml -f sdf -o o.sdf
The command line parameter --config is mandatory. This
specifies the path and filename of a configuration file or else it is the
simple action string,
without which the program cannot operate. A detailed description of the format of this
configuration file is given below.
By default, the program exits in case of molecule import/export errors. If the command line parameter -g or --ignore-error is specified, then errors
will not stop the process. The error is written to the console, the molecule is discarded from the structure file (the resulting file will contain less molecules than the input file).
With option --empty-mol-on-error the structure is changed for an empty molecule. The molecule is presented in the original form when using the option --unstandardized-mol-on-error. Both of these settings result in a file containing the same number of structures as the input file.
Input
Most molecular file formats are accepted ( Marvin Documents (MRV), MDL molfile, SDfile, RXNfile, RDfile, SMILES, etc.).
The input is either specified in input file(s), or else in input string(s), usually in SMILES format.
If neither the input file name(s) nor the input string(s) are specified in the command line then the standard input is read.
Output
Standardizer writes output molecules in the format specified by the --format
option (the default format is "smiles"). If the --output is omitted, results are
written to the standard output.
If the command line parameter --export-fields-to-smiles
is specified, then the property fields (SDF fields) of the molecules will be exported even if the output format
is SMILES, SMARTS, ChemAxon Extended SMILES or ChemAxon Extended SMARTS. In case of other formats the property
fields are always exported, this option has no effects.
Usage examples
- A UNIX command that reads molecular structures from the
mols.sdffile and writes the standardized molecules to the standard output in smiles format:standardize -c Standardizer.xml mols.sdf - A UNIX command that reads molecules given as SMILES strings from file
nci10000.smileslocated in the./test/pharmacophoredirectory and writes results in the file namednci10000.sdfto be created in the same directory:standardize -c Standardizer.xml nci10000.smiles -f sdf -o nci10000.sdf - The same with transformation check and verbose output, then displaying
the result in MarvinView:
standardize -c Standardizer.xml -e -v nci100.smiles -f sdf -o nci100.sdf mview nci100.sdf - Processing an SD file and displaying the standardized molecules using MarvinView:
standardize -c Standardizer.xml med100.sdf | mview -Note that such piping does not work in Windows.
- Standardization with action string:
standardize -c "aromatize..[O-:2][N+:1]=O>>[O:2]=[N:1]=O" med100.sdf -o med100.smiles - Standardization with action string, taking input molecules as SMILES strings:
standardize -c "aromatize..[O-:2][N+:1]=O>>[O:2]=[N:1]=O" \ "[O-][N+](=O)C1=CC=CC=C1" "[H]C1=C(C=C(C=C1)[N+]([O-])=O)[N+]([O-])=O" - Processing tasks belonging to no groups or to task group "target":
standardize -c Standardizer.xml -u target targets.sdf -f sdf -o output.sdf
Do you have a question? Would you like to learn more? Please browse among the related topics on our support forum or search the website. If you want to suggest modifications or improvements to our documentation email our support directly!




