Standardizer Examples
Contents
Introduction
These examples demonstrate the use of the
Standardizer program.
The purpose is to show how to run the standardize command and explain
some of its command line options
as well as its configuration.
For a description of reaction mapping, see the Reaction mapping section of the Reactor Manual.
Prerequisites
These examples run the standardize UNIX shell script under UNIX / Linux
or the standardize.bat batch file under Windows.
To run these examples:
- The Java Virtual Machine version 1.4 or higher and JChem have to be installed on your system.
- The
PATH(all systems) and theJCHEMHOME(under Windows) environment variables have to be set as described in the Preparing and Running JChem's Batch Files and Shell Scripts manual. - A command shell (under UNIX / Linux: your favorite shell, under Windows: a
Cygwin shell or a Command Prompt)
has to be run in the standardizer example directory.
In UNIX / Linux:cd jchem/examples/standardizer
In Windows:cd jchem\examples\standardizer
Examples
Example 1.: Set of standardizing operations, configuration 1.
The examples show how to set up a configuration containing several standardization steps.
The input molecule is stored in the file input.mol:
![]() |
Use Standardizer1.xml to standardize the molecule:
standardize -c Standardizer1.xml input.mol
or equally,
you can give the standardizer tasks in an action string
as well as specify the input molecule as a SMILES string on the command line
(this command is wrapped to more lines only for better readability, originally it is a single line):
standardize -c "aromatize..dehydrogenize..[O-:2][N+:1]=O>>[O:2]=[N:1]=O..N=[N:1]#[N:2]>>N=[N+:1]=[N-:2]
..C[N+:1][H:2].[F,Cl,Br,I;-:3]>>C[N:1]..[H:4][N:3][C:1]=[C:2]>>[H:4][C:2][C:1]=[N:3]..[H:4][O:3][C:1]=[C:2]>>[H:4][C:2][C:1]=[O:3]..clean"
"N#N=NCC(\C=C\O)C(CC[NH3+])C([H])([H:1])C1=CC(=C(Cl)C(=C1)N(=O)=O)[N+]([O-])=O.[Cl-]"
Note, that the action string
is a ".."-separated list of standardizer action tasks,
some of them are Standardizer keywords, others are SMARTS reaction strings.
The result is:
[H:1]C(C(CCN)C(CC=O)CN=[N+]=[N-])c1cc(c(Cl)c(c1)N(=O)=O)N(=O)=O
You can also specify the output format and the output file path:
standardize -c Standardizer1.xml -f sdf:-a -o result1.sdf input.molin which case the result molecule is saved in result1.sdf in SDF format.
Note, that we set
sdf:-a as output format in the -f parameter
because our molecule is aromatized due to standardization,
but the SDF format is supposed to store the dearomatized form.You can also use the action string and/or the SMILES input molecule string as above (this command is wrapped to more lines only for better readability, originally it is a single line):
standardize -c "aromatize..dehydrogenize..[O-:2][N+:1]=O>>[O:2]=[N:1]=O.. N=[N:1]#[N:2]>>N=[N+:1]=[N-:2]..C[N+:1][H:2].[F,Cl,Br,I;-:3]>>C[N:1].. [H:4][N:3][C:1]=[C:2]>>[H:4][C:2][C:1]=[N:3]..[H:4][O:3][C:1]=[C:2]>>[H:4][C:2][C:1]=[O:3] ..clean" -f sdf -o result1.sdf input.mol
standardize -c Standardizer1.xml -f sdf:-a -o result1.sdf "N#N=NCC(\C=C\O)C(CC[NH3+])C([H])([H:1])C1=CC(=C(Cl)C(=C1)N(=O)=O)[N+]([O-])=O.[Cl-]"
The result molecule is shown below:
![]() |
To see what happened, look at the subsections of the Actions section the XML configuration file
Standardizer1.xml or the ".."-separated sections of the action string
(this command is wrapped to more lines only for better readability, originally it is a single line):
"aromatize..dehydrogenize..[O-:2][N+:1]=O>>[O:2]=[N:1]=O..N=[N:1]#[N:2]>>N=[N+:1]=[N-:2].. C[N+:1][H:2].[F,Cl,Br,I;-:3]>>C[N:1]..[H:4][N:3][C:1]=[C:2]>>[H:4][C:2][C:1]=[N:3].. [H:4][O:3][C:1]=[C:2]>>[H:4][C:2][C:1]=[O:3]..clean"Each section describes a transformation. These transformations are performed on the input molecule in the order they appear in the configuration file. We show these transformations below:
- aromatization
- dehydrogenization
Observe, that the mapped
H(with atom map1) has not been removed since this simple action removes explicitHatoms that are unmapped, non-isotope, uncharged and non-radical. In our next example we will see a more sophisticated method for implicitizing hydrogens. - standardization of nitro groups:
nitro 
- standardization of azides:
azide 
- standardization of ammoniumhalogenides
(for general method for salt removal, see the next example):
ammoniumhalogenide 
- standardization of enamines:
enamine 
- standardization of enols:
enol 
- clean if needed: this means that the program automatically determines
whether a clean of the result molecule is necessary. A clean is needed
if some of the transformations have added new atoms or if the input molecule is SMILES while
the output format stores atom coordinates (e.g. MOL or SDF).
This clean-if-needed option is recommended for a general usage, you can read
about other options in the
<Clean>section of the Standardizer Configuration.
The graphical user interface of Standardizer offers configuration saving option after creating the list of operations:

Example 2.: Set of standardizing operations, configuration 2.
In this example we use the same input molecule (input.mol) with a different Standardizer configuration. Our XML configuration file is now Standardizer2.xml. This is very similar to the previous configuration file Standardizer1.xml from Example 1. The difference is that we changed two transformations:- Dehydrogenization is performed using the
removeexplicitHaction tag which enables us to remove also mapped and radical hydrogens. For a description on this type of dehydrogenization, see the<RemoveExplicitH>section in the Standardizer Configuration. Note, that this way of dehydrogenization is not available in action strings. - We replaced the ammoniumhalogenide
transformation by a
Removalaction tag, which keeps the largest disconnected fragment of the molecule and removes all others. For a description of this action, see the<Removal>section in the Standardizer Configuration.
Run Standardizer by:
standardize -c Standardizer2.xml input.molThe result is:
[NH3+]CCC(Cc1cc(c(Cl)c(c1)N(=O)=O)N(=O)=O)C(CC=O)CN=[N+]=[N-]
You can also specify the output format and the output file path:
standardize -c Standardizer2.xml -f sdf:-a -o result2.sdf input.molin which case the result molecule is saved to the file result2.sdf (SDF format).
Note, that we set sdf:-a as output format in the -f parameter
because our molecule is aromatized due to standardization,
but the SDF format is supposed to store the dearomatized form.
The result molecule is shown below:
![]() |
Observe, that this time the mapped hydrogen has also been removed, and the positive charge on the ammoniumhalogenide nitrogen has remained unchanged.
The use and meaning of command-line options in the above commands:
| Option | Description | Default |
|---|---|---|
-c |
configuration file/string | - |
-f |
specifies the output format (e.g. 'sdf', 'mol') | 'smiles' |
-o |
specifies the output file path | standard output (console) |
Example 3.: Conversion of alias and pseudo atoms to abbreviated groups
Using alias or pseudo atoms to denote abbreviated groups in chemical structures was frequent in earlier chemical database handling systems. It simplifies the drawing and visualization of the structures but since the most used database operations, for example substructure search can not interpret this annotation, it is necessary to use the full structure instead of alias and pseudo atoms. To overcome problems generated by this issue, Standardizer can transform aliases and pseudos to abbreviated groups.Let's consider this molecule (alias.mrv):

The methyl group was set an alias label of carboxylic group and the hydroxyl
with an alias label of a nitro group. Still, these ligands are taken into account
in their original form: a methyl and a hydroxyl group.
The nomenclature
alias ATOM or pseudo atom may be confusing, after all, we
are handling groups. The 'atom' denotation only refers to the fact that one atom of the structure
is replaced by an entity - not always a functional group.
The pseudo benzoate
is visualized with italic style.
The SMILES format of the molecule is: CC1=CC(*)=CC(O)=C1.
In the Standardizer the
AliasToGroup action replaces all alias and pseudo atoms by the correspondent
abbreviated group provided in the list of abbreviated groups. The abbreviated group may have
only one attachment point. This action will result contracted S-groups.
The first example shows how to convert the atoms with Standardizer. In the Create Configuration panel of Standardizer application add the operation needed:

In the text filed at the bottom you find the description of the operations. The Alias to Group command does not need any further settings.
The conversion can be also run from command line. Run the Standardizer command line application by:
standardize -c aliastogroup alias.mrv -f mrv -o alias_output.mrv
Here the configuration was given by a simple action string.
The result molecule will contain contracted abbreviated groups:

The atom style changed due to the conversion: all groups are in normal font and the bonds are connected to the chemically relevant atom.
The second example shows how to gain converted and expanded or ungrouped abbreviated groups. To have expanded groups in your structures, insert the Expand Group command after the conversion:

Note, that the operations in Standardizer are executed in the given order, so ungrouping or expanding abbreviated groups will not include alias and pseudo atoms before transforming them to groups.
After creating the configuration (i.e. the set of operations you want your molecules to be subject to) you have the possibility to save the configuration in XML format that can be reloaded later or used in command line. In Standardizer command line application the aliastogroup_expand.xml is used to standardize the molecule:
standardize -c aliastogroup_expand.xml alias.mrv -f mrv -o alias_output.mrv
The result after the group converting and expanding will be this:

Example 4.: Template based clean
Template based clean is a special way of cleaning with the help of a predefined template file. The template file contains pre-cleaned sample structures for which the usual clean algorithm fails due to their exceptional spatial arrangements. For example, bicycles, bridged polycycles, crown ethers and cycloalkanes are such structures. Our example template file contains some of these:
![]() |
![]() |
![]() |
Template based clean works in the following way: templates are searched in the target molecule in the order as they are specified in the template file. The first matching is processed: template atom coordinates are copied to the corresponding target atoms and the remaining atoms are cleaned with partial clean.
The corresponding Standardizer task is the
Clean task with
the following attributes: Type="TemplateBased" TemplateFile="clean_templates.sdf" where
clean_templates.sdf is the template file. The template based clean task can also
be specified in the simple action string as "clean:clean_templates.sdf".
Now clean some test molecules with template based clean using the above example template file. The input molecules are stored in clean_test.sdf:
![]() |
Run template based clean in either of the following ways:
standardize -c "aromatize..clean:clean_templates.sdf" clean_test.sdf -f sdf -o clean_test_output.sdf standardize -c StandardizerTBClean.xml clean_test.sdf -f sdf -o clean_test_output.sdf standardize -c StandardizerTBClean.txt clean_test.sdf -f sdf -o clean_test_output.sdf
Pre-aromatization is important in order to recognize the single-or-aromatic bonds of the templates.
Note, that the action string can also be written in a file either as it is or with writing each task in a separate line as in StandardizerTBClean.txt. The usual XML configuration StandardizerTBClean.xml can also be used.
The result file clean_test_output.sdf is shown below:
![]() |
Note, that molecule 4 is not cleaned, while it matches the same template as molecule
3. The reason is that it contains some extra bridges between N atoms - and we accept a
template as matching only if there is no shorter path between template matching atoms in the target
molecule than the corresponding path in the template.
Do you have a question? Would you like to learn more? Please browse among the related topics on our support forum or search the website. If you want to suggest modifications or improvements to our documentation email our support directly!








