These examples demonstrate the use of the
Standardizer program.
The purpose is to show how to run the standardize command and explain
some of its command line options
as well as its configuration.
For a description of reaction mapping, see the Reaction mapping section of the Reactor Manual.
These examples run the standardize UNIX shell script under UNIX / Linux
or the standardize.bat batch file under Windows.
To run these examples:
PATH (all systems) and the JCHEMHOME (under Windows)
environment variables have to be set as described in the
Preparing and Running JChem's Batch Files and
Shell Scripts manual.
cd jchem/examples/standardizerIn Windows:
cd jchem\examples\standardizer
These examples demonstrate standardization with two different configurations. The input molecule is stored in input.mol:
![]() |
standardize -c Standardizer1.xml input.molor else instead of writing an XML configuration file and specifying the input molecule file, you can give the standardizer tasks in an action string as well as specify the input molecule as a SMILES string on the command line:
standardize -c "aromatize..dehydrogenize..[O-:2][N+:1]=O>>[O:2]=[N:1]=O..N=[N:1]#[N:2]>>N=[N+:1]=[N-:2]..C[N+:1][H:2].[F,Cl,Br,I;-:3]>>C[N:1]..[H:4][N:3][C:1]=[C:2]>>[H:4][C:2][C:1]=[N:3]..[H:4][O:3][C:1]=[C:2]>>[H:4][C:2][C:1]=[O:3]..clean" "N#N=NCC(\C=C\O)C(CC[NH3+])C([H])([H:1])C1=CC(=C(Cl)C(=C1)N(=O)=O)[N+]([O-])=O.[Cl-]"Note, that the action string is a ".."-separated list of standardizer action tasks, some of them are Standardizer keywords, others are SMARTS reaction strings. The result is:
[H:1]C(C(CCN)C(CC=O)CN=[N+]=[N-])c1cc(c(Cl)c(c1)N(=O)=O)N(=O)=OYou can also specify the output format and the output file path:
standardize -c Standardizer1.xml -f sdf:-a -o result1.sdf input.molin which case the result molecule is saved in result1.sdf in SDF format.
Note, that we set sdf:-a as output format in the -f parameter
because our molecule is aromatized due to standardization,
but the SDF format is supposed to store the dearomatized form.
You can also use the action string and/or the SMILES input molecule string as above:
standardize -c "aromatize..dehydrogenize..[O-:2][N+:1]=O>>[O:2]=[N:1]=O..N=[N:1]#[N:2]>>N=[N+:1]=[N-:2]..C[N+:1][H:2].[F,Cl,Br,I;-:3]>>C[N:1]..[H:4][N:3][C:1]=[C:2]>>[H:4][C:2][C:1]=[N:3]..[H:4][O:3][C:1]=[C:2]>>[H:4][C:2][C:1]=[O:3]..clean" -f sdf -o result1.sdf input.mol
standardize -c Standardizer1.xml -f sdf:-a -o result1.sdf "N#N=NCC(\C=C\O)C(CC[NH3+])C([H])([H:1])C1=CC(=C(Cl)C(=C1)N(=O)=O)[N+]([O-])=O.[Cl-]"
The result molecule is shown below:
![]() |
To see what happened, look at the subsections of the Actions section the XML configuration file
Standardizer1.xml or the ".."-separated sections of the action string:
"aromatize..dehydrogenize..[O-:2][N+:1]=O>>[O:2]=[N:1]=O..N=[N:1]#[N:2]>>N=[N+:1]=[N-:2]..C[N+:1][H:2].[F,Cl,Br,I;-:3]>>C[N:1]..[H:4][N:3][C:1]=[C:2]>>[H:4][C:2][C:1]=[N:3]..[H:4][O:3][C:1]=[C:2]>>[H:4][C:2][C:1]=[O:3]..clean"Each section describes a transformation. These transformations are preformed on the input molecule in the order they appear in the configuration file. We show these transformations below:
Observe, that the mapped H (with atom map 1)
has not been removed since this simple action removes explicit H
atoms that are unmapped, non-isotope, uncharged and non-radical.
In our next example we will see a more sophisticated
method for implicitizing hydrogens.
![]() |
![]() |
![]() |
In our next example we will see a general method for removing salt components.
![]() |
![]() |
<Clean> section
of the Standardizer Configuration.
removeexplicitH action tag
which enables us to remove also mapped and radical hydrogens. For a description on this type of
dehydrogenization, see the
<RemoveExplicitH> section in the
Standardizer Configuration.
Note, that this way of dehydrogenization is not available in
action strings.
Removal action tag, which keeps the largest disconnected
fragment of the molecule and removes all others. For a description of this action, see the
<Removal> section in the
Standardizer Configuration.
Run Standardizer by:
standardize -c Standardizer2.xml input.molThe result is:
[NH3+]CCC(Cc1cc(c(Cl)c(c1)N(=O)=O)N(=O)=O)C(CC=O)CN=[N+]=[N-]You can also specify the output format and the output file path:
standardize -c Standardizer2.xml -f sdf:-a -o result2.sdf input.molin which case the result molecule is saved in result2.sdf in SDF format.
Note, that we set sdf:-a as output format in the -f parameter
because our molecule is aromatized due to standardization,
but the SDF format is supposed to store the dearomatized form.
The result molecule is shown below:
![]() |
Observe, that this time the mapped hydrogen has also been removed, and the positive charge on the ammoniumhalogenide nitrogen has remained unchanged.
The use and meaning of command-line options in the above commands:
| Option | Description | Default |
|---|---|---|
-c |
configuration file/string | - |
-f |
specifies the output format (e.g. 'sdf', 'mol') | 'smiles' |
-o |
specifies the output file path | standard output (console) |
Template based clean is a special way of cleaning with the help of a predefined template file. The template file contains pre-cleaned sample structures for which the usual celan algorithm fails due to their exceptional spatial arrangements. For example, bicycles, bridged polycycles crown ethers and cycloalkanes are such structures. Our example template file contains some of these:
![]() |
![]() |
![]() |
Template based clean works in the following way: templates are searched in the target molecule in the order as they are specified in the template file. The first matching is processed: template atom coordinates are copied to the corresponding target atoms and the remaining atoms are cleaned with partial clean.
The corresponding Standardizer task is the
Clean task with
the following attributes: Type="TemplateBased" TemplateFile="clean_templates.sdf" where
clean_templates.sdf is the template file. The template based clean task can also
be specified in the simple action string as "clean:clean_templates.sdf".
Now clean some test molecules with template based clean using the above example template file. The input molecules are stored in clean_test.sdf:
![]() |
Run template based clean in either of the following ways:
standardize -c "aromatize..clean:clean_templates.sdf" clean_test.sdf -f sdf -o clean_test_output.sdf standardize -c StandardizerTBClean.xml clean_test.sdf -f sdf -o clean_test_output.sdf standardize -c StandardizerTBClean.txt clean_test.sdf -f sdf -o clean_test_output.sdf
Pre-aromatization is important in order to recognize the single-or-aromatic bonds of the templates.
Note, that the action string can also be written in a file either as it is or with writing each task in a separate line as in StandardizerTBClean.txt. The usual XML configuration StandardizerTBClean.xml can also be used.
The result file clean_test_output.sdf is shown below:
![]() |
Note, that molecule 4 is not cleaned, while it matches the same template as molecule
3. The reason is that it contains some extra bridges between N atoms - and we accept a
template as matching only if there is no shorter path between template matching atoms in the target
molecule than the corresponding path in the template.
Task groups are useful if you want to use the same configuration for different purposes. Depending in your target, you may want to select certain tasks to be executed while skipping others. A typical example is when a bit different standardization is required for targets and queries in substructure search. For example, you may want to remove explicit hydrogens from the target while leaving them in the query. In this case you may add the "RemoveExplicitH" task to the "target" group and skip this task when standardizing the query by setting the active group to "query".
Task groups can be specified in the Groups XML attribute or between curly braces
in the simple action string. Take StandardizerGroups.xml in this example, which defines four groups: "target", "query", "g1", "g2". The last task, the enol transformation has no Groups attributes, which means that it is always executed (belongs to no specific groups). The corresponding action string is
{query,target}aromatize..{target}removeexplicith..{g1,g2}[*+:1]-[*-:2]>>[*:1]=[*:2]..{g1}[*+:1]=[*-:2]>>[*:1]#[*:2]..[O:3][C:1]=[C:2]>>[C:2][C:1]=[O:3]
or StandardizerGroups.txt in file form.
Take group_test.mol as input molecule:
![]() |
Run Standardizer with setting the active group to "query" in either of the following ways:
standardize -c "{query,target}aromatize..{target}removeexplicith..{g1,g2}[*+:1]-[*-:2]>>[*:1]=[*:2]..{g1}[*+:1]=[*-:2]>>[*:1]#[*:2]..[O:3][C:1]=[C:2]>>[C:2][C:1]=[O:3]" group_test.mol -u query
standardize -c StandardizerGroups.xml group_test.mol -u query
standardize -c StandardizerGroups.txt group_test.mol -u query
The result is:
![]() |
Only the first and the last tasks (aromatization and enol transformation) have been processed - the first one belongs to the "query" group, the last one has no groups specified and therefore it is always executed.
You can specify more active groups at the same time. For example, setting both "query" and "g1" will execute all but the second task (explicit H removal):
standardize -c "{query,target}aromatize..{target}removeexplicith..{g1,g2}[*+:1]-[*-:2]>>[*:1]=[*:2]..{g1}[*+:1]=[*-:2]>>[*:1]#[*:2]..[O:3][C:1]=[C:2]>>[C:2][C:1]=[O:3]" group_test.mol -u query,g1
standardize -c StandardizerGroups.xml group_test.mol -u query,g1
standardize -c StandardizerGroups.txt group_test.mol -u query,g1
The result is:
![]() |
Note, that you get the same if you only set the group of the second task to "target", do not set groups for any others and run Standardizer with active group "query":
standardize -c "aromatize..{target}removeexplicith..[*+:1]-[*-:2]>>[*:1]=[*:2]..[*+:1]=[*-:2]>>[*:1]#[*:2]..[O:3][C:1]=[C:2]>>[C:2][C:1]=[O:3]" group_test.mol -u query,g1