Standardizer Examples

Contents

 

Introduction

These examples demonstrate the use of the Standardizer program. The purpose is to show how to run the standardize command and explain some of its command line options as well as its configuration.

For a description of reaction mapping, see the "Reaction mapping" section of the Reactor Manual.

 

Prerequisites

These examples run the standardize UNIX shell script under UNIX / Linux or the standardize.bat batch file under Windows.

To run these examples:

  1. The Java Virtual Machine version 1.4 or higher and JChem have to be installed on your system.

  2. The PATH environment variable has to be set as described in the Preparing and Running JChem's Batch Files and Shell Scripts manual.

  3. A command shell (under UNIX / Linux: your favorite shell, under Windows: a Cygwin shell or a Command Prompt) has to be run in the standardizer example directory.
    In UNIX / Linux:
    cd jchem/examples/standardizer
    
    In Windows:
    cd jchem\examples\standardizer
    
 

Examples

Example 1.: Set of standardizing operations, configuration 1.

The examples show how to set up a configuration containing several standardization steps.

The input molecule is stored in the file input.mol:

input molecule

Use Standardizer1.xml to standardize the molecule:
    standardize -c Standardizer1.xml input.mol
or equally, you can give the standardizer tasks in an action string as well as specify the input molecule as a SMILES string on the command line (this command is wrapped to more lines only for better readability, originally it is a single line):
    standardize -c "aromatize..dehydrogenize..[O-:2][N+:1]=O>>[O:2]=[N:1]=O..N=[N:1]#[N:2]>>N=[N+:1]=[N-:2]
    ..C[N+:1][H:2].[F,Cl,Br,I;-:3]>>C[N:1]..[H:4][N:3][C:1]=[C:2]>>[H:4][C:2][C:1]=[N:3]..[H:4][O:3][C:1]=[C:2]>>[H:4][C:2][C:1]=[O:3]..clean"
    "N#N=NCC(\C=C\O)C(CC[NH3+])C([H])([H:1])C1=CC(=C(Cl)C(=C1)N(=O)=O)[N+]([O-])=O.[Cl-]"
Note, that the configuration set by action strings is a ".."-separated list of standardizer action tasks, some of them are Standardizer keywords, others are SMARTS reaction strings.

The result is:

    [H:1]C(C(CCN)C(CC=O)CN=[N+]=[N-])c1cc(c(Cl)c(c1)N(=O)=O)N(=O)=O
  

You can also specify the output format and the output file path:
  standardize -c Standardizer1.xml -f sdf:-a -o result1.sdf input.mol
in which case the result molecule is saved in result1.sdf in SDF format.
Note, that we set sdf:-a as output format in the -f parameter because our molecule is aromatized due to standardization, but the SDF format is supposed to store the dearomatized form.

You can also use the action string and/or the SMILES input molecule string as above (this command is wrapped to more lines only for better readability, originally it is a single line):

  standardize -c "aromatize..dehydrogenize..[O-:2][N+:1]=O>>[O:2]=[N:1]=O..
  N=[N:1]#[N:2]>>N=[N+:1]=[N-:2]..C[N+:1][H:2].[F,Cl,Br,I;-:3]>>C[N:1]..
  [H:4][N:3][C:1]=[C:2]>>[H:4][C:2][C:1]=[N:3]..[H:4][O:3][C:1]=[C:2]>>[H:4][C:2][C:1]=[O:3]
  ..clean" -f sdf -o result1.sdf input.mol
  standardize -c Standardizer1.xml -f sdf:-a -o result1.sdf 
  "N#N=NCC(\C=C\O)C(CC[NH3+])C([H])([H:1])C1=CC(=C(Cl)C(=C1)N(=O)=O)[N+]([O-])=O.[Cl-]"

The result molecule is shown below:

result molecule

To see what happened, look at the actions in the XML configuration file Standardizer1.xml or the ".."-separated sections of the actionstring (this command is wrapped to more lines only for better readability, originally it is a single line):

  "aromatize..dehydrogenize..[O-:2][N+:1]=O>>[O:2]=[N:1]=O..N=[N:1]#[N:2]>>N=[N+:1]=[N-:2]..
  C[N+:1][H:2].[F,Cl,Br,I;-:3]>>C[N:1]..[H:4][N:3][C:1]=[C:2]>>[H:4][C:2][C:1]=[N:3]..
  [H:4][O:3][C:1]=[C:2]>>[H:4][C:2][C:1]=[O:3]..clean"
Each section describes a transformation. These transformations are performed on the input molecule in the order they appear in the configuration file. We show these transformations below:
  1. aromatization
  2. dehydrogenization

    Note, that the mapped H (with atom map 1) has not been removed since this simple action removes explicit H atoms that are unmapped, non-isotope, uncharged and non-radical. In our next example we will see a more sophisticated method for implicitizing hydrogens.

  3. standardization of nitro groups:
    nitro

  4. standardization of azides:
    azide

  5. standardization of ammoniumhalogenides (for general method for salt removal, see the next example):
    ammoniumhalogenide

  6. standardization of enamines:
    enamine

  7. standardization of enols:
    enol

  8. clean if needed: this means that the program automatically determines whether a clean of the result molecule is necessary. A clean is needed if some of the transformations have added new atoms or if the input molecule is SMILES while the output format stores atom coordinates (e.g. MOL or SDF). This clean-if-needed option is recommended for a general usage, you can read about other options in the Clean section of the Available Standardizer Actions.

The graphical user interface of Standardizer Editor offers configuration saving option after creating the list of operations:

Example 2.: Set of standardizing operations, configuration 2.

In this example we use the same input molecule (input.mol) with a different Standardizer configuration. Our XML configuration file is now Standardizer2.xml. This is very similar to the previous configuration file Standardizer1.xml from Example 1. The difference is that we changed two transformations:

  1. Dehydrogenization is performed using the removeexplicitH action tag which enables us to remove also mapped and radical hydrogens. For a description on this type of dehydrogenization, see the <RemoveExplicitH> section in the Standardizer Configuration. Note, that this way of dehydrogenization is not available in action strings.

  2. We replaced the ammoniumhalogenide transformation by a Removal action tag, which keeps the largest disconnected fragment of the molecule and removes all others. For a description of this action, see the <Removal> section in the Standardizer Configuration.

Run Standardizer by:

  standardize -c Standardizer2.xml input.mol
The result is:
  [NH3+]CCC(Cc1cc(c(Cl)c(c1)N(=O)=O)N(=O)=O)C(CC=O)CN=[N+]=[N-]

You can also specify the output format and the output file path:
  standardize -c Standardizer2.xml -f sdf:-a -o result2.sdf input.mol
in which case the result molecule is saved to the file result2.sdf (SDF format).

Note, that we set sdf:-a as output format in the -f parameter because our molecule is aromatized due to standardization, but the SDF format is supposed to store the dearomatized form.


The result molecule is shown below:

result molecule

Observe, that this time the mapped hydrogen has also been removed, and the positive charge on the ammoniumhalogenide nitrogen has remained unchanged.

The use and meaning of command-line options in the above commands:

OptionDescriptionDefault
-c configuration file/string -
-f specifies the output format (e.g. 'sdf', 'mol') 'smiles'
-o specifies the output file path standard output (console)

Example 3.: Conversion of alias and pseudo atoms to abbreviated groups

Using alias or pseudo atoms to denote abbreviated groups in chemical structures was frequent in earlier chemical database handling systems. It simplifies the drawing and visualization of the structures but since the most used database operations, for example substructure search can not interpret this annotation, it is necessary to use the full structure instead of alias and pseudo atoms. To overcome problems generated by this issue, Standardizer can transform aliases and pseudos to abbreviated groups.

Let's consider this molecule (alias.mrv):

The methyl group was set an alias label of carboxylic group and the hydroxyl with an alias label of a nitro group. Still, these ligands are taken into account in their original form: a methyl and a hydroxyl group.
The nomenclature alias ATOM or pseudo atom may be confusing, after all, we are handling groups. The 'atom' denotation only refers to the fact that one atom of the structure is replaced by an entity - not always a functional group.
The pseudo benzoate is visualized with italic style.
The SMILES format of the molecule is: CC1=CC(*)=CC(O)=C1.

In the Standardizer the AliasToGroup action replaces all alias and pseudo atoms by the correspondent abbreviated group provided in the list of abbreviated groups. The abbreviated group may have only one attachment point. This action will result contracted S-groups.

The first example shows how to convert the atoms with Standardizer. In the Create Configuration panel of Standardizer application add the operation needed:

gui_config

In the text filed at the bottom you find the description of the operations. The Alias to Group command does not need any further settings.

The conversion can be also run from command line. Run the Standardizer command line application by:

  standardize -c aliastogroup alias.mrv -f mrv -o alias_output.mrv

Here the configuration was given by a simple action string.

The result molecule will contain contracted abbreviated groups:

The atom style changed due to the conversion: all groups are in normal font and the bonds are connected to the chemically relevant atom.

The second example shows how to gain converted and expanded or ungrouped abbreviated groups. To have expanded groups in your structures, insert the Expand Group command after the conversion:

gui_config

Note, that the operations in Standardizer are executed in the given order, so ungrouping or expanding abbreviated groups will not include alias and pseudo atoms before transforming them to groups.

After creating the configuration (i.e. the set of operations you want your molecules to be subject to) you have the possibility to save the configuration in XML format that can be reloaded later or used in command line. In Standardizer command line application the aliastogroup_expand.xml is used to standardize the molecule:

  standardize -c aliastogroup_expand.xml alias.mrv -f mrv -o alias_output.mrv
        

The result after the group converting and expanding will be this:

Example 4.: Template based clean

Template based clean is a special way of cleaning with the help of a predefined template file. The template file contains pre-cleaned sample structures for which the usual clean algorithm fails due to their exceptional spatial arrangements. For example, bicycles, bridged polycycles, crown ethers and cycloalkanes are such structures. Our example template file contains some of these:

clean_templates.sdf

Template based clean works in the following way: templates are searched in the target molecule in the order as they are specified in the template file. The first matching is processed: template atom coordinates are copied to the corresponding target atoms and the remaining atoms are cleaned with partial clean.

The corresponding Standardizer task is the Clean task with the following attributes: Type="TemplateBased" TemplateFile="clean_templates.sdf" where clean_templates.sdf is the template file. The template based clean task can also be specified in the simple action string as: clean:clean_templates.sdf.

Now clean some test molecules with template based clean using the above example template file. The input molecules are stored in clean_test.sdf:

clean_test.sdf

Run template based clean in either of the following ways:

  standardize -c "aromatize..clean:clean_templates.sdf" clean_test.sdf -f sdf -o clean_test_output.sdf
  standardize -c StandardizerTBClean.xml clean_test.sdf -f sdf -o clean_test_output.sdf
  standardize -c StandardizerTBClean.txt clean_test.sdf -f sdf -o clean_test_output.sdf

Pre-aromatization is important in order to recognize the single-or-aromatic bonds of the templates.

Note, that the action string can also be written in a file either as it is or with writing each task in a separate line as in StandardizerTBClean.txt. The usual XML configuration StandardizerTBClean.xml can also be used.

The result file clean_test_output.sdf is shown below:

clean_test_output.sdf

Note, that molecule 4 is not cleaned, while it matches the same template as molecule 3. The reason is that it contains some extra bridges between N atoms - and we accept a template as matching only if there is no shorter path between template matching atoms in the target molecule than the corresponding path in the template.

Do you have a question? Would you like to learn more?

Please browse among the related topics on our support forum or search the website. If you want to suggest modifications or improvements to our documentation email our support directly!