Standardizer Examples

Contents

 

Introduction

These examples demonstrate the use of the Standardizer program. The purpose is to show how to run the standardize command and explain some of its command line options as well as its configuration.

For a description of reaction mapping, see the Reaction mapping section of the Reactor Manual.

 

Prerequisites

These examples run the standardize UNIX shell script under UNIX / Linux or the standardize.bat batch file under Windows.

To run these examples:

  1. The Java Virtual Machine version 1.4 or higher and JChem have to be installed on your system.

  2. The PATH (all systems) and the JCHEMHOME (under Windows) environment variables have to be set as described in the Preparing and Running JChem's Batch Files and Shell Scripts manual.

  3. A command shell (under UNIX / Linux: your favorite shell, under Windows: a Cygwin shell or a Command Prompt) has to be run in the standardizer example directory.
    In UNIX / Linux:
    cd jchem/examples/standardizer
    
    In Windows:
    cd jchem\examples\standardizer
    
 

Examples

These examples demonstrate standardization with two different configurations. The input molecule is stored in input.mol:

input molecule

The use and meaning of command-line options in the above commands:

OptionDescriptionDefault
-c configuration file/string -
-f specifies the output format (e.g. 'sdf', 'mol') 'smiles'
-o specifies the output file path standard output (console)
 

Template based clean

Template based clean is a special way of cleaning with the help of a predefined template file. The template file contains pre-cleaned sample structures for which the usual celan algorithm fails due to their exceptional spatial arrangements. For example, bicycles, bridged polycycles crown ethers and cycloalkanes are such structures. Our example template file contains some of these:

clean_templates.sdf

Template based clean works in the following way: templates are searched in the target molecule in the order as they are specified in the template file. The first matching is processed: template atom coordinates are copied to the corresponding target atoms and the remaining atoms are cleaned with partial clean.

The corresponding Standardizer task is the Clean task with the following attributes: Type="TemplateBased" TemplateFile="clean_templates.sdf" where clean_templates.sdf is the template file. The template based clean task can also be specified in the simple action string as "clean:clean_templates.sdf".

Now clean some test molecules with template based clean using the above example template file. The input molecules are stored in clean_test.sdf:

clean_test.sdf

Run template based clean in either of the following ways:

standardize -c "aromatize..clean:clean_templates.sdf" clean_test.sdf -f sdf -o clean_test_output.sdf
standardize -c StandardizerTBClean.xml clean_test.sdf -f sdf -o clean_test_output.sdf
standardize -c StandardizerTBClean.txt clean_test.sdf -f sdf -o clean_test_output.sdf

Pre-aromatization is important in order to recognize the single-or-aromatic bonds of the templates.

Note, that the action string can also be written in a file either as it is or with writing each task in a separate line as in StandardizerTBClean.txt. The usual XML configuration StandardizerTBClean.xml can also be used.

The result file clean_test_output.sdf is shown below:

clean_test_output.sdf

Note, that molecule 4 is not cleaned, while it matches the same template as molecule 3. The reason is that it contains some extra bridges between N atoms - and we accept a template as matching only if there is no shorter path between template matching atoms in the target molecule than the corresponding path in the template.

 

Task groups

Task groups are useful if you want to use the same configuration for different purposes. Depending in your target, you may want to select certain tasks to be executed while skipping others. A typical example is when a bit different standardization is required for targets and queries in substructure search. For example, you may want to remove explicit hydrogens from the target while leaving them in the query. In this case you may add the "RemoveExplicitH" task to the "target" group and skip this task when standardizing the query by setting the active group to "query".

Task groups can be specified in the Groups XML attribute or between curly braces in the simple action string. Take StandardizerGroups.xml in this example, which defines four groups: "target", "query", "g1", "g2". The last task, the enol transformation has no Groups attributes, which means that it is always executed (belongs to no specific groups). The corresponding action string is

{query,target}aromatize..{target}removeexplicith..{g1,g2}[*+:1]-[*-:2]>>[*:1]=[*:2]..{g1}[*+:1]=[*-:2]>>[*:1]#[*:2]..[O:3][C:1]=[C:2]>>[C:2][C:1]=[O:3]
or StandardizerGroups.txt in file form.

Take group_test.mol as input molecule:

group_test.mol

Run Standardizer with setting the active group to "query" in either of the following ways:

standardize -c "{query,target}aromatize..{target}removeexplicith..{g1,g2}[*+:1]-[*-:2]>>[*:1]=[*:2]..{g1}[*+:1]=[*-:2]>>[*:1]#[*:2]..[O:3][C:1]=[C:2]>>[C:2][C:1]=[O:3]" group_test.mol -u query
standardize -c StandardizerGroups.xml group_test.mol -u query
standardize -c StandardizerGroups.txt group_test.mol -u query

The result is:

[H]C(c1cc(CC=O)cc(C[NH+]=[N-])c1)[N+]([O-])=O

Only the first and the last tasks (aromatization and enol transformation) have been processed - the first one belongs to the "query" group, the last one has no groups specified and therefore it is always executed.

You can specify more active groups at the same time. For example, setting both "query" and "g1" will execute all but the second task (explicit H removal):

standardize -c "{query,target}aromatize..{target}removeexplicith..{g1,g2}[*+:1]-[*-:2]>>[*:1]=[*:2]..{g1}[*+:1]=[*-:2]>>[*:1]#[*:2]..[O:3][C:1]=[C:2]>>[C:2][C:1]=[O:3]" group_test.mol -u query,g1
standardize -c StandardizerGroups.xml group_test.mol -u query,g1
standardize -c StandardizerGroups.txt group_test.mol -u query,g1

The result is:

[H]C(c1cc(CC=O)cc(C[N]#N)c1)N(=O)=O

Note, that you get the same if you only set the group of the second task to "target", do not set groups for any others and run Standardizer with active group "query":

standardize -c "aromatize..{target}removeexplicith..[*+:1]-[*-:2]>>[*:1]=[*:2]..[*+:1]=[*-:2]>>[*:1]#[*:2]..[O:3][C:1]=[C:2]>>[C:2][C:1]=[O:3]" group_test.mol -u query,g1
 
Copyright © 1999-2008 ChemAxon Ltd.    All rights reserved.