Standardizer User's Guide

Version 5.8.2

Contents

Further reading

 

Introduction

Standardizer brings molecules to a standardized form: the same molecule can have different tautomer and mesomer representations, chemists can draw an aromatic ring in aromatized form or by using alternating single and double bonds, draw hydrogen atoms explicitly or use only implicit hydrogens. However, when identifying a molecule or performing a substructure search, we need a common form to work with. For example, if we need those molecules from a database which have a pyrrole ring, we want to find the molecules either containing aromatized or non-aromatized pyrrole rings. We cannot predict whether an unknown input database or SDF file stores molecules in aromatized or non-aromatized forms.

There is a brute force approach for getting the expected search result, which performs two search operations: one with aromatized and another with a non-aromatized pyrrole ring as query and then merge the results. This algorithm doubles the search execution time for this simple pyrrole example and increases it exponentially with the number of possible molecule forms. For example, the query structure below has a two alterable functional groups (the pyrrole part and the enol part) and this accounts to four cases to be considered in the search:

Standardizer intro image 1 Standardizer intro image 2

We have chosen a much more efficient way to deal with this problem: first bring the molecules to standardized forms and then perform the search. We define the standard form for each variable group (i.e. oxo form for oxo-enol tautomers) and drawing modes (i.e. aromatic ring for aromatic-alternating bonds). This definition is specified in a configuration file. Standardizer performs the necessary transformations on the molecule in the order they are listed in the configuration file. The attentive reader may have noticed that these transformations also require search operations: we have to find the functional groups or aromatizable rings. However, with thoughtful planning these operations must be performed only once, then the molecules can be stored in standardized form and any operation that requires standardization (e.g. substructure search, reaction processing) can then be performed on these standardized molecules.

Standardization and search of standardized molecules in JChem databases can be performed as described here.

Standardizer provides some additional actions to refine the molecules by setting the stereo flag or performing coordinate cleaning. See an example below for the template based cleaning of a bridged skeleton.

compound to clean template cleaned result
COC(=O)C1C(CC2CCC1N2C)CC(=O)C3=CC=CC=C3 Standardizer Bicycle template Standardizer Bycicle cleaned

A set of simple examples and working examples are also available.

Application (GUI)

Standardizer GUI is an easy-to-use, high-end graphical user interface for the Standardizer tool of ChemAxon. This GUI allows you to reach all the functionalities of the Standardizer without the need of using the command-line with parameters, or editing configuration files by hand.

The Standardizer GUI will provide you a friendly way to bring your molecules to a standardized form with guide informations for each task you may encounter.

 

Usage

	standardize

Alternatively, on Win32, Unix or Mac / Java 2 (assuming that JChem is installed with creating shortcuts to Desktop or to the Start Menu):

	by double-clicking the appropriate icon
 

Input and Output

Most molecular file formats are accepted ( Marvin Documents (MRV), MDL molfile, SDfile, RXNfile, RDfile, SMILES, etc.).

Input files can be added by browsing them from the file system. Selecting more files will result in a concatenated output file. There are no restrictions in file format, input files can differ from each other, as well as from output. This allows concatenating molecule files and/or export to another format by leaving standardization rules empty.

    Working with multiple input files.
Standardizer GUI input sample image 1

 

Configuration

Standardizer GUI also provides an interface to manipulate configuration files. The embedded editor simplifies the creation of new configurations, or modification of existing ones. Building a configuration from the list of available commands, specifying the order of execution, setting custom parameters is fast and easy (download demo configuration files).

Custom transformations

You can add customized transformations to the standardization list.
  • Double-click the Scheme tab to edit the transformation (use a reaction arrow).
  • Recommended only for advanced users: Match Options are filtering the input molecules. The transformation will only be executed if the input species meets the matching options.
    Resetting default search options: remove the transform from the standardizer list and add it again.
  • Details on matching options: http://www.chemaxon.com/jchem/doc/user/query_searchoptions.html

    Note: search option "exact" has changed, the equivalent search option is "Full". String for this option: SearchOptions="sep=, t:f"

      Building configuration for Standardizer - detailed information is available for commands
    Standardizer GUI configuration sample image 1
      Building configuration for Standardizer - graphical interface for command settings
    Standardizer GUI configuration sample image 2
      Standardization in progress
    Standardizer GUI progress sample image
      Customizable view for Standardization results
    Standardizer GUI result sample image

    The command-line tool

                standardize [<input files>] -c <config file> [<options>]
            

    Prepare the usage of the standardize script or batch file as described in Preparing the Usage of JChem Batch Files and Shell Scripts.

    Alternatively, the Standardizer class can be directly invoked:

    Win32 / Java 2 (assuming that JChem is installed in c:\jchem):

                java -cp "c:\jchem\lib\jchem.jar;%CLASSPATH%" \
                chemaxon.reaction.Standardizer [<input files>] \
                -c <config file> [<options>]
            

    Unix / Java 2 (assuming that JChem is installed in /usr/local/jchem):

                java -cp "/usr/local/jchem/lib/jchem.jar:$CLASSPATH" \
                chemaxon.reaction.Standardizer [<input files>] \
                -c <config file> [<options>]
            

    Options

            General options:
                -h, --help                          this help message
                -g, --ignore-error                  continue with next molecule on error
                    --empty-mol-on-error            write an empty molecule, and continue
                                                    with next molecule on error
                    --unstandardized-mol-on-error   write the original molecule, and continue
    
    
            Input options:
              -c, --config       configuration XML file
                                                  or action string,
                                                  actions separated by "..",
                                                  valid actions are:
                                                  - reaction SMARTS
                                                  - "aromatize" (Daylight, general)
                                                  - "aromatize:b" (ChemAxon, basic)
                                                  - "aromatize:l" (loose)
                                                  - "dearomatize"
                                                  - "addexplicitH" ("hydrogenize")
                                                    (converts implicit H-s to explicit)
                                                  - "removeexplicitH[:lonely:isotope:
                                                    charged:radical:mapped:wedged]"
                                                    ("dehydrogenize")
                                                    (converts explicit H-s to implicit,
                                                    except for lonely, isotope, charged,
                                                    radical, mapped and wedged H-s;
                                                    if some of these are specified in a
                                                    ':'-separated list, then the given
                                                    H types are also converted)
                                                  - "clearisotopes"
                                                    (converts isotopes to non-isotopic form)
                                                  - "neutralize" (neutralize molecule)
                                                  - "clean" (partial clean in 2D)
                                                  - "clean:full" (full clean in 2D)
                                                  - "clean: