Serial Molecule Generator

Version 3.2.6

Contents

 

Introduction

Serial molecule generator creates molecules from given input molecules by processing a sequence of predefined reactions. The reactions are specified in the configuration file which is an extension of the Reactor configuration file.

Different molecule generator algorithms can be added to the current scheme which currently contains only a specific serial algorithm. This serial algorithm takes one-reactant one-product reactions and creates product molecules by combining reactions in all possible ways. The maximum number of reactions in a sequence can be specified (StepCount configuration attribute or --step-count command line parameter). If this parameter is omitted then reaction sequences are extended as long as new products can be generated. The reaction sequence together with the index or ID of the starting input molecule is stored in an SDF tag.

Example

Take the following two reactions:

with the input molecule:

Then the following molecules will be generated:

Notes:

  1. The REACTIONS tag stores the reaction sequence data: the input molecule index or ID is followed by the reaction IDs making up the reaction sequence, items are separated by semicolons.

  2. For a description of reaction mapping, see the Reaction mapping section of the Reactor Manual.

  3. The R-group definition in the first reaction means that neither of the mapped carbon atoms is attached to another carbon. This condition is added in order to avoid multiple processing of the same reaction center: since the reaction keeps the functional group to be matched, without this condition the same reaction center would be processed repeatedly, resulting in an infinite loop of adding the new atom group on the product side (atoms with maps 4-14).

  4. Infinite loops should also be avoided if the newly added atom group contains the same reaction center that is defined on the reactant side:

    This reaction is automatically applied to the newly added reaction, which results in adding a new reaction center again, and this loop is continued infinitely. To avoid this situation, the HitCount limit can be specified in the configuration file which determines the maximum number of reaction center search hits to be processed in one reaction processing step. The maximum number of reaction processing steps should also be specified in this case by either the StepCount configuration attribute or the --step-count command line parameter.

Usage

    molgen -c <config file> [<options>] [<input files/strings>] 

Prepare the usage of the molgen script or batch file as described in Preparing the Usage of JChem Batch Files and Shell Scripts.

Alternatively, the MoleculeGenerator class which is the common base class for the molecule generator algorithm implementor classes can be directly invoked:

Win32 / Java 2 (assuming that JChem is installed in c:\jchem):

    java -cp "c:\jchem\lib\jchem.jar;%CLASSPATH%" \
        chemaxon.reaction.MoleculeGenerator  \
	-c <config file> [<options>] \
	[<input files/strings>]

Unix / Java 2 (assuming that JChem is installed in /usr/local/jchem):

    java -cp "/usr/local/jchem/lib/jchem.jar:$CLASSPATH" \
        chemaxon.reaction.MoleculeGenerator  \
	-c <config file> [<options>] \
	[<input files/strings>]

Options

Options: 
  -h, --help                    this help message
  -c, --config <filepath>       configuration XML file
  -a, --algorithm               algorithm ID in the Algorithms section in the
                                configuration XML (default: serial)
  -o, --output <filepath>       output file path (default: stdout)
  -i, --id                      SDFile tag that stores the molecule ID
                                (default ID: the molecule index)
  -t, --tag                     SDFile tag that will store the
                                reaction sequence data (default: REACTIONS)
  -s, --step-count              the number of algorithm steps to be run
                                (default: infinity / unlimited)
  -g, --ignore-error            continue with next molecule on error

The command line parameter --config is mandatory. This specifies the path and filename of a configuration file without which the program cannot operate. A detailed description of the format of this configuration file is given below.

The command line parameter --algorithm specifies the molecule generator algorithm. The configuration XML contains a section for each configured algorithm with this section name (case insensitive string comparison is performed). Currently only the serial algorithm is implemented and this is also the default algorithm, therefore this parameter is mainly for future use.

The command line parameter --id specifies the SDF tag storing the molecule ID to be written to the output SDF as reference to the input molecule that the product molecule has been generated from.

The command line parameter --tag specifies the SDF tag storing the reaction sequence data.

The command line parameter --step-count specifies the maximum number of reaction processing steps to be performed in a reaction sequence. This parameter may also be speified in the StepCount configuration attribute - if it is given in both places then the command line parameter is used.

If the command line parameter --ignore-error is specified, then import/export errors will not stop the processing but the error is written to the console and the molecule is skipped. By default, the program exits in case of molecule import/export erros.

 

Input

Most molecular file formats are accepted ( MDL molfile, Compressed molfile, SDfile, Compressed SDfile, SMILES, etc.).

If no input file name or input string is specified in the command line then input is taken from the standard input.

 

Output

MoleculeGenerator writes output molecules in SDF format. If the --output is omitted, results are written to the standard output.

 

Configuration

The configuration XML is an extension of the Reactor configuration XML. However, instead of taking only one of the specified reactions, MoleculeGenerator uses all reactions to make up the reaction sequences.

The MoleculeGenerator-specific part of the configuration is given in the MoleculeGenerator section.

The Params subsection specifies some common molecule generator parameters in attributes that can be overridden by command line parameters:

The Algorithms subsection contains the sections for the molecule generator algorithms. The Algorithm attribute or the --algorithm command line parameter refers to one of these algorithms by its section name. Currently only the Serial algorithm is available. The corresponding java class (that should be a subclass of chemaxon.reaction.MoleculeGenerator) is specified in the mandatory Class attribute. The serial algorithm has two specific parameters that can be specified in Params subsection attributes:

The HitCount attribute can be used to avoid infinite loops when the reaction produces one or more new reaction centers of the same type (see Example note 3 in the Introduction).

Example

<ReactorConfiguration Version ="0.1" schemaLocation="react.xsd">

<Reactions>
    <Reaction ID="R1" Structure="r1.rxn"/>
    <Reaction ID="R2" Structure="r2.rxn"/>
</Reactions>

<MoleculeGenerator>
    <Params Reactions="REACTIONS" Algorithm="Serial" StepCount="2"/>
    <Algorithms>
	<Serial Class="chemaxon.reaction.SerialMoleculeGenerator">
	    <Params Multiple="true" HitCount="5"/>
	</Serial>
    </Algorithms>
</MoleculeGenerator>

</ReactorConfiguration>
 

Examples

  1. Generates molecules using the serial algorithm with configuration MolGen.xml from the input molecules mols.sdf, writes result to the standard output:
    molgen -c MolGen.xml mols.sdf
    
  2. The same with SMILES string input:
    molgen -c MolGen.xml "CCC(CC(O)=O)C(O)=O" "CP\C=C\C(C(NC=C)NC=C)N(CP(C)C=C)\C=C\C(CC(O)=O)C(O)=O"
    
  3. Takes maximum 3 reactions per reaction sequence and writes result to out.sdf with reaction sequence data stored in the RDATA SDF tag:
    molgen -c MolGen.xml -s 3 -t RDATA mols.sdf -o out.sdf
    
  4. The same but writes input molecule IDs taken from the ID SDF tag instead of molecule indices and displays the result in MarvinView:
    molgen -c MolGen.xml -s 3 -t RDATA -i ID  mols.sdf -o out.sdf
    mview out.sdf
    
  5. The same but directly pipes output to MarvinView:
    molgen -c MolGen.xml -s 3 -t RDATA -i ID  mols.sdf | mview -
    

    Note that such piping does not work in Windows.

 
Copyright © 1999-2007 ChemAxon Ltd.    All rights reserved.