Synthesizer

Version 3.2.12

Contents

 

Introduction

Synthesizer processes a sequence of reactions taking their reactants from and putting their products into predefined molecule sets. Synthesizer can be run in memory mode, file mode or database mode.

In memory mode input molecules are taken from input files, all molecules (input molecules and those created by processing the reactions) are stored in memory and molecules from some or all sets are written to an output file at the end.

In file mode input molecules are taken from input files, all molecules (input molecules and those created by processing the reactions) are stored in file-based molecule sets under a newly created subdirectory named by the synthesis name, optionally molecules from some or all sets are written to an output file at the end.

In database mode input molecules are first imported into the database, reactions take their reactants from and put their products into the database and the synthesis algorithm also writes the synthesis tree (trace of processed reactions) into the database from which the synthesis route for each molecule can be determined. The synthesis result molecules can either be exported to molecule files or can be browsed by a graphical tool called Synthesis Browser which connects to the database, shows molecules by sets and also displays the reaction sequence producing a given molecule. The Synthesis Browser can be invoked from the Tools menu of the JChemManager (jcman), or can be run as a standalone program by running the bin/synthesisbrowser script which runs the chemaxon.jchem.SynthesisBrowser java class. If molecule set colors are specified in the configuration XML then the Synthesis Browser colors atoms according to their originating molecule set. In this way a synthesis product can be seen as a composition of atoms taken from different input molecules and atoms added by different reactions, all of these distinguished by different colors.

Reactions with corresponding reactant and product molecule sets are stored in the synthesis graph. The synthesis graph is composed of edges being the reactions and nodes being arrays of molecule sets corresponding to reaction reactant molecule sets and product molecule sets. The synthesis graph is defined in the configuration XML.

A synthesis is composed of synthesis steps. In a synthesis step a synthesis graph edge is selected, then reactant molecules are selected from the reactant sets and the reaction is performed for the selected reactant molecules, the created products are placed into the product sets. This selection mechanism is determined by the synthesis algorithm.

For each synthesis step, the edge to be processed together with the selection mode and processing type of the reactants is predefined in the configuration.

The selection mode corresponds to the --mode command line parameter of the Reactor:

The selection mode is set in the Mode attribute of the synthesis Step section.

The processing type determines the set of reaction centers to be processed:

The processing type is set in the Type attribute of the synthesis Step section.

A set of working examples and a flash animation are also available.

Usage

    synthesize <command> [<options>] 
    synthesize <command> [<options>] [input file(s)]

  Commands:
    c        creates new synthesis in the database
    d        deletes a synthesis from the database
    i        imports a molecules into a database molecule set
    e        exports a database molecule set into a molecule file
    r        runs all synthesis steps in database mode
    l        lists the available synthesis names in the database
    m        runs the synthesis in memory mode (no database)
    f        runs the synthesis in file mode (no database)

Options

Options are command specific, run Synthesizer with the command and the --help parameter to get help on command options:

    synthesize <command> -h

Examples:

  1. Lists the command line parameters for memory-mode:
    synthesize m -h
    
  2. Lists the command line parameters for file-mode:
    synthesize f -h
    
  3. Lists the database molecule set import options:
    synthesize i -h
    
  4. Lists the command line parameters for synthesis creation in database mode:
    synthesize c -h
    
  5. Lists the command line parameters for running a synthesis in database mode:
    synthesize r -h
    
  6. Lists the database molecule set export options:
    synthesize e -h
    

Command m (memory-mode) refers to memory mode, in which case molecule import, the synthesis, molecule export are all run in one command, the synthesis molecule sets are stored in memory.

Command f (file-mode) refers to file mode, in which case molecule import, the synthesis, molecule export are all run in one command, the synthesis molecule sets are written into files in a subdirectory determined by the synthesis name. The synthesis name is specified in the --name (-n) mandatory command line parameter.

All other commands refer to different phases of synthesis in database mode, in which case molecule import, the synthesis, molecule export are all run as separate commands, since synthesis data is stored in the database. The synthesis is referred to by its synthesis name, which is specified in the --name (-n) mandatory command line parameter.

Prepare the usage of the synthesize script or batch file as described in Preparing the Usage of JChem Batch Files and Shell Scripts.

Alternatively, the Synthesizer class can be directly invoked:

Win32 / Java 2 (assuming that JChem is installed in c:\jchem):

    java -cp "c:\jchem\lib\jchem.jar;%CLASSPATH%" \
        chemaxon.reaction.Synthesizer <command> \
	[<options>] [input file(s)]

Unix / Java 2 (assuming that JChem is installed in /usr/local/jchem):

    java -cp "/usr/local/jchem/lib/jchem.jar:$CLASSPATH" \
        chemaxon.reaction.Synthesizer <command> \
	[<options>] [input file(s)]
 

Input

Most molecular file formats are accepted ( MDL molfile, Compressed molfile, SDfile, Compressed SDfile, SMILES, etc.).

Synthesizer reads molecules into input molecule sets. In file mode, the sets paired with the corresponding molecule files are specified in the --set parameter. In database mode, files are imported into a database molecule set by the i (import) command.

 

Output

In database mode, the output is the complete synthesis tree stored in the database. The synthesis sets and reaction sequences can be seen by the Synthesis Browser or molecules in specified (or all) sets can be exported to files by the e (export) command. In file mode, either all sets (default) or result sets specified in the --result-sets parameter are written to the output stream at the end of the synthesis.

 

Configuration

The configuration is an XML file. It has different subsections for standardization, search options, reaction definitions, set color definitions, the synthesis graph and the synthesis algorithm, and also for additional synthesis parameters.

Reaction configuration

The reaction configuration contains the following sections:

  1. Standardizer configuration: this section specifies the standardization procedure that is performed on both the targets and the queries before performing substructure search to find the reaction centers.
  2. Search attributes: this section specifies the search options to be used when searching functional groups according to the reaction equation.
  3. Expression evaluation context: this section defines the reaction rule evaluation context: the functions, plugins and molecular constants referenced by the rules. This section overrides the default context.
  4. Reaction definitions: this section specifies the reactions with rules as chemical terms composed of the molecular constants, functions, plugin calculations defined in the Evaluator subsection (or in the default context) as well as arithmetical and logical operators.
 

Synthesis specific configuration

The synthesis specific subsections of the XML configuration are described below by examples:
 

Examples

  1. A UNIX command that runs a synthesis in memory mode: uses the synthesis configuration given in config.xml, reads set1.sdf into molecule set SET1, reads set2.sdf into molecule set SET2, reads in.sdf into the input set specified in the configuration XML, runs the synthesis and writes all molecules (both input and created) to result.sdf.
    synthesize m -c config.xml in.sdf -s SET1 set1.sdf -s SET2 set2.sdf -f sdf -o result.sdf
    
  2. A UNIX command that runs a synthesis in file mode: molecule sets are written to files under the newly created subdirectory syn, no other output file is created.
    synthesize f -c config.xml -n syn -m in.sdf -s SET1 set1.sdf -s SET2 set2.sdf 
    
  3. A sequence of UNIX commands that run a synthesis with name syn in database mode: first create the synthesis, then import molecule files into input sets, then run the synthesis, finally export all molecules with molecule set ID written to the SDF tag SET:
    synthesize c -n syn -c config.xml
    synthesize i -n syn -s SET1 set1.sdf
    synthesize i -n syn -s SET2 set2.sdf
    synthesize r -n syn
    synthesize e -n syn -f sdf -t SET -o result.sdf
    
 
Copyright © 1999-2007 ChemAxon Ltd.    All rights reserved.