Synthesizer processes a sequence of reactions taking their reactants from and putting their products into predefined molecule sets. Synthesizer can be run in memory mode, file mode or database mode.
In memory mode input molecules are taken from input files, all molecules (input molecules and those created by processing the reactions) are stored in memory and molecules from some or all sets are written to an output file at the end.
In file mode input molecules are taken from input files, all molecules (input molecules and those created by processing the reactions) are stored in file-based molecule sets under a newly created subdirectory named by the synthesis name, optionally molecules from some or all sets are written to an output file at the end.
In database mode input molecules are first imported into the
database, reactions take their reactants from and put their products into the database
and the synthesis algorithm also writes the synthesis tree (trace of processed reactions)
into the database from which the synthesis route for each molecule can be determined.
The synthesis result molecules can either be exported to molecule files or can be browsed
by a graphical tool called Synthesis Browser
which connects to the database, shows molecules by sets and also displays the reaction
sequence producing a given molecule.
The Synthesis Browser can be invoked from the
Tools menu of the JChemManager (jcman), or can be run as a
standalone program by running the bin/synthesisbrowser script which runs the
chemaxon.jchem.SynthesisBrowser java class.
If molecule set colors are specified in the configuration XML then
the Synthesis Browser colors atoms according to their
originating molecule set.
In this way a synthesis product can be seen as a composition of atoms taken from different
input molecules and atoms added by different reactions, all of these distinguished by
different colors.
Reactions with corresponding reactant and product molecule sets are stored in the synthesis graph. The synthesis graph is composed of edges being the reactions and nodes being arrays of molecule sets corresponding to reaction reactant molecule sets and product molecule sets. The synthesis graph is defined in the configuration XML.
A synthesis is composed of synthesis steps. In a synthesis step a synthesis graph edge is selected, then reactant molecules are selected from the reactant sets and the reaction is performed for the selected reactant molecules, the created products are placed into the product sets. This selection mechanism is determined by the synthesis algorithm.
For each synthesis step, the edge to be processed together with the selection mode and processing type of the reactants is predefined in the configuration.
The selection mode corresponds to the
--mode
command line parameter of the Reactor:
The selection mode is set in the Mode attribute of the
synthesis Step section.
The processing type determines the set of reaction centers to be processed:
The processing type is set in the Type attribute of the
synthesis Step section.
A set of working examples and a flash animation are also available.
synthesize <command> [<options>]
synthesize <command> [<options>] [input file(s)]
Commands:
c creates new synthesis in the database
d deletes a synthesis from the database
i imports a molecules into a database molecule set
e exports a database molecule set into a molecule file
r runs all synthesis steps in database mode
l lists the available synthesis names in the database
m runs the synthesis in memory mode (no database)
f runs the synthesis in file mode (no database)
Options are command specific, run Synthesizer with the command and the
--help parameter to get help on command options:
synthesize <command> -h
Examples:
synthesize m -h
synthesize f -h
synthesize i -h
synthesize c -h
synthesize r -h
synthesize e -h
Command m (memory-mode) refers to memory mode,
in which case molecule import, the synthesis, molecule export are all run in one command,
the synthesis molecule sets are stored in memory.
Command f (file-mode) refers to file mode,
in which case molecule import, the synthesis, molecule export are all run in one command,
the synthesis molecule sets are written into files in a subdirectory determined by the
synthesis name. The synthesis name is specified in the --name
(-n) mandatory command line parameter.
All other commands refer to different phases of synthesis in database mode,
in which case molecule import, the synthesis, molecule export are all run as separate commands,
since synthesis data is stored in the database. The synthesis is referred to by its
synthesis name, which is specified in the --name (-n) mandatory
command line parameter.
Prepare the usage of the synthesize script or batch file
as described in Preparing the Usage of JChem
Batch Files and Shell Scripts.
Alternatively, the Synthesizer class can be directly invoked:
Win32 / Java 2 (assuming that JChem is installed in c:\jchem):
java -cp "c:\jchem\lib\jchem.jar;%CLASSPATH%" \
chemaxon.reaction.Synthesizer <command> \
[<options>] [input file(s)]
Unix / Java 2 (assuming that JChem is installed in /usr/local/jchem):
java -cp "/usr/local/jchem/lib/jchem.jar:$CLASSPATH" \
chemaxon.reaction.Synthesizer <command> \
[<options>] [input file(s)]
Most molecular file formats are accepted ( MDL molfile, Compressed molfile, SDfile, Compressed SDfile, SMILES, etc.).
Synthesizer reads molecules into input molecule sets. In file mode, the sets paired with the
corresponding molecule files are specified in the --set parameter. In database mode,
files are imported into a database molecule set by the i (import) command.
In database mode, the output is the complete synthesis tree stored in the database.
The synthesis sets and reaction sequences can be seen by the
Synthesis Browser or
molecules in specified (or all) sets can be exported to files by the e (export)
command. In file mode, either all sets (default) or result sets specified in the
--result-sets parameter are written to the output stream at the end of the synthesis.
The configuration is an XML file. It has different subsections for standardization, search options, reaction definitions, set color definitions, the synthesis graph and the synthesis algorithm, and also for additional synthesis parameters.
The reaction configuration contains the following sections:
Evaluator subsection (or in the default context) as well as
arithmetical and logical operators.Standardization used on reactants before performing the substructure search to find
the reaction centers is defined in the Standardizer subsection in the same
way as in the XML configuration of the Standardizer.
For a detailed description of the standardization process and configuration, refer to
the standardizer manual.
<Standardizer>
<Actions>
<Aromatize ID="aromatize"/>
<Reaction ID="plusminus" Structure="[*+:1][*-:2]>>[*:1]=[*:2]"/>
</Actions>
</Standardizer>
Search section placed at the top level is used to set search options when searching reaction centers in reactants. On the other hand, when a search section is placed below a match section then the search options are applied when checking matching conditions.
The search attribute section contains the substructure search options that differ from the default settings. Each attribute is optional. If omitted then the default value is used. For a detailed description of the search options see the JChem Query Guide
| Attribute | Range | Default Value |
|---|---|---|
| StereoSearch | true/false |
true |
| DoubleBondStereoMatchingMode | none/marked/all |
marked |
| SubgraphSearch | true/false |
true |
| ExactAtomMatching | true/false |
false |
| ExactStereoMatching | true/false |
false |
| OrderSensitiveSearch | true/false |
false |
Example:
<Search DoubleBondStereoMatchingMode="all" OrderSensitiveSearch="true"/>
Synthesizer uses Reactor to process reactants to products
according to the reaction equation. The reactions used in the synthesis can be specified
in the Reactions section of the XML configuration.
Reaction subsection elements specify the reaction with
reaction ID and with optional reaction rules.
The reaction is specified in the Structure
attribute either as a SMARTS/SMILES string or as a molecule file path. An optional Type
attribute can be added to specify whether the structure is given as a
string (Type="string") or as a file path (Type="path").
If the Type attribute is omitted then the structure type is
automatically decided based on its format which gives the correct result
in most cases.
The reaction rules can be specified in the
Reactivity, Exclude and Selectivity
subsections of the configuration XML. For more information on
reaction rules see the Reactor manual.
Below we show a complete evaluator configuration example and reaction definitions with rules (note that the Evaluator Standardizer is configured from the Standardizer section in this case).
This configuration does not have any chemical meaning, it simply represents the flexible reaction customization opportunities using the configuration file.
Example:
<Evaluator> <Matching ID="match"> <Search StereoSearch="true" StereoCareChecking="true"/> </Matching> <Functions> <Function ID="atomprop" Class="chemaxon.jep.function.AtomProperties"/> </Functions> <Plugins> <Plugin ID="charge" Class="chemaxon.marvin.calculations.ChargePlugin"/> <Plugin ID="pol" Class="chemaxon.marvin.calculations.PolarizationPlugin"/> <Plugin ID="pka" Class="chemaxon.marvin.calculations.pKaPlugin"> <Param Name="min" Value="-18"/> <Param Name="max" Value="30"/> </Plugin> <Plugin ID="logd" Class="chemaxon.marvin.calculations.logDPlugin"/> <Plugin ID="logp" Class="chemaxon.marvin.calculations.logPPlugin"/> <Plugin ID="logpi" Class="chemaxon.marvin.calculations.logPPlugin"> <Param Name="type" Value="increments"/> </Plugin> <Plugin ID="mass" Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin"> <Param Name="type" Value="mass"/> </Plugin> </Plugins> <Mols> <Mol ID="mol1" Structure="C[P:1](C)C"/> <Mol ID="ct" Structure="O\C=C/O"/> </Mols> </Evaluator> <Reactions> <Reaction ID="r1" Structure="[H:2][c:1]1ccccc1>>[Cl:3][c:1]1ccccc1"> <Reactivity> <![CDATA[ mass(reactant(0)) > 200 ]]> </Reactivity> <Selectivity Tolerance="0.01"> <![CDATA[ -charge(ratom(1)) ]]> </Selectivity> </Reaction> <Reaction ID="r2" Structure="[H][N:4]c1ccccc1[C:2]([H])=[O:1].[H][c:3]1cccc[c:5]1[H]>>[O:1]=[C:2]1c2ccccc2[N:4][c:5]3cccc[c:3]13"> <Reactivity> <![CDATA[ sum(charge(reactant(1))) > 0 && sum(charge(filter(reactant(1), "charge() > 2"))) > 10 ]]> </Reactivity> </Reaction> <Reaction ID="r3" Structure="[H][N:4]c1ccccc1[C:2]([H])=[O:1].[H][c:3]1cccc[c:5]1[H]>>[O:1]=[C:2]1c2ccccc2[N:4][c:5]3cccc[c:3]13"> <Reactivity> <![CDATA[ c1 = charge(reactant(1)); c2 = charge(filter(reactant(1), "charge() > 2")); sum(c1) > 0 && sum(c2) > 10 ]]> </Reactivity> </Reaction> </Reactions>
The ID attribute specifies the reaction id by which
the reaction can be referenced.
Alternatively, if the reaction is given in MRV or RDF format the reaction rules can be specified in the MRV/RDF tags. See the reaction file description in the Reactor manual for more information.
For a description of reaction mapping, see the Reaction mapping section of the Reactor manual.
<SetColors>
<DefaultColor Color="black"/>
<SetColor Set="alkyne" Color="red"/>
<SetColor Set="alkyl-halide" Color="blue"/>
<SetColor Set="amine" Color="#C0C0C0"/>
<SetColor Set="carboxil-acid" Color="orange"/>
</SetColors>
Set colors are used to color atoms in the
SynthesisBrowser by their originating sets.
The originating set is determined by the origin code: this is an integer
that identifies the reaction component (reactant/product index) together with the
synthesis step in which the atom has been added to the molecule.
This also determines the originating molecule set of the atom, which defined its color
in the SynthesisBrowser. When exporting molecules
to an SDF file from a synthesis, the origin code sequence (a ';' separated list of atomic
origin codes written in atom index order, e.g.: 4;5;6;6;6;) is stored in the field_0
SDF tag.
Set colors are defined in SetColor subsections,
by writing the set ID in the Set attribute and the
color reference in the Color attribute. The set ID
is the set identifier which the set is referenced by in the
synthesis graph definition section.
The color reference can be a color constant or a hexadecimal color representation.
Supported colors:
Java awt.Color constant names (examples: "cyan", "orange", "pink")
| Name | Code | Sample |
|---|---|---|
| black | #000000 | |
| white | #FFFFFF | |
| red | #FF0000 | |
| yellow | #FFFF00 | |
| lime | #00FF00 | |
| aqua | #00FFFF | |
| blue | #0000FF | |
| fuchsia | #FF00FF | |
| gray | #808080 | |
| silver | #C0C0C0 | |
| maroon | #800000 | |
| olive | #808000 | |
| green | #008000 | |
| teal | #008080 | |
| navy | #000080 | |
| purple | #800080 |
| Name | Code | Sample |
|---|---|---|
| brown | #5C3317 | |
| darkgreen | #006400 |
The DefaultColor subsection defines the set color applied
to the sets not listed in the SetColor subsections. If no
DefaultColor subsection is given then "black" color is
applied to the sets with no color definition.
<Graph>
<Rule><![CDATA[ mass(product(0)) < 500 ]]></Rule>
<Sets Input="alkyne" Result="RESULT" Trash="TRASH"/>
<Edge From="alkyne,alkyl-halide" To="product1" RID="r1" ID="id1"/>
<Edge From="product1,amine" To="product2" RID="r2"/>
<Edge From="carboxil-acid,product2" To="RESULT" RID="r3">
<Rule><![CDATA[ mass(reactant(0)) > 180 ]]></Rule>
</Edge>
</Graph>
The synthesis graph is defined as the list of graph edges: each edge is described in a
Edge subsection. By the RID attribute
the edge refers to the corresponding reaction ID attribute
in the reaction definition section. The From attribute defines the
reactant molecule sets as a list of set ID-s separated by commas, while the
To attribute defines the product molecule sets in the same way.
An additional ID attribute can be specified to identify the edge
for reference from the synthesis algorithm definition section.
The default edge ID is the reaction ID specified in the RID
attribute. The ID attribute is useful when the same reaction
belongs to more than one edge.
An optional synthesis rule can be specified for each synthesis reaction (graph edge) to filter product lists - the syntax is the same as for reaction definitions: after the reaction is processed and the product list returned, the rule is taken as a boolean expression and evaluated to "true" or "false" and the product list is accepted only if the evaluation result is "true".
It is also possible to set a default rule to be applied when no rule
is specified for the synthesis reaction (graph edge). This rule should be defined in the same way
as the graph edge specific rules, in a Rule element as a boolean expression; the
difference is that this node is a direct subnode of the Graph element node instead of
being a subnode of an Edge element node.
The Sets subsection defines special sets:
Input set: only used in
file mode to specify a set where molecules from input files
(specified without the --set molecule set ID) are loaded.
If no input file is specified but the graph has an input set then molecules from the
standard input are loaded into the input set.
Note: make sure that if the graph has an input set then either input file(s) are specified or the program gets its input from the standard input. Otherwise the software will wait for the standard input forever.
Result sets: only used in
file mode to specify a list of sets to be exported when the
synthesis is finished. Set ID-s are separated by commas. These result sets are exported if the
--result-sets command line parameter is specified without a following set list.
If a set list is specified then those sets will be written as output, if this parameter is omitted
then all sets are exported.
In database mode the synthesis data is stored in the database and separate
e (export) commands can be used to export molecule sets to files one-by-one.
Trash sets: their contents are not stored, used to specify
sets of side-products that are not used in the synthesis any more. These sets are not stored in the
database / memory and cannot be exported.
The synthesis algorithm is specified in the
Algorithm section in a subsection specific to the algorithm java class.
There are three algorithms available:
processes reaction steps in configuration order, each step is processed once.
An example configuration section is given below:
<Algorithm> <Linear Class="chemaxon.reaction.synthesis.LinearAlgorithm"> <Step ID="r1" Mode="comb" Type="all"/> <Step ID="r2" Mode="comb" Type="any"/> <Step ID="r3" Mode="seq" Type="all"/> </Linear> </Algorithm>
The molecule selection mode corresponding to the
--mode Reactor parameter is specified in the
Mode attribute ("comb" for
combinatorial mode, "seq" for
sequential mode, default is "comb").
processes reaction steps in all possible orders.
In this algorithm
all reactions are supposed to have a single reactant and typically, but not necessarily, the
synthesis graph consists of only one molecule set being the input molecule set
as well as the reactant / product set of each reaction. In each synthesis step, the next molecule is
taken from the synthesis graph and processed by each reaction to products. These
products are sotred in the synthesis graph and will be processed later in the same fashion. In this way
all reaction sequences starting from the input molecules are explored. The sequence length can be set
in the -e, --step-count command line parameter of the synthesize m,
synthesize f and synthesize r commands.
Reaction steps can be grouped into synthesis phases and molecule conditions can be specified
for all reactions, for each synthesis phase and for each synthesis step separately.
The number of phases to be processed can be set in the -a, --phase-count command line
parameter of the synthesize m, synthesize f and synthesize r
commands. By default, all phases are processed.
In each step, the corresponding step and phase conditions and the global condition is evaluated for
the current molecule and the step is processed for that molecule only if all of these conditions are
satisfied.
The molecule conditions are specified in ChemAxon's Chemical Terms
syntax and available functions can be extended by function objects declared in a separate
MolConditions section. These function objects are user defined JAVA classes (should be
found in the user's CLASSPATH) implementing the
MolCondition interface.
Each function object is specified in a separate MolCondition subsection with its class
given in the Class attribute and its (optional) initial parameter string to be set in
setParameters(String params) given in the optional Params attribute.
An example configuration is given below:
<Algorithm>
<Exhaustive Class="chemaxon.reaction.synthesis.ExhaustiveAlgorithm">
<MolConditions>
<MolCondition ID="moreThan1Oxygens" Class="AtomCountCondition" Params="8 > 1"/>
</MolConditions>
<MolCondition>
<![CDATA[ ringCount() > 0 ]]>
</MolCondition>
<Phase>
<Step ID="r11" Type="all">
<MolCondition>
<![CDATA[ aromaticRingCount() > 1 ]]>
</MolCondition>
</Step>
<Step ID="r12" Type="all"/>
</Phase>
<Phase>
<MolCondition>
<![CDATA[ moreThan1Oxygens() && match("PCCO") ]]>
</MolCondition>
<Step ID="r21" Type="all">
<MolCondition>
<![CDATA[ mass() > 250 ]]>
</MolCondition>
</Step>
</Phase>
</Exhaustive>
</Algorithm>
This configuration declares a user defined function object
AtomCountCondition.
With the above parametrization, the function checks if there are more than one oxygens in the
molecule. Other functions such as match(),
mass(), ringCount() and aromaticRingCount() are already declared in the default configuration
evaluator.xml. This configuration defines two phases
with two steps in the first phase and one step in the second phase. The global condition requires that
each molecule should contain a ring. The first step in the first phase is run only for molecules with
more than one aromatic rings, while the second phase step requires more than one oxygens, matching
the structure "PCCO" (SMILES) and molecule mass greater than 250.
selects a reaction step and corresponding reactants randomly in each step.
The number of steps can be specified in the -e, --step-count
command line parameter of the synthesize m, synthesize f and
synthesize r commands.
An example configuration is given below:
<Algorithm>
<Random Class="chemaxon.reaction.synthesis.RandomAlgorithm">
<Step ID="r1" Type="all"/>
<Step ID="r2" Type="any"/>
<Step ID="r3" Type="all"/>
</Random>
</Algorithm>
Class attribute is added for technical reasons to specify the
java class implementing the algorithm. The synthesis steps are defined in
Step subsections, the steps will be performed in the same sequence
as listed in the configuration in the linear case, in all possible orders in the
exhaustive case, and selected randomly in each step
in the random case. Each step has an ID which refers to
the corresponding graph edge ID. The graph edge ID is either specified in the
ID attribute of the graph edge or it is the same
as the corresponding reaction ID specified in the RID attribute
if the former is omitted.
The processing type is specified in the Type attribute:
<Params Unique="false" Cached="true"/>Synthesis parameters are specified in the
Params section.
Currently two such parameters are available:
Unique: corresponds to the
Unique Reactor parameter: if set to "false",
then repeated product lists are allowed when processing the reactions (the default is "true")
Cached: corresponds to the
Cached Reactor parameter: if set to "true",
all reactions are processed in cache mode (ie, calculation and search results are cached)
(the default is "false") - note that setting this parameter to "true" has not much effect in
database mode since this cache is memory cache and is not stored in
the database
SET1,
reads set2.sdf into molecule set SET2,
reads in.sdf into the input set specified in the configuration XML,
runs the synthesis and writes all molecules (both input and created) to result.sdf.
synthesize m -c config.xml in.sdf -s SET1 set1.sdf -s SET2 set2.sdf -f sdf -o result.sdf
syn, no other output file is created.
synthesize f -c config.xml -n syn -m in.sdf -s SET1 set1.sdf -s SET2 set2.sdf
syn in database mode:
first create the synthesis, then import molecule files into input sets,
then run the synthesis, finally export all molecules with molecule set ID written to
the SDF tag SET:
synthesize c -n syn -c config.xml synthesize i -n syn -s SET1 set1.sdf synthesize i -n syn -s SET2 set2.sdf synthesize r -n syn synthesize e -n syn -f sdf -t SET -o result.sdf