These examples demonstrate the use of the
Synthesizer program.
The purpose is to show how to run the synthesize command and explain
some of its command line options
as well as its configuration.
Synthesizer uses Reactor to perform a synthesis step. For a description of reaction definitions and mapping, see the Reaction definitions section of the Synthesizer manual and Reaction mapping section of the Reactor manual.
These examples run the synthesize UNIX shell script under UNIX / Linux
or the synthesize.bat batch file under Windows.
To run these examples:
PATH (all systems) and the JCHEMHOME (under Windows)
environment variables have to be set as described in the
Preparing and Running JChem's Batch Files and
Shell Scripts manual.
.jchem file under the .chemaxon (UNIX / Linux) or
chemaxon (Windows) directory under your user home directory.
To set up and save these settings, use the
database connection dialog coming up
when starting JChemManager (jcman).
After exiting JChemManager your settings will be saved properly.
cd jchem/examples/synthesizerIn Windows:
cd jchem\examples\synthesizer
We will show a simplified version of a combinatorial synthesis (see [1] and [2]). This example synthesis is an application of the linear algorithm. The same synthesis will be performed in memory mode, file mode and database mode.
To run the examples, step into the linear subdirectory:
cd linear
The synthesis configuration
is stored in Synthesizer.xml.
Click here for graphical representation of the
synthesis graph defined in
Synthesizer.xml. You can look at the edges corresponding
to the different synthesis steps by clicking on the steps below.
The synthesis consists of the following 3
synthesis steps:
Alkynes react with the scaffold taken as an alkyl-halide according to the following reaction:
![]() |
Products created in the previous step react with amines according to the following reaction:
![]() |
Carboxylic-acids react with the products created in the previous step according to the following reaction:
![]() |
The products created in the last step are put into the RESULT set of the synthesis.
For a description on reaction definitions see the
Reaction definitions section of the
Reactor Manual.
Reaction conditions are shown in MView after setting Table / Show Fields.
In addition to the reaction rules specified in the
Reaction definitions, our synthesis steps
also have a synthesis condition
which determines whether the created products are relevant to the synthesis.
These conditions are specified in the <Rule> subsections of our
Synthesizer.xml configuration XML.
We have the same condition for all synthesis steps:
Ncount(product(0)) + Ocount(product(0)) + Scount(product(0)) <= 10 && mass(product(0)) <= 700
which means that we accept a product only if the total number of nitrogen, oxygen and sulfur atoms
is at most 10 and the molecule mass is at most 700.
For comparison, we will run the synthesis examples below without this rule. The configuration XML
Synthesizer_norule.xml will be used in this case. The only
difference from the original Synthesizer.xml configuration is, that
there are no <Rule> subsections.
All synthesis steps
have the same parameters set in the <Step> sections:
the Mode attribute is set to "comb", while the Type attribute is
set to "all". This means that all of the above reactions are performed in
combinatorial mode with
all reaction centers processed.
We run Synthesizer with the Unique option that can be set
as a global synthesis parameter
in the Params section of the configuration XML. This means that product repetitions
are filtered by the Reactor in each step.
Our scaffold molecule is stored in scaffold.smiles and shown below:
![]() |
An alternative scaffold used in the original synthesis described in [1] and [2] is stored in scaffold.mol (see Note 1).
Our additional input molecules are stored in alkynes.smiles, amines.smiles, carboxylic-acids.smiles and shown below:
![]() |
![]() |
![]() |
In memory mode Synthesizer uses
memory based molecule sets which means that all molecules are stored in memory.
Although the synthesis is the fastest with this storage, the serious limitation is that
only a couple of thousands of molecules can be stored in the memory at a time
(5-6000 by default, can be increased by setting the -Xmx option of the java VM).
This is not sufficient in most cases.
Run Synthesizer in memory mode by the m synthesis command:
synthesize m -c Synthesizer.xml -s scaffold scaffold.smiles -s alkyne alkynes.smiles -s amine amines.smiles -s carboxylic-acid carboxylic-acids.smiles -t SET -f sdf -o result_mem.sdf
The above command runs Synthesizer in memory mode, uses the configuration stored in
Synthesizer.xml, fills the input sets scaffold, alkyne,
amine, carboxylic-acid with molecules stored in scaffold.smiles,
alkynes.smiles and amines.smiles, and
carboxylic-acids.smiles, resp.,
finally outputs all synthesized molecules (together with the inputs and intermediate products)
to result_mem.sdf with the molecule set ID stored in the SET
SDF tag.
Note, that the SDF tag field_0 stores the
origin code sequence used by the
Synthesis Browser to color the atoms according to
their originating molecule sets. Currently this is available only in
database mode.
Some sample molecules from the synthesis result file result_mem.sdf:
![]() |
For comparison, run Synthesizer without the synthesis rules by using the configuration XML Synthesizer_norule.xml:
synthesize m -c Synthesizer_norule.xml -s scaffold scaffold.smiles -s alkyne alkynes.smiles -s amine amines.smiles -s carboxylic-acid carboxylic-acids.smiles -t SET -f sdf -o result_mem_norule.sdf
Some sample molecules from the synthesis result file result_mem_norule.sdf:
![]() |
You can see the complete list of memory-mode options by typing:
synthesize m -h
The use and meaning of command line options used in this example:
| Option | Description | Default |
|---|---|---|
-c |
configuration file | - |
-s |
molecule set ID with input file | - |
-t |
SDF tag storing the molecule SET ID | the SET ID is not stored |
-f |
specifies the output format (e.g. 'sdf', 'mol') | 'smiles' |
-o |
specifies the output file path | standard output (console) |
In file mode Synthesizer uses
file based molecule sets which means that all molecules are stored in a separate file.
The file format is specified in the -f command line option (default: SMILES).
These molecule set files are placed in a subdirectory of the current directory. The name
of the directory is the same as the synthesis name specified in the -n
mandatory command line option. Synthesis set files are placed into this newly created directory
with file names being the molecule set IDs.
Compared to memory mode, file mode is slower but does not have the serious limitation on the number of molecules. Molecule set files are kept after the synthesis has finished.
Run Synthesizer in file mode by the f synthesis command. Use synthesis name
syn (-n syn), SMILES molecule sets (-f smiles)
(but this is the default molecule set format anyway),
do not require output other than the molecule sets themselves (-m):
synthesize f -n syn -c Synthesizer.xml -s scaffold scaffold.smiles -s alkyne alkynes.smiles -s amine amines.smiles -s carboxylic-acid carboxylic-acids.smiles -f smiles -m
The synthesis sets can be found in the syn subdirectory:
If there exists a subdirectory syn already (e.g. because you have previously run the
synthesis in file mode) then a new subdirectory name is generated by appending a random number
to syn.
The generated molecules alltogether are the same as the molecules generated in memory mode.
For comparison, run Synthesizer without the synthesis rules by using the configuration XML Synthesizer_norule.xml:
synthesize f -n syn_norule -c Synthesizer_norule.xml -s scaffold scaffold.smiles -s alkyne alkynes.smiles -s amine amines.smiles -s carboxylic-acid carboxylic-acids.smiles -f smiles -m
The synthesis sets can be found in the syn_norule subdirectory:
You can see the complete list of file-mode options by typing:
synthesize f -h
The use and meaning of command line options used in this example:
| Option | Description | Default |
|---|---|---|
-c |
configuration file | - |
-s |
molecule set ID with input file | - |
-n |
the synthesis directory name | - |
-f |
specifies the synthesis set file format (e.g. 'sdf', 'mol') | 'smiles' |
-m |
output only the molecule set files | output all molecules in an output file/stream apart from the generated molecule sets |
In database mode Synthesizer stores molecule sets in database. Note, that you should have a properly configured database connection to run Synthesizer in database mode. The main advantage of the database mode is that you can browse the molecules in the Synthesis Browser, view the structures colored according to origin codes or view the corresponding synthesis path. Molecules are stored in a regular JChem structure table, synthesis data (origin code, synthesis set ID, synthesis path) is stored in separate custom tables.
In database mode we use separate commands for creating a synthesis (command c),
importing molecules to synthesis sets (command i) and running the synthesis
(command r). The Synthesizer Manual
contains the complete list of available commands.
You can see the complete list of command specific options by:
synthesize <command> -h
For example, type
synthesize i -h
to display import options.
c:
specify the synthesis name in the -n option
and the synthesis configuration file in the -c option:
synthesize c -n syn -c Synthesizer.xml
The synthesis name is used to identify the synthesis in the Synthesis Browser as well as it is used as a basis for synthesis table names.
If you already have a synthesis with name syn then either choose a different
synthesis name, or else delete the synthesis by command d before creating it as
shown above:
synthesize d -n syn synthesize c -n syn -c Synthesizer.xml
i:
synthesize i -n syn -s scaffold scaffold.smiles synthesize i -n syn -s alkyne alkynes.smiles synthesize i -n syn -s amine amines.smiles synthesize i -n syn -s carboxylic-acid carboxylic-acids.smiles
r:
synthesize r -n syn
Duplicate structure filtering can be switched on by the -q
option for molecule import (command i) and for the synthesis process
(command r). In this case the
JChem structure table will contain
unique structures while you will still see duplicates in the
Synthesis Browser, because
these molecule duplicates may have been created through different synthesis paths
(in our case molecule duplicates with the same synthesis path are filtered by the
Unique synthesis option set in the
<Params> section of our configuration XML
Synthesizer.xml).
Note, that duplicate structure filtering is time consuming and makes the synthesis
process much slower.
You can export molecules to file command e:
synthesize e -n syn -s RESULT -f sdf -o result_db.sdf
The RESULT set is exported to result_db.sdf
with origin codes stored in the
field_0 SDF tag. The RESULT molecules are shown below:
![]() |
Now that you have synthesized molecules in the database, you can use the Synthesis Browser to see them:
synthesisbrowser. Select the synthesis syn from the
synthesis selection combobox.
RESULT set is shown below:
![]() |
22 by clicking on it and press the
"Show in Tree" button:
![]() |
On the left pane you can follow the molecule synthesis sequence leading to the
result molecule with ID 22.
![]() |
For comparison, you can run the synthesis without the synthesis rules by using the configuration XML Synthesizer_norule.xml:
synthesize c -n syn_norule -c Synthesizer_norule.xml synthesize i -n syn_norule -s scaffold scaffold.smiles synthesize i -n syn_norule -s alkyne alkynes.smiles synthesize i -n syn_norule -s amine amines.smiles synthesize i -n syn_norule -s carboxylic-acid carboxylic-acids.smiles synthesize r -n syn_norule
Export the RESULT set to result_db_norule.sdf by:
synthesize e -n syn_norule -s RESULT -f sdf -o result_db_norule.sdf
Some sample molecules from the RESULT set that were previously excluded by the synthesis rules are shown below:
![]() |
This example synthesis is an application of the exhaustive algorithm. The example shows a virtual emulation of the aerobic bacterial biodegradation of phenol. Two alternative pathways of the oxygenolytic ring cleavage reactions of catechol are catalyzed by specific dioxygenases. Both pathways may be present in one bacterial species. Refer to [3] for a detailed description of this mechanism.
The same synthesis will be performed in memory mode, file mode and database mode.
To run the examples, step into the exhaustive subdirectory:
cd exhaustive
The synthesis configuration is stored in Synthesizer.xml. Note, that synthesis graph consists of only one set since we want to generate the products along all reaction sequences which means that intermediate products should be taken as reactants in all reactions in the same way as input molecules.
Our input molecule (phenol) is shown below:
![]() |
The synthesis consists of the following synthesis steps:
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
In memory mode Synthesizer uses
memory based molecule sets which means that all molecules are stored in memory.
Although the synthesis is the fastest with this storage, the serious limitation is that
only a couple of thousands of molecules can be stored in the memory at a time
(5-6000 by default, can be increased by setting the -Xmx option of the java VM).
This is not sufficient in most cases.
Run Synthesizer in memory mode by the m synthesis command:
synthesize m -s S1 phenol.smiles -c Synthesizer.xml -o metabolites.smiles
The generated metabolites are shown below:
![]() |
In file mode Synthesizer uses
file based molecule sets which means that all molecules are stored in a separate file.
The file format is specified in the -f command line option (default: SMILES).
These molecule set files are placed in a subdirectory of the current directory. The name
of the directory is the same as the synthesis name specified in the -n
mandatory command line option. Synthesis set files are placed into this newly created directory
with file names being the molecule set IDs.
Compared to memory mode, file mode is slower but does not have the serious limitation on the number of molecules. Molecule set files are kept after the synthesis has finished.
Run Synthesizer in file mode by the f synthesis command:
synthesize f -n biodegradation -s S1 phenol.smiles -c Synthesizer.xml -o metabolites.smiles
The result is the same set of metabolites as in
memory mode, but the molcules are also stored in the
molecule set file S1 together with
origin codes
in the biodegradation subdirectory. If there exists a subdirectory
biodegradation already (e.g. because you have previously run the synthesis
in file mode) then a new subdirectory name is generated by appending a random number to
biodegradation.
In database mode Synthesizer stores molecule sets in database. Note, that you should have a properly configured database connection to run Synthesizer in database mode. The main advantage of the database mode is that you can browse the molecules in the Synthesis Browser, view the structures colored according to origin codes or view the corresponding synthesis path. Molecules are stored in a regular JChem structure table, synthesis data (origin code, synthesis set ID, synthesis path) is stored in separate custom tables.
In database mode we use separate commands for creating a synthesis (command c),
importing molecules to synthesis sets (command i) and running the synthesis
(command r). The Synthesizer Manual
contains the complete list of available commands.
You can see the complete list of command specific options by:
synthesize <command> -h
For example, type
synthesize i -h
to display import options.
c:
specify the synthesis name in the -n option
and the synthesis configuration file in the -c option:
synthesize c -n biodegradation -c Synthesizer.xml
The synthesis name is used to identify the synthesis in the Synthesis Browser as well as it is used as a basis for synthesis table names.
If you already have a synthesis with name syn then either choose a different
synthesis name, or else delete the synthesis by command d before creating it as
shown above:
synthesize d -n biodegradation synthesize c -n biodegradation -c Synthesizer.xml
S1 by command i:
synthesize i -n biodegradation -s S1 phenol.smiles
r:
synthesize r -n biodegradation
e:
synthesize e -n biodegradation -s S1 -o metabolites.smiles
Now that you have synthesized molecules in the database, you can use the Synthesis Browser to see them:
synthesisbrowser. Select the synthesis biodegradation from the
synthesis selection combobox.
Use the Compound view to see the generated molecules, some sample molecules are shown below:
![]() |
Use the Tree view to see the generation tree for a selected molecule. Select a structure by clicking on it and press the "Show in Tree" button:
![]() |
Press the "View synthesis path" to see the reaction sequence leading to the selected molecule:
![]() |