These examples demonstrate the use of the Fragmenter and the FragmentStatistics programs.
First we show how to run the fragment command and explain
some of their command line options
as well as its configuration.
FragmentStatistics is a supplementary
tool which takes the Fragmenter cxsmiles output
as input, performs duplicate filtering of fragments as well as an optional categorization by
chemical activity data. Fragments are sorted by a
scoring function which is a weighted
combination of the atom count and occurrence rates in each category. The
fragstat command with
command line options is demonstrated
by 50 sample molecules including activity data.
These examples run the fragment UNIX shell script under UNIX / Linux
or the fragment.bat batch file under Windows.
To run these examples:
PATH (all systems) and the JCHEMHOME (under Windows)
environment variables have to be set as described in the
Preparing and Running JChem's Batch Files and
Shell Scripts manual.
cd jchem/examples/fragmenterIn Windows:
cd jchem\examples\fragmenter
These examples demonstrate molecule fragmentation by cleavage rules defined in the Fragmenter.xml XML configuration file. The cleavage reactions are shown below:
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
The configuration file Fragmenter.xml also contains a standardization section which is used to standardize the input molecule. The current standardization only aromatizes the input molecule. Refer to the Standardizer Manual and its configuration section for details.
The Fragmenter has one generic rule: Fragmenter never cleaves a ring-bond.
In this way, each cleavage bond increases the number of fragments by 1, and the
number of fragments in a complete fragmentation (called a fragment set) is one more
than the number of cleavage bonds corresponding to the fragmentation.
fragment -c Fragmenter.xml input.mol -f sdf:-a -o fragments1.sdf
Note, that we set sdf:-a as output format in the -f parameter
because our fragments are aromatized due to standardization,
but the SDF format is supposed to store the dearomatized form.
The resulting fragments are stored in fragments1.sdf.
By default, Fragmenter writes its output in cxsmiles format that can be processed by Fragment Statistics. fragmentation cleavage data is stored in the atom labels. In these examples we use SDF format where fragmentation cleavage data is also stored in SDF tags. You can see these data items in MView by setting Table / Show Fields (only some of the fragments shown):
![]() |
You may not be satisfied with this result very much. There are very small fragments as well as fragments that are not very interesting. There are a couple of ways to improve this result, as you will see in the examples that follow.
Fragmentation subsection of the
Params section of the Fragmenter.xml configuration file,
or else override its options by
command line options. This example applies
the latter. We reduce the number of fragments in a complete fragmentation of the molecule
(called a fragment set) from 4 to 3 (option -x)
and reduce the number of fragment sets to be generated from 8 to 2
(option -y):
fragment -c Fragmenter.xml input.mol -f sdf:-a -o fragments2.sdf -x 3 -y 2
The resulting fragments are stored in fragments2.sdf.
Some fragments from the result set (still not optimal) are shown below:
![]() |
You can also include intermediate fragment sets: those that can be subdivided by adding
more cleavage bonds. In our case this means that we include the 2-fragment fragmentations
that can be subdivided to 3 fragments by cleaving one more bond, as well as the starting
molecule itself. Now increase the number of fragment sets to be generated to a huge value
so that all possibilities could be generated (you can do this by setting -y
with a huge value or else removing the MaxSetCount attribute from the
Fragmenter.xml configuration):
fragment -c Fragmenter.xml input.mol -f sdf:-a -o fragments3.sdf -x 3 -y 200 -e
The resulting fragments are stored in fragments3.sdf.
Some sample fragments from the huge result set:
![]() |
You may still not be satisfied with the results: the next example shows a bunch of customizable rules that can be added in a single option.
<Recap> section of the
FragmenterRecap.xml XML configuration.
Run Fragmenter with the RECAP rules by adding a Reviser section to your
configuration - the new configuration XML is
FragmenterRecap.xml:
fragment -c FragmenterRecap.xml input.mol -f sdf:-a -o fragments4.sdf
The resulting fragments are stored in fragments4.sdf and are shown below:
![]() |
2 fragments in a
fragment set (-x 2), include all fragmentations with a practically unlimited
number of fragment sets (-y 200):
fragment -c FragmenterRecap.xml input.mol -f sdf:-a -o fragments5.sdf -x 2 -y 200
The resulting fragments are stored in fragments5.sdf.
Now you can see that the RECAP rules with our configuration allow only 3
cleavage bonds:
![]() |
For comparison, see the same without the RECAP rules:
fragment -c Fragmenter.xml input.mol -f sdf:-a -o fragments6.sdf -x 2 -y 200
The resulting fragments are stored in fragments6.sdf.
Some sample fragment pairs from the result:
![]() |
Observe, that when we applied the RECAP rules, we had only 3 possible cleavage bonds
with 3 resulting fragment sets containing 3*2=6 fragments alltogether,
while without these rules we have 10 possible cleavage bonds with 10
resulting fragment sets containing 10*2=20 fragments alltogether with 2
fragment repetitions. The resulting 18 different fragments in the latter case contain
some molecules you might do not want to see in a fragmentation. This indicates the strength of the
RECAP rules.
The use and meaning of command-line options in the above commands:
| Option | Description | Default |
|---|---|---|
-c |
configuration file | - |
-x |
max number of fragments in a fragment set |
unlimited |
-y |
max number of fragment sets in a molecule |
unlimited |
-e |
include fragment sets corresponding to extendable cleavage bond sets |
accept only unextendable cleavage bond sets for creating a fragment set |
-f |
specifies the output file format | cxsmiles |
-o |
specifies the output file path | standard output (console) |
Now it is your turn:
<RECAP> section of the
FragmenterRecap.xml configuration XML
(refer to the The RECAP Parameters
section in the Fragmenter Manual);
<Fragmentation> subsection
(also try to delete the MaxFragmentCount and / or the MaxSetCount
attributes to see the default (unlimited) behavior).
FragmentStatistics can be used for duplicate filtering and sorting fragments created by Fragmenter. FragmentStatistics can also categorize and sort fragments by chemical activity, based on activity data given in a specific SDF field of the input molecules.
We use a 50 molecule sample input stored in
beta2_adrenoceptor_antagonists.sdf.
Activity values are given in the ACTIVITY SDF field:
![]() |
We apply the FragmenterAll.xml Fragmenter configuration to create all fragments:
fragment -c FragmenterAll.xml beta2_adrenoceptor_antagonists.sdf -s ACTIVITY -o fragments.cxsmiles
Note, that we have to create fragments in the default cxsmiles format if we want to make
fragment statistics. The SDF field containg the activity data is specified in the -s
parameter. This is optional, only needed for chemical activity based fragment sorting.
Fragments are stored in fragments.cxsmiles, some sample
fragments out of the 897 generated fragments are shown below:
![]() |
Note, that field_1 contains the activity data of the corresponding input
molecule.
fragstat fragments.cxsmiles -o sorted.cxsmiles
We have 494 fragments sorted by the default scoring function:
the product of the atom count and the fragment occurrence:
![]() |
Data fields:
field_0: atom count
field_1: fragment occurrence
field_2: score (atom count * fragment occurrence)
1.
This means that molecules with activity value at least 1 are considered
active, while all others are inactive. By default, only fragments appearing in the
active set are listed in the output (you can include all fragments by specifying the
-a parameter).
fragstat fragments.cxsmiles -c 1 -o stat.cxsmiles
We have 348 active fragments sorted by the default scoring function:
the product of the atom count and the difference between fragment occurrences in the active
and the inactive sets:
![]() |
Data fields:
field_0: atom count
field_1: fragment occurrence in the active set (score >= 1)
field_2: fragment occurrence in the inactive set (score < 1)
field_3: score (atom count * (active occurrence - inactive occurrence))
Note, that table header with field captions is included if the -d parameter is
specified, however, in this case the output is no longer in cxsmiles format and cannot be
directly mview-ed.
The use and meaning of command-line options in the above commands:
| Option | Description | Default |
|---|---|---|
-c |
cutoff values | - |
-a |
output all fragments | output only actives |
-d |
include table header | cxsmiles format, no header |
-o |
specifies the output file path | standard output (console) |
Now it is your turn:
-c "1 6" or -c "1 4.5". Note, that multiple cutoff values
should be enclosed in quotes.
fragstat -s.