Table of Contents
Markush Enumeration
New to IJC 2.3 was the ability to handle databases of Markush structures.
Associated with this is the ability to enumerate a Markush structure, and
to restrict the enumerated structures to those that match the current query.
Markush structures are commonly used to describe combinatorial libraries or
patent claims. It is assumed that the reader has basic knowledge of Markush
structures.
Background
The underlying Marvin and JChem tools provide support for handling Markush structures
in the following ways:
- Marvin allows drawing and display of structures with Markush features,
- Marvin allows enumeration of a Markush structure to generate some or all of
the discrete structures described by the Markush definition. This requires a
Markush Enumeration license.
- JChem allows a table of Markush structures to be created and searches
run against that table. This allows you, for instance, to perform a
substructure search against a database of Markush structures to find
all Markush definitions that includes the query structure as a substructure.
This is very useful, for instance, when searching patent databases.
Markush Search and Markush Enumeration licenses are required.
- JChem allows Markush Enumeration to be performed within the context of a
query structure, so that the structures that are enumerated are restricted
to those that match the query structure.
Markush Search and Markush Enumeration licenses are required.
Instant JChem allows you to perform all of these operations.
1. Drawing Markush structures in Marvin Sketch
Please consult the
Marvin Sketch documentation.
2. Markush Enumeration in Marvin
Marvin Sketch provides a Markush Enumeration plugin. This is a feature
of Marvin, not Instant JChem, but can be used in Instant JChem whenever
Marvin Sketch is used. For more details please consult the
Marvin Markush Enumeration plugin documentation.
3. Creating and searching Markush tables
Creating tables is described in the Editing Entities help page.
4. Enumerating Markush structures
Instant JChem has specical support for enumerating Markush structures.
To do this you first need to create a JChem table containing the Markush
structures. See section 3 for details.
Opening the Markush Enumeration dialog
Once you have a Markush structure table you can view the contents using the
standard form or grid view. You can also run structure searches (most typically
substructure searches) against this table to find only those Markush structures
of interest. When you are viewing the contents of the Markush table in the grid
or form view you can choose to enumerate any particular structure. Seelct the structure
you want to enumerate and click on the 'Enumerate a Markush Structure' icon
(
)
in the toolbar. The Markush Enumeration dialog will open.
Enumeration modes
The Markush Enumeration dialog operates in 3 different modes:
- Full enumeration
- This performs exhaustive enumeration of the Markush structure.
Markush libraries can potentially be vast in size (bigger that the
number of atoms in the universe!), so the enumeration is limited to a
maximum number of structures that you can specify. By default this is
set to 100 structures.
- Random enumeration
- This performs random enumeration of the Markush structure. This
is most useful for large Markush libraries where it is not practical to
fully enumerate the library. Random enumeration allows you to sample
the library in a random fashion so that you obtain a good representation
of the various structures in the library.
The same warning about library size that are described for full enumeration
also apply to random enumeration.
- Markush reduction according to the hit
- This option is only active when you have run a substructure search
on the Markush table and when you have a Markush Search license.
In this mode the enumerated structures are limited to those that contain
the substructure.
Whilst this usaully significantly reduces the number of enumerated
structures, the limits on the enumerated library size still apply.
You can see the part of the enumerated structure that corresponds
to the query substructure using the typical hit display options.
Note that multiple enumerated paths matching the substructure may
result in the same enumerated structure, so the results may contain
duplicates.
Other options
- Library size
- The full enumerated size of the library is displayed. This helps you
decide whether to use full or partical enumeration, and whether to
adjust the limit on the maximum library size. Note: the actual number
of enumerated structures may be less than the calculated full enumerated
library size. This is because the actual enumeration includes a valence filter
that excludes incorrect structures. For instance this can happen when
using query bond features e.g. an ANY bond attached to a benzene ring
will give a predicted library size of 3, but when the actual enumeration
is performed only a single structure will be generated as the double and
tripple bond variants would result in valence errors.
- Max Structures
- This limits the number of enumerated structures that are generated.
- Output to file
- This lets you output the enumerated structures to a file rather than
seeing them on screen. This is useful when you are enumerating a large
library.
- Rows and Columns
- This lets you contol the grid size when displaying the enumerated
structures.
- Show R-groups
- This lets you show or hide any R-Group definitions that may stil be
present in the enumerated structure. This applies to using Markush reduction
according to the hit where the query structure may not match a particular
R-group and so its definition is still present in the enumerated structure
(it is only partically enumerated).
- Colouring
- When performing Markush reduction according to the hit this option
lets you turn on or off the highlighting of the query substructure.
Performing enumeration
Once the appropriate options have been set the enumeration can be started
by pressing the 'Enumerate' button. Once running this button changes to 'Cancel'
allowing the enumeration to be halted at it current position. Results are
displayed as the structures are generated. If enumerating to file sample
enumerated structures are displayed as the enumeration proceeds.
Notes
Enumerated libraries can be very large. Enumeration can be slow and use
lots of memory. If you are wanting to enumerate large libraries then consider:
- Increasing the amount of memory available to Instant JChem. See the
memory usage documentation
for details.
- Outputting the results to file rather than displaying them.
Copyright © 1998-2008
ChemAxon Ltd.
All rights reserved.