Table of Contents

Markush Enumeration

New to IJC 2.3 was the ability to handle databases of Markush structures. Associated with this is the ability to enumerate a Markush structure, and to restrict the enumerated structures to those that match the current query.

Markush structures are commonly used to describe combinatorial libraries or patent claims. It is assumed that the reader has basic knowledge of Markush structures.

Background

The underlying Marvin and JChem tools provide support for handling Markush structures in the following ways:

  1. Marvin allows drawing and display of structures with Markush features,
  2. Marvin allows enumeration of a Markush structure to generate some or all of the discrete structures described by the Markush definition. This requires a Markush Enumeration license.
  3. JChem allows a table of Markush structures to be created and searches run against that table. This allows you, for instance, to perform a substructure search against a database of Markush structures to find all Markush definitions that includes the query structure as a substructure. This is very useful, for instance, when searching patent databases. Markush Search and Markush Enumeration licenses are required.
  4. JChem allows Markush Enumeration to be performed within the context of a query structure, so that the structures that are enumerated are restricted to those that match the query structure. Markush Search and Markush Enumeration licenses are required.
Instant JChem allows you to perform all of these operations.

1. Drawing Markush structures in Marvin Sketch

Please consult the Marvin Sketch documentation.

2. Markush Enumeration in Marvin

Marvin Sketch provides a Markush Enumeration plugin. This is a feature of Marvin, not Instant JChem, but can be used in Instant JChem whenever Marvin Sketch is used. For more details please consult the Marvin Markush Enumeration plugin documentation.

3. Creating and searching Markush tables

Creating tables is described in the Editing Entities help page.

4. Enumerating Markush structures

Instant JChem has specical support for enumerating Markush structures. To do this you first need to create a JChem table containing the Markush structures. See section 3 for details.

Opening the Markush Enumeration dialog

Once you have a Markush structure table you can view the contents using the standard form or grid view. You can also run structure searches (most typically substructure searches) against this table to find only those Markush structures of interest. When you are viewing the contents of the Markush table in the grid or form view you can choose to enumerate any particular structure. Seelct the structure you want to enumerate and click on the 'Enumerate a Markush Structure' icon (  ) in the toolbar. The Markush Enumeration dialog will open.


Enumeration modes

The Markush Enumeration dialog operates in 3 different modes:

Full enumeration
This performs exhaustive enumeration of the Markush structure. Markush libraries can potentially be vast in size (bigger that the number of atoms in the universe!), so the enumeration is limited to a maximum number of structures that you can specify. By default this is set to 100 structures.
Random enumeration
This performs random enumeration of the Markush structure. This is most useful for large Markush libraries where it is not practical to fully enumerate the library. Random enumeration allows you to sample the library in a random fashion so that you obtain a good representation of the various structures in the library. The same warning about library size that are described for full enumeration also apply to random enumeration.
Markush reduction according to the hit
This option is only active when you have run a substructure search on the Markush table and when you have a Markush Search license. In this mode the enumerated structures are limited to those that contain the substructure. Whilst this usaully significantly reduces the number of enumerated structures, the limits on the enumerated library size still apply. You can see the part of the enumerated structure that corresponds to the query substructure using the typical hit display options. Note that multiple enumerated paths matching the substructure may result in the same enumerated structure, so the results may contain duplicates.

Other options

Library size
The full enumerated size of the library is displayed. This helps you decide whether to use full or partical enumeration, and whether to adjust the limit on the maximum library size. Note: the actual number of enumerated structures may be less than the calculated full enumerated library size. This is because the actual enumeration includes a valence filter that excludes incorrect structures. For instance this can happen when using query bond features e.g. an ANY bond attached to a benzene ring will give a predicted library size of 3, but when the actual enumeration is performed only a single structure will be generated as the double and tripple bond variants would result in valence errors.
Max Structures
This limits the number of enumerated structures that are generated.
Output to file
This lets you output the enumerated structures to a file rather than seeing them on screen. This is useful when you are enumerating a large library.
Rows and Columns
This lets you contol the grid size when displaying the enumerated structures.
Show R-groups
This lets you show or hide any R-Group definitions that may stil be present in the enumerated structure. This applies to using Markush reduction according to the hit where the query structure may not match a particular R-group and so its definition is still present in the enumerated structure (it is only partically enumerated).
Colouring
When performing Markush reduction according to the hit this option lets you turn on or off the highlighting of the query substructure.

Performing enumeration

Once the appropriate options have been set the enumeration can be started by pressing the 'Enumerate' button. Once running this button changes to 'Cancel' allowing the enumeration to be halted at it current position. Results are displayed as the structures are generated. If enumerating to file sample enumerated structures are displayed as the enumeration proceeds.

Notes

Enumerated libraries can be very large. Enumeration can be slow and use lots of memory. If you are wanting to enumerate large libraries then consider:

  1. Increasing the amount of memory available to Instant JChem. See the memory usage documentation for details.
  2. Outputting the results to file rather than displaying them.




Copyright © 1998-2008 ChemAxon Ltd. All rights reserved.