Markush Enumeration Plugin

A Markush structure is a description of a compound class by generic notations, primarily used in patent claims and the description of combinatorial libraries. The library of a Markush structure is the total set of specific molecules that are described by the Markush structure.

The Markush enumeration plugin can be used to generate a whole or a subset of the library of a generic Markush structure. It is also capable of calculating the total number of specific structures present in a Markush library. The plugin is accessible from the Marvin GUI (Tools->Markush Enumeration), through the cxcalc command-line program (See this link for the detailed usage of the plugin in command line.), via API and in the Chemical Terms functions in JChem.

Markush features

Currently, the Markush enumeration plugin supports the following features that describe mostly Markush structures in combinatorial libraries:

Name Description Example Example Markush library member
R-groups R-groups (also referred to as "substituent variation") are the most widely known Markush generic features. The variable part of the structure is denoted by an R-atom (eg. R1), and the definitions are given separately. In each definition the connection points must be defined to show where the bonds of the R-atom are linked. R-atoms can appear in both rings and chains and can have up to two attachments points. The same R-atom can appear multiple times, and the different occurrences are handled as different cases. (So they can be substituted with different definitions.) R-group nesting in R-group definitions is allowed to any depth, but without recursion. (An R-group definition cannot use the R-atom it is defining, not even through the use of other embedding R-atom(s).) In Marvin and the plugin, R-groups up to number R32767 can be used.
Atom lists Atom lists are another example of substituent variation. They define lists of atom types at a given position. There is no restriction for the length of the list and for bond count of atom lists.
Bond lists The following bond lists (generic bond types) are supported by the plugin: single or double, any(single, double or triple), single or aromatic, double or aromatic.
Link nodes Link nodes are atoms that may repeat between two of their designated bonds (called outer bonds, denoted by brackets). All other substituents (if exist) repeat together with the atom. In the results, the new bonds between the repeating atoms will have the bond type of the lower order outer bond.
Position variation bonds Position variation bonds are bonds attached to variable atoms at one or both end positions. The set of variable atoms is drawn as a multicenter group. A position variation bond connects one atom from one end position to one atom from the other end position. If the end position is a single atom then the bond is attached to this atom, if the end position is a multicenter group then the bond is attached to an arbitrary member of the group.

Limitations:

  • Substructure search is not yet prepared to handle the case when both end positions are multicenter groups.
  • A multicenter end position is not allowed to contain R-atoms.
  • A multicenter end position is not allowed to contain another position variation bond (ie, position variation bonds cannot be nested).

If a link node is a member of a multicenter group then the group will include the repeated atoms as well in case when the original multicenter group contains no more atoms from the link fragment, otherwise the position variation bond is part of the link fragment and repeated together with the link node. Although an R-atom is not allowed to take part in position variation, it can be the single-atom end position of a position variation bond, in which case its attachment point is connected to the bond.

Functionality of the plugin

The plugin allows the following functionality. Examples are given using Marvin GUI.
Sequential enumeration
Enumerates members of the Markush library in a sequential manner (by substituting the first definition of the first variable, etc). The results are specific structures. The plugin user interface allows the enumeration of all library members, or a specified number.

Random enumeration
This mode generates a random subset of the Markush library to give a quick sampling. It is especially helpful for huge libraries, where full enumeration is impossible. In random mode variable parts are chosen randomly, and the substitution probability of each definition is proportionate with the fragment library size that the given definition generates. This ensures the generation of a representative random subset over the Markush library space.

Calculate library size
The size of the Markush library can be calculated by arbitrary precision. On the user interface, the exact value is displayed until 20 digits, above that only the magnitude is shown (for example, 10^28). The calculated number is the size of the whole library, and does not consider the valence check filter. (See below.)

Selected part enumeration
If part of the Markush structure is selected, only the generic features in the selected part are considered for enumeration/calculation. This allows focusing on a particular area of the Markush structure. Enumeration of selected parts only may result in generating (more specific) Markush structures.

Valence filter
If the Markush structure is not properly (or too generally) formulated, it is possible that it describes structures with valence errors. In this case, the valence filter setting is useful to filter out the offending result structures. The default value is on.

Scaffold alignment and coloring
Coloring the scaffold (part of the structure containg no Markush features) and/or the R-groups in enumerated structures can help visual recognition of parts of the molecules. Differentiation of the structures is aided by alignment of all structures to the original scaffold. These options are available in sequential and random enumeration.