Special search types

R-group query structures

Atom lists proved to be useful tools for creating query structures with variable atoms. JChem provides a similar variability feature of functional groups or other substructures in queries for molecule or reaction targets (tables) through the use of R-groups.

If you are interested in searching combinatorial Markush library targets (tables) described by R-group notation, see this following section.

Root structure

An R-group query structure consists of three components, a root structure (often referred to as scaffold), a set of R-group definitions and R-group conditions. The root structure contains the portion of the query structure that does not vary among the structures retrieved. R-groups (or R-groups) are attached as substituents to the root and their sites are marked with R1, R2, R3, etc. symbols. It is possible to attach multiple R-groups to one root, even to a single atom of the structure. One R-group can be attached multiple times to the same root, but it does not mean that all these attachments should be filled by the same definition (see occurrence conditions below for further information).

R-group definitions

R-group definitions are variable lists of ligands connected to specific positions of the root structure by their attachment points.

Conditions

Remarks

The following restrictions apply for R-group queries:

R-group decomposition

For a description of R-group decomposition, see this separate document.

Searching in combinatorial Markush targets/tables

Certain combinatorial libraries can be described by Markush structures that contain generic features to express variable structural features. The library of a Markush structure is the total set of specific molecules that are described by the Markush structure.

JChem allows searching in combinatorial libraries described as Markush structures, without the need to explicitly enumerate all molecules of the Markush library. The searching can handle the same generic features as the Markush Enumeration Plugin.

Generic Markush features in combinatorial Markush targets

Currently, JChem supports the following generic features that describe Markush structures in combinatorial libraries:

  1. R-groups

    R-groups (also referred to as "substituent variation") are the most widely known Markush generic features. The variable part of the structure is denoted by an R-atom (eg. R1), and the definitions are given separately. In each definition the connection points must be defined to show where the bonds of the R-atom are linked. R-atoms can appear in both rings and chains and can have up to two attachments points. The same R-atom can appear multiple times, and the different occurrences are handled as different cases. (So they can be substituted with different definitions.) R-group nesting in R-group definitions is allowed to any depth, but without recursion. (An R-group definition cannot use the R-atom it is defining, not even through the use of other embedding R-atom(s).) R-groups up to number R32767 can be used.

    Example Example Markush library member

  2. Atom lists

    Atom lists are another example of substituent variation. They define lists of atom types at a given position. There is no restriction for the length of the list and for bond count of atom lists.

    Example Example Markush library member

  3. Bond lists

    The following bond lists (generic bond types) are supported: single or double, any (single, double or triple), single or aromatic, double or aromatic. The any bond implicitly can also match aromatic bonds, when it is part of a potentially aromatic system. See: Markush aromatization.

    Example Example Markush library member

  4. Link nodes

    Link nodes are atoms that may repeat between two of their designated bonds (called outer bonds, denoted by brackets). All other substituents (if exist) repeat together with the atom. In the results, the new bonds between the repeating atoms will have the bond type of the lower order outer bond.

    Example Example Markush library member

  5. Position variation bonds

    Position variation bonds are bonds attached to variable atoms at one or both end positions. The set of variable atoms is drawn as a multicenter group. A position variation bond connects one atom from one end position to one atom from the other end position. If the end position is a single atom then the bond is attached to this atom, if the end position is a multicenter group then the bond is attached to an arbitrary member of the group.

    Limitations:

    • Substructure search is not yet prepared to handle the case when both end positions are multicenter groups.
    • A multicenter end position is not allowed to contain R-atoms.
    • A multicenter end position is not allowed to contain another position variation bond (ie, position variation bonds cannot be nested).

    If a link node is a member of a multicenter group then the group will include the repeated atoms as well in case when the original multicenter group contains no more atoms from the link fragment, otherwise the position variation bond is part of the link fragment and repeated together with the link node (see example structures). If the position variation bond is part of the link fragment the multicenter group can have atoms only within the link fragment and the link node atom.

    Although an R-atom is not allowed to take part in position variation, it can be the single-atom end position of a position variation bond, in which case its attachment point is connected to the bond.

    Example Example Markush library member

Querying combinatorial Markush targets

All structural search types are allowed for Markush targets/tables. (PERFECT, SUBSTRUCTURE, SUPERSTRUCTURE, EXACT and EXACT_FRAGMENT search types.) Similarity search is not allowed for Markush targets. The following query features are supported in the query when searching Markush targets:

Examples

Table 2. Simple substructure search examples (The bond denoted by dots is an Any bond: single or double or triple.)

  target
substructure
query

Table 3. Simple exact structure search examples (The bond denoted by dots is an Any bond: single or double or triple.)

  target
exact
structure
query

Table 4. Exact structure search examples where explicit Hydrogens are not considered.

  target
exact
structure
query

Markush structure reduction to a hit

When a query matches a Markush structure, there are different ways of displaying the hit. One possibility is to color the matching parts of the original Markush structure, but it may mean that the highlighting is spread across different fragments (R-group definitions) when the query overlaps variable parts. Markush structure reduction is a technique wherein the variable parts overlapping the hit are expanded (substituted with the appropriate specific definition). This way the hit highlighting is always visible as a whole and part of the scaffold. (Note that the resulting structure of Markush structure reduction may still contain generic features.)

Markush structure reduction examples

Table 5. Structure reduction

  target
Hit coloring in original Markush structure Markush structure reduction to the hit
substructure
query

Markush aromatization

With the introduction of generic notation in target structures, it is possible to formulate ring systems with ambiguous aromaticity status: some enumerations of the ring are aromatic, and others are not. See a simple example below.

Therefore, in case of Markush targets, it is not possible to entirely separate standardization and searching the way as described in section Standardization. Instead, aromaticity is handled in a more complex way that ensures that no matching is lost. (However, there may be false positives in case the query is not matching a full ring. See examples below.)

Standardization for Markush targets (tables) solely consists of a special aromatization method: Markush aromatization. It divides rings of the Markush structure with generic features into three different categories:

(Too complex rings that cannot be decided stay in the ambiguous category. Currently the default complexity limit is 100 enumerations of the generic features causing ambiguity.)

Searching considers aromatic and nonaromatic rings the same way as for specific structures. However, ambiguous rings have a two-step processing:

  1. In the first phase, ambiguous rings are allowed to match to both aromatic and nonaromatic query parts. The hits are then checked in a second phase.
  2. In the second phase, a hit expansion is performed. This usually results rings of less complexity in place of the ambiguous aromaticity ones. The expanded structure then is subject to another Markush aromatization and aromaticity is checked on the expanded, aromatized structure. If a ring is still ambiguous in the second phase (or too complex to decide), the matching is accepted so that no hit is lost.

Examples

Table 6. Aromaticity in Markush targets

  target
substructure
query

Back to index page