Reaction fingerprint

Version 5.1.2

Contents

 

Introduction, concepts

The need for the comparison of chemical reactions using computational tools is as basic as in the case of chemical structures. Techniques developed for the estimation of chemical (or other type of) similarity of molecules can be adapted to the similarity estimation of chemical reactions.

In order to define reaction similarity some basic concepts are introduced.
Simply speaking, chemical reactions transform one or more reactants to one or more products. Traditionally, reactants are drawn on the left side of the reaction arrow, while products are placed to the right of it. Thus reactants are often referred to as the left side of the reaction (and products as the right side of the reaction).
One possible approach to characterize the transformation carried out (and thus to charactirize the reaction itself) is to identify the changing atoms and the changing bonds in the reaction with respect to the reactants and the product srtuctures.
An atom is changing if

  1. one or more of its bond is changed (ie. the bond is different on the left side than on the right side); or
  2. it is present only on one side of the reaction and it has a non changing atom neighbour.

  3. A bond is changing if it is present only on one side of the reaction.

    Changing atoms and changings bonds define the reacting centre of the reaction. The reacting centre is specific to a particular type of reaction.

    Atoms and bonds in the reacting centre are colored blue. (The numbers near reacting center atoms are the atom maps that associate the reactant atoms with the product atoms.)

    Reaction similarity

    Structural properties that are present in the reaction offer a natural approach to introduce the concept of reaction similarity. Two reactions can be considered similar if their product side and/or reactant side are similar. With this consideration, reaction similarity is reduced to molecular or structural similarity.
    Nevertheless, another type of reaction similarity can be introduced by focusing on the reacting centre of the reaction. This transformational similarity is less influenced by the particular reactant and product present in a reaction but it is dominated by the reaction mechanism. Both of these types of reaction similarity are found to be useful in comparing and matching reactions.

    The degree of structural similarity is determined by the structural similarity of the reactants (products) present in the two reactions compared. The degree of transformational similarity can be examined at three distinct levels: strict, medium and coarse. Strict scale similarity compares the reactions with the broadest topological environment of the reacting centres. In contrast to this coarse similarity restricts similarity comparison to the bare reacting centres completely ignoring their neighborhoods. Topological distances introduced in the present implementation of transformational similarity are: 2, 1 and 0 according to the coarseness of similarity. These topological distances are interpreted as bond distances from atoms in the reacting centre.

    Coarse similarity: changing atoms and bonds (i.e. reacting reacting center) are taken into account.

    Medium scale similarity: atoms next to any atom in the reacting centers along with their bonds are taken into account. These atoms and bonds introduce further constraints and make the similarity comparison more strict.


    Strict scale similarity: atoms in the 2 bond neighbourhood of the reacting center atoms as well as corresponding bonds are taken into account.

    Reaction fingerprint

    The success of use of various types of fingerprints applied to the similarity calculations of chemical structures encourages the introduction of an analogous reaction fingerprint. It is apparent, that such fingerprint would support the structural similarity assessment of reactions: the topological chemical fingerprint of the reactants (products) can directly be compared. However, the transformational similarity should also be addressed, nevertheless, at three different scales of coarseness.
    Considering, that the transformational similarity was defined by topological properties of the reacting centres, topological fingerprints appear to be viable choice for transformational similarity too. The reacting centre is a substructure of the reactant (and the product), thus its topological fingerprint can be constructed and this fingerprint can be used to represent the reacting centre. This concept can also be adapted to the extended neighborhood of the reacting center, simply because the broader neighborhood of the reacting centre is just another, larger substructure.

    Based on the above considerations the structure of the reaction fingerprint is defined as follows:

    1. chemical fingerprint (CFp) of the reactant(s)
    2. CFp of the product(s)
    3. CFp of the reactant side of the reaction center
    4. CFp of the product side of the reaction center
    5. CFp of the reactant side of the reaction center including its 1 bond neighborhood
    6. CFp of the product side of the reaction center including its 1 bond neighborhood
    7. CFp of the reactant side of the reaction center including its 2 bond neighborhood
    8. CFp of the product side of the reaction center including its 2 bond neighborhood

    The total length of the reaction fingerprint (in the present implementation) is 2048 bits. The above defined 8 segments of the reaction fingerprint are layed out in the schema below (segment sizes given in number of bit):

    512 512 128 128 128 128 256 256

    This reaction fingerprint enables both types of reaction similarity calculations, and with the expense of some extra storage space it makes the transformational similarity calculation efficient in all three predefined levels of coarseness.

    Reaction fingerprint metrics

    Two types of reaction similarity calculations have been introduced: structural and transformational. Structural distinguishes the reactant and the product sides, while transformational relates to three levels of coarseness. With these considerations five metrics need to be introduced to efficiently estimate the five different cathegories of reation similarity. These metrics are as follows:

    All of these metrics are based on the Tanimoto metric, consequently the degree of similarity is expressed from 0 to 1.
    ReactantTanimoto considers only the first quarter of the reactoin fingerprint that represents the reactants in the reaction and ignores the rest of the reaction fingerprint. Therefore if estimates the structural similarity of the reactants only.
    ProductTanimoto takes the seconds quarter of the fingerprint that is associated with the products.
    StrictReactionTanimoto takes the last two segments of the reaction fingerprint that represent the reacting centre of both the reactant and the product side of the reaction with the broadest neighborhood and ignores the first 3/4 of the reaction fingerprint.
    Similarly, MediumReactionTanimoto applies the Tanimoto metric to the 5th and 6th segments; while CoarseReactionTanimoto takes the 3th and the 4th segments that encodes the reacting centre of the reactant and the product side, respectively.

     
    Copyright © 1999-2008 ChemAxon Ltd.    All rights reserved.