chemaxon.descriptors
Class ECFPFeatureLookup

java.lang.Object
  extended by chemaxon.descriptors.ECFPFeatureLookup

public class ECFPFeatureLookup
extends java.lang.Object

Class for retrieving the substructural features of ECFP fingerprints. ECFPs are represented either as lists of integer identifiers or as fixed-length bit strings, in which the identifiers and bit positions account for particular substructural features of the input molecule. This class provides a lookup service for both kinds of ECFP representations.

A related class, ECFPFeature serves for representing the substructural features of ECFP fingerprints. More precisely, each ECFPFeature instance captures a circular atom neighborhood of the input molecule by recording a central atom and a diameter. The ECFP generation process assigns integer identifiers to these substructural features by a hashing procedure. The positions of 1 bits in the fixed-length bit string representation are derived from these identifiers. This lookup class provides methods to obtain the represented ECFP features for a given identifier or bit position.

Note that there is no one-to-one relationship between the substructural features and the generated identifers. Therefore, the lookup methods of this class return a list of corresponding ECFPFeature objects for the given identifier or bit position. Apparently, atom neighborhoods that are equivalent with respect to the considered atom properties are represented by the same identifier and bit position. However, unwanted collisions may also occur, especially for the fixed-length bit string representation. That is, completely different substructural features may be represented by the same bit position due to the applied hashing method (folding). In such cases, all represented features are listed by the lookup methods. (These collisions are inevitable effects of the limited representation capability of fixed-length fingerprints.)

Apart from the collisions, it is also possible that two different identifiers represent the same atom neighborhood but originating in different central atoms. In such cases, the fingerprint generation method eliminates the redundancy by keeping only one representation according to a specific rule. For example, in the ECFP fingerprints of CO, only three identifiers (bits) are kept out of the generated four.

This class requires ECFP configuration parameters, which determine both the generation process of ECFP features and the standardization actions that should be applied on the input molecule. You should use exactly the same configuration parameters for fingerprint generation and feature retrieval to ensure correct results.

For more information about ECFPs, see the related HTML documentation.

Typical usage

    ECFPFeatureLookup lookup = new ECFPFeatureLookup();
    lookup.processMolecule(mol);
    for (ECFPFeature f : lookup.getFeaturesFromIdentifier(id)) {
        System.out.println(f.getSubstructure().toFormat("SMARTS"));
    }
 

Since:
JChem 5.5
Author:
Peter Kovacs (pkovacs84)
See Also:
ECFPFeature, ECFP

Constructor Summary
ECFPFeatureLookup()
          Creates a new ECFPFeatureLookup instance with the default ECFP configuration parameters.
ECFPFeatureLookup(ECFPParameters params)
          Creates a new ECFPFeatureLookup instance with the given ECFP configuration parameters.
ECFPFeatureLookup(java.lang.String configString)
          Creates a new ECFPFeatureLookup instance with the given ECFP configuration parameters.
 
Method Summary
 int getBitPosition(int id)
          Returns the corresponding bit position for the given integer identifier.
 java.lang.Integer getBitPosition(MolAtom atom, int diameter)
          Returns the corresponding bit position for the given atom neighborhood.
 java.util.List<ECFPFeature> getFeaturesFromBitPosition(int bitPos)
          Returns the substructural features represented by the given bit position.
 java.util.List<ECFPFeature> getFeaturesFromIdentifier(int id)
          Returns the substructural features represented by the given integer identifier.
 java.lang.Integer getIdentifier(MolAtom atom, int diameter)
          Returns the corresponding integer identifier for the given atom neighborhood.
 void processMolecule(Molecule mol)
          Performs the necessary preprocessing for the given molecule.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ECFPFeatureLookup

public ECFPFeatureLookup()
Creates a new ECFPFeatureLookup instance with the default ECFP configuration parameters.


ECFPFeatureLookup

public ECFPFeatureLookup(java.lang.String configString)
Creates a new ECFPFeatureLookup instance with the given ECFP configuration parameters.

Parameters:
configString - ECFP configuration string in XML

ECFPFeatureLookup

public ECFPFeatureLookup(ECFPParameters params)
Creates a new ECFPFeatureLookup instance with the given ECFP configuration parameters.

Parameters:
params - ECFP parameters object
Method Detail

processMolecule

public void processMolecule(Molecule mol)
Performs the necessary preprocessing for the given molecule.

Parameters:
mol - the molecule

getFeaturesFromIdentifier

public java.util.List<ECFPFeature> getFeaturesFromIdentifier(int id)
Returns the substructural features represented by the given integer identifier. If no such feature is found, this method returns an empty list.

Parameters:
id - the identifier
Returns:
the list of ECFP features

getFeaturesFromBitPosition

public java.util.List<ECFPFeature> getFeaturesFromBitPosition(int bitPos)
Returns the substructural features represented by the given bit position. If no such feature is found, this method returns an empty list.

Parameters:
bitPos - the position in the fixed-length bit string
Returns:
the list of ECFP features

getIdentifier

public java.lang.Integer getIdentifier(MolAtom atom,
                                       int diameter)
                                throws java.lang.IllegalArgumentException
Returns the corresponding integer identifier for the given atom neighborhood.

Note that the generated identifier is often removed by the fingerprint generation process because the same atom neighborhood is represented by another center atom and diameter. In these cases, this function returns null.

Parameters:
atom - the center atom of the circular neighborhood. It must be a chemical atom that is not removed in the standardization phase.
diameter - the diameter of the circular neighborhood. It must be an even number between zero and the maximum diameter specified by the ECFP configuration parameters.
Returns:
the integer identifier or null if no identifier corresponds to the given neighborhood in the generated fingerprint.
Throws:
java.lang.IllegalArgumentException - if the central atom or the diameter is illegal (e.g., the given atom is an explicit hydrogen, which is removed by the applied standardizer).

getBitPosition

public java.lang.Integer getBitPosition(MolAtom atom,
                                        int diameter)
                                 throws java.lang.IllegalArgumentException
Returns the corresponding bit position for the given atom neighborhood.

Note that the generated identifier is often removed by the fingerprint generation process because the atom neighborhood is represented by another center atom and diameter. In these cases, this function returns null.

Parameters:
atom - the center atom of the circular neighborhood. It must be a chemical atom that is not removed in the standardization phase.
diameter - the diameter of the circular neighborhood. It must be an even number between zero and the maximum diameter specified by the ECFP configuration parameters.
Returns:
the bit position or null if no identifier corresponds to the given neighborhood in the generated fingerprint.
Throws:
java.lang.IllegalArgumentException - if the central atom or the diameter is illegal (e.g., the given atom is an explicit hydrogen, which is removed by the applied standardizer).

getBitPosition

public int getBitPosition(int id)
Returns the corresponding bit position for the given integer identifier.

Parameters:
id - the identifier
Returns:
the bit position