chemaxon.descriptors
Class ChemicalFingerprint

java.lang.Object
  extended by chemaxon.descriptors.MolecularDescriptor
      extended by chemaxon.descriptors.ChemicalFingerprint
All Implemented Interfaces:
java.lang.Cloneable

public class ChemicalFingerprint
extends MolecularDescriptor

The ChemicalFingerprint class implements topological fingerprints as a type of MolecularDescriptors. Such fingerprints encode the topological connection between atoms of the chemical graph. Though such encoding loses information, still it preserves enough to allow fast comparisons of chemical structures without their direct structural comparison but instead involving their topological fingerprints.
This class provides two metrics for dissimilarity calculations: Tanimoto and Euclidean. Many varieties of the base metrics are supported, for instance scaling, directing, weighting. Euclidean has a normalized form too in order to upper bound the otherwise unbounded Euclidean metric.

Typical usage:


Generating fingerprints
      CFParameters params = new CFParameters( "config.xml" );
      CF fp = new CF( params );

      // always use an MDSet object, even if it has one component only 
      MDSet ds = new MDSet();
      ds.addDescriptor( pfp );

      // create an input source reader that takes molecules from a smiles file
      MDFileReader src = new MDFileReader( "input.smiles" );
      src.setIdTagName( "CGX_ID" ); // just an example

      // process input: get the fingerprints from the input source and do sg

      while ( src.next( ds ) ) {  // scr generates the descriptor!
          do_something( ds );
      }
      src.close();
 

Since:
JChem 2.0
Author:
Miklos Vargyas, Peter Kovacs (pkovacs84)

Field Summary
protected  int brightness
          number of bits set in the fingerprint (sometimes this is called the darkness, but that seems to be less pausible)
protected  int[] fp
          storage for the fingerprint
 
Fields inherited from class chemaxon.descriptors.MolecularDescriptor
params
 
Constructor Summary
ChemicalFingerprint()
          Creates a new, empty instance of ChemicalFingerprint without allocating internal storage.
ChemicalFingerprint(CFParameters params)
          Creates a new instance of ChemicalFingerprint according to the parameters given.
ChemicalFingerprint(ChemicalFingerprint cfp)
          Copy constructor.
ChemicalFingerprint(java.lang.String params)
          Creates a new instance of ChemicalFingerprint according to the parameters given.
 
Method Summary
 void clear()
          Clears the fingerprint: sets all bins to store zero value.
 ChemicalFingerprint clone()
          Creates a new instance with identical internal state.
 void fromData(byte[] dbRepr)
          Builds a fingerprint from an external data format, created by a previous call to toData().
 void fromFloatArray(float[] descr)
          Builds fingerprint from its float array representation.
 void fromString(java.lang.String cfp)
          Builds a fingerprint from its string representation created by toString().
 java.lang.String[] generate(Molecule m)
          Creates the ChemicalFingerprint descriptor for the given Molecule.
 float getAsymmetricEuclidean(ChemicalFingerprint f)
          Calculates the asymmetric Euclidean distance.
 int getBrightness()
          Gets the brightness of the fingerprint.
 int getCommonBitCount(ChemicalFingerprint f)
           
 float[] getDefaultDissimilarityMetricThresholds()
          Gets the default dissimilarity threshold values for all dissimilarity metrics defined.
 int getDefaultMetricIndex()
          Gets the index of the default metric.
 float getDefaultThreshold(int metricIndex)
          Gets a metric dependent default threshold value.
 float getDissimilarity(MolecularDescriptor fp2)
          Calculates the dissimilarity between two chemical fingerprints using the default distance measure.
 float getDissimilarity(MolecularDescriptor fp2, int metricIndex)
          Calculates the dissimilarity between two chemical fingerprints using the specified distance metric.
 java.lang.String[] getDissimilarityMetrics()
          Gets the dissimilarity metric names
 float getEuclidean(ChemicalFingerprint f)
          Calculates the Euclidean distance.
 float getLowerBound(java.lang.Object fp2)
          Calculates the lower bound estimate of the dissimilarity from the given fingerprint.
 java.lang.String getName()
          Gets the name of the ChemicalFingerprint object.
 java.lang.String getParametersClassName()
          Gets the name of the parameters class corresponding to the descriptor.
 java.lang.String getShortName()
          Gets the short name of the descriptor.
 float getTanimoto(ChemicalFingerprint f)
          Calculates the Tanimoto metric.
 float getTversky(ChemicalFingerprint f)
          Calculates the Tversky !!
 float getWeightedAsymmetricEuclidean(ChemicalFingerprint f)
          Calculates the weighted asymmetric Euclidean distance.
 float getWeightedEuclidean(ChemicalFingerprint f)
          Calculates the weighted Euclidean distance.
 boolean isSubSetOf(ChemicalFingerprint f)
          Checks if this fingerprint is a subset of another fingerprint that is passed as method parameter.
 void setParameters(MDParameters parameters)
          Sets parameters, allocates internal storage if needed and cleans the descriptor.
 void setParameters(java.lang.String parameters)
          Sets the parameters of an already created ChemicalFingerprint object.
 java.lang.String toBinaryString()
          Converts the fingerprint into a 0,1 string.
 byte[] toData()
          Converts a chemical fingerprint object into a byte array.
 java.lang.String toDecimalString()
          Converts the fingerprint into a tab separated string.
 float[] toFloatArray()
          Creates the float array representation of the fingerprint.
 java.lang.String toString()
          Converts the fingerprint into a readable string.
 
Methods inherited from class chemaxon.descriptors.MolecularDescriptor
generate, getAtomSetColors, getAtomSetIndexes, getAtomSetNames, getDissimilarityMetricIndex, getLowerBound, getMetricIndex, getMetricName, getMetricName, getNumberOfMetrics, getNumberOfWeights, getParameters, getThreshold, getThreshold, main, needsConfig, newInstance, newInstance, newInstanceFromXML, setScreeningConfiguration
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

fp

protected int[] fp
storage for the fingerprint


brightness

protected int brightness
number of bits set in the fingerprint (sometimes this is called the darkness, but that seems to be less pausible)

Constructor Detail

ChemicalFingerprint

public ChemicalFingerprint()
Creates a new, empty instance of ChemicalFingerprint without allocating internal storage.


ChemicalFingerprint

public ChemicalFingerprint(CFParameters params)
Creates a new instance of ChemicalFingerprint according to the parameters given.

Parameters:
params - parameters used in fingerprint generation and handling
Since:
JChem 2.2

ChemicalFingerprint

public ChemicalFingerprint(java.lang.String params)
Creates a new instance of ChemicalFingerprint according to the parameters given.

Parameters:
params - parameter settings

ChemicalFingerprint

public ChemicalFingerprint(ChemicalFingerprint cfp)
Copy constructor. An identical copy of the chemical fingerprint passed is created. The old and the new instances share the same CFParameters object.

Parameters:
cfp - fingerprint to be copied
Method Detail

clone

public ChemicalFingerprint clone()
Creates a new instance with identical internal state.

Specified by:
clone in class MolecularDescriptor
Returns:
the newly copied object

getName

public java.lang.String getName()
Gets the name of the ChemicalFingerprint object. The name is not the same as the class name, it is nicer, more readable and meaningful for end-users too.

Overrides:
getName in class MolecularDescriptor
Returns:
the nice, external name for ChemicalFingerprint class objects

getShortName

public java.lang.String getShortName()
Gets the short name of the descriptor.

Overrides:
getShortName in class MolecularDescriptor
Returns:
the short name used in text outputs (tables etc.)

getParametersClassName

public java.lang.String getParametersClassName()
Gets the name of the parameters class corresponding to the descriptor.

Overrides:
getParametersClassName in class MolecularDescriptor
Returns:
the name of the parameters class

getBrightness

public int getBrightness()
Gets the brightness of the fingerprint. Actually, sometimes this is called the darkness. To be precise, this methods gets the numbers of 1 (one) bits in the fingerprint.

Returns:
number of bits set to 1

setParameters

public void setParameters(MDParameters parameters)
Sets parameters, allocates internal storage if needed and cleans the descriptor.

Overrides:
setParameters in class MolecularDescriptor
Parameters:
parameters - fingerprint parameters

setParameters

public void setParameters(java.lang.String parameters)
                   throws MDParametersException
Sets the parameters of an already created ChemicalFingerprint object.

Specified by:
setParameters in class MolecularDescriptor
Parameters:
parameters - parameter settings for the descriptor
Throws:
MDParametersException - any XML error

toData

public byte[] toData()
Converts a chemical fingerprint object into a byte array. This format can be reffered to as an "external representation" since it servers as the data format for storing fingerprints in databases.
Use the fromData() method to build the fingerprint from this "external" representation.

Specified by:
toData in class MolecularDescriptor
Returns:
byte array representation of the fingerprint object

fromData

public void fromData(byte[] dbRepr)
Builds a fingerprint from an external data format, created by a previous call to toData().

Specified by:
fromData in class MolecularDescriptor
Parameters:
dbRepr - "external" representation of ChemicalFingerprint

clear

public final void clear()
Clears the fingerprint: sets all bins to store zero value.


toString

public final java.lang.String toString()
Converts the fingerprint into a readable string. This is the default external text format of the fingerprint, which can also be stored into an SDfile.

Specified by:
toString in class MolecularDescriptor
Returns:
string representation of the fingerprint

toDecimalString

public final java.lang.String toDecimalString()
Converts the fingerprint into a tab separated string.

Specified by:
toDecimalString in class MolecularDescriptor
Returns:
string representation of the fingerprint

toBinaryString

public java.lang.String toBinaryString()
Converts the fingerprint into a 0,1 string.

Overrides:
toBinaryString in class MolecularDescriptor
Returns:
binary string representation of the fingerprint
Since:
JChem 2.3

fromString

public final void fromString(java.lang.String cfp)
                      throws java.text.ParseException
Builds a fingerprint from its string representation created by toString().

Specified by:
fromString in class MolecularDescriptor
Parameters:
cfp - fingerprint string
Throws:
java.text.ParseException

toFloatArray

public final float[] toFloatArray()
Creates the float array representation of the fingerprint. This array contains all values of the fingerprint (including all zeros) in the elements of the array.

Specified by:
toFloatArray in class MolecularDescriptor
Returns:
a float array representation of the fingerprint
Since:
JChem 2.0.1

fromFloatArray

public void fromFloatArray(float[] descr)
                    throws java.lang.RuntimeException
Builds fingerprint from its float array representation. Typically used when a hypothesis is created.

Specified by:
fromFloatArray in class MolecularDescriptor
Parameters:
descr - fingerprint represented in a float array (e.g. generated by toFloatArray())
Throws:
java.lang.RuntimeException
Since:
JChem 2.0.1

generate

public java.lang.String[] generate(Molecule m)
                            throws MDGeneratorException
Creates the ChemicalFingerprint descriptor for the given Molecule. Calls the generator created by the corresponding MDParameters class.

Overrides:
generate in class MolecularDescriptor
Returns:
property names set in the molecule passed during generation
Throws:
MDGeneratorException - when failed to generate descriptor

getDissimilarityMetrics

public java.lang.String[] getDissimilarityMetrics()
Gets the dissimilarity metric names

Specified by:
getDissimilarityMetrics in class MolecularDescriptor
Returns:
the metrics array

getDefaultDissimilarityMetricThresholds

public float[] getDefaultDissimilarityMetricThresholds()
Gets the default dissimilarity threshold values for all dissimilarity metrics defined.

Specified by:
getDefaultDissimilarityMetricThresholds in class MolecularDescriptor
Returns:
array of dissimilarity threshold values

getDefaultMetricIndex

public int getDefaultMetricIndex()
Gets the index of the default metric. In the case of this class this is Tanimoto.

Overrides:
getDefaultMetricIndex in class MolecularDescriptor
Returns:
metric index of the default metric

getDefaultThreshold

public float getDefaultThreshold(int metricIndex)
Gets a metric dependent default threshold value. Ideally, this value should be based on statistics, though the actual value is not too critical, since these are only used in user interfaces to simplify the use of applications for beginners.

Overrides:
getDefaultThreshold in class MolecularDescriptor
Parameters:
metricIndex - index of a parameterized metric

getCommonBitCount

public int getCommonBitCount(ChemicalFingerprint f)

getTanimoto

public float getTanimoto(ChemicalFingerprint f)
Calculates the Tanimoto metric.

Parameters:
f - the distance from f is calculated
Returns:
the tanimoto distance (dissimilarity coefficient)

getTversky

public float getTversky(ChemicalFingerprint f)
Calculates the Tversky !!DISSIMILARITY!! index: (1-(commonly used tversky))

Parameters:
f - the distance from f is calculated
Returns:
the Tversky dissmilarity index as float

getEuclidean

public float getEuclidean(ChemicalFingerprint f)
Calculates the Euclidean distance. This is the same as the Euclidean distance for bit string.

Parameters:
f - the distance from f is calculated
Returns:
the dissimilarity coefficient

getWeightedEuclidean

public float getWeightedEuclidean(ChemicalFingerprint f)
Calculates the weighted Euclidean distance. This is the same as the weighted Euclidean distance for bit strings.

Parameters:
f - the distance from f is calculated
Returns:
the dissimilarity coefficient

getAsymmetricEuclidean

public float getAsymmetricEuclidean(ChemicalFingerprint f)
Calculates the asymmetric Euclidean distance. This is the same as the asymmetric Euclidean distance for bit strings.

Parameters:
f - the distance from f is calculated
Returns:
the dissimilarity coefficient

getWeightedAsymmetricEuclidean

public float getWeightedAsymmetricEuclidean(ChemicalFingerprint f)
Calculates the weighted asymmetric Euclidean distance. This is the same as the asymmetric Euclidean distance for bit strings.

Parameters:
f - the distance from f is calculated
Returns:
the dissimilarity coefficient

getDissimilarity

public float getDissimilarity(MolecularDescriptor fp2)
Calculates the dissimilarity between two chemical fingerprints using the default distance measure.

Specified by:
getDissimilarity in class MolecularDescriptor
Parameters:
fp2 - the other pahrmacophore fingerprint
Returns:
dissimilarity ratio

getDissimilarity

public float getDissimilarity(MolecularDescriptor fp2,
                              int metricIndex)
Calculates the dissimilarity between two chemical fingerprints using the specified distance metric. The index of the required metric can be obtained by calling getMetricIndex( String metricName ) .
New metrics implemented by this class have to be added at the end of the existing ones.

Specified by:
getDissimilarity in class MolecularDescriptor
Parameters:
fp2 - the chemical fingerprint from which the distance is measured
metricIndex - index of the metric to be used
Returns:
the dissimilarity ratio
See Also:
MDParameters, PFParameters

getLowerBound

public float getLowerBound(java.lang.Object fp2)
Calculates the lower bound estimate of the dissimilarity from the given fingerprint. In the case of ChemicalFingerprint a good estimate for the minimum distance cannot be obtained efficiently (that is, significantly faster than calculating the proper distance) therefore 0 is returned. This trivial distance bound estimation will lead to calling getDistance.

Parameters:
fp2 - chemical fingerprint from which distance is measured
Returns:
estimate of the minimum distance

isSubSetOf

public boolean isSubSetOf(ChemicalFingerprint f)
Checks if this fingerprint is a subset of another fingerprint that is passed as method parameter. A binary fingerprint is considered to be a subset of another if none of its bits is larger than that of the other's.

Parameters:
f - a descriptor which is supposed to be a superset
Returns:
true if this descriptor is a subset of the parameter