chemaxon.descriptors
Class ECFP

java.lang.Object
  extended by chemaxon.descriptors.MolecularDescriptor
      extended by chemaxon.descriptors.ECFP
All Implemented Interfaces:
chemaxon.license.Licensable, java.lang.Cloneable

public class ECFP
extends MolecularDescriptor
implements chemaxon.license.Licensable

The ECFP class implements Extended-Connectivity Fingerprints (ECFPs) as a type of MolecularDescriptors. ECFPs are circular topological fingerprints designed for molecular characterization, similarity searching, and structure-activity modeling. They are among the most popular similarity search tools in drug discovery and they are effectively used in a wide variety of applications.

The main properties of ECFPs are the following.

For more information, see the detailed HTML documentation.

Since:
JChem 5.4
Author:
Peter Kovacs (pkovacs84)

Field Summary
protected  int brightness
          The number of bits set in the binary vector storage
protected  int[] fp
          Binary vector storage of the fingerprint
protected  int[] ids
          Identifier list storage of the fingerprint
 
Fields inherited from class chemaxon.descriptors.MolecularDescriptor
params
 
Constructor Summary
ECFP()
          Creates a new, empty instance of ECFP without allocating internal storage.
ECFP(ECFP ecfp)
          Copy constructor.
ECFP(ECFPParameters params)
          Creates a new instance of ECFP according to the parameters given.
ECFP(java.lang.String params)
          Creates a new instance of ECFP according to the parameters given.
 
Method Summary
 void clear()
          Clears the fingerprint, all values are set to zero.
 ECFP clone()
          Creates a new instance with identical internal state.
 void dropBinaryVector()
          Drops the binary vector storage.
 void fromData(byte[] data)
          Builds an ECFP fingerprint from an external data format created by toData().
 void fromFeatureSet(java.util.Set<java.lang.Integer> set)
          Deprecated. As of JChem 5.4.1, replaced by fromIdentiferSet().
 void fromFloatArray(float[] descr)
          Builds an ECFP fingerprint from its float array representation.
 void fromIdentiferSet(java.util.Set<java.lang.Integer> set)
          Builds an ECFP fingerprint from a set of Integer identifers.
 void fromIntArray(int[] array)
          Builds an ECFP fingerprint from an array of int identifiers.
 void fromString(java.lang.String ecfp)
          Builds an ECFP fingerprint from its string representation created by toString().
 java.lang.String[] generate(Molecule m)
          Creates the ECFP fingerprint for the given Molecule.
 float getAsymmetricEuclidean(ECFP f)
          Calculates the asymmetric Euclidean distance.
 int getBrightness()
          Gets the brightness of the fingerprint.
 float[] getDefaultDissimilarityMetricThresholds()
          Gets the default dissimilarity threshold values for all dissimilarity metrics defined.
 int getDefaultMetricIndex()
          Gets the index of the default metric.
 float getDefaultThreshold(int metricIndex)
          Gets a metric dependent default threshold value.
 float getDissimilarity(MolecularDescriptor other)
          Calculates the dissimilarity ratio between two ECFP objects using the current default metric.
 float getDissimilarity(MolecularDescriptor other, int metricIndex)
          Calculates the dissimilarity between two ECFP objects using the specified metric, apart from that it is the same as getDissimilarity(final MolecularDescriptor other).
 java.lang.String[] getDissimilarityMetrics()
          Gets the dissimilarity metric names introduced for this class of MolecularDescriptor.
 float getEuclidean(ECFP f)
          Calculates the Euclidean distance.
 int getFeatureCount()
          Deprecated. As of JChem 5.4.1, replaced by getIdentiferCount().
 int getIdentiferCount()
          Gets the number of integer identifers generated for the fingerprint.
 java.lang.String getName()
          Gets the name of the ECFP fingerprint object.
 java.lang.String getParametersClassName()
          Gets the name of the parameters class corresponding to the descriptor.
 java.lang.String getShortName()
          Gets the short name of the fingerprint.
 float getTanimoto(ECFP f)
          Calculates the Tanimoto distance.
 float getWeightedAsymmetricEuclidean(ECFP f)
          Calculates the weighted asymmetric Euclidean distance.
 float getWeightedEuclidean(ECFP f)
          Calculates the weighted Euclidean distance.
 boolean isLicensed()
          Returns information about the licensing of the product.
protected  void requireBinaryVector()
          Checks the binary vector storage and generates it from the identifier list if necessary.
 void setLicenseEnvironment(java.lang.String env)
          Sets the license environment.
 void setParameters(MDParameters parameters)
          Sets the parameters of an already created ECFP object.
 void setParameters(java.lang.String parameters)
          Sets the parameters of an already created ECFP object.
 java.lang.String toBinaryString()
          Converts the fingerprint into a fixed-length 0,1 string.
 java.util.BitSet toBitSet()
          Returns a bit vector storing the "folded" binary representation of the fingerprint.
 byte[] toData()
          Converts an ECFP object into a byte array.
 java.lang.String toDecimalString()
          Converts the ECFP fingerprint into a tab separated string.
 java.util.Set<java.lang.Integer> toFeatureSet()
          Deprecated. As of JChem 5.4.1, replaced by toIdentiferSet().
 float[] toFloatArray()
          Creates the float array representation of a ECFP fingerprint object.
 java.util.Set<java.lang.Integer> toIdentiferSet()
          Converts the fingerprint to a set of Integer identifiers.
 int[] toIntArray()
          Converts the fingerprint to an array of int identifiers.
 java.lang.String toString()
          Converts the fingerprint into a readable string.
 
Methods inherited from class chemaxon.descriptors.MolecularDescriptor
generate, getAtomSetColors, getAtomSetIndexes, getAtomSetNames, getDissimilarityMetricIndex, getLowerBound, getMetricIndex, getMetricName, getMetricName, getNumberOfMetrics, getNumberOfWeights, getParameters, getThreshold, getThreshold, main, needsConfig, newInstance, newInstance, newInstanceFromXML, setScreeningConfiguration
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

ids

protected int[] ids
Identifier list storage of the fingerprint


fp

protected int[] fp
Binary vector storage of the fingerprint


brightness

protected int brightness
The number of bits set in the binary vector storage

Constructor Detail

ECFP

public ECFP()
Creates a new, empty instance of ECFP without allocating internal storage.


ECFP

public ECFP(ECFPParameters params)
Creates a new instance of ECFP according to the parameters given.

Parameters:
params - parameter settings

ECFP

public ECFP(java.lang.String params)
Creates a new instance of ECFP according to the parameters given.

Parameters:
params - parameter settings

ECFP

public ECFP(ECFP ecfp)
Copy constructor. An identical copy of the ECFP fingerprint passed is created. The old and the new instances share the same ECFPParameters object.

Parameters:
ecfp - fingerprint to be copied
Method Detail

clone

public ECFP clone()
Creates a new instance with identical internal state.

Specified by:
clone in class MolecularDescriptor
Returns:
the newly copied object

isLicensed

public boolean isLicensed()
Returns information about the licensing of the product.

Specified by:
isLicensed in interface chemaxon.license.Licensable
Returns:
true if the product is correctly licensed

setLicenseEnvironment

public void setLicenseEnvironment(java.lang.String env)
Sets the license environment.

Specified by:
setLicenseEnvironment in interface chemaxon.license.Licensable

getName

public java.lang.String getName()
Gets the name of the ECFP fingerprint object. This name is not the same as the class name: nicer, and more meaningful for end-users too.

Overrides:
getName in class MolecularDescriptor
Returns:
the nice, external name for ECFP class objects

getShortName

public java.lang.String getShortName()
Gets the short name of the fingerprint.

Overrides:
getShortName in class MolecularDescriptor
Returns:
the short name used in text outputs (tables etc.)

getParametersClassName

public java.lang.String getParametersClassName()
Gets the name of the parameters class corresponding to the descriptor.

Overrides:
getParametersClassName in class MolecularDescriptor
Returns:
the name of the parameters class

setParameters

public void setParameters(MDParameters parameters)
                   throws MDParametersException
Sets the parameters of an already created ECFP object.

Overrides:
setParameters in class MolecularDescriptor
Parameters:
parameters - parameter settings for the fingerprint
Throws:
MDParametersException - any XML error

setParameters

public void setParameters(java.lang.String parameters)
                   throws MDParametersException
Sets the parameters of an already created ECFP object.

Specified by:
setParameters in class MolecularDescriptor
Parameters:
parameters - parameter settings for the fingerprint
Throws:
MDParametersException - any XML error

clear

public void clear()
Clears the fingerprint, all values are set to zero.


toData

public byte[] toData()
Converts an ECFP object into a byte array. This format can be referred to as an "external representation" since it servers as the data format for storing ECFP fingerprints in databases.
Use the fromData() method to build the ECFP object from this "external" representation.

Specified by:
toData in class MolecularDescriptor
Returns:
byte array representation of the fingerprint object

fromData

public void fromData(byte[] data)
Builds an ECFP fingerprint from an external data format created by toData().

Specified by:
fromData in class MolecularDescriptor
Parameters:
data - "external" representation of a ECFP object

toString

public final java.lang.String toString()
Converts the fingerprint into a readable string. This is the default external text format of the fingerprint, which can also be stored into an SDfile.

Specified by:
toString in class MolecularDescriptor
Returns:
string representation of the fingerprint

toDecimalString

public final java.lang.String toDecimalString()
Converts the ECFP fingerprint into a tab separated string.

Specified by:
toDecimalString in class MolecularDescriptor
Returns:
string representation of the fingerprint

toBinaryString

public java.lang.String toBinaryString()
Converts the fingerprint into a fixed-length 0,1 string. This string represents the "folded" binary version of the fingerprint.

Overrides:
toBinaryString in class MolecularDescriptor
Returns:
binary string representation of the fingerprint

fromString

public final void fromString(java.lang.String ecfp)
                      throws java.text.ParseException
Builds an ECFP fingerprint from its string representation created by toString().

Specified by:
fromString in class MolecularDescriptor
Parameters:
ecfp - ECFP fingerprint string
Throws:
java.text.ParseException

toFloatArray

public float[] toFloatArray()
Creates the float array representation of a ECFP fingerprint object.

Specified by:
toFloatArray in class MolecularDescriptor
Returns:
a float array of the fingerprint values

fromFloatArray

public void fromFloatArray(float[] descr)
Builds an ECFP fingerprint from its float array representation. Typically used when a hypothesis is created.

Specified by:
fromFloatArray in class MolecularDescriptor
Parameters:
descr - fingerprint represented in a float array (e.g. generated by toFloatArray())

toIntArray

public int[] toIntArray()
Converts the fingerprint to an array of int identifiers.


fromIntArray

public void fromIntArray(int[] array)
Builds an ECFP fingerprint from an array of int identifiers.


toIdentiferSet

public java.util.Set<java.lang.Integer> toIdentiferSet()
Converts the fingerprint to a set of Integer identifiers.


fromIdentiferSet

public void fromIdentiferSet(java.util.Set<java.lang.Integer> set)
Builds an ECFP fingerprint from a set of Integer identifers.


toFeatureSet

public java.util.Set<java.lang.Integer> toFeatureSet()
Deprecated. As of JChem 5.4.1, replaced by toIdentiferSet().

Converts the fingerprint to a set of Integer identifiers.


fromFeatureSet

public void fromFeatureSet(java.util.Set<java.lang.Integer> set)
Deprecated. As of JChem 5.4.1, replaced by fromIdentiferSet().

Builds an ECFP fingerprint from a set of Integer identifers.


toBitSet

public java.util.BitSet toBitSet()
Returns a bit vector storing the "folded" binary representation of the fingerprint.


getIdentiferCount

public int getIdentiferCount()
Gets the number of integer identifers generated for the fingerprint.

Returns:
the number of identifers in the fingerprint

getFeatureCount

public int getFeatureCount()
Deprecated. As of JChem 5.4.1, replaced by getIdentiferCount().

Gets the number of integer identifers generated for the fingerprint.

Returns:
the number of identifers in the fingerprint

getBrightness

public int getBrightness()
Gets the brightness of the fingerprint. Actually, sometimes this is called the darkness. To be precise, this methods gets the numbers of 1 (one) bits in the fingerprint.

Returns:
number of bits set to 1

requireBinaryVector

protected void requireBinaryVector()
Checks the binary vector storage and generates it from the identifier list if necessary.


dropBinaryVector

public void dropBinaryVector()
Drops the binary vector storage. It will be regenerated when required.


generate

public java.lang.String[] generate(Molecule m)
                            throws MDGeneratorException
Creates the ECFP fingerprint for the given Molecule. Calls the generator created by the corresponding ECFPParameters class.

Overrides:
generate in class MolecularDescriptor
Returns:
property names set in the molecule during generation
Throws:
MDGeneratorException - when failed to generate fingerprint

getDissimilarityMetrics

public java.lang.String[] getDissimilarityMetrics()
Gets the dissimilarity metric names introduced for this class of MolecularDescriptor.

Specified by:
getDissimilarityMetrics in class MolecularDescriptor
Returns:
the metrics array

getDefaultDissimilarityMetricThresholds

public float[] getDefaultDissimilarityMetricThresholds()
Gets the default dissimilarity threshold values for all dissimilarity metrics defined.

Specified by:
getDefaultDissimilarityMetricThresholds in class MolecularDescriptor
Returns:
array of dissimilarity threshold values

getDefaultMetricIndex

public int getDefaultMetricIndex()
Gets the index of the default metric. In the case of ECFP, this is Tanimoto.

Overrides:
getDefaultMetricIndex in class MolecularDescriptor
Returns:
metric index of the default metric

getDefaultThreshold

public float getDefaultThreshold(int metricIndex)
Gets a metric dependent default threshold value. Ideally, this value should be based on statistics, though the actual value is not too critical, since these are only used in user interfaces to simplify the use of applications for beginners.

Overrides:
getDefaultThreshold in class MolecularDescriptor
Parameters:
metricIndex - index of a parameterized metric

getTanimoto

public float getTanimoto(ECFP f)
Calculates the Tanimoto distance.

Parameters:
f - the distance from f is calculated
Returns:
the tanimoto distance (dissimilarity coefficient)

getEuclidean

public float getEuclidean(ECFP f)
Calculates the Euclidean distance. This is the same as the Euclidean distance for bit string.

Parameters:
f - the distance from f is calculated
Returns:
the dissimilarity coefficient

getWeightedEuclidean

public float getWeightedEuclidean(ECFP f)
Calculates the weighted Euclidean distance. This is the same as the weighted Euclidean distance for bit strings.

Parameters:
f - the distance from f is calculated
Returns:
the dissimilarity coefficient

getAsymmetricEuclidean

public float getAsymmetricEuclidean(ECFP f)
Calculates the asymmetric Euclidean distance. This is the same as the asymmetric Euclidean distance for bit strings.

Parameters:
f - the distance from f is calculated
Returns:
the dissimilarity coefficient

getWeightedAsymmetricEuclidean

public float getWeightedAsymmetricEuclidean(ECFP f)
Calculates the weighted asymmetric Euclidean distance. This is the same as the asymmetric Euclidean distance for bit strings.

Parameters:
f - the distance from f is calculated
Returns:
the dissimilarity coefficient

getDissimilarity

public float getDissimilarity(MolecularDescriptor other)
Calculates the dissimilarity ratio between two ECFP objects using the current default metric. Default metric is set in the corresponding ECFPParameters object by setCurrentParametrizedMetric(int metricIndex). In the case of assymetric distances, swapping the two fingerprints can make big difference.

Specified by:
getDissimilarity in class MolecularDescriptor
Parameters:
other - a fingerprint, to which the dissimilarity ratio is measured
Returns:
the dissimilarity ratio

getDissimilarity

public float getDissimilarity(MolecularDescriptor other,
                              int metricIndex)
Calculates the dissimilarity between two ECFP objects using the specified metric, apart from that it is the same as getDissimilarity(final MolecularDescriptor other).

Specified by:
getDissimilarity in class MolecularDescriptor
Parameters:
other - a fingerprint, to which the dissimilarity ratio is measured
metricIndex - the index of the metric to be used
Returns:
the dissimilarity ratio
See Also:
MDParameters, PFParameters