chemaxon.descriptors
Class MDSimilarity

java.lang.Object
  extended by chemaxon.descriptors.MDSimilarity
All Implemented Interfaces:
chemaxon.license.Licensable

public class MDSimilarity
extends java.lang.Object
implements chemaxon.license.Licensable

Performs similarity comparisons between MDSets (see MDSet (for example set of chemical fingerprints and/or pharmacophore fingerprints). Comparisons may be performed, when all the query descriptor sets to which molecular descriptor sets will be compared are added, the metrics to be used are set and filtering options are also set. If filtering thresholds are applied then they should be also given.

After a comparison results may be retreived by calling methods getDissimilarityCoeff() or getDissimilarityCoeffs().

Typical usage:

 MDSimilarity similarity = MDSimilarity();

 // Add queries from MDReader
 similarity.addQueries( queryReader );            
 // Setup metrics and thresholds
 for ( int d = 0; d < descriptorCount; d++ ) {
     for ( int m = 0; m < metricIndices[ d ].length; m++ ) {
          similarity.useMetric( d, metricIndices[ d ][ m ], thresholds[ d ][ m ]);
     }
 }
 // Setup filtering
 if ( andMetrics ) 
     similarity.passWithAllMetrics();
 else
     similarity.passWithOneMetric();
 if ( andDescriptors ) 
     similarity.passWithAllDescriptors();
 else
     similarity.passWithOneDescriptor();
 
 // Setup result writer (table writer in this case)  
 MDSimilarityTableWriter twr = new MDSimilarityTableWriter( outputStream, precision );
 if ( !verboseSet ) {
    twr.setVerbosity( verbose );
    twr.setVerboseFrequency( verboseFreq );
    verboseSet = true;
 }
 twr.setPrintId( generateId );
 if ( idTagName != null ) {
     twr.setPrintNaturalId( true );
     twr.setNaturalIdName( idTagName );
 }
 twr.setPrecision( precision );
 similarity.addResultWriter( twr );

 // Perform comparisons, results are written into the specified result writer
 similarity.compare( targetReader );
 

Since:
JChem 2.0
Author:
Zsuzsanna Szabo

Constructor Summary
MDSimilarity()
          Creates a new instance.
 
Method Summary
 void addQueries(MDReader queryReader)
          Adds new query molecules as their set of descriptors from a chemical descriptor reader.
 void addQueries(MDSet[] queries)
          Adds new query molecules as their set of descriptors from an array.
 void addQuery(MDSet query)
          Adds a new query molecule as its set of descriptors.
 void addResultWriter(MDSimilarityResultWriter rwr)
          Adds a MDSimilarityResultWriter object.
 boolean compare(int mdIndex, int metricIndex, MDSet target)
          Compares a target descriptor against all queries added prior to the call of this method using the given metric of the given descriptor.
 int compare(MDReader targetReader)
          Compares a list of target descriptor sets (read by a molecular descriptor reader) against all queries added prior to the call of this method the same way as compareQueries( MolecularDescriptor target ) but for each target.
 boolean compare(MDSet target)
          Compares a target descriptor set (for instance from a database) against all queries added prior to the call of this method.
 float getDissimilarityCoeff(int queryIndex, int mdIndex, int metricIndex)
          Retrieves query dissimilarity coefficients (one at a time) of the last compareQueries() or compare() method called.
 float[][] getDissimilarityCoeffs(int queryIndex)
          Retrieves query dissimilarity coefficients with all metrics and one query of the last compareQueries() or compare() method called.
 float[] getDissimilarityCoeffs(int queryIndex, int mdIndex)
          Retrieves query dissimilarity coefficients with all metrics and one descriptor of the last compareQueries() or compare() method called.
 int getNrOfQueries()
          Gets the number of queries that have already been added.
 int getNrOfUsedMetrics(int mdIndex)
          Return the number of metrics used with the given molecular descriptor in similarity calculations.
 MDSet getQuery(int queryIndex)
          Gets a query.
 boolean isComponentWise()
          Checks the component-wise flag.
 boolean isLicensed()
           
 boolean isPassWithAllDescriptors()
          Tells whether filtering of target descriptor sets is set to pass only if each descriptor in the set passes.
 boolean isPassWithAllMetrics()
          Tells whether filtering of target descriptor sets is set to pass only if dissimilarity calculated with each metric used with the descriptor is under the required threshold.
 boolean isPassWithOneDescriptor()
          Tells whether filtering of target descriptor sets is set to pass if at least one descriptor in the set passes.
 boolean isPassWithOneMetric()
          Tells whether filtering of target descriptor sets is set to pass if dissimilarity calculated with at least one metric used with the descriptor is under the required threshold.
 boolean isUsedMetric(int mdIndex, int metricIndex)
          Return if the given metric is used with the given molecular descriptor in similarity calculations.
 void passWithAllDescriptors()
          In the following searches the descriptor set of a target molecule passes the comparison with a query descriptor set, if all descriptors of the set have passed the corresponding comparisons.
 void passWithAllMetrics()
          In the following searches a target molecule's molecular descriptor passes the comparison with a corresponding query descriptor, if all dissimilarity coefficients (distances calculated with each metric) between these descriptors are under the previously given threshold.
 void passWithOneDescriptor()
          In the following searches the descriptor set of a target molecule passes the comparison with a query descriptor set, if at least one descriptor of the set have passed the corresponding comparisons.
 void passWithOneMetric()
          In the following searches a target molecule's molecular descriptor passes the comparison with a corresponding query descriptor, if at least one dissimilarity coefficient between these descriptors is under the previously given threshold.
 void setComponentWise(boolean componentWise)
          Sets MDSet evaluation mode.
 void setLicenseEnvironment(java.lang.String env)
           
 void setThreshold(float threshold)
          Sets threshold for descriptor set mode.
 float threshold(int mdIndex, int metricIndex)
          Return the acceptance threshold of the given metric for the given molecular descriptor.
 void useMetric(int mdIndex, int metricIndex)
          Use the specified metric for the specified molecular descriptor with the dissimilarity threshold stored in the corresponding parameters settings.
 void useMetric(int mdIndex, int metricIndex, float threshold)
          Use the specified metric for the specified molecular descriptor along with the given dissimilarity threshold.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MDSimilarity

public MDSimilarity()
Creates a new instance. Allocates internal storage.

Method Detail

setComponentWise

public void setComponentWise(boolean componentWise)
Sets MDSet evaluation mode. Default mode is composite (descriptor set) mode, when one dissimilarity value is calculated for each descriptor set (using selected/default metrics per components and calculating the weighted sum of these dissimilairty values). In component-wise mode each component of a descriptor set yield one dissimilarity value and these values are kept independent in screening (ie. they are not summed).

Parameters:
componentWise - indicates component-wise evaluation model]
Since:
JChem 2.2

addResultWriter

public void addResultWriter(MDSimilarityResultWriter rwr)
Adds a MDSimilarityResultWriter object. A MDSimilarity instance can have an arbitrary number and type of such MDSimilarityResultWriters and all are envoked (in the same order as they were added) after each target MDSet has been processed.

Parameters:
rwr - a result writer object
Since:
JChem 2.2

addQuery

public void addQuery(MDSet query)
Adds a new query molecule as its set of descriptors. The number of queries is not limited, however their number is supposed to be significantly smaller than the number of targets. In typical usage the number of queries does not exceed 10.
Once a query is added, it cannot be withdrawn. Added queries must be the composition of the same kind of descriptors.

Parameters:
query - Query descriptor set, it is not cloned.

addQueries

public void addQueries(MDSet[] queries)
Adds new query molecules as their set of descriptors from an array.

Parameters:
queries - Array of query descriptor sets, it is not cloned.

addQueries

public void addQueries(MDReader queryReader)
                throws MDReaderException
Adds new query molecules as their set of descriptors from a chemical descriptor reader.

Parameters:
queryReader - Molecular descriptor set reader of the queries.
Throws:
MDReaderException
Since:
JChem 2.2

getQuery

public MDSet getQuery(int queryIndex)
Gets a query.

Parameters:
queryIndex - The index of the query (in order of addition) from 0 to getNrOfQueries() - 1 (both inclusive).
Returns:
The set of molecular descriptors of the query

getNrOfQueries

public int getNrOfQueries()
Gets the number of queries that have already been added.

Returns:
Number of query descriptors.

setThreshold

public void setThreshold(float threshold)
Sets threshold for descriptor set mode. (Component-wise mode uses different threshold values for each descriptor component and metric.)

Parameters:
threshold - similarity threshold
Since:
JChem 2.2

useMetric

public void useMetric(int mdIndex,
                      int metricIndex,
                      float threshold)
Use the specified metric for the specified molecular descriptor along with the given dissimilarity threshold.

Parameters:
mdIndex - Index of the molecular descriptor in the set.
metricIndex - Index of the metric.
threshold - Maximum dissimilarity allowed.

useMetric

public void useMetric(int mdIndex,
                      int metricIndex)
Use the specified metric for the specified molecular descriptor with the dissimilarity threshold stored in the corresponding parameters settings.

Parameters:
mdIndex - Index of the molecular descriptor in the set.
metricIndex - Index of the metric.

isUsedMetric

public boolean isUsedMetric(int mdIndex,
                            int metricIndex)
Return if the given metric is used with the given molecular descriptor in similarity calculations.

Parameters:
mdIndex - Index of the molecular descriptor in the set.
metricIndex - Index of the metric.
Returns:
Metric in use flag.

getNrOfUsedMetrics

public int getNrOfUsedMetrics(int mdIndex)
Return the number of metrics used with the given molecular descriptor in similarity calculations.

Parameters:
mdIndex - Index of the molecular descriptor in the set.
Returns:
Metric in use flag.

threshold

public float threshold(int mdIndex,
                       int metricIndex)
Return the acceptance threshold of the given metric for the given molecular descriptor.

Parameters:
mdIndex - Index of the molecular descriptor in the set.
metricIndex - Index of the metric.
Returns:
Threshold value, -1.0F, if metric is not used.

isComponentWise

public boolean isComponentWise()
Checks the component-wise flag.

Returns:
true if screening work in component-wise mode
Since:
JChem 2.2

passWithAllMetrics

public void passWithAllMetrics()
In the following searches a target molecule's molecular descriptor passes the comparison with a corresponding query descriptor, if all dissimilarity coefficients (distances calculated with each metric) between these descriptors are under the previously given threshold. If this flag is not set, then one coefficient under the threshold is enough for passing (default).


isPassWithAllMetrics

public boolean isPassWithAllMetrics()
Tells whether filtering of target descriptor sets is set to pass only if dissimilarity calculated with each metric used with the descriptor is under the required threshold.


passWithOneMetric

public void passWithOneMetric()
In the following searches a target molecule's molecular descriptor passes the comparison with a corresponding query descriptor, if at least one dissimilarity coefficient between these descriptors is under the previously given threshold. This is the default setting.


isPassWithOneMetric

public boolean isPassWithOneMetric()
Tells whether filtering of target descriptor sets is set to pass if dissimilarity calculated with at least one metric used with the descriptor is under the required threshold.


passWithAllDescriptors

public void passWithAllDescriptors()
In the following searches the descriptor set of a target molecule passes the comparison with a query descriptor set, if all descriptors of the set have passed the corresponding comparisons. If this flag is not set, then one passing descriptor from the set is enough for passing (default).


isPassWithAllDescriptors

public boolean isPassWithAllDescriptors()
Tells whether filtering of target descriptor sets is set to pass only if each descriptor in the set passes.


passWithOneDescriptor

public void passWithOneDescriptor()
In the following searches the descriptor set of a target molecule passes the comparison with a query descriptor set, if at least one descriptor of the set have passed the corresponding comparisons. This is the default setting.


isPassWithOneDescriptor

public boolean isPassWithOneDescriptor()
Tells whether filtering of target descriptor sets is set to pass if at least one descriptor in the set passes.


compare

public boolean compare(int mdIndex,
                       int metricIndex,
                       MDSet target)
                throws java.lang.RuntimeException
Compares a target descriptor against all queries added prior to the call of this method using the given metric of the given descriptor. The results of the comparison (the dissimilarity coefficients) are stored internally, but only the results of the last comparison are kept, former values are discarded. Thus it is the responsibility of the user of this class to obtain required values by calling queryDissimilarityCoeffs() after compareQueries() is performed.
The method can be used for filtering purposes, in which case its return value indicates whether the current target descriptor set is filtered out or not. Threshold values are set separately with useMetric().

Parameters:
mdIndex - Index of the molecular descriptor.
metricIndex - Index of the metric.
target - Target descriptor set.
Returns:
Target passed filtering or not.
Throws:
java.lang.RuntimeException

compare

public boolean compare(MDSet target)
                throws java.lang.RuntimeException
Compares a target descriptor set (for instance from a database) against all queries added prior to the call of this method. Results of the comparison (the dissimilarity coefficients) are stored internally, but only the results of the last comparison are kept, former values are discarded. Thus it is the responsibility of the user of this class to obtain required values by calling queryDissimilarityCoeffs() after compareQueries() is performed.
The method can be used for filtering purposes, in which case its return value indicates whether the current target descriptor set is filtered out or not. Threshold values are set separately with useMetric().

Parameters:
target - Target descriptor set.
Returns:
Target passed filtering or not.
Throws:
java.lang.RuntimeException

compare

public int compare(MDReader targetReader)
            throws MDReaderException,
                   java.lang.RuntimeException
Compares a list of target descriptor sets (read by a molecular descriptor reader) against all queries added prior to the call of this method the same way as compareQueries( MolecularDescriptor target ) but for each target.
Processing the results is the responsibility of the class implementing the MDSimilarityResultWriter interface.
Before starting the processing of targets the open() procedure of MDSimilarityResultWriter is executed, then after processing each target the procedure write() is invoked, after the processing has ended the procedure close() is invoked.

Parameters:
targetReader - Reader of target descriptor sets.
Returns:
Number of targets that passed filtering.
Throws:
MDReaderException
java.lang.RuntimeException
Since:
JChem 2.2

getDissimilarityCoeff

public float getDissimilarityCoeff(int queryIndex,
                                   int mdIndex,
                                   int metricIndex)
Retrieves query dissimilarity coefficients (one at a time) of the last compareQueries() or compare() method called.

Parameters:
queryIndex - Index of the query molecule. Query molecules are numbered from 0 to nQueries() - 1, in the same order as added with addQuery().
mdIndex - Index of molecular descriptor component in the set.
metricIndex - Index of the metric.
Returns:
Value of the dissimilarity coefficient.

getDissimilarityCoeffs

public float[] getDissimilarityCoeffs(int queryIndex,
                                      int mdIndex)
Retrieves query dissimilarity coefficients with all metrics and one descriptor of the last compareQueries() or compare() method called.

Parameters:
queryIndex - Index of the query molecule. Query molecules are numbered from 0 to nQueries() - 1, in the same order as added with addQuery().
mdIndex - Index of molecular descriptor component in the set.
Returns:
Array of dissimilarity coefficients with each metrics.

getDissimilarityCoeffs

public float[][] getDissimilarityCoeffs(int queryIndex)
Retrieves query dissimilarity coefficients with all metrics and one query of the last compareQueries() or compare() method called.

Parameters:
queryIndex - Index of the query molecule. Query molecules are numbered from 0 to nQueries() - 1, in the same order as added with addQuery().
Returns:
Array of dissimilarity coefficients with each descriptor and metric.

isLicensed

public boolean isLicensed()
Specified by:
isLicensed in interface chemaxon.license.Licensable

setLicenseEnvironment

public void setLicenseEnvironment(java.lang.String env)
Specified by:
setLicenseEnvironment in interface chemaxon.license.Licensable