chemaxon.descriptors
Class MDHitEvaluator

java.lang.Object
  extended by chemaxon.descriptors.MDHitEvaluator

public class MDHitEvaluator
extends java.lang.Object

Retrieves statistical information from a test screen on a set of molecules. Statistical information supplied:

Basic input:

There are two possible ways of usage. The first is intended to be applied to smaller amount of molecules, but with fast retrieval of statistical information in several ways. In this case all the dissimilarity values are calculated previously and are stored to enable fast queries.

If the 'memory-safe' methods are used, then dissimilarities are calculated on the go, each time when a query function is called, they are not stored in the memory.

Typical usage: Not memory-safe mode:

 evaluator = new MDHitEvaluator( similarity );
 evaluator.setSelectivityAsymmetryFactor( 0.3 );  
 int functionIndex = evaluator.getEvaluatorFunctionIndex( "SelectivityEffectiveness" )
 evaluator.setCurrentEvaluatorFunction( functionIndex );
 evaluator.calcDissimilarity( testReader, targetReader );
 int nSimilars = evaluator.getNumberOfSimilars();
 float E = evaluator.evaluateByMetric( descrIndex, metrIndex, 
                                         (int) 0.3 * nSimilars, (int) 0.8 * nSimilars );
 float E = evaluator.evaluateByMetric( descrIndex, metrIndex, 
                                         (int) 0.5 * nSimilars, nSimilars );

Memory-safe mode, dissimilarities are always calculated!

 evaluator = new MDHitEvaluator( similarity );
 evaluator.setSelectivityAsymmetryFactor( 0.3 );  
 int functionIndex = evaluator.getEvaluatorFunctionIndex( "SelectivityEffectiveness" )
 evaluator.setCurrentEvaluatorFunction( functionIndex );
 float E = evaluator.evaluateByMetric( descrIndex, metrIndex, 50.0F, 
                                       testReader, targetReader );

Since:
JChem 2.0
Author:
Zsuzsanna Szabo

Field Summary
 java.lang.String[] evaluatorFunctions
           
 
Constructor Summary
MDHitEvaluator(MDSimilarity similarity)
          Creates a new instance, allocates storage.
 
Method Summary
 void calcDissimilarity(MDReader similarSetReader, MDReader dissimilarSetReader)
          Precalculates dissimilarity values.
 int[] calcMetricDistribution(int descrIndex, int metricIndex, float lowerBound, float upperBound, int nHistograms, float[] metricValues)
          Retrieves the distribution of the given metric from the dissimilarity values calculated by a previous call to calcDissimilarity().
 int[] calcMetricDistribution(int descrIndex, int metricIndex, float lowerBound, float upperBound, int nHistograms, float[] metricValues, MDReader similarSetReader, MDReader dissimilarSetReader)
          Retrieves the distribution of the given metric from the dissimilarity values calculated by a screen using the given two molecular descriptor readers.
 float evaluateByAll(int nSimilarHits)
          Not implemented yet
 float evaluateByAll(int fromNSimilarHits, int toNSimilarHits)
          Not implemented yet
 float evaluateByDescriptor(int descrIndex, int nSimilarHits)
          Not implemented yet
 float evaluateByDescriptor(int descrIndex, int fromNSimilarHits, int toNSimilarHits)
          Not implemented yet
 float evaluateByMetric(int descrIndex, int metricIndex, float minPercentageOfSimilarHits)
          Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the percentage of similar hits to the total number of similars must be greater or equal, than the given percentage.
 float evaluateByMetric(int descrIndex, int metricIndex, float minPercentageOfSimilarHits, MDReader similarSetReader, MDReader dissimilarSetReader)
          Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the percentage of similar hits to the total number of similars must be greater or equal, than the given percentage.
 float evaluateByMetric(int descrIndex, int metricIndex, int nSimilarHits)
          Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the given number of similars must be found.
 float evaluateByMetric(int descrIndex, int metricIndex, int fromNSimilarHits, int toNSimilarHits)
          Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the number of similar hits must be between the given numbers.
 float evaluateByMetric(int descrIndex, int metricIndex, int fromNSimilarHits, int toNSimilarHits, MDReader similarSetReader, MDReader dissimilarSetReader)
          Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the number of similar hits must be between the given numbers.
 float evaluateByMetric(int descrIndex, int metricIndex, int nSimilarHits, MDReader similarSetReader, MDReader dissimilarSetReader)
          Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the given number of similars must be found.
 int getCurrentEvaluatorFunction()
          Gets the index of the current the evaluator function
 int getEvaluatorFunctionIndex(java.lang.String name)
          Gets the index of the evaluator function from its name
 java.lang.String getEvaluatorFunctionName(int index)
          Gets the name of the evaluator function from its index
 java.util.ArrayList[] getInsertedDissimilars()
          Returns lists of dissimilars which have dissimilarity values lower than the similars.
 int getNextDissimilarHit()
          Retrieves ids of target hits found in a previous screen or evaluation one by one.
 int getNextSimilarHit()
          Retrieves ids of known similar hits found in a previous screen or evaluation one by one.
 int getNumberOfDissimilarHits()
          Returns the number of hits from the set of target molecules, found in a previous evaluation or screen.
 int getNumberOfDissimilars()
          Returns the number of target molecules (read by dissimilarReader previously).
 int getNumberOfSimilarHits()
          Returns the number of hits from the known similar molecules, found in a previous evaluation or screen.
 int getNumberOfSimilars()
          Returns the number of known similar molecules (read by similarReader previously).
 float getSelectivityAsymmetryFactor()
          Returns the value of the asymmetry factor (weight) of the evaluator funcion selectivity effectiveness.
 float getThreshold(int descrIndex, int metricIndex)
          Returns threshold set by last screen (given by user as a parameter) of evaluation (set by evaluation).
 void resetDissimilarHits()
          Resets target hits found in a previous screen or evaluation for following retrieval one by one.
 void resetSimilarHits()
          Resets known similar hits found in a previous screen or evaluation for following retrieval one by one.
 float[] screen(int descrIndex, int metricIndex, float threshold)
          Screen the similar set and the dissimilar set with the given descriptor, metric and threshold.
 float[] screen(int descrIndex, int metricIndex, float threshold, MDReader similarSetReader, MDReader dissimilarSetReader)
          Screen the similar set and the dissimilar set with the given descriptor, metric and threshold.
 void setCurrentEvaluatorFunction(int index)
          Sets the evaluator function, the value of which is returned in each evaluate call.
 void setSelectivityAsymmetryFactor(float alpha)
          Sets the asymmetry factor (weight) of the evaluator funcion selectivity effectiveness.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

evaluatorFunctions

public java.lang.String[] evaluatorFunctions
Constructor Detail

MDHitEvaluator

public MDHitEvaluator(MDSimilarity similarity)
Creates a new instance, allocates storage.

Parameters:
similarity - A complete MDSimilarity object with added queries
Method Detail

setCurrentEvaluatorFunction

public void setCurrentEvaluatorFunction(int index)
Sets the evaluator function, the value of which is returned in each evaluate call.

Parameters:
index - Index of evaluator funcion

getCurrentEvaluatorFunction

public int getCurrentEvaluatorFunction()
Gets the index of the current the evaluator function

Returns:
Index of evaluator funcion

getEvaluatorFunctionIndex

public int getEvaluatorFunctionIndex(java.lang.String name)
                              throws java.lang.IllegalArgumentException
Gets the index of the evaluator function from its name

Parameters:
name - Name of evaluator function
Returns:
Index of evaluator funcion
Throws:
java.lang.IllegalArgumentException

getEvaluatorFunctionName

public java.lang.String getEvaluatorFunctionName(int index)
Gets the name of the evaluator function from its index

Parameters:
index - Index of evaluator funcion
Returns:
Name of evaluator function

setSelectivityAsymmetryFactor

public void setSelectivityAsymmetryFactor(float alpha)
                                   throws java.lang.IllegalArgumentException
Sets the asymmetry factor (weight) of the evaluator funcion selectivity effectiveness.

Parameters:
alpha - Value of he asymmetry factor
Throws:
java.lang.IllegalArgumentException

getSelectivityAsymmetryFactor

public float getSelectivityAsymmetryFactor()
Returns the value of the asymmetry factor (weight) of the evaluator funcion selectivity effectiveness.

Returns:
Value of he asymmetry factor

calcDissimilarity

public void calcDissimilarity(MDReader similarSetReader,
                              MDReader dissimilarSetReader)
                       throws MDReaderException
Precalculates dissimilarity values. It is worth doind so, if the number of all dissimilarity values can fit into the memory. In this case several evaluations can be performed afterwards without recalculating the dissimilarity values.

Parameters:
similarSetReader - Reader of the test set of known similars
dissimilarSetReader - Reader of the set of target molecules (where similars are thought)
Throws:
MDReaderException

screen

public float[] screen(int descrIndex,
                      int metricIndex,
                      float threshold)
Screen the similar set and the dissimilar set with the given descriptor, metric and threshold. Evaluate all evaluator functions. To be called only if calcDissimilarity() has been called previously.

Parameters:
descrIndex - Index of molecular descriptor
metricIndex - Index of metric (of the given molecular descriptor)
threshold - Threshold value for selecting hits
Returns:
Array of evaluator function values

screen

public float[] screen(int descrIndex,
                      int metricIndex,
                      float threshold,
                      MDReader similarSetReader,
                      MDReader dissimilarSetReader)
               throws MDReaderException
Screen the similar set and the dissimilar set with the given descriptor, metric and threshold. Evaluate all evaluator functions.

Parameters:
descrIndex - Index of molecular descriptor
metricIndex - Index of metric (of the given molecular descriptor)
threshold - Threshold value for selecting hits
similarSetReader - Reader of the test set of known similars
dissimilarSetReader - Reader of the set of target molecules (where similars are thought)
Returns:
Array of evaluator function values
Throws:
MDReaderException

evaluateByMetric

public float evaluateByMetric(int descrIndex,
                              int metricIndex,
                              int nSimilarHits)
Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the given number of similars must be found. Threshold for metric is set using this requirement, and can be retrieved by a following call to getThreshold( descrIndex, metricIndex ). To be called only if calcDissimilarity() has been called previously.

Parameters:
descrIndex - Index of molecular descriptor
metricIndex - Index of metric (of the given molecular descriptor)
nSimilarHits - Number of known similars required as hits
Returns:
Value of current evaluator function

evaluateByMetric

public float evaluateByMetric(int descrIndex,
                              int metricIndex,
                              int nSimilarHits,
                              MDReader similarSetReader,
                              MDReader dissimilarSetReader)
                       throws MDReaderException
Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the given number of similars must be found. Threshold for metric is set using this requirement, and can be retrieved by a following call to getThreshold( descrIndex, metricIndex ).

Parameters:
descrIndex - Index of molecular descriptor
metricIndex - Index of metric (of the given molecular descriptor)
nSimilarHits - Number of known similars required as hits
similarSetReader - Reader of the test set of known similars
dissimilarSetReader - Reader of the set of target molecules (where similars are thought)
Returns:
Value of current evaluator function
Throws:
MDReaderException

evaluateByMetric

public float evaluateByMetric(int descrIndex,
                              int metricIndex,
                              int fromNSimilarHits,
                              int toNSimilarHits)
Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the number of similar hits must be between the given numbers. The actual number of similar hits and the threshold for the metric will be the one, for which the value of the evaluator function is maximal amongst the allowed ones. Their value can be retrieved by a following call to getNumberOfSimilarHits( descrIndex, metricIndex ) and getThreshold( descrIndex, metricIndex ). To be called only if calcDissimilarity() has been called previously.

Parameters:
descrIndex - Index of molecular descriptor
metricIndex - Index of metric (of the given molecular descriptor)
fromNSimilarHits - Minimal number of known similars required as hits
toNSimilarHits - Maximal number of known similars required as hits
Returns:
Value of current evaluator function

evaluateByMetric

public float evaluateByMetric(int descrIndex,
                              int metricIndex,
                              float minPercentageOfSimilarHits)
Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the percentage of similar hits to the total number of similars must be greater or equal, than the given percentage. The actual number of similar hits and the threshold for the metric will be the one, for which the value of the evaluator function is maximal amongst the allowed ones. Their value can be retrieved by a following call to getNumberOfSimilarHits( descrIndex, metricIndex ) and getThreshold( descrIndex, metricIndex ). To be called only if calcDissimilarity() has been called previously.

Parameters:
descrIndex - Index of molecular descriptor
metricIndex - Index of metric (of the given molecular descriptor)
minPercentageOfSimilarHits - Minimal percentage of known similars required as hits compared to total number of similars
Returns:
Value of current evaluator function

evaluateByMetric

public float evaluateByMetric(int descrIndex,
                              int metricIndex,
                              int fromNSimilarHits,
                              int toNSimilarHits,
                              MDReader similarSetReader,
                              MDReader dissimilarSetReader)
                       throws MDReaderException
Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the number of similar hits must be between the given numbers. The actual number of similar hits and the threshold for the metric will be the one, for which the value of the evaluator function is maximal amongst the allowed ones. Their value can be retrieved by a following call to getNumberOfSimilarHits( descrIndex, metricIndex ) and getThreshold( descrIndex, metricIndex ).

Parameters:
descrIndex - Index of molecular descriptor
metricIndex - Index of metric (of the given molecular descriptor)
fromNSimilarHits - Minimal number of known similars required as hits
toNSimilarHits - Maximal number of known similars required as hits
similarSetReader - Reader of the test set of known similars
dissimilarSetReader - Reader of the set of target molecules (where similars are thought)
Returns:
Value of current evaluator function
Throws:
MDReaderException

evaluateByMetric

public float evaluateByMetric(int descrIndex,
                              int metricIndex,
                              float minPercentageOfSimilarHits,
                              MDReader similarSetReader,
                              MDReader dissimilarSetReader)
                       throws MDReaderException
Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the percentage of similar hits to the total number of similars must be greater or equal, than the given percentage. The actual number of similar hits and the threshold for the metric will be the one, for which the value of the evaluator function is maximal amongst the allowed ones. Their value can be retrieved by a following call to getNumberOfSimilarHits( descrIndex, metricIndex ) and getThreshold( descrIndex, metricIndex ).

Parameters:
descrIndex - Index of molecular descriptor
metricIndex - Index of metric (of the given molecular descriptor)
minPercentageOfSimilarHits - Minimal percentage of known similars required as hits compared to total number of similars
similarSetReader - Reader of the test set of known similars
dissimilarSetReader - Reader of the set of target molecules (where similars are thought)
Returns:
Value of current evaluator function
Throws:
MDReaderException

evaluateByDescriptor

public float evaluateByDescriptor(int descrIndex,
                                  int nSimilarHits)
Not implemented yet


evaluateByDescriptor

public float evaluateByDescriptor(int descrIndex,
                                  int fromNSimilarHits,
                                  int toNSimilarHits)
Not implemented yet


evaluateByAll

public float evaluateByAll(int nSimilarHits)
Not implemented yet


evaluateByAll

public float evaluateByAll(int fromNSimilarHits,
                           int toNSimilarHits)
Not implemented yet


getNumberOfSimilars

public int getNumberOfSimilars()
Returns the number of known similar molecules (read by similarReader previously).

Returns:
Number of known similar structures

getNumberOfDissimilars

public int getNumberOfDissimilars()
Returns the number of target molecules (read by dissimilarReader previously).

Returns:
Number of known target structures, which are not known to be similars

getNumberOfSimilarHits

public int getNumberOfSimilarHits()
Returns the number of hits from the known similar molecules, found in a previous evaluation or screen.

Returns:
Number of known similar hits

getNumberOfDissimilarHits

public int getNumberOfDissimilarHits()
Returns the number of hits from the set of target molecules, found in a previous evaluation or screen.

Returns:
Number of target hits

resetSimilarHits

public void resetSimilarHits()
Resets known similar hits found in a previous screen or evaluation for following retrieval one by one.


resetDissimilarHits

public void resetDissimilarHits()
Resets target hits found in a previous screen or evaluation for following retrieval one by one.


getNextSimilarHit

public int getNextSimilarHit()
Retrieves ids of known similar hits found in a previous screen or evaluation one by one.

Returns:
Id of next similar hit

getNextDissimilarHit

public int getNextDissimilarHit()
Retrieves ids of target hits found in a previous screen or evaluation one by one.

Returns:
Id of next hit from target set of dissimilars

getThreshold

public float getThreshold(int descrIndex,
                          int metricIndex)
Returns threshold set by last screen (given by user as a parameter) of evaluation (set by evaluation).

Parameters:
descrIndex - Index of molecular descriptor
metricIndex - Index of metric (of the given molecular descriptor)
Returns:
Threshold value

getInsertedDissimilars

public java.util.ArrayList[] getInsertedDissimilars()
Returns lists of dissimilars which have dissimilarity values lower than the similars. First element contains the list of dissimilar ids that have dissimilarity lower than all similars, second contains the ids of dissimilars with dissimilarity values between the first and second similar (if similars are ordered by their dissimilarity values) etc.

Returns:
Array of lists of dissimilar ids, length is the number of similars.
Since:
JChem 2.2

calcMetricDistribution

public int[] calcMetricDistribution(int descrIndex,
                                    int metricIndex,
                                    float lowerBound,
                                    float upperBound,
                                    int nHistograms,
                                    float[] metricValues)
Retrieves the distribution of the given metric from the dissimilarity values calculated by a previous call to calcDissimilarity(). Distribution is returned by giving the number of dissimilarities falling into the (nHistograms - 2) equal size intervals beween lowerBound and upperBound, and by adding two extra intervals: for each value lower than the given lower bound and for each value greater than the given upper bound. The i-th interval is defined as: [ metricValues[ i ], metricValues[ i + 1 ] ] .

Parameters:
descrIndex - Index of molecular descriptor
metricIndex - Index of metric (of the given molecular descriptor)
lowerBound - Lower bound for dissimilarity distribution
upperBound - Upper bound for dissimilarity distribution
nHistograms - Refinement of distribution: number of histograms (including the two extra histograms)
metricValues - Outgoing parameter! Must be allocated previously with length (nHistograms + 1), contains endpoints of the dissimilarity value intervals
Returns:
Array of numbers of dissimilarity values falling into the intervals defined by metricValues

calcMetricDistribution

public int[] calcMetricDistribution(int descrIndex,
                                    int metricIndex,
                                    float lowerBound,
                                    float upperBound,
                                    int nHistograms,
                                    float[] metricValues,
                                    MDReader similarSetReader,
                                    MDReader dissimilarSetReader)
                             throws MDReaderException
Retrieves the distribution of the given metric from the dissimilarity values calculated by a screen using the given two molecular descriptor readers. Distribution is returned by giving the number of dissimilarities falling into the (nHistograms - 2) equal size intervals beween lowerBound and upperBound, and by adding two extra intervals: for each value lower than the given lower bound and for each value greater than the given upper bound. The i-th interval is defined as: [ metricValues[ i ], metricValues[ i + 1 ] ] .

Parameters:
descrIndex - Index of molecular descriptor
metricIndex - Index of metric (of the given molecular descriptor)
lowerBound - Lower bound for dissimilarity distribution
upperBound - Upper bound for dissimilarity distribution
nHistograms - Refinement of distribution: number of histograms (including the two extra histograms)
metricValues - Outgoing parameter! Must be allocated previously with length (nHistograms + 1), contains endpoints of the dissimilarity value intervals
similarSetReader - Reader of the test set of known similars
dissimilarSetReader - Reader of the set of target molecules (where similars are thought)
Returns:
Array of numbers of dissimilarity values falling into the intervals defined by metricValues
Throws:
MDReaderException