chemaxon.descriptors
Class MDGenerator

java.lang.Object
  extended by chemaxon.descriptors.MDGenerator
Direct Known Subclasses:
BCUTGenerator, CFGenerator, ECFPGenerator, PFGenerator, RFGenerator, ShapeGenerator

public abstract class MDGenerator
extends java.lang.Object

Base class for all kinds of MolecularDescriptor generators. Its main purpose is two-fold: (1) defines an interface for all generator classes (that is, what methods should be implemented), (2) implements function for gather statistical data on descriptor generated and retrieval functions for these statistics.

Since:
JChem 2.1
Author:
Miklos Vargyas

Field Summary
protected  boolean createStatistics
          indicates if statistical data has to be gathered during generation
protected  int[] density
           
protected  int[] freqCount
           
protected  int maxNonEmptyId
           
protected  float maxNonEmptyPercent
           
protected  int minNonEmptyId
           
protected  float minNonEmptyPercent
           
protected  int molCount
          variables to collect statistical data in
protected  float sumNonEmptyPercent
           
 
Constructor Summary
MDGenerator()
          Created an object.
 
Method Summary
protected  int calcFreqCount(MolecularDescriptor d)
          Calculate and store in freqCount[] absolute frequency counts per cells.
abstract  java.lang.String[] generate(Molecule m, MolecularDescriptor d)
          Generates the molecular descriptor for the given molecule.
 float getAverageNonZeroRatio()
          Gets the average percentage of cells that have non-zero value taken all descriptors generated since the initialization of the generator into account.
 int getBrightestMolId()
          Gets the id of that molecule which had the maximum number of non-zero cells among all descriptors generated since the initialization of the generator object.
 int getDarkestMolId()
          Gets the id of that molecule which had the minimum number of non-zero cells among all descriptors generated since the initialization of the generator object.
 int[] getDensityCounts()
          Gets the array of bit density.
 int[] getFrequencyCounts()
          Gets the absolute frequence count array for all descriptors generated.
 float getMaximumBitRatio()
          Gets the maximum percentage of non-zero cells in descriptors generated.
 float getMinimumBitRatio()
          Gets the minimum percentage of non-zero cells in descriptors generated.
 int getMoleculeCount()
          Gets the number of molecules processed (that is, the number of descriptors generated) since the initialization of the object.
 void setCreateStatistics(boolean createStatistics)
          Toggles the create statistics flag.
protected  void updateStatistics(MolecularDescriptor d)
          Updates statistics gathered on fingerprints generated.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

createStatistics

protected boolean createStatistics
indicates if statistical data has to be gathered during generation


molCount

protected int molCount
variables to collect statistical data in


minNonEmptyPercent

protected float minNonEmptyPercent

minNonEmptyId

protected int minNonEmptyId

maxNonEmptyPercent

protected float maxNonEmptyPercent

maxNonEmptyId

protected int maxNonEmptyId

sumNonEmptyPercent

protected float sumNonEmptyPercent

freqCount

protected int[] freqCount

density

protected int[] density
Constructor Detail

MDGenerator

public MDGenerator()
Created an object.

Method Detail

generate

public abstract java.lang.String[] generate(Molecule m,
                                            MolecularDescriptor d)
                                     throws MDGeneratorException
Generates the molecular descriptor for the given molecule. The MolecularDescriptor provided is updated (thus it has to be allocated and initialized by the client of this class).

Parameters:
m - molecule for which the descriptor is created
d - the generated descriptor
Returns:
names of tags (properties) added
Throws:
MDGeneratorException - in the case of any failures to generate the descriptor

setCreateStatistics

public void setCreateStatistics(boolean createStatistics)
Toggles the create statistics flag.

Parameters:
createStatistics - new value for the create statistics flag
Since:
JChem 2.1

updateStatistics

protected void updateStatistics(MolecularDescriptor d)
Updates statistics gathered on fingerprints generated.

Parameters:
d - newly generated MolecularDescriptor
Since:
JChem 2.1

calcFreqCount

protected int calcFreqCount(MolecularDescriptor d)
Calculate and store in freqCount[] absolute frequency counts per cells. Also gets number of non-zero cells in the descriptor.

Parameters:
d - descriptor in which non-zero cells should be counted
Returns:
number of non-zero cells

getMoleculeCount

public int getMoleculeCount()
Gets the number of molecules processed (that is, the number of descriptors generated) since the initialization of the object.

Returns:
number of molecules processed

getAverageNonZeroRatio

public float getAverageNonZeroRatio()
Gets the average percentage of cells that have non-zero value taken all descriptors generated since the initialization of the generator into account.

Returns:
relative number of bits set in descriptors

getMaximumBitRatio

public float getMaximumBitRatio()
Gets the maximum percentage of non-zero cells in descriptors generated.

Returns:
maximum bits set, relative to descriptor length

getBrightestMolId

public int getBrightestMolId()
Gets the id of that molecule which had the maximum number of non-zero cells among all descriptors generated since the initialization of the generator object.

Returns:
unique molecule identifier (a consequtive index from zero)

getMinimumBitRatio

public float getMinimumBitRatio()
Gets the minimum percentage of non-zero cells in descriptors generated.

Returns:
minimum bits set, relative to descriptor length

getDarkestMolId

public int getDarkestMolId()
Gets the id of that molecule which had the minimum number of non-zero cells among all descriptors generated since the initialization of the generator object.

Returns:
unique molecule identifier (a consequtive index from zero)

getDensityCounts

public int[] getDensityCounts()
Gets the array of bit density. The array can be indexed from 0 to 10. Index i returns the number of descriptors in which the ratio non-zero cells is between 10 * i and 10 * i + 10 .

Returns:
array of density counts

getFrequencyCounts

public int[] getFrequencyCounts()
Gets the absolute frequence count array for all descriptors generated. Each element of the array stores the number of descriptors in which the corresponding cell had non-zero value.

Returns:
per-cell frequency count array