chemaxon.descriptors
Class MDSet

java.lang.Object
  extended by chemaxon.descriptors.MDSet

public class MDSet
extends java.lang.Object

MDset combines several MolecularDescriptors into one entity. The purpose of this class is to allow dissimilarity calculations being performed on various MolecularDescriptors simultaneously. This improves the predictive power of individual descriptors and is more efficient than doing it one-by-one.
MDSet objects can be compared against each other by dissimilarity metrics. The dissimilarity coefficient is obtained as the weighted sum of the dissimilarity coefficients of the pair-wise comparison of components. Weights are stored in the MDSetParameters class, aggregated by this class.
MDSet instances are associated with (and calculated from) molecular structures. This connection between the orginal Molecule and its MDSet objects is preserved by the unique identifier of the molecule which is stored in the MDSet object too.
Besides MolecularDescriptor components, and MDSet object can take an arbitrary number of external, user defined float values. Typically, these are calculated by third party software and stored in SDfile tags or database columns. These values are used in dissimilarity calculations but they are never modified.
Remark: the term Set is slightly misleading since components constituting the MDSet are ordered. Tuple or Record would be more appropriate though probably quite unusual in a cheminformatics context.

Since:
JChem 2.0
Author:
Miklos Vargyas

Field Summary
 float dissim
          dissimilarity measured against an other set
 
Constructor Summary
MDSet()
          Creates an empty MDSet object.
MDSet(int nComponents)
          Creates an empty MDSet object capable of stroring a given number of MolecularDescriptor components.
MDSet(int nComponents, int nUserData)
          Creates an empty MDSet object capable of stroring a given number of MolecularDescriptor components and the given number of user defined (external) data.
MDSet(MDSet c)
          Copy constructor.
 
Method Summary
 void addDescriptor(MolecularDescriptor descriptor)
          Appends the next component to the MDSet object.
 java.lang.Object clone()
          Clones the object.
 void generate(Molecule mol)
          Generates the MDSet from the given molecular structure.
 MolecularDescriptor getDescriptor(int index)
          Gets a specified component of the MDSet.
 float getDissimilarity(MDSet o)
          Calculates the dissimilarity between two MDSet objects.
 int getId()
          Gets the identifier of the MDSet.
 float getLowerBound(java.lang.Object o)
          Gives a lower bound estimation for the value of getDissimilarity( final Object o ).
 java.lang.String getNaturalId()
          Gets the natural identifier of the source Molecule of the MDSet.
 MDSetParameters getParameters()
          Gets the current parameter settings.
 float[] getUserData()
          Deprecated. since 2.3
 float getUserData(int index)
          Deprecated. since 2.3
static MDSet newInstance(java.lang.String[] componentTypes)
          Gets a new MDSet instance constituted of the specified components.
static MDSet newInstance(java.lang.String[] componentTypes, java.io.File[] params)
          Gets a new MDSet instance constituted of the specified components.
static MDSet newInstance(java.lang.String[] componentTypes, java.lang.String[] params)
          Gets a new MDSet instance constituted of the specified components.
 void setDescriptor(int componentIndex, MolecularDescriptor md)
          Sets a given component of the MDSet.
 void setDescriptors(MolecularDescriptor[] descriptors)
          Sets all components of the MDSet.
 void setId(int id)
          Sets the unique internal idenifier of the MDSet object.
 void setNaturalId(java.lang.String id)
          Sets the natural idenifier of the MDSet object.
 void setParameters(MDSetParameters params)
          Sets the parameters of the MDSet.
 void setSize(int nComponents)
          Sets the number of MolecularDescriptor components in the MDSet.
 void setSize(int nComponents, int nUserData)
          Sets the number of MolecularDescriptor components and the number of user defined (external) data in the MDSet.
 void setUserData(float[] userData)
          Deprecated. since 2.3
 void setUserData(int dataIndex, float userData)
          Deprecated. since 2.3
 int size()
          Gets the number of components constituting the MDSet.
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

dissim

public float dissim
dissimilarity measured against an other set

Constructor Detail

MDSet

public MDSet()
Creates an empty MDSet object. It can be initialized by calling setSize( int nComponents ) and setParameters( final MDSetParameters params ).


MDSet

public MDSet(MDSet c)
Copy constructor. Creates an identical object, in which components are cloned, but parameters are not cloned.

Parameters:
c - a MDSet object to be copied

MDSet

public MDSet(int nComponents)
Creates an empty MDSet object capable of stroring a given number of MolecularDescriptor components. Components should be added by setDescriptor( final MolecularDescriptor descriptor ) .

Parameters:
nComponents - number of components in the MDSet object

MDSet

public MDSet(int nComponents,
             int nUserData)
Creates an empty MDSet object capable of stroring a given number of MolecularDescriptor components and the given number of user defined (external) data. Components should be added by setDescriptor( final MolecularDescriptor descriptor ) .

Parameters:
nComponents - number of components in the MDSet object
nUserData - number of further floating point values
Method Detail

newInstance

public static MDSet newInstance(java.lang.String[] componentTypes)
Gets a new MDSet instance constituted of the specified components. MDSetParameters are set to default.

Parameters:
componentTypes - type names of the components
Returns:
a new object

newInstance

public static MDSet newInstance(java.lang.String[] componentTypes,
                                java.lang.String[] params)
Gets a new MDSet instance constituted of the specified components. Components are parametrized with the given parameter settings.

Parameters:
componentTypes - type names of the components
params - parameter strings
Returns:
a new object; or null, if the required class could not be instanciated

newInstance

public static MDSet newInstance(java.lang.String[] componentTypes,
                                java.io.File[] params)
Gets a new MDSet instance constituted of the specified components. Components are parametrized from the given parameter files.

Parameters:
componentTypes - type names of the components
params - parameter files
Returns:
a new object; or null, if the required class could not be instanciated

clone

public java.lang.Object clone()
Clones the object.

Overrides:
clone in class java.lang.Object
Returns:
a new, identical MDSet instance

setSize

public void setSize(int nComponents,
                    int nUserData)
Sets the number of MolecularDescriptor components and the number of user defined (external) data in the MDSet.

Parameters:
nComponents - number of components in the MDSet object
nUserData - number of further floating point values

setSize

public void setSize(int nComponents)
Sets the number of MolecularDescriptor components in the MDSet.

Parameters:
nComponents - number of components in the MDSet object

setId

public void setId(int id)
Sets the unique internal idenifier of the MDSet object.

Parameters:
id - unique identifier

getId

public int getId()
Gets the identifier of the MDSet.

Returns:
the identifier

setNaturalId

public void setNaturalId(java.lang.String id)
Sets the natural idenifier of the MDSet object. This identifier is taken from a Molecule (from an SDfile tag).

Parameters:
id - unique identifier

getNaturalId

public java.lang.String getNaturalId()
Gets the natural identifier of the source Molecule of the MDSet.

Returns:
the identifier

setParameters

public void setParameters(MDSetParameters params)
Sets the parameters of the MDSet. Note, that this has no effect on the parameters of individual MolecularDescriptor components in the MDSet.


getParameters

public MDSetParameters getParameters()
Gets the current parameter settings.

Returns:
the parameters of the MDSet

addDescriptor

public void addDescriptor(MolecularDescriptor descriptor)
Appends the next component to the MDSet object.

Parameters:
descriptor - the next component of the MDSet

setDescriptors

public void setDescriptors(MolecularDescriptor[] descriptors)
Sets all components of the MDSet.

Parameters:
descriptors - MDSet components, they are not cloned

setDescriptor

public void setDescriptor(int componentIndex,
                          MolecularDescriptor md)
Sets a given component of the MDSet.

Parameters:
componentIndex - index of the component to be set
md - the MolecularDescriptor type of the specified component

size

public int size()
Gets the number of components constituting the MDSet.

Returns:
number of component

getDescriptor

public MolecularDescriptor getDescriptor(int index)
Gets a specified component of the MDSet.

Parameters:
index - component index
Returns:
the selected component

generate

public void generate(Molecule mol)
              throws MDGeneratorException
Generates the MDSet from the given molecular structure.

Throws:
MDGeneratorException - when failed to generate one of the components

getDissimilarity

public float getDissimilarity(MDSet o)
Calculates the dissimilarity between two MDSet objects. The dissimilarity value is the weighted sum of the component-wise dissimilarity values.

Parameters:
o - a MDSet object to which this is compated Its type is Object in order to implement the Clusterable interface.
Returns:
the dissimilarity coefficient calculated

getLowerBound

public float getLowerBound(java.lang.Object o)
Gives a lower bound estimation for the value of getDissimilarity( final Object o ). This method is implemented due to the services requirements by the Clusterable interface.

Parameters:
o - MDSet object to which this is compated Its type is Object in order to implement the Clusterable interface.
Returns:
the lower bound estimation of the dissimilarity coefficient

setUserData

public void setUserData(float[] userData)
Deprecated. since 2.3

Sets all user defined float values in the MDSet.

Parameters:
userData - user defined floating point data values

setUserData

public void setUserData(int dataIndex,
                        float userData)
Deprecated. since 2.3

Sets a given user defined float value in the MDSet.

Parameters:
dataIndex - index of the data value to be set
userData - user defined floating point data value

getUserData

public float getUserData(int index)
Deprecated. since 2.3

Gets the value of a user defined data component.

Parameters:
index - data component index
Returns:
value of user defined data component

getUserData

public float[] getUserData()
Deprecated. since 2.3

Gets the value of all user defined data components.

Returns:
array of values of user defined data components