chemaxon.descriptors
Class GenerateMD

java.lang.Object
  extended by chemaxon.descriptors.GenerateMD

public class GenerateMD
extends java.lang.Object

GenerateMD provides a high level Application Program Interface (API) with comprehensive functionality for the generation of various Molecular Descriptors. The API supports all kinds of inputs and outputs (molecule files, database, desciptor files), and is capable of generating multiple descriptors simultaneously.
Example of typical usage:

      CFParameters cfpConfig = new CFParameters( "jchem/examples/config/cfp.xml" );
      PFParameters cfpConfig = new PFParameters( "jchem/examples/config/pharma-frag.xml" );
      GenerateMD generator = new GenerateMD( 2 );
      generator.setInput( "molecules.sdf" );
      generator.setSDFileInput( true );
      generator.setDescriptor( 0, "molecules.cfp", "CF", cfpConfig, "" );
      generator.setDescriptor( 1, "molecules.pfp", "PF", pfpConfig, "" );
      generator.init();
      generator.run();
      generator.close();
 

The above example generates two descriptors (a descriptor set) at the same time for every structures read from the input file molecules.sdf. The first component of the descriptor set is a chemical fingerprint which is configured from the parameter file jchem/examples/config/cfp.xml , while the second is a pharmcophore fingerprint configured by the jchem/examples/config/pharma-frag.xml configuration file.
The chemical and pharmcophore fingerprints generated are written into the files molecules.cfp and molecules.pfp respectively.

This class does not provide methods others than transforming a molecular structure retrieved from the input source into one or more descriptor files or database tables.
GenerateMD also servers as a command line tool for the generation of Molecular Descriptors from batch.
Beside supporting all kinds of MolecularDescriptors implemented by ChemAxon, it is capable of generating arbitarary custom MolecularDescriptors (which are derived from the MolecularDescriptor class) implemented by users or third parties.
GenerateMD accepts various import sources: molecular files in many standard format, and database table (JChem structure tables). MolecularDescriptors generated are stored in file in the case of file input, and in database tables (so called MD tables) when input molecules are retrieved from a structure table. SDfile output stores the descriptors generated in a custom tag. It is also possible to produce MolecularDescriptor files that do not include any structural information only the descriptors in a readable format. Such files allow faster operation than SDfiles in further processing steps (for example in virtual screening).

Since:
JChem 2.0
Author:
Miklos Vargyas, Peter Kovacs (pkovacs84), Adrian Kalaszi

Constructor Summary
GenerateMD()
          Creates an empty MolecularDescripotor generator object.
GenerateMD(int descriptorCount)
          Creates an object for generating the given number of different MolecularDescriptors (a molecular descriptor set, MDSet ) simultaneously.
 
Method Summary
 void addMDConfig(java.lang.String descrName, java.lang.String configName, java.io.File configFile)
          Adds a new parameter configuration to the descriptor.
 void addMDConfig(java.lang.String descrName, java.lang.String configName, java.lang.String config)
          Adds a new parameter configuration to the descriptor.
 void close()
          Closes the generator, all output files or database connection.
 void createMDTable(java.lang.String descrName, java.lang.String className, java.lang.String settings, java.lang.String comment)
          Creates a database table to store the MolecularDescriptors generated.
 void deleteMDConfig(java.lang.String descrName, java.lang.String configName)
          Deletes an extension configuration.
 void deleteMDTable(java.lang.String descrName)
          Deletes a database table that strores molecular descriptors.
 int[] getASSBClusters()
           
 int getCounter()
          Gets the number of molecules processed since init() was called.
 java.lang.String[] getMDNames()
          Gets the names of all descriptor types stored in the database that are associated with the current structure table.
 java.lang.String getStatistics(int di)
          Gets statistical data on descriptors generated.
 void init()
          Initialize the generator object.
static void main(java.lang.String[] args)
          Command-line entry point to the MolecularDescriptor generator.
 void run()
          Processes all structures from the input source.
 void setBinaryOutput(boolean binaryOutput)
          Sets decimal output format.
 void setConnectionHandler(ConnectionHandler connectionHandler)
          Sets the database connection when both structures and descriptors are stored in a database.
 void setCreateStat(boolean createStat)
          Toggles create statistics flag.
 void setDecimalOutput(boolean decimalOutput)
          Sets decimal output format.
 void setDescriptor(int index, java.lang.String name, java.lang.String type, MDParameters params, java.lang.String comment)
          Sets type, name, parameters and comment for a given descriptor component.
 void setDescriptor(int index, java.lang.String name, java.lang.String type, java.lang.String settings, java.lang.String comment)
          Sets type, name, parameters and comment for a given descriptor component.
 void setDescriptor(java.lang.String name, java.lang.String type, MDParameters params, java.lang.String comment)
          Sets type, name, parameters and comment for a given descriptor component.
 void setDescriptor(java.lang.String name, java.lang.String type, java.lang.String settings, java.lang.String comment)
          Sets the descriptor to be generated.
 void setDescriptors(java.lang.String[] names, java.lang.String[] types, MDParameters[] params, java.lang.String[] comments)
          Sets type, name, parameters and comment for all components of a molecular descriptor set.
 void setDescriptors(java.lang.String[] names, java.lang.String[] types, java.lang.String[] settings, java.lang.String[] comments)
          Sets all descriptor components to be generated simultaneously.
 void setGenerateId(int from)
          Toggles automatic unique structure/descriptor identifier generation mode and sets the value of the first unique identifier.
 void setIdTagName(java.lang.String idTagName)
          Sets the name of the input SDfile tag which contains unique structure identifiers.
 void setInput(java.io.InputStream input)
          Sets the input to an already opened molecular structure stream.
 void setInput(java.lang.String inputFileName)
          Sets the name of the input molecular structure file.
 void setOutputFileName(java.lang.String outputFileName)
          Sets the name of the output SDfile.
 void setSDfileInput(boolean sdfInput)
          Toggles input file type.
 void setSDfileOutput(boolean sdfOutput)
          Toggles SDfile output format.
 void setSelectStatement(java.lang.String whereClause)
          Sets the optional select statement for fetching molecules from the structure table.
 void setStructureTableName(java.lang.String structureTableName)
          Sets the name of the structure table to take molecular structures from.
 void setTagName(int index, java.lang.String name)
          Sets the SDfile tag name for the given descriptor set component.
 void setTagName(java.lang.String name)
          Sets the SDfile tag name for the only descriptor type generated.
 void setTagNames(java.lang.String[] names)
          Sets the SDfile tag names for all descriptor set components.
 void setUpdateOnInsert(boolean updateOnInsert)
          Sets/clears automatic update on insert mode.
 void setValidateDescriptor(java.lang.String activityTagName, double clusteringRadius, java.lang.String metric)
          Sets parameters for the Activity-seeded Structure-based clustering.
 boolean step()
          Fetches one structure from the input source and generates descriptors as specified before initialization by the setter methods.
 void updateMDTable(java.lang.String descrName)
          Systematically regenerates all descriptors.
 void validateDescriptor()
          Validates a descriptor by the activity-seeded structure-based clustering.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

GenerateMD

public GenerateMD()
Creates an empty MolecularDescripotor generator object.


GenerateMD

public GenerateMD(int descriptorCount)
Creates an object for generating the given number of different MolecularDescriptors (a molecular descriptor set, MDSet ) simultaneously.

Parameters:
descriptorCount - number of independent descriptor types to be generated
Method Detail

setConnectionHandler

public void setConnectionHandler(ConnectionHandler connectionHandler)
                          throws MDGeneratorException
Sets the database connection when both structures and descriptors are stored in a database.

Parameters:
connectionHandler - valid connection to a database
Throws:
MDGeneratorException - when attempting to call this method after init()

setStructureTableName

public void setStructureTableName(java.lang.String structureTableName)
                           throws MDGeneratorException,
                                  java.sql.SQLException
Sets the name of the structure table to take molecular structures from. Use this when input comes from a database.

Parameters:
structureTableName - name of the database table of input structures
Throws:
MDGeneratorException - when attempting to call this method after init() or when there is no valid database connection; or if descriptor validation option was selected beforehand
java.sql.SQLException - in the case of database management errors

setUpdateOnInsert

public void setUpdateOnInsert(boolean updateOnInsert)
Sets/clears automatic update on insert mode. Auto-update on insert means that the descriptor table is automatically updated when a new structure is inserted into the original structure table.

Parameters:
updateOnInsert - indicates auto-update mode
Since:
JChem 2.3

setSelectStatement

public void setSelectStatement(java.lang.String whereClause)
                        throws MDGeneratorException
Sets the optional select statement for fetching molecules from the structure table.

Parameters:
whereClause - restrict clause without the WHERE statement
Throws:
MDGeneratorException - when attempting to call this method after init() or when there is no valid and alive database connection, or when no structure table name has been set

setInput

public void setInput(java.lang.String inputFileName)
              throws MDGeneratorException,
                     java.io.IOException
Sets the name of the input molecular structure file.

Parameters:
inputFileName - name of the input file
Throws:
MDGeneratorException - when attempting to call this method after init() or when there is alreadt a valid and alive database connection
MDGeneratorException
java.io.IOException

setInput

public void setInput(java.io.InputStream input)
              throws MDGeneratorException
Sets the input to an already opened molecular structure stream.

Parameters:
input - an input stream
Throws:
MDGeneratorException - when attempting to call this method after init() or when there is already a valid and alive database connection

setSDfileInput

public void setSDfileInput(boolean sdfInput)
                    throws MDGeneratorException
Toggles input file type.

Parameters:
sdfInput - indicates, if input file is an SDfile
Throws:
MDGeneratorException - when attempting to call this method after init() or when no input file has been specified

setOutputFileName

public void setOutputFileName(java.lang.String outputFileName)
                       throws MDGeneratorException
Sets the name of the output SDfile. Note, that if the required output is one or more descriptor file(s), it (they) should not be specified as output file(s), but as descriptor name(s).

Parameters:
outputFileName - name of the output SDfile
Throws:
MDGeneratorException - when attempting to call this method after init() or when there is already a valid and alive database connection

setSDfileOutput

public void setSDfileOutput(boolean sdfOutput)
Toggles SDfile output format.

Parameters:
sdfOutput - indicates if output file is an SDfile

setDecimalOutput

public void setDecimalOutput(boolean decimalOutput)
Sets decimal output format. This file format is recognized by JKlustor tools.

Since:
JChem 2.0.1

setBinaryOutput

public void setBinaryOutput(boolean binaryOutput)
Sets decimal output format. This file format is recognized by JKlustor tools.

Since:
JChem 2.3

setIdTagName

public void setIdTagName(java.lang.String idTagName)
Sets the name of the input SDfile tag which contains unique structure identifiers. These identifiers are printed in each line of the decimal output format.

Parameters:
idTagName - SDfile structure identifier tag name
Since:
JChem 2.0.1

setValidateDescriptor

public void setValidateDescriptor(java.lang.String activityTagName,
                                  double clusteringRadius,
                                  java.lang.String metric)
                           throws MDGeneratorException
Sets parameters for the Activity-seeded Structure-based clustering.

Parameters:
activityTagName - name of the SDfile tag storing activity data
clusteringRadius - dissimilarity radius of a cluster around a seed
metric - metric used in clustering
Throws:
MDGeneratorException
Since:
JChem 2.3

setDescriptor

public void setDescriptor(java.lang.String name,
                          java.lang.String type,
                          java.lang.String settings,
                          java.lang.String comment)
                   throws MDGeneratorException
Sets the descriptor to be generated. Use this method when descriptor of one type are generated (that is, the descriptor set has one component only).

Parameters:
name - user given name of the descriptor
type - type name of the descriptor (e.g. ChemicalFingerprint)
settings - parameter settings of the descriptor
comment - optional comment to be stored in database
Throws:
MDGeneratorException - when attempting to call this method after init() or when applying this method in multiple descriptor case

setDescriptor

public void setDescriptor(int index,
                          java.lang.String name,
                          java.lang.String type,
                          java.lang.String settings,
                          java.lang.String comment)
                   throws MDGeneratorException
Sets type, name, parameters and comment for a given descriptor component. Use this method when more than one descriptors are generated at a time (e.g. CF and PF simultaneously).

Parameters:
index - index of the component
name - user given name of the descriptor set component
type - type of the descriptor set component (e.g. ChemicalFingerprint)
settings - parameter settings for the descriptor
comment - optional comment to be stored in database
Throws:
MDGeneratorException - when attempting to call this method after init() or when another component's settings were specified with an MDParameters object (rather than a String

setDescriptors

public void setDescriptors(java.lang.String[] names,
                           java.lang.String[] types,
                           java.lang.String[] settings,
                           java.lang.String[] comments)
                    throws MDGeneratorException
Sets all descriptor components to be generated simultaneously.

Parameters:
names - user given names of the descriptor set components
types - type names of the descriptors (e.g. ChemicalFingerprint)
settings - parameter settings for the descriptors
comments - optional comments to be stored in database
Throws:
MDGeneratorException - when attempting to call this method after init()

setDescriptor

public void setDescriptor(java.lang.String name,
                          java.lang.String type,
                          MDParameters params,
                          java.lang.String comment)
                   throws MDGeneratorException
Sets type, name, parameters and comment for a given descriptor component. Use this method when only one descriptor type is generated.

Parameters:
name - user given name of the descriptor
type - type name of the descriptor (e.g. ChemicalFingerprint)
params - parameter settings for the descriptor
comment - optional comment to be stored in database
Throws:
MDGeneratorException - when attempting to call this method after init() or when applying this method in multiple descriptor case

setDescriptor

public void setDescriptor(int index,
                          java.lang.String name,
                          java.lang.String type,
                          MDParameters params,
                          java.lang.String comment)
                   throws MDGeneratorException
Sets type, name, parameters and comment for a given descriptor component. Use this method when more than one descriptors are generated at a time and they are not specified all in one go.

Parameters:
index - index of the component to be specified
name - user given name of the descriptor
type - type name of the descriptor (e.g. ChemicalFingerprint)
params - parameter settings of the descriptor
comment - optional comment to be stored indatabase only
Throws:
MDGeneratorException - when attempting to call this method after init() or when a previously set component was specified with a String parameter setting

setDescriptors

public void setDescriptors(java.lang.String[] names,
                           java.lang.String[] types,
                           MDParameters[] params,
                           java.lang.String[] comments)
                    throws MDGeneratorException
Sets type, name, parameters and comment for all components of a molecular descriptor set.

Parameters:
names - user given names of the descriptor components
types - type names of the descriptor components (e.g. ChemicalFingerprint)
params - parameter settings for the descriptor components
comments - optional comments to be stored in database
Throws:
MDGeneratorException - when attempting to call this method after init()

setTagName

public void setTagName(java.lang.String name)
                throws MDGeneratorException
Sets the SDfile tag name for the only descriptor type generated.

Parameters:
name - SDfile tag name
Throws:
MDGeneratorException - when attempting to call this method after init() or when applying this method in multiple descriptor case

setTagName

public void setTagName(int index,
                       java.lang.String name)
                throws MDGeneratorException
Sets the SDfile tag name for the given descriptor set component.

Parameters:
index - index of the component
name - SDfile tag name
Throws:
MDGeneratorException - when attempting to call this method after init()

setTagNames

public void setTagNames(java.lang.String[] names)
                 throws MDGeneratorException
Sets the SDfile tag names for all descriptor set components.

Parameters:
names - SDfile tag names
Throws:
MDGeneratorException - when attempting to call this method after init()

setGenerateId

public void setGenerateId(int from)
                   throws MDGeneratorException
Toggles automatic unique structure/descriptor identifier generation mode and sets the value of the first unique identifier.

Parameters:
from - the value of the first id to be generated
Throws:
MDGeneratorException - when attempting to call this method after init() or when attempting to generate ID-s for database structures

setCreateStat

public void setCreateStat(boolean createStat)
Toggles create statistics flag.

Parameters:
createStat - new value for the create statistics flag

init

public void init()
          throws MDGeneratorException
Initialize the generator object. Call this method only after all features, modes and parameters have been set by the setter methods.

Throws:
MDGeneratorException - when attempting to initialize once again, all input/output (file creation and writing) and all database (SQL) exceptions are re-thrown

step

public boolean step()
             throws MDGeneratorException
Fetches one structure from the input source and generates descriptors as specified before initialization by the setter methods.

Returns:
true if a structure was successfully processed
Throws:
MDGeneratorException - when not yet initialized or failure to read input or write output

getCounter

public int getCounter()
               throws MDGeneratorException
Gets the number of molecules processed since init() was called.

Returns:
number of structures processed
Throws:
MDGeneratorException - when not yet initialized

getASSBClusters

public int[] getASSBClusters()

run

public void run()
         throws MDGeneratorException
Processes all structures from the input source. Structure from the input are retrieved one-by-one and all descriptors types set earlier (by the set methods) are generated and stored in the specified output.

Throws:
MDGeneratorException - not yet initialized or failed to read input or write output

getStatistics

public java.lang.String getStatistics(int di)
Gets statistical data on descriptors generated.

Parameters:
di - descriptor component index
Returns:
statistics in a formatted string
Since:
JChem 2.1

validateDescriptor

public void validateDescriptor()
Validates a descriptor by the activity-seeded structure-based clustering.


close

public void close()
           throws MDGeneratorException
Closes the generator, all output files or database connection.

Throws:
MDGeneratorException - when not yet initialized or failed to close output files

createMDTable

public void createMDTable(java.lang.String descrName,
                          java.lang.String className,
                          java.lang.String settings,
                          java.lang.String comment)
                   throws MDGeneratorException
Creates a database table to store the MolecularDescriptors generated. There is no need to call this method directly if descriptors are generated with methods offered by this class, for advanced usage only.
The corresponding structure table's name should be set by setStructureTableName( String ) prior to calling this function.

Parameters:
descrName - symbolic name of the descriptor, given by the user
className - name of the class implementing the descriptor
settings - parameter string
comment - optional comment
Throws:
MDGeneratorException - when there is no valid database connection or an SQL error occured

deleteMDTable

public void deleteMDTable(java.lang.String descrName)
                   throws MDGeneratorException,
                          java.sql.SQLException
Deletes a database table that strores molecular descriptors. All raws, the table and all corresponding administrative information is lost irreversibly.

Parameters:
descrName - name of the descriptor (as given by the user when created)
Throws:
MDGeneratorException - when there is no valid database
java.sql.SQLException - any database error

updateMDTable

public void updateMDTable(java.lang.String descrName)
                   throws MDGeneratorException,
                          java.sql.SQLException
Systematically regenerates all descriptors. Call this method, when new structures are added to the structure table.

Parameters:
descrName - name of the descriptor (as given by the user when created)
Throws:
MDGeneratorException - when there is no valid database
java.sql.SQLException - any database error

addMDConfig

public void addMDConfig(java.lang.String descrName,
                        java.lang.String configName,
                        java.lang.String config)
                 throws MDGeneratorException,
                        java.sql.SQLException
Adds a new parameter configuration to the descriptor. Such extra configurations, often called as 'screening configurations' can extend or overwrite parameter settings stored in the time of creation. A typical example is adding new dissimilarity metrics optimized for a new active compound family to the existing set of metrics.

Parameters:
descrName - name of the descriptor (as given by the user when created)
configName - symbolic name given by the user to help the identification of the extension configuration
config - extra configuration settings
Throws:
MDGeneratorException - when there is no valid database or an existing configuration is attempted to be redefined
java.sql.SQLException - any database error

addMDConfig

public void addMDConfig(java.lang.String descrName,
                        java.lang.String configName,
                        java.io.File configFile)
                 throws java.sql.SQLException,
                        java.io.IOException,
                        MDGeneratorException
Adds a new parameter configuration to the descriptor. Such extra configurations, often called as 'screening configurations' can extend or overwrite parameter settings stored in the time of creation. A typical example is adding new dissimilarity metrics optimized for a new active compound family to the existing set of metrics.

Parameters:
descrName - name of the descriptor (as given by the user when created)
configName - symbolic name given by the user to help the identification of the extension configuration
configFile - file of extra configuration settings
Throws:
MDGeneratorException - when there is no valid database or an existing configuration is attempted to be redefined
java.sql.SQLException - any database error
java.io.IOException

deleteMDConfig

public void deleteMDConfig(java.lang.String descrName,
                           java.lang.String configName)
                    throws MDGeneratorException,
                           java.sql.SQLException
Deletes an extension configuration.

Parameters:
descrName - name of the descriptor (as given by the user when created)
configName - symbolic name given by the user to help the identification of the extension configuration
Throws:
MDGeneratorException - when there is no valid database
java.sql.SQLException - any database error

getMDNames

public java.lang.String[] getMDNames()
                              throws MDGeneratorException,
                                     java.sql.SQLException
Gets the names of all descriptor types stored in the database that are associated with the current structure table.

Returns:
molecular descriptors' names
Throws:
MDGeneratorException - when there is no valid database
java.sql.SQLException - any database error

main

public static void main(java.lang.String[] args)
Command-line entry point to the MolecularDescriptor generator.

Parameters:
args - the command line arguments