also servers as a command line tool for the generation
of Molecular Descriptors from batch.
Beside supporting all kinds of MolecularDescriptors implemented by ChemAxon,
it is capable of generating arbitarary custom MolecularDescriptors (which
are derived from the MolecularDescriptor class) implemented by
users or third parties.
GenerateMD accepts various import sources: molecular files in
many standard format, and database table (JChem structure tables).
MolecularDescriptors generated are stored in file in the case
of file input, and in database tables (so called MD tables) when
input molecules are retrieved from a structure table. SDfile output stores
the descriptors generated in a custom tag. It is also possible to produce
MolecularDescriptor files that do not include any structural
information only the descriptors in a readable format. Such files allow faster
operation than SDfiles in further processing steps (for example in
virtual screening).
- Since:
- JChem 2.0
- Author:
- Miklos Vargyas, Peter Kovacs (pkovacs84), Adrian Kalaszi
|
Constructor Summary |
GenerateMD()
Creates an empty MolecularDescripotor generator object. |
GenerateMD(int descriptorCount)
Creates an object for generating the given number of different
MolecularDescriptors (a molecular descriptor set, MDSet
) simultaneously. |
|
Method Summary |
void |
addMDConfig(java.lang.String descrName,
java.lang.String configName,
java.io.File configFile)
Adds a new parameter configuration to the descriptor. |
void |
addMDConfig(java.lang.String descrName,
java.lang.String configName,
java.lang.String config)
Adds a new parameter configuration to the descriptor. |
void |
close()
Closes the generator, all output files or database connection. |
void |
createMDTable(java.lang.String descrName,
java.lang.String className,
java.lang.String settings,
java.lang.String comment)
Creates a database table to store the MolecularDescriptors
generated. |
void |
deleteMDConfig(java.lang.String descrName,
java.lang.String configName)
Deletes an extension configuration. |
void |
deleteMDTable(java.lang.String descrName)
Deletes a database table that strores molecular descriptors. |
int[] |
getASSBClusters()
|
int |
getCounter()
Gets the number of molecules processed since init() was
called. |
java.lang.String[] |
getMDNames()
Gets the names of all descriptor types stored in the database that are
associated with the current structure table. |
java.lang.String |
getStatistics(int di)
Gets statistical data on descriptors generated. |
void |
init()
Initialize the generator object. |
static void |
main(java.lang.String[] args)
Command-line entry point to the MolecularDescriptor generator. |
void |
run()
Processes all structures from the input source. |
void |
setBinaryOutput(boolean binaryOutput)
Sets decimal output format. |
void |
setConnectionHandler(ConnectionHandler connectionHandler)
Sets the database connection when both structures and
descriptors are stored in a database. |
void |
setCreateStat(boolean createStat)
Toggles create statistics flag. |
void |
setDecimalOutput(boolean decimalOutput)
Sets decimal output format. |
void |
setDescriptor(int index,
java.lang.String name,
java.lang.String type,
MDParameters params,
java.lang.String comment)
Sets type, name, parameters and comment for a given descriptor component. |
void |
setDescriptor(int index,
java.lang.String name,
java.lang.String type,
java.lang.String settings,
java.lang.String comment)
Sets type, name, parameters and comment for a given descriptor component. |
void |
setDescriptor(java.lang.String name,
java.lang.String type,
MDParameters params,
java.lang.String comment)
Sets type, name, parameters and comment for a given descriptor component. |
void |
setDescriptor(java.lang.String name,
java.lang.String type,
java.lang.String settings,
java.lang.String comment)
Sets the descriptor to be generated. |
void |
setDescriptors(java.lang.String[] names,
java.lang.String[] types,
MDParameters[] params,
java.lang.String[] comments)
Sets type, name, parameters and comment for all components of a
molecular descriptor set. |
void |
setDescriptors(java.lang.String[] names,
java.lang.String[] types,
java.lang.String[] settings,
java.lang.String[] comments)
Sets all descriptor components to be generated simultaneously. |
void |
setGenerateId(int from)
Toggles automatic unique structure/descriptor identifier generation mode
and sets the value of the first unique identifier. |
void |
setIdTagName(java.lang.String idTagName)
Sets the name of the input SDfile tag which contains unique structure
identifiers. |
void |
setInput(java.io.InputStream input)
Sets the input to an already opened molecular structure stream. |
void |
setInput(java.lang.String inputFileName)
Sets the name of the input molecular structure file. |
void |
setOutputFileName(java.lang.String outputFileName)
Sets the name of the output SDfile. |
void |
setSDfileInput(boolean sdfInput)
Toggles input file type. |
void |
setSDfileOutput(boolean sdfOutput)
Toggles SDfile output format. |
void |
setSelectStatement(java.lang.String whereClause)
Sets the optional select statement for fetching molecules from the
structure table. |
void |
setStructureTableName(java.lang.String structureTableName)
Sets the name of the structure table to take molecular structures from. |
void |
setTagName(int index,
java.lang.String name)
Sets the SDfile tag name for the given descriptor set component. |
void |
setTagName(java.lang.String name)
Sets the SDfile tag name for the only descriptor type generated. |
void |
setTagNames(java.lang.String[] names)
Sets the SDfile tag names for all descriptor set components. |
void |
setUpdateOnInsert(boolean updateOnInsert)
Sets/clears automatic update on insert mode. |
void |
setValidateDescriptor(java.lang.String activityTagName,
double clusteringRadius,
java.lang.String metric)
Sets parameters for the Activity-seeded Structure-based clustering. |
boolean |
step()
Fetches one structure from the input source and generates descriptors
as specified before initialization by the setter methods. |
void |
updateMDTable(java.lang.String descrName)
Systematically regenerates all descriptors. |
void |
validateDescriptor()
Validates a descriptor by the activity-seeded structure-based clustering. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
GenerateMD
public GenerateMD()
- Creates an empty
MolecularDescripotor generator object.
GenerateMD
public GenerateMD(int descriptorCount)
- Creates an object for generating the given number of different
MolecularDescriptors (a molecular descriptor set, MDSet
) simultaneously.
- Parameters:
descriptorCount - number of independent descriptor types to be
generated
setConnectionHandler
public void setConnectionHandler(ConnectionHandler connectionHandler)
throws MDGeneratorException
- Sets the database connection when both structures and
descriptors are stored in a database.
- Parameters:
connectionHandler - valid connection to a database
- Throws:
MDGeneratorException - when attempting to call this method
after init()
setStructureTableName
public void setStructureTableName(java.lang.String structureTableName)
throws MDGeneratorException,
java.sql.SQLException
- Sets the name of the structure table to take molecular structures from.
Use this when input comes from a database.
- Parameters:
structureTableName - name of the database table of input
structures
- Throws:
MDGeneratorException - when attempting to call this method
after init() or when
there is no valid database
connection; or if descriptor validation
option was selected beforehand
java.sql.SQLException - in the case of database management errors
setUpdateOnInsert
public void setUpdateOnInsert(boolean updateOnInsert)
- Sets/clears automatic update on insert mode. Auto-update on insert means
that the descriptor table is automatically updated when a new structure
is inserted into the original structure table.
- Parameters:
updateOnInsert - indicates auto-update mode- Since:
- JChem 2.3
setSelectStatement
public void setSelectStatement(java.lang.String whereClause)
throws MDGeneratorException
- Sets the optional select statement for fetching molecules from the
structure table.
- Parameters:
whereClause - restrict clause without the WHERE
statement
- Throws:
MDGeneratorException - when attempting to call this method
after init() or when
there is no valid and alive database
connection, or when no structure table
name has been set
setInput
public void setInput(java.lang.String inputFileName)
throws MDGeneratorException,
java.io.IOException
- Sets the name of the input molecular structure file.
- Parameters:
inputFileName - name of the input file
- Throws:
MDGeneratorException - when attempting to call this method
after init() or when
there is alreadt a valid and alive
database connection
MDGeneratorException
java.io.IOException
setInput
public void setInput(java.io.InputStream input)
throws MDGeneratorException
- Sets the input to an already opened molecular structure stream.
- Parameters:
input - an input stream
- Throws:
MDGeneratorException - when attempting to call this method
after init() or when
there is already a valid and alive
database connection
setSDfileInput
public void setSDfileInput(boolean sdfInput)
throws MDGeneratorException
- Toggles input file type.
- Parameters:
sdfInput - indicates, if input file is an SDfile
- Throws:
MDGeneratorException - when attempting to call this method
after init() or when no
input file has been specified
setOutputFileName
public void setOutputFileName(java.lang.String outputFileName)
throws MDGeneratorException
- Sets the name of the output
SDfile. Note, that if the
required output is one or more descriptor file(s), it (they) should not
be specified as output file(s), but as descriptor name(s).
- Parameters:
outputFileName - name of the output SDfile
- Throws:
MDGeneratorException - when attempting to call this method
after init() or when
there is already a valid and alive
database connection
setSDfileOutput
public void setSDfileOutput(boolean sdfOutput)
- Toggles SDfile output format.
- Parameters:
sdfOutput - indicates if output file is an SDfile
setDecimalOutput
public void setDecimalOutput(boolean decimalOutput)
- Sets decimal output format. This file format is recognized by JKlustor
tools.
- Since:
- JChem 2.0.1
setBinaryOutput
public void setBinaryOutput(boolean binaryOutput)
- Sets decimal output format. This file format is recognized by JKlustor
tools.
- Since:
- JChem 2.3
setIdTagName
public void setIdTagName(java.lang.String idTagName)
- Sets the name of the input SDfile tag which contains unique structure
identifiers. These identifiers are printed in each line of the decimal
output format.
- Parameters:
idTagName - SDfile structure identifier tag name- Since:
- JChem 2.0.1
setValidateDescriptor
public void setValidateDescriptor(java.lang.String activityTagName,
double clusteringRadius,
java.lang.String metric)
throws MDGeneratorException
- Sets parameters for the Activity-seeded Structure-based clustering.
- Parameters:
activityTagName - name of the SDfile tag storing activity dataclusteringRadius - dissimilarity radius of a cluster around a seedmetric - metric used in clustering
- Throws:
MDGeneratorException- Since:
- JChem 2.3
setDescriptor
public void setDescriptor(java.lang.String name,
java.lang.String type,
java.lang.String settings,
java.lang.String comment)
throws MDGeneratorException
- Sets the descriptor to be generated. Use this method when
descriptor of one type are generated (that is, the descriptor set
has one component only).
- Parameters:
name - user given name of the descriptortype - type name of the descriptor (e.g. ChemicalFingerprint)settings - parameter settings of the descriptorcomment - optional comment to be stored in database
- Throws:
MDGeneratorException - when attempting to call this method
after init() or when
applying this method in multiple
descriptor case
setDescriptor
public void setDescriptor(int index,
java.lang.String name,
java.lang.String type,
java.lang.String settings,
java.lang.String comment)
throws MDGeneratorException
- Sets type, name, parameters and comment for a given descriptor component.
Use this method when more than one descriptors are generated at a time
(e.g. CF and PF simultaneously).
- Parameters:
index - index of the componentname - user given name of the descriptor set componenttype - type of the descriptor set component (e.g.
ChemicalFingerprint)settings - parameter settings for the descriptorcomment - optional comment to be stored in database
- Throws:
MDGeneratorException - when attempting to call this method
after init() or when
another component's settings were
specified with an MDParameters
object (rather than a
String
setDescriptors
public void setDescriptors(java.lang.String[] names,
java.lang.String[] types,
java.lang.String[] settings,
java.lang.String[] comments)
throws MDGeneratorException
- Sets all descriptor components to be generated simultaneously.
- Parameters:
names - user given names of the descriptor set componentstypes - type names of the descriptors (e.g. ChemicalFingerprint)settings - parameter settings for the descriptorscomments - optional comments to be stored in database
- Throws:
MDGeneratorException - when attempting to call this method
after init()
setDescriptor
public void setDescriptor(java.lang.String name,
java.lang.String type,
MDParameters params,
java.lang.String comment)
throws MDGeneratorException
- Sets type, name, parameters and comment for a given descriptor component.
Use this method when only one descriptor type is generated.
- Parameters:
name - user given name of the descriptortype - type name of the descriptor (e.g. ChemicalFingerprint)params - parameter settings for the descriptorcomment - optional comment to be stored in database
- Throws:
MDGeneratorException - when attempting to call this method
after init() or when
applying this method in multiple
descriptor case
setDescriptor
public void setDescriptor(int index,
java.lang.String name,
java.lang.String type,
MDParameters params,
java.lang.String comment)
throws MDGeneratorException
- Sets type, name, parameters and comment for a given descriptor component.
Use this method when more than one descriptors are generated at a time
and they are not specified all in one go.
- Parameters:
index - index of the component to be specifiedname - user given name of the descriptortype - type name of the descriptor (e.g. ChemicalFingerprint)params - parameter settings of the descriptorcomment - optional comment to be stored indatabase only
- Throws:
MDGeneratorException - when attempting to call this method
after init() or when
a previously set component was specified
with a String parameter
setting
setDescriptors
public void setDescriptors(java.lang.String[] names,
java.lang.String[] types,
MDParameters[] params,
java.lang.String[] comments)
throws MDGeneratorException
- Sets type, name, parameters and comment for all components of a
molecular descriptor set.
- Parameters:
names - user given names of the descriptor componentstypes - type names of the descriptor components (e.g.
ChemicalFingerprint)params - parameter settings for the descriptor componentscomments - optional comments to be stored in database
- Throws:
MDGeneratorException - when attempting to call this method
after init()
setTagName
public void setTagName(java.lang.String name)
throws MDGeneratorException
- Sets the SDfile tag name for the only descriptor type generated.
- Parameters:
name - SDfile tag name
- Throws:
MDGeneratorException - when attempting to call this method
after init() or when
applying this method in multiple
descriptor case
setTagName
public void setTagName(int index,
java.lang.String name)
throws MDGeneratorException
- Sets the SDfile tag name for the given descriptor set component.
- Parameters:
index - index of the componentname - SDfile tag name
- Throws:
MDGeneratorException - when attempting to call this method
after init()
setTagNames
public void setTagNames(java.lang.String[] names)
throws MDGeneratorException
- Sets the SDfile tag names for all descriptor set components.
- Parameters:
names - SDfile tag names
- Throws:
MDGeneratorException - when attempting to call this method
after init()
setGenerateId
public void setGenerateId(int from)
throws MDGeneratorException
- Toggles automatic unique structure/descriptor identifier generation mode
and sets the value of the first unique identifier.
- Parameters:
from - the value of the first id to be generated
- Throws:
MDGeneratorException - when attempting to call this method
after init() or when
attempting to generate ID-s for database
structures
setCreateStat
public void setCreateStat(boolean createStat)
- Toggles create statistics flag.
- Parameters:
createStat - new value for the create statistics flag
init
public void init()
throws MDGeneratorException
- Initialize the generator object. Call this method only after all
features, modes and parameters have been set by the setter methods.
- Throws:
MDGeneratorException - when attempting to initialize once again,
all input/output (file creation and
writing) and all database (SQL)
exceptions are re-thrown
step
public boolean step()
throws MDGeneratorException
- Fetches one structure from the input source and generates descriptors
as specified before initialization by the setter methods.
- Returns:
- true if a structure was successfully processed
- Throws:
MDGeneratorException - when not yet initialized or failure
to read input or write output
getCounter
public int getCounter()
throws MDGeneratorException
- Gets the number of molecules processed since
init() was
called.
- Returns:
- number of structures processed
- Throws:
MDGeneratorException - when not yet initialized
getASSBClusters
public int[] getASSBClusters()
run
public void run()
throws MDGeneratorException
- Processes all structures from the input source. Structure from the input
are retrieved one-by-one and all descriptors types set earlier (by the
set methods) are generated and stored in the specified output.
- Throws:
MDGeneratorException - not yet initialized or failed
to read input or write output
getStatistics
public java.lang.String getStatistics(int di)
- Gets statistical data on descriptors generated.
- Parameters:
di - descriptor component index
- Returns:
- statistics in a formatted string
- Since:
- JChem 2.1
validateDescriptor
public void validateDescriptor()
- Validates a descriptor by the activity-seeded structure-based clustering.
close
public void close()
throws MDGeneratorException
- Closes the generator, all output files or database connection.
- Throws:
MDGeneratorException - when not yet initialized or failed
to close output files
createMDTable
public void createMDTable(java.lang.String descrName,
java.lang.String className,
java.lang.String settings,
java.lang.String comment)
throws MDGeneratorException
- Creates a database table to store the
MolecularDescriptors
generated. There is no need to call this method directly if descriptors
are generated with methods offered by this class, for advanced usage only.
The corresponding structure table's name should be set by
setStructureTableName( String ) prior to calling
this function.
- Parameters:
descrName - symbolic name of the descriptor, given by the userclassName - name of the class implementing the descriptorsettings - parameter stringcomment - optional comment
- Throws:
MDGeneratorException - when there is no valid database
connection or an SQL error occured
deleteMDTable
public void deleteMDTable(java.lang.String descrName)
throws MDGeneratorException,
java.sql.SQLException
- Deletes a database table that strores molecular descriptors. All raws,
the table and all corresponding administrative information is lost
irreversibly.
- Parameters:
descrName - name of the descriptor (as given by the user when
created)
- Throws:
MDGeneratorException - when there is no valid database
java.sql.SQLException - any database error
updateMDTable
public void updateMDTable(java.lang.String descrName)
throws MDGeneratorException,
java.sql.SQLException
- Systematically regenerates all descriptors. Call this method, when new
structures are added to the structure table.
- Parameters:
descrName - name of the descriptor (as given by the user when
created)
- Throws:
MDGeneratorException - when there is no valid database
java.sql.SQLException - any database error
addMDConfig
public void addMDConfig(java.lang.String descrName,
java.lang.String configName,
java.lang.String config)
throws MDGeneratorException,
java.sql.SQLException
- Adds a new parameter configuration to the descriptor. Such extra
configurations, often called as 'screening configurations' can extend
or overwrite parameter settings stored in the time of creation. A typical
example is adding new dissimilarity metrics optimized for a new active
compound family to the existing set of metrics.
- Parameters:
descrName - name of the descriptor (as given by the user when
created)configName - symbolic name given by the user to help the
identification of the extension configurationconfig - extra configuration settings
- Throws:
MDGeneratorException - when there is no valid database or an
existing configuration is attempted to
be redefined
java.sql.SQLException - any database error
addMDConfig
public void addMDConfig(java.lang.String descrName,
java.lang.String configName,
java.io.File configFile)
throws java.sql.SQLException,
java.io.IOException,
MDGeneratorException
- Adds a new parameter configuration to the descriptor. Such extra
configurations, often called as 'screening configurations' can extend
or overwrite parameter settings stored in the time of creation. A typical
example is adding new dissimilarity metrics optimized for a new active
compound family to the existing set of metrics.
- Parameters:
descrName - name of the descriptor (as given by the user when
created)configName - symbolic name given by the user to help the
identification of the extension configurationconfigFile - file of extra configuration settings
- Throws:
MDGeneratorException - when there is no valid database or an
existing configuration is attempted to
be redefined
java.sql.SQLException - any database error
java.io.IOException
deleteMDConfig
public void deleteMDConfig(java.lang.String descrName,
java.lang.String configName)
throws MDGeneratorException,
java.sql.SQLException
- Deletes an extension configuration.
- Parameters:
descrName - name of the descriptor (as given by the user when
created)configName - symbolic name given by the user to help the
identification of the extension configuration
- Throws:
MDGeneratorException - when there is no valid database
java.sql.SQLException - any database error
getMDNames
public java.lang.String[] getMDNames()
throws MDGeneratorException,
java.sql.SQLException
- Gets the names of all descriptor types stored in the database that are
associated with the current structure table.
- Returns:
- molecular descriptors' names
- Throws:
MDGeneratorException - when there is no valid database
java.sql.SQLException - any database error
main
public static void main(java.lang.String[] args)
- Command-line entry point to the
MolecularDescriptor generator.
- Parameters:
args - the command line arguments