Chemical Terms Evaluator

Version 5.0.6

Contents

 

Introduction

The Chemical Terms Evaluator is designed to evaluate mathematical expressions on molecules. These expressions usually have a chemical meaning formulated in ChemAxon's Chemical Terms Language using built-in chemical and general purpose functions. It is also possible to extend this built-in set of calculations by a user-defined configuration.

Apart from evaluating Chemical Terms by the evaluate chemaxon command line tool, this evaluation mechanism is used for chemical calculations in chemaxon products where computational and/or search conditions come into the picture, such as pharmacophore feature identification (PMapper) (note, that pmapper feature definitions use a specific syntax), reaction definitions (Reactor), database filters and chemical calculations (JChem Cartridge).

The heart of the evaluator mechanism is the JEP Java Expression Parser.

You may want to look at the complete language reference including a description of the expression syntax and some simple examples showing how some well-known chemical rules can be formulated in this language.

Evaluator uses molecule context to set the input molecule, therefore calculations refer to the input molecule by default. The language reference also includes a set of Evaluator examples. A set of working examples is available.

 

Configuration

The configuration file is an XML file containing some/all of the following optional subsections:

  1. Evaluator parameters: this section specifies general evaluator parameters, currently cache-mode can be set here
  2. Molecule constant definitions: this section defines the molecule structures that can be referenced in the expression strings, e.g. as query molecules in matching conditions
  3. Matching conditions: this section specifies the reference ID of the substructure matching function and its search options in case when they are different from the default substructure search settings (these override the default matching condition )
  4. Plugin definitions: this section describes the plugins and their parameters that can be referenced from the expression (these override the default plugin definitions)
  5. Function definitions: this section describes the predefined and user-defined functions that can be referenced from the expression (these override the default function definitions)
  6. Standardization: this section specifies the standardization of molecules (including actions with their Groups attribute set to "target" which are processed for input molecules, but are skipped for molecule constants (queries) referenced in the configuration XML)

Evaluator Parameters

The evaluator parameter section currently sets the cache-mode attribute: if set to "true" then matching condition and plugin calculation results are cached in the molecule object and reused instead of performing the same structure search or chemical calculation repeatedly. The default is "false", since typically a Chemical Terms evaluation does not contain multiple references to the same matching condition or calculation and the caching procedure by itself also has some overhead.

Example:

<Params Cached="true"/>

Molecule Constant Definitions

The molecule constant definition section contains the definitions of the molecular structures used as query structures in search conditions. Each molecular structure definition is given in a separate <Mol> section. The reference name of the molecular structure is given in the ID attribute, the structure itself is either specified inline as a SMILES string or as the path to the file containing the molecular structures in any of the recognized formats (for instance MDL molfile, Compressed molfile, SDfile, Compressed SDfile, SMILES). Specifying one of these two attributes is required for each molecular structure.

Example:

<Mols>
    <Mol ID="pos" Path="../PD/Positive.mol"/>
    <Mol ID="amine" Structure="../PD/Amine.mol"/>
    <Mol ID="amide" Structure="../PD/Amide.mol"/>
    <Mol ID="nitro" Structure="../PD/Nitro.mol"/>
    <Mol ID="hydrazide" Structure="../PD/Hydrazide.mol"/>
    <Mol ID="amidine" Structure="../PD/Amidine.mol"/>
    <Mol ID="carboxyl" Structure="[OH:2]C=[O:1]"/>
    <Mol ID="qh" Structure="[!#1!#6:1][H]"/>
</Mols>

The SMILES format strings specified in the above XML configuration file contain the molecular structures displayed below.

Molecule constant examples
 

Matching Conditions

The matching condition declaration enables the Match function to be used in expression strings. This function performs substructure search and optionally checks for atom matching.

Declaration

The declaration gives a reference ID to the function and can specify the search attributes in case when they differ from the default settings. Each attribute is optional. If omitted then the default value is used. For a detailed description of the search options see the JChem Query Guide.

Search attributes that can be set in the Search section.
AttributeRangeDefault Value
StereoSearchtrue/falsetrue
DoubleBondStereoMatchingModenone/marked/allmarked
SubgraphSearchtrue/falsetrue
ExactAtomMatchingtrue/falsefalse
ExactStereoMatchingtrue/falsefalse
OrderSensitiveSearchtrue/falsefalse

Example:

<Matching ID="match">
    <Search DoubleBondStereoMatchingMode="all" OrderSensitiveSearch="true"/>
</Matching>

A detailed description of the usage of the match function in expression strings is given below. A table of match function descriptions with examples is also available as a short reference.

Plugin Definitions

The plugin declarations enables different structure based chemical calculations (e.g. pKa, logP, logD) to be referenced in the expression strings.

Declaration

The plugin definition section contains the following data for each plugin reference that is to be used in the expressions:

  1. the plugin name which the plugin is referenced by in the expression,
  2. the plugin JAR relative to the $JCHEMHOME/plugins directory, where the plugin class should be loaded from (optional, loaded from the usual CLASSPATH if omitted)
  3. the plugin java class which wraps the plugin calculation into a prescribed frame (see Writing a Custom Plugin for details on how to wrap a calculation into a plugin)
  4. the plugin parameters as parameter name-value pairs - this section is optional: if omitted, the default plugin parameters are used

The set of possible plugin parameters and a short description for each plugin can be seen with the help of the cxcalc program:

cxcalc <plugin> -h

where plugin is the plugin ID in the cxcalc configuration file. The parameter names used by the Evaluator are the long command line parameter names, without the starting '--' double dashes. For example, take pKa, type:

cxcalc pka -h

which prints out the following help text:

Calculator plugin: pka.
pKa calculation.
 
Usage:
  cxcalc [general options] [input files] pka
[pka options] [input files]
 
pka options:
  -h, --help       this help message
  -p, --precision  <floating point precision as number of
                   fractional digits: 0-8 or inf> default: 2
  -t, --type       [pKa|acidic|basic] (default: pKa)
  -m, --mode       [macro|micro] (default: macro)
  -n, --ions       max number of ionizable atoms to be considered (default: 8)
  -i, --min        min basic pKa (default: -10)
  -x, --max        max acidic pKa (default: 20)
  -a, --na         number of acidic pKa values displayed (default: 2)
  -b, --nb         number of basic pKa values displayed (default: 2)

The help, precision, na and nb parameters refer to display options, therefore these are not used by the Evaluator. Thus the parameter set for the pKa calculation in our case is:

type, mode, ions, min, max.

The same plugin can be used with different parameter settings if the XML configuration has more than one <Plugin> section with the same java class but different plugin names used to reference the plugins with each of the different parameter sections. In the following example the pKa1 name references pKa calculation with minimal basic pKa value -3 and maximal acidic pKa value 10 while the pKa2 name references pKa calculation with minimal basic pKa value -20 and maximal acidic pKa value 30. Different functions of a calculator plugin can be referenced by different IDs. In the example below, the "mass" result type of the ElemetalAnalyser plugin is referenced by the mass name, while the "exactmass" result type of the same plugin is referred by the exactmass name.

Example:

<Plugins>
    <Plugin ID="charge" 
        Class="chemaxon.marvin.calculations.ChargePlugin"
        JAR="ChargePlugin.jar"/>
    <Plugin ID="ioncharge" Class="chemaxon.marvin.calculations.IonChargePlugin">
	<Param Name="pH" Value="3.6"/>
        <Param Name="max-ions" Value="6"/>
	<Param Name="min-percent" Value="5"/>
	<Param Name="charge-type" Value="accumulated"/>
    </Plugin>
    <Plugin ID="microspecies" Class="chemaxon.marvin.calculations.MajorMicrospeciesPlugin"/>
    <Plugin ID="pka" Class="chemaxon.marvin.calculations.pKaPlugin"/>
    <Plugin ID="pKa1" 
        Class="chemaxon.marvin.calculations.pKaPlugin">
	<Param Name="min" Value="-3"/>
	<Param Name="max" Value="10"/>
    </Plugin>
    <Plugin ID="pKa2" 
        Class="chemaxon.marvin.calculations.pKaPlugin">
	<Param Name="min" Value="-20"/>
	<Param Name="max" Value="30"/>
    </Plugin>
    <Plugin ID="logp" 
        Class="chemaxon.marvin.calculations.logPPlugin">
        <Param Name="type" Value="logPMicro"/>
    </Plugin>
    <Plugin ID="mass" 
        Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin">
	<Param Name="type" Value="mass"/>
    </Plugin>
    <Plugin ID="exactmass" 
        Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin">
	<Param Name="type" Value="exactmass"/>
    </Plugin>
    <Plugin ID="logp" Class="chemaxon.marvin.calculations.logPPlugin"/>
    <Plugin ID="logd" Class="chemaxon.marvin.calculations.logDPlugin"/>
    <Plugin ID="acc" Class="chemaxon.marvin.calculations.HBDAPlugin">
	<Param Name="type" Value="acc"/>
    </Plugin>
    <Plugin ID="don" Class="chemaxon.marvin.calculations.HBDAPlugin">
	<Param Name="type" Value="don"/>
    </Plugin>
    <Plugin ID="acceptorcount" Class="chemaxon.marvin.calculations.HBDAPlugin">
	<Param Name="type" Value="acceptorcount"/>
    </Plugin>
    <Plugin ID="donorcount" Class="chemaxon.marvin.calculations.HBDAPlugin">
	<Param Name="type" Value="donorcount"/>
    </Plugin>
</Plugins>

Function Definitions

The expression strings can also include references to predefined functions. These functions are implemented by java classes that have to implement the org.nfunk.jep.function.PostfixMathCommandI interface. See the JEP API Documentation for details.

Declaration

The function definition section contains the user-defined function implementation java classes accessible from the expressions. Each class is given an ID: this is the name that the function is referenced by from the expression. The Class attribute specifies the java class that implements the function. A predefined function may have preset parameters in a similar fashion as in the Plugin declaration section. Currently only the atomic property query function applies this for presetting the name of the atomic property to be queried.

Example:

    <Functions>
        <Function ID="array" Class="chemaxon.jep.function.IntArray"/>
        <Function ID="min" Class="chemaxon.jep.function.Min"/>
	<Function ID="max" Class="chemaxon.jep.function.Max"/>
	<Function ID="count" Class="chemaxon.jep.function.Count"/>
	<Function ID="sum" Class="chemaxon.jep.function.Sum"/>
	<Function ID="sortasc" Class="chemaxon.jep.function.SortAsc"/>
	<Function ID="sortdesc" Class="chemaxon.jep.function.SortDesc"/>
	<Function ID="in" Class="chemaxon.jep.function.In"/>
	<Function ID="eval" Class="chemaxon.jep.function.AtomEvaluatorFunction"/>
	<Function ID="filter" Class="chemaxon.jep.function.Filter"/>
	<Function ID="minatom" Class="chemaxon.jep.function.MinAtom"/>
	<Function ID="maxatom" Class="chemaxon.jep.function.MaxAtom"/>
	<Function ID="minvalue" Class="chemaxon.jep.function.MinValue"/>
	<Function ID="maxvalue" Class="chemaxon.jep.function.MaxValue"/>
	<Function ID="atomprop" Class="chemaxon.jep.function.AtomProperties"/>
	<Function ID="hcount" Class="chemaxon.jep.function.AtomProperties">
	    <Param Name="property" Value="hcount"/>
	</Function>
	<Function ID="connections" Class="chemaxon.jep.function.AtomProperties">
	    <Param Name="property" Value="connections"/>
	</Function>
	<Function ID="valence" Class="chemaxon.jep.function.AtomProperties">
	    <Param Name="property" Value="valence"/>
	</Function>
	<Function ID="atno" Class="chemaxon.jep.function.AtomProperties">
	    <Param Name="property" Value="atno"/>
	</Function>
	<Function ID="map" Class="chemaxon.jep.function.AtomProperties">
	    <Param Name="property" Value="map"/>
	</Function>
	<Function ID="arom" Class="chemaxon.jep.function.AtomProperties">
	    <Param Name="property" Value="arom"/>
	</Function>
    </Functions>

Default function, plugin definitions and matching condition

Default plugin and function definitions as well as the default matching condition are read from the built-in evaluator.xml file located under the chemaxon/jep directory in jchem.jar provided by ChemAxon. Plugins, functions and matching conditions defined by the user are read from $JCHEMHOME/config/evaluator.xml file and from evaluator.xml file located under the .chemaxon (UNIX / Linux) or chemaxon (Windows) subdirectory in the user's home directory. The user defined XML configuration elements are added to default configuration, if both exist then user defined configuration override the built-in settings.

 

Standardization

Standardization used on molecule constants (excluding target-only tasks) and on input molecules (including target-only tasks) before evaluation is defined in the Standardizer subsection in the same way as in the XML configuration of the Standardizer. For a detailed description of the standardization process and configuration, see the standardizer manual.

Example:

<Standardizer>
    <Actions>
	<Transformation ID="plusminus" Structure="[*+:1][*-:2]>>[*:1]=[*:2]"/>
        <Removal ID="keepOne" Method="keepLargest" Measure="molMass" Groups="target"/>
	<Action ID="aromatize" Act="aromatize"/>
    </Actions>
</Standardizer>
 

Usage

The command line tool evaluate evaluates a single expression and prints the result in human readable text format or else outputs the input molecule with the result set as a specified SDF tag.
    evaluate [<options>] [<input files>]

Prepare the usage of the evaluate script or batch file as described in Preparing the Usage of JChem Batch Files and Shell Scripts.

Alternatively, the Evaluator class can be directly invoked:

Win32 / Java 2 (assuming that JChem is installed in c:\jchem):

    java -cp "c:\jchem\lib\jchem.jar;%CLASSPATH%" \
        chemaxon.jep.Evaluator [<options>] [<input files>]

Unix / Java 2 (assuming that JChem is installed in /usr/local/jchem):

    java -cp "/usr/local/jchem/lib/jchem.jar:$CLASSPATH" \
        chemaxon.jep.Evaluator [<options>] [<input files>]

Options

Options:
  -h, --help                          this help message

Input Options:
  -c, --config <filepath>             configuration XML file
                                      (if omitted then default
                                      configuration is applied)
  -n, --no-input-mol                  expression should be evaluated
                                      without input molecule
  -e, --expr-string <str|filepath>    expression string or file

Output Options:
  -o, --output <filepath>             output file path (default: stdout)
  -g, --ignore-error                  continue with next molecule on error
  -v, --verbose                       verbose output
  -C, --clean <dim>                   clean output molecules (dim: 2 or 3)
  -f, --format <format>               output format if result is molecule
                                      (default: smiles or smarts)
                                      (ignores the output options below)
  -x, --extract <format>              extract mode: write exactly those
                                      molecules in the specified format that
                                      satisfy the input boolean expression
                                      (excludes other output options)
  -p, --precision <precision>         max. number of fractional digits
                                      in the output (default: 2)
  -S, --sdf-output                    SDF output (otherwise text output)
  -t, --tag                           name of the SDFile tag to store the
                                      evaluation result (default: CALC)
  -i, --include-expr                  output expression string

The input molecule file can contain more then one molecules, in this case the expression evaluation is performed for all input molecules one-by-one.

The command line parameter --config specifies the filename of the configuration file that configures the available plugins and query molecule constants.

If the command line parameter --no-input-mol is specified then expression is evaluated without input molecule.

The command line parameter --expr-string specifies the expression string if it is given on the command line or the file path containing the expression string.

The command line parameter --format specifies the output molecule format in case when the output is a molecule or a molecule array. The default format is SMILES / SMARTS. If this option is used then all other output options except for --output, --ignore-error and --verbose are ignored.

If the command line parameter --clean is specified then result molecules as well as SDF output is cleaned in the given dimension.

If the command line parameter --extract is specified then the input expression is used as a molecule filter: for each input molecule it is evaluated as a boolean condition and the program filters the molecules that satisfy this condition, that is, for which the expression evaluation result is true. These molecules are written as output in the specified format. If this option is used then all other output options except for --output, --ignore-error and --verbose are ignored.

The command line parameter --precision specifies the maximum number of fractional digits to be displayed in the output.

If the command line parameter --sdf-output is specified then input molecules are written to the output in SDF format with evaluation result set as an SDF tag. The command line parameter --tag specifies this SDF tag.

If the command line parameter --include-expr is specified then the evaluation result is preceeded by the expression string itself in the output.

If the command line parameter --ignore-error is specified, then import/export errors will not stop the processing but the error is written to the console and the molecule is skipped. By default, the program exits in case of molecule import/export erros.

 

Input

The software may take molecules from a text file. Most molecular file formats are accepted ( MDL molfile, Compressed molfile, SDfile, Compressed SDfile, SMILES, etc.).

If no input file name is given in the command line the standard input is read.

 

Output

If no output file name is given, results are written to the standard output.

If the --sdf-output command line parameter is specified, the output format is SDF and the evaluation result is written to an SDF tag (default tag: CALC). Otherwise only the evaluation result is written to the output in simple text format.

 

Usage Examples

  1. Calculates the molecule mass for the molecules in the target.sdf file where the mass calculator plugin is defined in the config.xml configuration file:
    evaluate -c config.xml -e "mass()" target.sdf
    
  2. Filters molecules with molecule mass at least 200, molecule mass is computed according to the default configuration:
    evaluate -e "mass() >= 200" -x sdf -o heavy.sdf target.sdf
    
  3. Evaluates the expression in file calc.txt for molecules in target.sdf, uses the default configuration:
    evaluate -e calc.txt target.sdf
    
  4. The same with SDF output into file with results written to the SDF tag RESULT, preceded by the expression string:
    evaluate -e calc.txt -S -i -t RESULT -o result.sdf target.sdf
    
  5. The same but the expression string is given in the expr.txt file:
    evaluate -e expr.txt -m query.sdf target.sdf
    
  6. Calculates partial charges for each atom with precision of 3 fractional digits, uses the charge calculation defined in config.xml:
    evaluate -c config.xml -e "charge()" -p 3 target.sdf
    
  7. The same with SDF file output with charge values written to the CHARGES SDF tag:
    evaluate -c config.xml -e "charge()" -p 3 -S -t CHARGE -o result.sdf target.sdf
    
  8. Enumerates atoms 1 and 2 of the Markush structure m.mrv, writes the resulting structures in MRV format:
    evaluate -e "enumeration('1,2')" m.mrv -f mrv
    
 
Copyright © 1999-2008 ChemAxon Ltd.    All rights reserved.