Chemical Terms Language Reference
Version 5.9.4
Contents
- Introduction
- Language Elements
- Expression Syntax
- Predefined Functional Groups and Named Molecule Groups
- Initial Scripts
- Input Contexts
- Configuration
- Examples
Further reading
Introduction
This document describes ChemAxon's Chemical Terms Language. This language is used to formulate chemical expressions in general. Its current usage includes chemical rules for reaction processing, search filters or both as chemical calculations and chemical filtering in JChem Cartridge. The Evaluator command line tool and the Evaluator API are also available for general purpose expression evaluation.
The Chemical Terms Evaluator is designed to evaluate mathematical expressions on molecules using built-in chemical and general purpose functions. It is also possible to extend this built-in set of calculations by a user-defined configuration.
The heart of the evaluator mechanism is the JEP Java Expression Parser, equipped with chemical plugin calculations, chemical substructure search and some additional chemical and general purpose functions. User defined functions can also be added to this function set.
Here are some simple examples showing how some well-known chemical rules can be formulated for a given input molecule read from a molecule context:
The following filters are used in drug discovery and drug development to narrow down the scope of molecules. They provide estimation on solubility and permeability of orally active compounds considering their physical and chemical properties. The examined properties are given as chemical terms.- Lipinski's rule of five states that the absorption or permeation of a molecule is more likely when the molecular weight is under 500 g/mol,
the value of logP is lower than 5, and the molecule has utmost 5 H-donor and 10 H-acceptor atoms. The definition of the aforementioned rule by ChemicalTerms is:
(mass() <= 500) && (logP() <= 5) && (donorCount() <= 5) && (acceptorCount() <= 10)
- Lead-likeness:
(mass() <= 450) && (logD("7.4") >= -4) && (logD("7.4") <= 4) && (ringCount() <= 4) && (rotatableBondCount() <= 10) && (donorCount() <= 5) && (acceptorCount() <= 8) - Bioavailability:
(mass() <= 500) + (logP() <= 5) + (donorCount() <= 5) + (acceptorCount() <= 10) + (rotatableBondCount() <= 10) + (PSA() <= 200) + (fusedAromaticRingCount() <= 5) >= 6
Note, that summing up the7subresults above means to count how many of them are satisfied. The requirement that this sum should be at least6means that we do not require all of the subconditions to be satisfied but instead we allow at most one of them to fail. - Ghose filter:
(mass() >= 160) && (mass() <= 480) && (atomCount() >= 20) && (atomCount() <= 70) && (logP() >= -0.4) && (logP() <= 5.6) && (refractivity() >= 40) && (refractivity() <= 130)
- Scaffold hopping:
refmol = "actives.sdf"; dissimilarity("ChemicalFingerprint", refmol) - dissimilarity("PharmacophoreFingerprint", refmol) > 0.6Note, that molecule constants can be defined by a molecule file path or a SMILES string. Multiple expressions are separated by ';' characters, whitespace characters can be added freely for readability, since they are not considered by the evaluation process.
A set of working examples is also available.
Language Elements
The Chemical Terms Evaluator parses and evaluates expressions that are built from the following language elements:
- the usual arithmetics: addition (
+), substraction(-), multiplication (*) and division (/), - the logical operators: AND (
&&), OR (||), NOT (!) - structure based chemical calculations with the help of chemical calculator plugins including charge, pKa, logP, logD calculations and extendible by others
- chemical and general purpose functions: predefined functions for taking maximum, minimum, getting atomic and molecular properties, performing chemical calculations that are not implemented by chemical calculator plugins, etc. The set of available functions can be extended by user defined functions.
- matching conditions:
searches predefined functional groups or certain atoms of the functional groups
specified by atom maps in the target molecule, either matching a target atom
with any of the specified query atoms or else searching the functional group in the
target molecule without a specified target atom, the return value is
trueif a matching is found,falseotherwise
Note: matching condition functions are not available in Marvin, they can be used only if JChem software package is installed. - input context functions for accessing input data
A set of short reference tables provides a summary of the available functions / calculations and the use of matching conditions.
Expression Syntax
Expression strings consist of an arbitrary number of initial assignments followed by a last subexpression that provides the evaluation result. An assignment sets a variable to the evaluation result of a subexpression. This variable can later be used to refererence this result. The assignment syntax is:
<identifier> = <subexpression>;
Note the ending ';' character. Examples for assignments:
x = 2; y = x + 8; z = f(x,y) + g(x,y);where
f and g are predefined functions.
An expression is an optional sequence of assignments followed by a subexpression providing the evaluation result:
<identifier1> = <subexpression1>; <identifier2> = <subexpression2>; ... <identifierN> = <subexpressionN>; <result subexpression>where
N can also be zero in which case the expression coincides with the result
subexpression.
Here is an example with assignments:
a = f(2,3); b = g(4,5); x = a + b; x*x
Here is the same without assignments:
(f(2,3) + g(4,5))*(f(2,3) + g(4,5))
Assignments increase efficiency if the same evaluation result is used more than once since inline repetition of a subexpression results in multiple evaluation. Assignments can also be used to increase readability. However, in most cases, when the expression is simple, assignments are not needed. Note, that whitespace characters (new-line, tab, space) are skipped when parsing the expression string, so whitespace characters can be freely used for increasing readability.
The following examples demonstrate the expression syntax with very simple subexpressions. Examples with chemical meaning are shown later for matching conditions, chemical calculations and chemical and general purpose functions.
Examples:- A simple expression:
3+2
- Using assignments:
x = 2; y = 3; x + y
- A more complicated one:
x = 2; y = 3; z = 8*(x + y); t = 6*x*y; z + t
- When the same value is used more than once:
x = (3 + 4)*8 + 16; y = 3*x; z = x + 20; 5*(y + 8) + 4*z
Predefined Functional Groups and Named Molecule Groups
It is sometimes easier to refer molecules by names rather than explicit SMARTS
strings or molecule file paths. For example, you may want to write nitro
or carboxyl as query in a match
function. Frequently used queries are pre-defined in the
built-in functional groups file
(chemaxon/marvin/templates/functionalgroups.cxsmi within MarvinBeans-templates.jar).
You can also define your favourite query SMARTS in marvin/config/marvin/templates/functionalgroups.cxsmi file and in
$HOME\chemaxon\marvin\templates\functionalgroups.cxsmi (Windows) or
$HOME/.chemaxon/marvin/templates/functionalgroups.cxsmi (UNIX / Linux)
file where marvin is the Marvin istallation directory, $HOME is your user
home directory.
However, there are some limitations when choosing the molecule names.
Molecule names should be composed of letter, digit characters, and the '_' character.
This means that molecule names cannot contain special characters, such as '=', '-', etc. with the exception of '_'.
Molecule name definitions in functionalgroups.cxsmi file can contain whitespace characters (space, tab), but when
names are referenced from a Chemical Terms expression the whitespace characters should be replaced with a single '_' character
(e.g. secondary amine should be referred as secondary_amine in Chemical Terms expressions).
Note: from Marvin 5.4 mols.smarts configuration file is not used by Chemical Terms. It is replaced by functionalgroups.cxsmi file.
Initial Scripts
You can define molecule sets and other constants in the user-defined
initial script $HOME/chemaxon/MARVIN_MAJOR_VERSION/jep.script (Windows) or
$HOME/.chemaxon/MARVIN_MAJOR_VERSION/jep.script (UNIX / Linux), where $HOME is
your user home directory, and MARVIN_MAJOR_VERSION is the major version of Marvin (e.g. "5.1").
This script is run right after the molecule sets are read and the constants defined here can be
used later in your chemical expressions. Any valid chemical terms assignment is allowed here,
and the terminating ';' characters may be omitted as long as you write each assignment in a
separate line.
Typically, you will define a molecule set by
- listing its members:
x = {acid_halide, alcohol, "[#6]CC[#8]"} y = {alkene, amide, imide, imine} z = {alkene, amide, amine, alcohol, isocyanate} - or deriving it from other sets with the help of set operators:
all = x + y + z (union of x, y, z) join = y * z (join of y and z) C = (x + y) * z (join of the union of x and y with z) D = z - alcohol (all elements of z except alcohol) E = (x + y) - z (union of x and y without the elements of z)
where+means set-union,*means set-join and-means exclusion.
Predefined molecules and molecule sets are most useful in query definitions of the match function:
match(amide)will test whether the input molecule matches an amide group,match(reactant(0), {amide,amine})will test whether the first reactant in a reaction context matches an amide or an aminematch(2, {metalloid,alcohol}, 1)will check whether atom2of the input molecule matches either a metalloid or an alcohol carbon - the last parameter1denotes the query atom map which picks the carbon from the alcohol definition;match(ratom(2), {metalloid,alcohol}, 1)is the same in a reaction context with checking target reactant atom which corresponds to the reactant atom with map2in the reaction equation.
Input Contexts
When evaluating an expression, the Evaluator substitutes data reference symbols by the corresponding data items. All data items belong to exactly one of the following data groups:
- constants: data having the same value at each evaluation
- numerical or string constants (e.g.
5,7.4,"acidic","mols/amine.mol","NCC(N)C1=CC(=CC=C1)C(O)=O") - molecule constants declared in the
configuration XML (e.g.
nitro,hydrazide,carboxyl)
- numerical or string constants (e.g.
- inputs: data possibly changing for each evaluation, such as
- input molecules and atoms,
- input reactants for reaction processing,
- created products in reaction processing
The type of the input data depends on the expression evaluation environment, which currently is one of the following:
- an expression string evaluated by the command line version of Evaluator refers to the current input molecule read from the input file(s) or the standard input
- an inner atomic expression refers to both the input molecule and the current atom - it is used when a Chemical Terms expression is evaluated on some or all atoms of the input molecule (e.g. atom filtering conditions, atomic evaluators and min-max evaluators)
- a reaction condition can refer to a reactant and a product array as well as to their atoms mapped according to the reaction equation
The evaluation environment provides a specific input context for accessing its input data. The input context consists of a bunch of accessor functions that can be used in the expression strings to access the input data. The following input contexts correspond to the evaluation environments described above:
- molecule context, used for single molecule input
(e.g. command line Evaluator,
JChem Cartridge):
mol(): refers to the current input molecule
- atom context, used for single atom input
(e.g. inner atomic expressions):
mol(): refers to the current input moleculeatom(): refers to the current input atom index in the input molecule
- search context, used for filtering search hits
(e.g.
jcsearchand search queries):mol(), target(): both refer to the search target moleculequery(): refers to the search query moleculem(int i): refers to the query atom index with atom mapihit(), h(): both refer to the search hit arrayhit(int i), h(int i): both refer to thei-th element of the search hit array, this is the target atom index matching the query atom with atom indexihm(int i): refers to the target atom index matching the query atom with atom mapi(shorthand forh(m(i)))
- reaction context, used for reaction input
initiated by the Reactor:
reactant(int i): refers to thei-th reactant (0-based indexing)product(int i): refers to thei-th product (0-based indexing)ratom(int m): refers to the reactant atom corresponding to reactant atom mapmaccording to the reaction equationpatom(int m): refers to the product atom corresponding to product atom mapmaccording to the reaction equation
Note, that the default input molecule is the molecule returned by mol()
in case when this function exists in the context.
Configuration
The built-in configuration XML can be extended by user-defined functions and plugin calculations. The configuration syntax is described in the Evaluator Manual.
Examples
The examples below are divided into sections according to the input context applied, which corresponds to the different applications that can make use of ChemAxon's chemical expressions. These examples use the built-in configuration XML, the referenced functions and plugin calculations are listed in the short reference tables.
Evaluator and JChem Cartridge examples (molecule context)
- Structure based calculations (plugin calculations)
Plugin references provide access to ChemAxon's calculator plugins. These calculations equip our expressions with chemical meaning.
- The physiological microspecies at pH
7.4of the input molecule:microspecies("7.4") - The partial charges on atoms
0,2and3(0-based) of the input molecule:charge(0, 2, 3)
- The same with taking the physiological microspecies at pH
7.4:charge(0, 2, 3, "7.4")
- Checking whether the partial charge on atom
0in the input molecule is greater than or equal to this charge value in the physiological microspecies at pH7.4:charge(0) > charge(0, "7.4")
- The significant pKa value (acidic or basic) on atom
9(0-based) of the input molecule:pka(9)
- The acidic pKa value on atom
9(0-based) of the input molecule:pka("acidic", 9)Note that if the pKa type "acidic" or "basic" is omitted (as in the previous example), then the more significant value is returned, while specifically the "acidic" (or "basic") pKa value is returned if the type is specified.
- The strongest acidic pKa value of the input molecule:
pka("acidic", "1")Note the difference in the last two examples: in pKa calculation a number denotes the atom index while a number in quotation marks denotes the strongness order:
9in the previous example refers to atom9while"1"in the above example refers to the strongest acidic pKa value ("2"refers to the second strongest value, etc.). - The logP value of the input molecule:
logp()
- The logD value at
pH=7.4of the input molecule:logd("7.4")Note that in logD calculation the pH value should be enclosed in quotation marks.
- Check the difference between logD values at two different pH-s:
logd("7.4") - logd("3.8") > 0.5 - The mass of the input molecule:
mass()
- The number of H bond acceptor atoms in the input molecule:
acceptorCount()
- The same with taking the physiological microspecies at pH
7.4:acceptorCount("7.4") - Checking the difference of the two above:
acceptorCount("7.4") - acceptorCount() > 1
- The physiological microspecies at pH
- Functions
There are different type of functions provided by ChemAxon:
- general purpose functions: simple array utility functions, such as minimum, maximum, sum or number of array elements and an array element sorter function
- atomic functions: functions referring to an input atom, such as the atom property query function of which queries atom properties (e.g. hydrogen count) or the containment function that checks whether an atom index is contained in an atom index array
- molecular functions: functions that calculate molecular properties, but do
not fit into the structure based calculations section (e.g.
isQueryfunction) - evaluator functions: functions containing an inner expression string as parameter - evaluate this expression for each atom in an atom context, examples include a filtering function that takes a boolean expression and returns atoms satisfying it and min-max functions which evaluate the inner expression for all atoms in the context, return the minimum or maximum value or the corresponding atom index
- The minimum of the partial charge values on atoms
7,8and9(0-based) of the input molecule:min(charge(7), charge(8), charge(9))
- The hydrogen count on atom
2(0-based) of the input molecule:hcount(2)
- The valence of atom
2of the input molecule:valence(2)
- The atom indices corresponding to positive partial charges in the input molecule:
filter("charge() > 0") - The number of atoms with positive partial charge in the input molecule:
count(filter("charge() > 0")) - The positive partial charges in the input molecule:
charge(filter("charge() > 0")) - The same but sorted in ascending order:
sortAsc(charge(filter("charge() > 0"))) - Indices of atoms having partial charge at least
0.4in the major microspecies atpH=7.4:filter("charge('7.4') >= 0.4") - The partial charge values on these atoms in the input molecule:
charge(filter("charge('7.4') >= 0.4")) - The minimum acidic pKa value on hetero atoms
with a single hydrogen:
min(pka(filter("match('[!#6!#1;H1]')"), "acidic")) - Checking whether there is a hetero atom with acidic pKa value
less than
0.75:min(pka(filter("match('[!#6!#1;H1]')"), "acidic")) < 0.75 - Indices of atoms with the two strongest basic pKa values:
maxAtom("pka('basic')", 2)Note, that expression strings can be enclosed by either double or single quotes, in case of nested strings these can be used alternated. However, some UNIX shells interpret single quotes and therefore single quotes are hard to use in command line input - the file input solves this problem, or else single double quotes can be replaced by escaped inner double quotes:
maxAtom("pka(\"basic\")", 2) - The corresponding pKa values:
maxValue("pka('basic')", 2) - Testing whether the partial charge on the atom with the strongest basic pKa
value exceeds the partial charge on the atom with the second strongest basic pKa
value:
x = maxAtom("pka('basic')", 2); charge(x[0]) > charge(x[1])Note, that in the current version the above expression cannot be evaluated if there are less than two basic pKa values in the input molecule.
- The basic pKa values for atoms with positive charge, sorted in descending order:
sortDesc(pka("basic", filter("charge() > 0")))Note, that in the current version
NaN(meaning that there is no valid pKa for the given atom) values are put to the end of the array after sorting. - Checking whether there is a sufficiently large difference between the two strongest
basic pKa values of the previous example:
x = sortDesc(pka("basic", filter("charge() > 0"))); x[0] - x[1] > 1.5 - The hydrogen count for each atom in the input molecule:
eval("hcount()") - The number of hydrogens in the input molecule:
sum(eval("hcount()")) - Dissimilarity between the benzene ring and the input molecule
using pharmacophore fingerprint as molecular descriptor with Tanimoto (default) metric:
refmol = "c1ccccc1"; dissimilarity("PF", refmol)Note:
dissimilarityfunction is not available in Marvin, it can be used only if JChem software package is installed. - The same using Euclidean metric:
refmol = "c1ccccc1"; dissimilarity("PF:Euclidean", refmol)Note:
dissimilarityfunction is not available in Marvin, it can be used only if JChem software package is installed. - The partial charge on the two atoms out of
1, 6, 8(0-based atom indices) having the first and second biggest hydrogen counts (molecule context):x = array(1, 6, 8); y = maxAtom(x, "hcount()", 2); charge(y)
- Checking whether atom
6(0-based atom index) has the first or second smallest partial charge among atoms1, 6, 8, 10, 12(molecule context):x = array(1, 6, 8, 10, 12); y = minAtom(x, "charge()", 2); in(6, y)
- Matching conditions
There are three options to reference substructure search from our expressions:
matchfunction returns atrue / falseanswer whilematchCountanddisjointMatchCountfunctions return the number of search hits.Note:
match,matchCountanddisjointMatchCountfunctions are not available in Marvin, they can be used only if JChem software package is installed.- A simple molecule matching test taking the input molecule as target:
match("C1CCOCC1") - Atom matching with target atom being atom
2(0-based) of the input molecule and query atom set being all query atoms:match(2, "C1CCOCC1")
- Atom matching with target atom being atom
2(0-based) of the input molecule, and query atom set being both query carbon atoms attached to the oxygen:match(2, "C1C[C:1]O[C:2]C1", 1, 2)
- The same with referencing the query by molecule file path:
match(2, "mols/query.mol", 1, 2)
- The same with referencing the query by molecule ID
nitroas a predefined molecule constant:match(2, nitro, 1, 2)
- The sum of "C=O" and "CO" groups in the input molecule:
matchCount("C=O") + matchCount("CO") - A more complex condition checking whether the input molecule contains sulfur and whether
there are at least
6"C=O" and "CO" groups in the input molecule alltogether:match("S") && (matchCount("C=O") + matchCount("CO") >= 6)
- A simple molecule matching test taking the input molecule as target:
Reactor examples (reaction context)
Note: Reactor is part of JChem software package, it is not available in Marvin.
- Structure based calculations (plugin calculations)
Plugin references provide access to ChemAxon's calculator plugins. These calculations equip our expressions with chemical meaning.
- The physiological microspecies at pH
7.4of the second reactant:microspecies(reactant(0), "7.4")
- The partial charges on reactant atom matching map
1in the reaction equation:charge(ratom(1))
- The same with taking the physiological microspecies at pH
7.4:charge(ratom(1), "7.4")
- The partial charges of atom having atom index
2in the first reactant:charge(reactant(0), 2)
Note: Evaluation of this expression will result in error if there is no atom with index
2in the first reactant. In reaction context referring by atom index (instead of atom map) is recommened only if the atom index(es) are returned by a Chemical Terms expression (see this example). - Checking whether the partial charge on reactant atom matching map
1is greater than or equal to this charge value in the physiological microspecies at pH7.4:charge(ratom(1)) > charge(ratom(1), "7.4")
- The significant pKa value (acidic or basic) on product atom
matching map
3in the reaction equation:pka(ratom(3))
- The acidic pKa on the above atom:
pka(ratom(3), "acidic")
Note that if the pKa type "acidic" or "basic" is omitted (as in the previous example), then the more significant value is returned, while specifically the "acidic" (or "basic") pKa value is returned if the type is specified.
- The strongest acidic pKa value of the first reactant:
pka(reactant(0), "acidic", "1")
- The logP value of the first product:
logp(product(0))
- The logD value at
pH=7.4of the first product:logd(product(0), "7.4")
Note that in logD calculation the pH value should be enclosed in quotation marks.
- Check the difference between logD values at two different pH-s:
logd(product(0), "7.4") - logd(product(0), "3.8") > 0.5
- The mass of the second product:
mass(product(1))
- The number of H bond acceptor atoms in the second reactant:
acceptorCount(reactant(1))
- The same with taking the physiological microspecies at pH
7.4:acceptorCount(reactant(1), "7.4")
- Checking the difference of the two above:
acceptorCount(reactant(1), "7.4") - acceptorCount(reactant(1)) > 1
- The physiological microspecies at pH
- Functions
There are different type of functions provided by ChemAxon:
- general purpose functions: simple array utility functions, such as minimum, maximum, sum or number of array elements and an array element sorter function
- atomic functions: functions referring to an input atom, such as the atom property query function of which queries atom properties (e.g. hydrogen count) or the containment function that checks whether an atom index is contained in an atom index array
- molecular functions: functions that calculate molecular properties, but do
not fit into the structure based calculations section (e.g.
isQueryfunction) - evaluator functions: functions containing an inner expression string as parameter - evaluate this expression for each atom in an atom context, examples include a filtering function that takes a boolean expression and returns atoms satisfying it and min-max functions which evaluate the inner expression for all atoms in the context, return the minimum or maximum value or the corresponding atom index
- The minimum of the partial charge values on reactant atoms matching maps
2,3and4:min(charge(ratom(2)), charge(ratom(3)), charge(ratom(4)))
- The hydrogen count on product atom matching map
2:hcount(patom(2))
- The valence of reactant atom matching map
2:valence(ratom(2))
- The atom indices corresponding to positive partial charges in the first reactant:
filter(reactant(0), "charge() > 0")
- The number of atoms with positive partial charge in the first reactant:
count(filter(reactant(0), "charge() > 0"))
- The positive partial charges in the first reactant:
charge(reactant(0), filter(reactant(0), "charge() > 0"))
- The same but sorted in ascending order:
sortAsc(charge(reactant(0), filter(reactant(0), "charge() > 0")))
- Indices of atoms having partial charge at least
0.4in major microspecies of the first product atpH=7.4:filter(product(0), "charge('7.4') >= 0.4") - The partial charge values on these atoms in the input molecule:
charge(product(0), filter(product(0), "charge('7.4') >= 0.4")) - The minimum acidic pKa value on hetero atoms
with a single hydrogen in the first reactant:
min(pka(reactant(0), filter(reactant(0), "match('[!#6!#1;H1]')"), "acidic")) - Checking whether there is a hetero atom with acidic pKa value
less than
0.75in the first reactant:min(pka(reactant(0), filter(reactant(0), "match('[!#6!#1;H1]')"), "acidic")) < 0.75 - The minimum acidic pKa value on aliphatic atoms
in the first reactant:
min(pKa(reactant(0), filter(reactant(0), "aliphaticAtom()", "acidic")))
- Checking whether the bond between reactant atom matching map
1and reactant atom matching map2is a single or double bond.(bondType(reactant(0), bond(ratom(1), ratom(2))) == 1 || bondType(reactant(0), bond(ratom(1), ratom(2))) == 2)
Note, that
bond(ratom(1), ratom(2))subexpression returns an<atomIndex1>-<atomIndex2>string, so in reaction context the molecule parameter also must be passed tobondType()function (see this note). In the example reactant atoms matching maps1and2are atoms of the first reactant (reactant(0)). - Indices of atoms with the two strongest basic pKa values
in the first product:
maxAtom(product(0), "pka('basic')", 2)Note, that expression strings can be enclosed by either double or single quotes, in case of nested strings these can be used alternated. However, some UNIX shells interpret single quotes and therefore single quotes are hard to use in command line input - the file input solves this problem, or else single double quotes can be replaced by escaped inner double quotes:
maxAtom(product(0), "pka(\"basic\")", 2)
- The corresponding pKa values:
maxValue(product(0), "pka('basic')", 2) - Testing whether the partial charge on the atom with the strongest basic pKa
value exceeds the partial charge on the atom with the second strongest basic pKa
value in the second product:
x = maxAtom(product(1), "pka('basic')", 2); charge(x[0]) > charge(x[1])Note, that in the current version the above expression cannot be evaluated if there are less than two basic pKa values in the molecule.
- The basic pKa values for atoms with positive charge, sorted in descending order:
sortDesc(pka("basic", reactant(0), filter(reactant(0), "charge() > 0")))Note, that in the current version
NaN(meaning that there is no valid pKa for the given atom) values are put to the end of the array after sorting. - Checking whether there is a sufficiently large difference between the two strongest
basic pKa values of the previous example:
x = sortDesc(pka("basic", reactant(0), filter(reactant(0), "charge() > 0"))); x[0] - x[1] > 1.5 - The hydrogen count for each atom in the first product:
eval(product(0), "hcount()")
- The number of hydrogens in the first product:
sum(eval(product(0), "hcount()"))
- Dissimilarity between the first reactant and product
using pharmacophore fingerprint as molecular descriptor with Tanimoto (default) metric:
dissimilarity("PF", reactant(0), product(0)) - The same using Euclidean metric:
dissimilarity("PF:Euclidean", reactant(0), product(0))
- Matching conditions
There are three options to reference substructure search from our expressions:
matchfunction returns atrue / falseanswer whilematchCountanddisjointMatchCountfunctions return the number of search hits.- A simple molecule matching test taking the first reactant as target:
match(reactant(0), "C1CCOCC1")
- Atom matching with target atom matching map
2and query atom set being all query atoms:match(patom(2), "C1CCOCC1")
- Atom matching with target atom matching map
2, and query atom set being both query carbon atoms attached to the oxygen:match(patom(2), "C1C[C:1]O[C:2]C1", 1, 2)
- The same with referencing the query by molecule file path:
match(patom(2), "mols/query.mol", 1, 2)
- The sum of "C=O" and "CO" groups in the second product:
matchCount(product(1), "C=O") + matchCount(product(1), "CO")
- A more complex condition checking whether the second product contains sulfur and whether
there are at least
6"C=O" and "CO" groups in the input molecule alltogether:match(product(1), "S") && (matchCount(product(1), "C=O") + matchCount(product(1), "CO") >= 6)
- A simple molecule matching test taking the first reactant as target:
Search filter examples (search context)
- Structure based calculations (plugin calculations)
Plugin references provide access to ChemAxon's calculator plugins. These calculations equip our expressions with chemical meaning.
- Filter hits by requiring that the partial charge on target atom matching query map
1should be positive:charge(hm(1)) > 0
- The same with taking the physiological microspecies of the target at pH
7.4:charge(hm(1), "7.4") > 0
- Checking whether the partial charge on target atom matching query map
1is greater than or equal to this charge value in the physiological microspecies at pH7.4:charge(hm(1)) > charge(hm(1), "7.4")
- The basic pKa value on target atom
matching query map
3should be greater than8.0:pka(hm(3) "basic") > 8.0
- The strongest acidic pKa value of the target should be less than
0.5:pka("acidic", "1") < 0.5Note, that by default, the expression refers to the target. Write
query()to refer to the query:pka(query(), "acidic", "1") < 0.5
- The logP value of the query should be greater than that of the target:
logp(query()) > logp()
- The logD value at
pH=7.4of the target should be less than the logD value atpH=3.4:logd("7.4") < logd("3.4")Note that in logD calculation the pH value should be enclosed in quotation marks.
- Check the difference between logD values of the target at two different pH-s:
logd("7.4") - logd("3.8") > 0.5 - Require a sufficiently large target mass as well as a positive charge value
at the target atom matching query map
1:(mass() > 500) && (charge(hm(1)) > 0)
- Filter hits by requiring that the partial charge on target atom matching query map
- Functions
There are different type of functions provided by ChemAxon:
- general purpose functions: simple array utility functions, such as minimum, maximum, sum or number of array elements and an array element sorter function
- atomic functions: functions referring to an input atom, such as the atom property query function of which queries atom properties (e.g. hydrogen count) or the containment function that checks whether an atom index is contained in an atom index array
- evaluator functions: functions containing an inner expression string as parameter - evaluate this expression for each atom in an atom context, examples include a filtering function that takes a boolean expression and returns atoms satisfying it and min-max functions which evaluate the inner expression for all atoms in the context, return the minimum or maximum value or the corresponding atom index
- The minimum of the partial charge values on target atoms matching maps
2,3and4should be negative, that is there should be at least one negative among these charge values:min(charge(hm(2)), charge(hm(3)), charge(hm(4))) < 0
- The hydrogen counts on target atom matching map
1and on target atom matching map2should be at least1:(hcount(hm(1)) >= 1) && (hcount(hm(2)) >= 1)
- The valence of target atom matching map
1or map2should be at least1:(valence(hm(1)) >= 1) || (valence(hm(2)) >= 1)
- The number of atoms with positive partial charge in the target should be at least
that in the query:
count(filter("charge() > 0")) >= count(filter(query()"charge() > 0")) - The minimum acidic pKa value on hetero atoms
with a single hydrogen in the target should be less than
0.75, that is, there should be at least one hetero atom with a single hydrogen with acidic pKa less than0.75:min(pka(filter("match('[!#6!#1;H1]')"), "acidic")) < 0.75 - Testing whether the partial charge on the atom with the strongest basic pKa
value exceeds the partial charge on the atom with the second strongest basic pKa
value in the target:
x = maxAtom("pka('basic')", 2); charge(x[0]) > charge(x[1])Note, that in the current version the above expression cannot be evaluated if there are less than two basic pKa values in the molecule.
- Checking whether there is a sufficiently large difference between the two strongest
basic pKa values among atoms with positive charge in the target and the query:
x = max(pka("basic", query(), filter(query(), "charge() > 0"))); y = max(pka("basic", filter("charge() > 0"))); (x - y > 1.5) || (y - x > 1.5) - Dissimilarity between the target and query
using pharmacophore fingerprint as molecular descriptor with Tanimoto (default) metric
should be sufficiently small:
dissimilarity("PF", target(), query()) < 0.6Note, that the
target()can be omitted as in the above examples:dissimilarity("PF", query()) < 0.6 - The same using Euclidean metric:
dissimilarity("PF:Euclidean", target(), query()) < 0.6
Do you have a question? Would you like to learn more? Please browse among the related topics on our support forum or search the website. If you want to suggest modifications or improvements to our documentation email our support directly!
