Chemical Terms Language Reference

Version 5.9.4

Contents

Further reading

 

Introduction

This document describes ChemAxon's Chemical Terms Language. This language is used to formulate chemical expressions in general. Its current usage includes chemical rules for reaction processing, search filters or both as chemical calculations and chemical filtering in JChem Cartridge. The Evaluator command line tool and the Evaluator API are also available for general purpose expression evaluation.

The Chemical Terms Evaluator is designed to evaluate mathematical expressions on molecules using built-in chemical and general purpose functions. It is also possible to extend this built-in set of calculations by a user-defined configuration.

The heart of the evaluator mechanism is the JEP Java Expression Parser, equipped with chemical plugin calculations, chemical substructure search and some additional chemical and general purpose functions. User defined functions can also be added to this function set.

Here are some simple examples showing how some well-known chemical rules can be formulated for a given input molecule read from a molecule context:

The following filters are used in drug discovery and drug development to narrow down the scope of molecules. They provide estimation on solubility and permeability of orally active compounds considering their physical and chemical properties. The examined properties are given as chemical terms.
  1. Lipinski's rule of five states that the absorption or permeation of a molecule is more likely when the molecular weight is under 500 g/mol, the value of logP is lower than 5, and the molecule has utmost 5 H-donor and 10 H-acceptor atoms. The definition of the aforementioned rule by ChemicalTerms is:
    (mass() <= 500) && 
    (logP() <= 5) && 
    (donorCount() <= 5) && 
    (acceptorCount() <= 10)
    
  2. Lead-likeness:
    (mass() <= 450) &&
    (logD("7.4") >= -4) && (logD("7.4") <= 4) &&
    (ringCount() <= 4) &&
    (rotatableBondCount() <= 10) &&
    (donorCount() <= 5) &&
    (acceptorCount() <= 8)
    
  3. Bioavailability:
    (mass() <= 500) +
    (logP() <= 5) +
    (donorCount() <= 5) +
    (acceptorCount() <= 10) +
    (rotatableBondCount() <= 10) +
    (PSA() <= 200) +
    (fusedAromaticRingCount() <= 5) >= 6
    
    Note, that summing up the 7 subresults above means to count how many of them are satisfied. The requirement that this sum should be at least 6 means that we do not require all of the subconditions to be satisfied but instead we allow at most one of them to fail.

  4. Ghose filter:
    (mass() >= 160) && (mass() <= 480) &&
    (atomCount() >= 20) && (atomCount() <= 70) &&
    (logP() >= -0.4) && (logP() <= 5.6) &&
    (refractivity() >= 40) && (refractivity() <= 130)
    
  5. Scaffold hopping:
    refmol = "actives.sdf";
    dissimilarity("ChemicalFingerprint", refmol) - 
    dissimilarity("PharmacophoreFingerprint", refmol) > 0.6
    
    Note, that molecule constants can be defined by a molecule file path or a SMILES string. Multiple expressions are separated by ';' characters, whitespace characters can be added freely for readability, since they are not considered by the evaluation process.

A set of working examples is also available.

 

Language Elements

The Chemical Terms Evaluator parses and evaluates expressions that are built from the following language elements:

  • the usual arithmetics: addition (+), substraction(-), multiplication (*) and division (/),
  • the logical operators: AND (&&), OR (||), NOT (!)
  • structure based chemical calculations with the help of chemical calculator plugins including charge, pKa, logP, logD calculations and extendible by others
  • chemical and general purpose functions: predefined functions for taking maximum, minimum, getting atomic and molecular properties, performing chemical calculations that are not implemented by chemical calculator plugins, etc. The set of available functions can be extended by user defined functions.
  • matching conditions: searches predefined functional groups or certain atoms of the functional groups specified by atom maps in the target molecule, either matching a target atom with any of the specified query atoms or else searching the functional group in the target molecule without a specified target atom, the return value is true if a matching is found, false otherwise
    Note: matching condition functions are not available in Marvin, they can be used only if JChem software package is installed.
  • input context functions for accessing input data

A set of short reference tables provides a summary of the available functions / calculations and the use of matching conditions.

 

Expression Syntax

Expression strings consist of an arbitrary number of initial assignments followed by a last subexpression that provides the evaluation result. An assignment sets a variable to the evaluation result of a subexpression. This variable can later be used to refererence this result. The assignment syntax is:

<identifier> = <subexpression>;

Note the ending ';' character. Examples for assignments:

x = 2;
y = x + 8;
z = f(x,y) + g(x,y);
where f and g are predefined functions.

An expression is an optional sequence of assignments followed by a subexpression providing the evaluation result:

<identifier1> = <subexpression1>;
<identifier2> = <subexpression2>;
...
<identifierN> = <subexpressionN>;
<result subexpression>
where N can also be zero in which case the expression coincides with the result subexpression.

Here is an example with assignments:

a = f(2,3);
b = g(4,5);
x = a + b;
x*x

Here is the same without assignments:

(f(2,3) + g(4,5))*(f(2,3) + g(4,5))

Assignments increase efficiency if the same evaluation result is used more than once since inline repetition of a subexpression results in multiple evaluation. Assignments can also be used to increase readability. However, in most cases, when the expression is simple, assignments are not needed. Note, that whitespace characters (new-line, tab, space) are skipped when parsing the expression string, so whitespace characters can be freely used for increasing readability.

The following examples demonstrate the expression syntax with very simple subexpressions. Examples with chemical meaning are shown later for matching conditions, chemical calculations and chemical and general purpose functions.

Examples:
  1. A simple expression:
    3+2
    
  2. Using assignments:
    x = 2;
    y = 3;
    x + y
    
  3. A more complicated one:
    x = 2;
    y = 3;
    z = 8*(x + y);
    t = 6*x*y;
    z + t
    
  4. When the same value is used more than once:
    x = (3 + 4)*8 + 16;
    y = 3*x;
    z = x + 20;
    5*(y + 8) + 4*z
    
 

Predefined Functional Groups and Named Molecule Groups

It is sometimes easier to refer molecules by names rather than explicit SMARTS strings or molecule file paths. For example, you may want to write nitro or carboxyl as query in a match function. Frequently used queries are pre-defined in the built-in functional groups file (chemaxon/marvin/templates/functionalgroups.cxsmi within MarvinBeans-templates.jar).

You can also define your favourite query SMARTS in marvin/config/marvin/templates/functionalgroups.cxsmi file and in $HOME\chemaxon\marvin\templates\functionalgroups.cxsmi (Windows) or $HOME/.chemaxon/marvin/templates/functionalgroups.cxsmi (UNIX / Linux) file where marvin is the Marvin istallation directory, $HOME is your user home directory.

However, there are some limitations when choosing the molecule names. Molecule names should be composed of letter, digit characters, and the '_' character. This means that molecule names cannot contain special characters, such as '=', '-', etc. with the exception of '_'. Molecule name definitions in functionalgroups.cxsmi file can contain whitespace characters (space, tab), but when names are referenced from a Chemical Terms expression the whitespace characters should be replaced with a single '_' character (e.g. secondary amine should be referred as secondary_amine in Chemical Terms expressions).

Note: from Marvin 5.4 mols.smarts configuration file is not used by Chemical Terms. It is replaced by functionalgroups.cxsmi file.

 

Initial Scripts

You can define molecule sets and other constants in the user-defined initial script $HOME/chemaxon/MARVIN_MAJOR_VERSION/jep.script (Windows) or $HOME/.chemaxon/MARVIN_MAJOR_VERSION/jep.script (UNIX / Linux), where $HOME is your user home directory, and MARVIN_MAJOR_VERSION is the major version of Marvin (e.g. "5.1"). This script is run right after the molecule sets are read and the constants defined here can be used later in your chemical expressions. Any valid chemical terms assignment is allowed here, and the terminating ';' characters may be omitted as long as you write each assignment in a separate line. Typically, you will define a molecule set by

  1. listing its members:
    x = {acid_halide, alcohol, "[#6]CC[#8]"}
    y = {alkene, amide, imide, imine} 
    z = {alkene, amide, amine, alcohol, isocyanate}
    
  2. or deriving it from other sets with the help of set operators:
    all = x + y + z     (union of x, y, z)
    join = y * z        (join of y and z)
    C = (x + y) * z     (join of the union of x and y with z)
    D = z - alcohol     (all elements of z except alcohol)
    E = (x + y) - z     (union of x and y without the elements of z)
    
    where + means set-union, * means set-join and - means exclusion.

Predefined molecules and molecule sets are most useful in query definitions of the match function:

  • match(amide) will test whether the input molecule matches an amide group, match(reactant(0), {amide,amine}) will test whether the first reactant in a reaction context matches an amide or an amine

  • match(2, {metalloid,alcohol}, 1) will check whether atom 2 of the input molecule matches either a metalloid or an alcohol carbon - the last parameter 1 denotes the query atom map which picks the carbon from the alcohol definition; match(ratom(2), {metalloid,alcohol}, 1) is the same in a reaction context with checking target reactant atom which corresponds to the reactant atom with map 2 in the reaction equation.
 

Input Contexts

When evaluating an expression, the Evaluator substitutes data reference symbols by the corresponding data items. All data items belong to exactly one of the following data groups:

  1. constants: data having the same value at each evaluation

  2. inputs: data possibly changing for each evaluation, such as
    • input molecules and atoms,
    • input reactants for reaction processing,
    • created products in reaction processing

The type of the input data depends on the expression evaluation environment, which currently is one of the following:

  1. an expression string evaluated by the command line version of Evaluator refers to the current input molecule read from the input file(s) or the standard input

  2. an inner atomic expression refers to both the input molecule and the current atom - it is used when a Chemical Terms expression is evaluated on some or all atoms of the input molecule (e.g. atom filtering conditions, atomic evaluators and min-max evaluators)

  3. a reaction condition can refer to a reactant and a product array as well as to their atoms mapped according to the reaction equation

The evaluation environment provides a specific input context for accessing its input data. The input context consists of a bunch of accessor functions that can be used in the expression strings to access the input data. The following input contexts correspond to the evaluation environments described above:

  1. molecule context, used for single molecule input (e.g. command line Evaluator, JChem Cartridge):
    • mol(): refers to the current input molecule

  2. atom context, used for single atom input (e.g. inner atomic expressions):
    • mol(): refers to the current input molecule
    • atom(): refers to the current input atom index in the input molecule

  3. search context, used for filtering search hits (e.g. jcsearch and search queries):
    • mol(), target(): both refer to the search target molecule
    • query(): refers to the search query molecule
    • m(int i): refers to the query atom index with atom map i
    • hit(), h(): both refer to the search hit array
    • hit(int i), h(int i): both refer to the i-th element of the search hit array, this is the target atom index matching the query atom with atom index i
    • hm(int i): refers to the target atom index matching the query atom with atom map i (shorthand for h(m(i)))

  4. reaction context, used for reaction input initiated by the Reactor:
    • reactant(int i): refers to the i-th reactant (0-based indexing)
    • product(int i): refers to the i-th product (0-based indexing)
    • ratom(int m): refers to the reactant atom corresponding to reactant atom map m according to the reaction equation
    • patom(int m): refers to the product atom corresponding to product atom map m according to the reaction equation
    Note: In reaction context atoms also can be referred by atom index, but in this case the molecule (reactant / product) parameter always have to be specified in the parameter list of the function (see this example).

Note, that the default input molecule is the molecule returned by mol() in case when this function exists in the context.

 

Configuration

The built-in configuration XML can be extended by user-defined functions and plugin calculations. The configuration syntax is described in the Evaluator Manual.

 

Examples

The examples below are divided into sections according to the input context applied, which corresponds to the different applications that can make use of ChemAxon's chemical expressions. These examples use the built-in configuration XML, the referenced functions and plugin calculations are listed in the short reference tables.

 

Evaluator and JChem Cartridge examples (molecule context)

  • Structure based calculations (plugin calculations)

    Plugin references provide access to ChemAxon's calculator plugins. These calculations equip our expressions with chemical meaning.

    1. The physiological microspecies at pH 7.4 of the input molecule:
      microspecies("7.4")
      
    2. The partial charges on atoms 0, 2 and 3 (0-based) of the input molecule:
      charge(0, 2, 3)
      
    3. The same with taking the physiological microspecies at pH 7.4:
      charge(0, 2, 3, "7.4")
      
    4. Checking whether the partial charge on atom 0 in the input molecule is greater than or equal to this charge value in the physiological microspecies at pH 7.4:
      charge(0) > charge(0, "7.4")
      
    5. The significant pKa value (acidic or basic) on atom 9 (0-based) of the input molecule:
      pka(9)
      
    6. The acidic pKa value on atom 9 (0-based) of the input molecule:
      pka("acidic", 9)
      

      Note that if the pKa type "acidic" or "basic" is omitted (as in the previous example), then the more significant value is returned, while specifically the "acidic" (or "basic") pKa value is returned if the type is specified.

    7. The strongest acidic pKa value of the input molecule:
      pka("acidic", "1")
      

      Note the difference in the last two examples: in pKa calculation a number denotes the atom index while a number in quotation marks denotes the strongness order: 9 in the previous example refers to atom 9 while "1" in the above example refers to the strongest acidic pKa value ("2" refers to the second strongest value, etc.).

    8. The logP value of the input molecule:
      logp()
      
    9. The logD value at pH=7.4 of the input molecule:
      logd("7.4")
      

      Note that in logD calculation the pH value should be enclosed in quotation marks.

    10. Check the difference between logD values at two different pH-s:
      logd("7.4") - logd("3.8") > 0.5
      
    11. The mass of the input molecule:
      mass()
      
    12. The number of H bond acceptor atoms in the input molecule:
      acceptorCount()
      
    13. The same with taking the physiological microspecies at pH 7.4:
      acceptorCount("7.4")
      
    14. Checking the difference of the two above:
      acceptorCount("7.4") - acceptorCount() > 1
      
  • Functions

    There are different type of functions provided by ChemAxon:

    1. general purpose functions: simple array utility functions, such as minimum, maximum, sum or number of array elements and an array element sorter function

    2. atomic functions: functions referring to an input atom, such as the atom property query function of which queries atom properties (e.g. hydrogen count) or the containment function that checks whether an atom index is contained in an atom index array

    3. molecular functions: functions that calculate molecular properties, but do not fit into the structure based calculations section (e.g. isQuery function)

    4. evaluator functions: functions containing an inner expression string as parameter - evaluate this expression for each atom in an atom context, examples include a filtering function that takes a boolean expression and returns atoms satisfying it and min-max functions which evaluate the inner expression for all atoms in the context, return the minimum or maximum value or the corresponding atom index

    1. The minimum of the partial charge values on atoms 7, 8 and 9 (0-based) of the input molecule:
      min(charge(7), charge(8), charge(9))
      
    2. The hydrogen count on atom 2 (0-based) of the input molecule:
      hcount(2)
      
    3. The valence of atom 2 of the input molecule:
      valence(2) 
      
    4. The atom indices corresponding to positive partial charges in the input molecule:
      filter("charge() > 0")
      
    5. The number of atoms with positive partial charge in the input molecule:
      count(filter("charge() > 0")) 
      
    6. The positive partial charges in the input molecule:
      charge(filter("charge() > 0"))
      
    7. The same but sorted in ascending order:
      sortAsc(charge(filter("charge() > 0")))
      
    8. Indices of atoms having partial charge at least 0.4 in the major microspecies at pH=7.4:
      filter("charge('7.4') >= 0.4")
      
    9. The partial charge values on these atoms in the input molecule:
      charge(filter("charge('7.4') >= 0.4"))
      
    10. The minimum acidic pKa value on hetero atoms with a single hydrogen:
      min(pka(filter("match('[!#6!#1;H1]')"), "acidic"))
      
    11. Checking whether there is a hetero atom with acidic pKa value less than 0.75:
      min(pka(filter("match('[!#6!#1;H1]')"), "acidic")) < 0.75
      
    12. Indices of atoms with the two strongest basic pKa values:
      maxAtom("pka('basic')", 2)
      

      Note, that expression strings can be enclosed by either double or single quotes, in case of nested strings these can be used alternated. However, some UNIX shells interpret single quotes and therefore single quotes are hard to use in command line input - the file input solves this problem, or else single double quotes can be replaced by escaped inner double quotes:

      maxAtom("pka(\"basic\")", 2)
      
    13. The corresponding pKa values:
      maxValue("pka('basic')", 2)
      
    14. Testing whether the partial charge on the atom with the strongest basic pKa value exceeds the partial charge on the atom with the second strongest basic pKa value:
      x = maxAtom("pka('basic')", 2);
      charge(x[0]) > charge(x[1])
      

      Note, that in the current version the above expression cannot be evaluated if there are less than two basic pKa values in the input molecule.

    15. The basic pKa values for atoms with positive charge, sorted in descending order:
      sortDesc(pka("basic", filter("charge() > 0")))
      

      Note, that in the current version NaN (meaning that there is no valid pKa for the given atom) values are put to the end of the array after sorting.

    16. Checking whether there is a sufficiently large difference between the two strongest basic pKa values of the previous example:
      x = sortDesc(pka("basic", filter("charge() > 0")));
      x[0] - x[1] > 1.5
      
    17. The hydrogen count for each atom in the input molecule:
      eval("hcount()")
      
    18. The number of hydrogens in the input molecule:
      sum(eval("hcount()"))
      
    19. Dissimilarity between the benzene ring and the input molecule using pharmacophore fingerprint as molecular descriptor with Tanimoto (default) metric:
      refmol = "c1ccccc1";
      dissimilarity("PF", refmol) 
      

      Note: dissimilarity function is not available in Marvin, it can be used only if JChem software package is installed.

    20. The same using Euclidean metric:
      refmol = "c1ccccc1";
      dissimilarity("PF:Euclidean", refmol) 
      

      Note: dissimilarity function is not available in Marvin, it can be used only if JChem software package is installed.

    21. The partial charge on the two atoms out of 1, 6, 8 (0-based atom indices) having the first and second biggest hydrogen counts (molecule context):
      x = array(1, 6, 8);
      y = maxAtom(x, "hcount()", 2);
      charge(y)
      
    22. Checking whether atom 6 (0-based atom index) has the first or second smallest partial charge among atoms 1, 6, 8, 10, 12 (molecule context):
      x = array(1, 6, 8, 10, 12);
      y = minAtom(x, "charge()", 2);
      in(6, y)
      
  • Matching conditions

    There are three options to reference substructure search from our expressions: match function returns a true / false answer while matchCount and disjointMatchCount functions return the number of search hits.

    Note: match, matchCount and disjointMatchCount functions are not available in Marvin, they can be used only if JChem software package is installed.

    1. A simple molecule matching test taking the input molecule as target:
      match("C1CCOCC1")
      
    2. Atom matching with target atom being atom 2 (0-based) of the input molecule and query atom set being all query atoms:
      match(2, "C1CCOCC1")
      
    3. Atom matching with target atom being atom 2 (0-based) of the input molecule, and query atom set being both query carbon atoms attached to the oxygen:
      match(2, "C1C[C:1]O[C:2]C1", 1, 2)
      
    4. The same with referencing the query by molecule file path:
      match(2, "mols/query.mol", 1, 2)
      
    5. The same with referencing the query by molecule ID nitro as a predefined molecule constant:
      match(2, nitro, 1, 2)
      
    6. The sum of "C=O" and "CO" groups in the input molecule:
      matchCount("C=O") + matchCount("CO")
      
    7. A more complex condition checking whether the input molecule contains sulfur and whether there are at least 6 "C=O" and "CO" groups in the input molecule alltogether:
      match("S") && (matchCount("C=O") + matchCount("CO") >= 6)
      
 

Reactor examples (reaction context)

Note: Reactor is part of JChem software package, it is not available in Marvin.

  • Structure based calculations (plugin calculations)

    Plugin references provide access to ChemAxon's calculator plugins. These calculations equip our expressions with chemical meaning.

    1. The physiological microspecies at pH 7.4 of the second reactant:
      microspecies(reactant(0), "7.4")
      
    2. The partial charges on reactant atom matching map 1 in the reaction equation:
      charge(ratom(1))
      
    3. The same with taking the physiological microspecies at pH 7.4:
      charge(ratom(1), "7.4")
      
    4. The partial charges of atom having atom index 2 in the first reactant:
      charge(reactant(0), 2)
      

      Note: Evaluation of this expression will result in error if there is no atom with index 2 in the first reactant. In reaction context referring by atom index (instead of atom map) is recommened only if the atom index(es) are returned by a Chemical Terms expression (see this example).

    5. Checking whether the partial charge on reactant atom matching map 1 is greater than or equal to this charge value in the physiological microspecies at pH 7.4:
      charge(ratom(1)) > charge(ratom(1), "7.4")
      
    6. The significant pKa value (acidic or basic) on product atom matching map 3 in the reaction equation:
      pka(ratom(3))
      
    7. The acidic pKa on the above atom:
      pka(ratom(3), "acidic")
      

      Note that if the pKa type "acidic" or "basic" is omitted (as in the previous example), then the more significant value is returned, while specifically the "acidic" (or "basic") pKa value is returned if the type is specified.

    8. The strongest acidic pKa value of the first reactant:
      pka(reactant(0), "acidic", "1")
      
    9. The logP value of the first product:
      logp(product(0))
      
    10. The logD value at pH=7.4 of the first product:
      logd(product(0), "7.4")
      

      Note that in logD calculation the pH value should be enclosed in quotation marks.

    11. Check the difference between logD values at two different pH-s:
      logd(product(0), "7.4") - logd(product(0), "3.8") > 0.5
      
    12. The mass of the second product:
      mass(product(1))
      
    13. The number of H bond acceptor atoms in the second reactant:
      acceptorCount(reactant(1))
      
    14. The same with taking the physiological microspecies at pH 7.4:
      acceptorCount(reactant(1), "7.4")
      
    15. Checking the difference of the two above:
      acceptorCount(reactant(1), "7.4") - acceptorCount(reactant(1)) > 1
      
  • Functions

    There are different type of functions provided by ChemAxon:

    1. general purpose functions: simple array utility functions, such as minimum, maximum, sum or number of array elements and an array element sorter function

    2. atomic functions: functions referring to an input atom, such as the atom property query function of which queries atom properties (e.g. hydrogen count) or the containment function that checks whether an atom index is contained in an atom index array

    3. molecular functions: functions that calculate molecular properties, but do not fit into the structure based calculations section (e.g. isQuery function)

    4. evaluator functions: functions containing an inner expression string as parameter - evaluate this expression for each atom in an atom context, examples include a filtering function that takes a boolean expression and returns atoms satisfying it and min-max functions which evaluate the inner expression for all atoms in the context, return the minimum or maximum value or the corresponding atom index

    1. The minimum of the partial charge values on reactant atoms matching maps 2, 3 and 4:
      min(charge(ratom(2)), charge(ratom(3)), charge(ratom(4)))
      
    2. The hydrogen count on product atom matching map 2:
      hcount(patom(2))
      
    3. The valence of reactant atom matching map 2:
      valence(ratom(2)) 
      
    4. The atom indices corresponding to positive partial charges in the first reactant:
      filter(reactant(0), "charge() > 0")
      
    5. The number of atoms with positive partial charge in the first reactant:
      count(filter(reactant(0), "charge() > 0")) 
      
    6. The positive partial charges in the first reactant:
      charge(reactant(0), filter(reactant(0), "charge() > 0"))
      
    7. The same but sorted in ascending order:
      sortAsc(charge(reactant(0), filter(reactant(0), "charge() > 0")))
      
    8. Indices of atoms having partial charge at least 0.4 in major microspecies of the first product at pH=7.4:
      filter(product(0), "charge('7.4') >= 0.4")
      
    9. The partial charge values on these atoms in the input molecule:
      charge(product(0), filter(product(0), "charge('7.4') >= 0.4"))
      
    10. The minimum acidic pKa value on hetero atoms with a single hydrogen in the first reactant:
      min(pka(reactant(0), filter(reactant(0), "match('[!#6!#1;H1]')"), "acidic"))
      
    11. Checking whether there is a hetero atom with acidic pKa value less than 0.75 in the first reactant:
      min(pka(reactant(0), filter(reactant(0), "match('[!#6!#1;H1]')"), "acidic")) < 0.75
      
    12. The minimum acidic pKa value on aliphatic atoms in the first reactant:
      min(pKa(reactant(0), filter(reactant(0), "aliphaticAtom()", "acidic")))
      
    13. Checking whether the bond between reactant atom matching map 1 and reactant atom matching map 2 is a single or double bond.
      (bondType(reactant(0), bond(ratom(1), ratom(2))) == 1 || bondType(reactant(0), bond(ratom(1), ratom(2))) == 2) 
      

      Note, that bond(ratom(1), ratom(2)) subexpression returns an <atomIndex1>-<atomIndex2> string, so in reaction context the molecule parameter also must be passed to bondType() function (see this note). In the example reactant atoms matching maps 1 and 2 are atoms of the first reactant (reactant(0)).

    14. Indices of atoms with the two strongest basic pKa values in the first product:
      maxAtom(product(0), "pka('basic')", 2)
      

      Note, that expression strings can be enclosed by either double or single quotes, in case of nested strings these can be used alternated. However, some UNIX shells interpret single quotes and therefore single quotes are hard to use in command line input - the file input solves this problem, or else single double quotes can be replaced by escaped inner double quotes:

      maxAtom(product(0), "pka(\"basic\")", 2)
      
    15. The corresponding pKa values:
      maxValue(product(0), "pka('basic')", 2)
      
    16. Testing whether the partial charge on the atom with the strongest basic pKa value exceeds the partial charge on the atom with the second strongest basic pKa value in the second product:
      x = maxAtom(product(1), "pka('basic')", 2);
      charge(x[0]) > charge(x[1])
      

      Note, that in the current version the above expression cannot be evaluated if there are less than two basic pKa values in the molecule.

    17. The basic pKa values for atoms with positive charge, sorted in descending order:
      sortDesc(pka("basic", reactant(0), filter(reactant(0), "charge() > 0")))
      

      Note, that in the current version NaN (meaning that there is no valid pKa for the given atom) values are put to the end of the array after sorting.

    18. Checking whether there is a sufficiently large difference between the two strongest basic pKa values of the previous example:
      x = sortDesc(pka("basic", reactant(0), filter(reactant(0), "charge() > 0")));
      x[0] - x[1] > 1.5
      
    19. The hydrogen count for each atom in the first product:
      eval(product(0), "hcount()")
      
    20. The number of hydrogens in the first product:
      sum(eval(product(0), "hcount()"))
      
    21. Dissimilarity between the first reactant and product using pharmacophore fingerprint as molecular descriptor with Tanimoto (default) metric:
      dissimilarity("PF", reactant(0), product(0)) 
      
    22. The same using Euclidean metric:
      dissimilarity("PF:Euclidean", reactant(0), product(0)) 
      
  • Matching conditions

    There are three options to reference substructure search from our expressions: match function returns a true / false answer while matchCount and disjointMatchCount functions return the number of search hits.

    1. A simple molecule matching test taking the first reactant as target:
      match(reactant(0), "C1CCOCC1")
      
    2. Atom matching with target atom matching map 2 and query atom set being all query atoms:
      match(patom(2), "C1CCOCC1")
      
    3. Atom matching with target atom matching map 2, and query atom set being both query carbon atoms attached to the oxygen:
      match(patom(2), "C1C[C:1]O[C:2]C1", 1, 2)
      
    4. The same with referencing the query by molecule file path:
      match(patom(2), "mols/query.mol", 1, 2)
      
    5. The sum of "C=O" and "CO" groups in the second product:
      matchCount(product(1), "C=O") + matchCount(product(1), "CO")
      
    6. A more complex condition checking whether the second product contains sulfur and whether there are at least 6 "C=O" and "CO" groups in the input molecule alltogether:
      match(product(1), "S") && (matchCount(product(1), "C=O") + matchCount(product(1), "CO") >= 6)
      
 

Search filter examples (search context)

  • Structure based calculations (plugin calculations)

    Plugin references provide access to ChemAxon's calculator plugins. These calculations equip our expressions with chemical meaning.

    1. Filter hits by requiring that the partial charge on target atom matching query map 1 should be positive:
      charge(hm(1)) > 0
      
    2. The same with taking the physiological microspecies of the target at pH 7.4:
      charge(hm(1), "7.4") > 0
      
    3. Checking whether the partial charge on target atom matching query map 1 is greater than or equal to this charge value in the physiological microspecies at pH 7.4:
      charge(hm(1)) > charge(hm(1), "7.4")
      
    4. The basic pKa value on target atom matching query map 3 should be greater than 8.0:
      pka(hm(3) "basic") > 8.0
      
    5. The strongest acidic pKa value of the target should be less than 0.5:
      pka("acidic", "1") < 0.5
      

      Note, that by default, the expression refers to the target. Write query() to refer to the query:

      pka(query(), "acidic", "1") < 0.5
      
    6. The logP value of the query should be greater than that of the target:
      logp(query()) > logp()
      
    7. The logD value at pH=7.4 of the target should be less than the logD value at pH=3.4:
      logd("7.4") < logd("3.4")
      

      Note that in logD calculation the pH value should be enclosed in quotation marks.

    8. Check the difference between logD values of the target at two different pH-s:
      logd("7.4") - logd("3.8") > 0.5
      
    9. Require a sufficiently large target mass as well as a positive charge value at the target atom matching query map 1:
      (mass() > 500) && (charge(hm(1)) > 0)
      
  • Functions

    There are different type of functions provided by ChemAxon:

    1. general purpose functions: simple array utility functions, such as minimum, maximum, sum or number of array elements and an array element sorter function

    2. atomic functions: functions referring to an input atom, such as the atom property query function of which queries atom properties (e.g. hydrogen count) or the containment function that checks whether an atom index is contained in an atom index array

    3. evaluator functions: functions containing an inner expression string as parameter - evaluate this expression for each atom in an atom context, examples include a filtering function that takes a boolean expression and returns atoms satisfying it and min-max functions which evaluate the inner expression for all atoms in the context, return the minimum or maximum value or the corresponding atom index
    1. The minimum of the partial charge values on target atoms matching maps 2, 3 and 4 should be negative, that is there should be at least one negative among these charge values:
      min(charge(hm(2)), charge(hm(3)), charge(hm(4))) < 0
      
    2. The hydrogen counts on target atom matching map 1 and on target atom matching map 2 should be at least 1:
      (hcount(hm(1)) >= 1) && (hcount(hm(2)) >= 1)
      
    3. The valence of target atom matching map 1 or map 2 should be at least 1:
      (valence(hm(1)) >= 1) || (valence(hm(2)) >= 1)
      
    4. The number of atoms with positive partial charge in the target should be at least that in the query:
      count(filter("charge() > 0")) >= count(filter(query()"charge() > 0"))
      
    5. The minimum acidic pKa value on hetero atoms with a single hydrogen in the target should be less than 0.75, that is, there should be at least one hetero atom with a single hydrogen with acidic pKa less than 0.75:
      min(pka(filter("match('[!#6!#1;H1]')"), "acidic")) < 0.75
      
    6. Testing whether the partial charge on the atom with the strongest basic pKa value exceeds the partial charge on the atom with the second strongest basic pKa value in the target:
      x = maxAtom("pka('basic')", 2);
      charge(x[0]) > charge(x[1])
      

      Note, that in the current version the above expression cannot be evaluated if there are less than two basic pKa values in the molecule.

    7. Checking whether there is a sufficiently large difference between the two strongest basic pKa values among atoms with positive charge in the target and the query:
      x = max(pka("basic", query(), filter(query(), "charge() > 0")));
      y = max(pka("basic", filter("charge() > 0")));
      (x - y > 1.5) || (y - x > 1.5)
      
    8. Dissimilarity between the target and query using pharmacophore fingerprint as molecular descriptor with Tanimoto (default) metric should be sufficiently small:
      dissimilarity("PF", target(), query()) < 0.6 
      

      Note, that the target() can be omitted as in the above examples:

      dissimilarity("PF", query()) < 0.6 
      
    9. The same using Euclidean metric:
      dissimilarity("PF:Euclidean", target(), query()) < 0.6 
      

Do you have a question? Would you like to learn more? Please browse among the related topics on our support forum or search the website. If you want to suggest modifications or improvements to our documentation email our support directly!