Query features

Query atoms

Organic chemists often look for molecules that can not be represented by a single structure. Although it is possible to run multiple structure searches in cascade, it is much more efficient to run a search only once using a well designed query structure. This structure often contains query features, possibly including complex conditional expressions for atoms and bonds.

Atom lists, not lists

It is possible to define the type of an atom in a custom atom list. If the type of the corresponding atom in the target molecular structure is a member of the list, it is considered a matching atom (Table 1). Not lists can be used to specify atoms to be excluded (Table 2). Please note that the matching of not list atoms may depend on the input format of the query molecule. See details.

Table 1. Atom lists

  target
query

Table 2. Atom not lists

  target
query

Generic query atoms

Applying atom lists and not lists is a practical solution when the number of included or excluded atoms is small. However, generic query atom types are helpful to avoid long atom lists. JChem handles at the moment the following generic query atom types.

 A           Any (any atom except hydrogen. Neither matches to explicit nor
                 implicit hydrogens. Please note that in JChem the SMARTS
                 primitive "*" is imported as any atom and does not match to
                 plain hydrogens. (Neither explicit nor implicit.)
                 For differences between matching any atoms appearing in
                 different file formats, see here )
 AH          Any atom, including hydrogen.
 Q           Hetero (any atom except hydrogen and carbon)
 QH          Hetero atom or hydrogen (any atom except carbon)
 M           Metal (contains alkali metals, alkaline earth metals, transition metals, actinides, lanthanides, poor(basic) metals, Ge, Sb and Po)
 MH          Metal or hydrogen
 X           Halogen (F,Cl,Br or I)
 XH          Halogen or hydrogen
 Gn          Member of group (column) n in the periodic system (n = 1..18)
                 Attention: G17 is NOT the same as X, as it contains At!

Table 3. Generic query atoms

  target
query

Atom properties

The chemical neighborhood of an atom is sometimes as important as its type. Conditions for the chemical environment of an atom can be defined by atom properties (Table 5). Some of these require a value (connections, smallest ring size) others don't (aromatic). The smallest set of smallest rings (SSSR) is used for the evaluation of ring properties.
    a           aromatic (has aromatic bond)
    A           aliphatic (does not have aromatic bond)
    D<n>        degree (number of explicit connections; default for "n" is one)
    H<n>        total hydrogens (total number of hydrogen substituents)
    h<n>        implicit hydrogens (number of implicit hydrogen substituents*)
    R<n>        rings (number of rings the atom is a member of)
    r<n>        smallest ring size (size of the smallest ring the atom is a member of)
    R           ring membership (whether atom is part of a ring or not)
    v<n>        valence (total bond order)
    X<n>        connections (number of substituents including hydrogens)
    s<n>        substitution count (number of non-H substituents)
                s0-s5:exact substitution count; s6: 6 or more substitutions
    s*          substitution as drawn (no extra non-H substituents)
    rb<n>       ring bond count (number of ring bonds next to the atom)
                rb0-rb3:exact ring bond count; rb4: 4 or more ring bonds.
                The same property can be achieved using the SMARTS "x" property
                (see smarts doc ). 
    rb*         ring bond count as drawn (no extra ring bonds)
    u           unsaturated atom (atom has double, triple or aromatic bond)
* Corresponds to both ISIS and Daylight behaviours, depending on the source of the Molecule object. For details, see the differences section.

Table 4. Atom properties

  target
query

Isotopes, charges, radicals

Isotopes, charges and radicals are atom properties which can be found not only on the query structure, but on a target as well. They are considered during the search. This includes that an atom without isotope, charge and radical information will match to atoms with information about those properties. However, this behavior can be fine-tuned using the following options: (All constants above are defined in class chemaxon.sss.SearchConstants.)

The following tables show some examples.

Table 5.

  target
setOption(OPTION_CHARGE_MATCHING, CHARGE_MATCHING_DEFAULT) (Default)
query
setOption(OPTION_CHARGE_MATCHING, CHARGE_MATCHING_EXACT)
query
setOption(OPTION_CHARGE_MATCHING, CHARGE_MATCHING_IGNORE)
query

Table 6.

  target
setOption(OPTION_ISOTOPE_MATCHING, ISOTOPE_MATCHING_DEFAULT) (Default)
query
setOption(OPTION_ISOTOPE_MATCHING, ISOTOPE_MATCHING_EXACT)
query
setOption(OPTION_ISOTOPE_MATCHING, ISOTOPE_MATCHING_IGNORE)
query

Table 7.

  target
setOption(OPTION_RADICAL_MATCHING, RADICAL_MATCHING_DEFAULT) (Default)
query
setOption(OPTION_RADICAL_MATCHING, RADICAL_MATCHING_EXACT)
query
setOption(OPTION_RADICAL_MATCHING, RADICAL_MATCHING_IGNORE)
query

Link nodes (link atoms)

Link nodes are atoms which may occur one or more times defining a variable length chain or ring. The link node is denoted by its brackets and the repetition range. All bonds not crossed by the brackets (and connecting parts) are also repeated together with the link node. See examples below.

Table 8.

Query Possible meanings

SMARTS atoms

JChem's search supports all valid SMARTS atom expressions. (See Daylight's SMARTS theory manual.)

SMARTS atoms are depicted the following way in marvin:

The following additional query features are handled as part of this:

Logical operators between query elements

This query feature allows the use of logical operators: the two "and" operators, "or" and "not" to combine queries into complex expressions. Table 9. shows the operators in the order of their precedence ("!" evaluated first):

Table 9.

Operator Name
! not (unary operator)
& high precedence and (default operator, i.e. can be omitted between two query expressions)
, or
; low precedence and

Table 10. Examples

Query Target
NCC(O)=O [N+]CC([O-])=O [H]OC(=O)N([H])C COC
[OX2H,OX1-]
[O&X2&H,O&X1&-]
[NX3;H2,H1]
[OX2!-]

Recursive SMARTS

One of the most powerful feature of SMARTS atoms is recursive SMARTS. It can be used to describe an environment of an atom with the syntax "$( <<SMARTS expression>> )". The first atom of the <<SMARTS expression>> will be matched to the atom in question, and the rest to its environments. It evaluates true if the expression matches.

Table 11.

SMARTS Meaning
[OX2$(OaaN)] Aliphatic oxygen with two connections, next to an aromatic ring having an aliphatic N in ortho position.
[OX2$(*aaN)] Same as above.
[$([OX2]aaN)] Same as above.
[NX3;H2,H1;!$(NC=O)] Primary or secondary amine, not amide.
[$(N~*~*~[O!$(O([C,c])[C,c])])] Aliphatic N three bonds away to a non-ether aliphatic O.

Table 12.

Query Target
[OX2$(OaaN)]
[$(OCC),$(OCN)]
[$(O([C,c])[C,c])]
[$(N~*~*~[O!$(O([C,c])[C,c])])]

Please note that uppercase atom symbols only match to aliphatic atoms and lowercase only to aromatic.

Further information regarding SMARTS

In JChem explicit and implicit hydrogens in the target are treated the same, and hence the presence or absence of plain hydrogens does not affect the result of the search.

In JChem the SMARTS primitive "*" (any atom) does not match to plain hydrogens. (Neither explicit nor implicit.) However, it matches deuterium and charged H. See below.

Further SMARTS examples can be found on Daylight's page.

Pseudo atoms

Pseudo atoms have user-defined atom types, and they only match another pseudo atom of the same name (case insensitive). Commonly used pseudo atoms include "Resin" and "Pol", referring to the often used solid phases in syntheses (Pol is the default pseudo for resin in MDL ISIS/Draw).

Table 13.

Query Target

It should be noted that there is no chemical intelligence associated with pseudo atoms. This means that if a common abbreviation is used as pseudo atom, it will not match the corresponding molecular group. To achieve this, correct abbreviations (Superatom S-groups) must be used.

Lone pairs

JChem search can handle query and target atoms having lone pairs associated with them. Lone pairs on the query side match explicit and implied lone pairs, but please note that lone pairs are only considered when attached to an atom, ie isolated lone pairs will not match anything.

Table 14.

Query Target

Querying bonds

Generic bonds

Querying against bonds can determine if a bond in the target molecule is one of the four basic types (single, double, triple, aromatic)or one of the generic types that are available for fine tuning query structures (Table 6). The line style represents the type of a bond.

any
single or double
single or aromatic
double or aromatic

Table 15. Generic bond types

  target
query

Note on aromatic bonds

For the correct use of aromatic bonds and aromatic systems in general, see the Aromatization section under Standardization.

Stereo bonds (tetrahedral chirality and cis/trans configuration)

See section Stereochemistry

Chain/ring bond attributes

In addition to the bond type discussed above, a bond topology query attribute can be assigned to bonds. This expresses that the bond must be part of a ring or must not. See the examples below.

Table 16. Generic bond types

  target
query

SMARTS bonds

SMARTS bond expressions are also supported. (See Daylight's SMARTS theory manual.)

SMARTS bonds are depicted the following way in marvin:

Like at SMARTS atoms, SMARTS logical operators "!" (not), "&", ";" (high and low precedence and), "," (or) can be used. "&" is the default operator, hence "and" is assumed if there is no operator between two SMARTS primitives. Furthermore, the following characters have valid meanings:

Table 17.

Bond expression Meaning
- Single bond
= double bond
# triple bond
: aromatic bond
@ any ring bond
/ directional bond: single "up" (used at cis/trans)
\ directional bond: single "down" (used at cis/trans)

Table 18.

SMARTS Meaning
C-,=,#C Two aliphatic carbons connected by single, double or triple bond.
*-!@* Two atoms connected by a nonring single bond.
*@-,!@&/*=*@-,!@&/* Double bond between two single bonds in ring or not in ring but in trans configuration.

Table 19.

Query Target
C-,=,#C
*-!@*
*@-,!@&/*=*@-,!@&/*

Further SMARTS examples can be found on Daylight's page.

Coordinate bonds

Coordination compounds can be registered and searched for in JChem structure databases. Both "atom to atom" and "multicenter" (involving more than two atoms) representations are supported.

Atom to atom coordinate bonds

Matching of "atom to atom" coordinate bonds is similar to matching other bond types. The direction of the coordinate bond arrow is not checked. See examples below. (Q stands for hetero atom, M for any metal atom. The thin dotted bond represents an ANY query bond.)

Table 20.

Query Target

Multicenter coordinate bonds

Multicenter coordinate bonds are handled the way as if each atom at opposite ends of the coordinate bond had individual coordinate bonds in between them. This means that the following molecule pairs are equivalent (The used molecule representation conforms to IUPAC recommendation: atom to atom coordinate bonds are displayed by an arrow and multicenter coordinate bonds are denoted by thick dashed line.)

So individual and multicenter representations can both be used during searching, in all combinations. See examples below. (The thin dotted bonds represent ANY query bond types.)

Table 21.

Query Target

Position variation bonds

Position variation bond (or variable point of attachment) is used to express that a bond may be attached to multiple positions (atoms), most often used for rings. This is represented by a multicenter atom at one or both end of the position variation bond. Its representation and drawing in Marvin is described in the Marvin Sketch help. See examples below.

Table 22. Meaning of position variation.

Query Possible meanings

Table 23. Matching of position variation queries.

Query Target

Components

This section describes the feautures related to different types of components. Component = set of connected atoms in a molecular drawing. The connection can be:

The handling of these different types of components are described below.

SMARTS component level grouping

This feature uses components as atoms connected by bonds. In SMARTS queries it can be specified whether different components (fragments) of the query should appear in the same or different components in the target. It is represented by grouping parentheses around the component in the SMARTS string. Please note that there are no graphical representation of this feature in Marvin.

Table 24.

SMARTS representation Meaning
C.C No restrictions.
(C.C) The two carbons must appear in the same component.
(C).(C) The two carbons must appear in different components.

Table 25.

Query Target
C.C
(C.C)
(C).(C)
(C).(C).C

Component, Mixture and Formulation brackets

This feature relates to the use of brackets (S-groups) of type COM (component), FOR (formulation) and MIX (mixture). A component here is a set of atoms contained by a component bracket.

Ordered and unordered mixtures

An unordered mixture (MIX type S-group) consists of several unordered components. For these types of mixtures, the order of addition during the preparation is not important. Example:

Ordered mixtures (FOR type S-groups), on the other hand contain ordered components, which define the order of addition. Example:

Matching of mixture and component brackets

The component grouping of component brackets is considered during the matching, so all atoms drawn inside component brackets in the query can only match atoms that are contained in the same component brackets in the target and separate components can only match separate components.

Component brackets without surrounding mix or for brackets are considered as being in mix (unordered mixture) brackets and molecules not drawn in any component brackets are considered to be in the same component.

Table 26.

Query Target

Matching of formulation brackets

Unordered mixture (mix) queries match both unordered (mix) and ordered (for) mixtures. However, ordered (for) mixtures only match ordered mixtures, and the component numbering must keep order. Examples:

Table 27.

Query Target

Other component features

During reaction searching, reaction component grouping is maintained, see at the reaction component handling section.

Exact fragment matching ensures that all query components (atoms connected by bonds) match only full components. See its description in the Search types section.

Explicit hydrogens

For the sake of simplicity, organic chemists usually do not draw hydrogen atoms on molecules, but in some models used to represent molecules the hydrogens are shown implicitly or explicitly. Whatever display mode one prefers, all free valences of the atoms are considered to be filled with hydrogens. In case of query structures, explicit query hydrogens have a significant importance. An explicitly drawn query hydrogen defines that the target must contain a hydrogen in that position (Table 8).

Table 28.

  target
query

Chemical Terms filtering expressions

Searches can include extra conditions formulated in the Chemical Terms language. Chemical Terms is a chemistry language which allows users to formulate complex chemical questions, expressions and rules. Chemical Terms can contain references to functional groups, other structural elements and physico-chemical properties. The syntax is described in the Chemical Terms Reference. Search specific functions contained in the search context provide access to the query and the target molecules, the search hit array and its elements:

The default input molecule is the target molecule (e.g. mass() is the same as mass(target()), both refer to the molecule mass of the target molecule).

The filtering expression can be set by

the latter specifies a configuration XML with function/plugin definitions to be used in addition to those provided by the built-in evaluator.xml.

The following table shows some examples (pKa values are shown at target atoms).

Table 29.

  target
setFilter("pka(hm(1))> 2")
query
setFilter("pka('acidic', hm(1))> 2 && mass()> 100")
query

A set of working examples is also available.

Back to index page