chemaxon.sss.search
Class Search

java.lang.Object
  extended by chemaxon.sss.search.Search
All Implemented Interfaces:
SearchConstants, StereoConstants, chemaxon.util.search.MolSearcher
Direct Known Subclasses:
MolSearch

public abstract class Search
extends java.lang.Object
implements StereoConstants, SearchConstants, chemaxon.util.search.MolSearcher

Parent of all structural search classes

Author:
Szilard Dorant, Szabolcs Csepregi

Field Summary
protected static java.util.logging.Level MRV_OUTPUT_LEVEL
          Mrv format of the query and target is logged at this level.
protected  int preMatchLength
          The number of used items in preMatchTargetAtoms and preMatchQueryAtoms arrays.
protected  int[] preMatchQueryAtoms
          Array to store query atoms of pairs set by addMatch().
protected  int[] preMatchTargetAtoms
          Array to store target atoms of pairs set by addMatch().
protected  MolSearchOptions searchOptions
          Object to store all search parameters.
 
Fields inherited from interface chemaxon.struc.StereoConstants
ANTI, ATOMSTEREO_EITHER, ATOMSTEREO_MASK, ATOMSTEREO_NONE, ATOMSTEREO_SPECIFIC, CHIRALITY_M, CHIRALITY_MASK, CHIRALITY_P, CHIRALITY_r, CHIRALITY_R, CHIRALITY_s, CHIRALITY_S, CHIRALITYSUPPORT_ALL, CHIRALITYSUPPORT_NONE, CHIRALITYSUPPORT_SELECTED, CIS, CTUMASK, CTUNKNOWN, CTUNSPEC, DBS_ALL, DBS_MARKED, DBS_NONE, ENDO, EXO, PARITY_ALLENE, PARITY_EITHER, PARITY_EVEN, PARITY_MASK, PARITY_ODD, PARITY_TETRAHEDRAL, PARITY_UNSPEC, STGRP_ABS, STGRP_AND, STGRP_NONE, STGRP_OR, SYN, TRANS
 
Fields inherited from interface chemaxon.sss.SearchConstants
ABS_STEREO_ALWAYS_ON, ABS_STEREO_CHIRAL_FLAG, ABS_STEREO_TABLE_OPTION, ATTACHED_DATA_MATCH_EXACT, ATTACHED_DATA_MATCH_GENERAL, ATTACHED_DATA_MATCH_IGNORE, CHARGE_MATCHING_DEFAULT, CHARGE_MATCHING_EXACT, CHARGE_MATCHING_IGNORE, DEFAULT_DISSIMILARITY_THRESHOLD, DEFAULT_SEARCHTYPE, DISSIMILARITY_PROPERTY_NAME, DUPLICATE, FULL, FULL_FRAGMENT, HCOUNT_MATCHING_AUTO, HCOUNT_MATCHING_EQUAL, HCOUNT_MATCHING_GREATER_OR_EQUAL, HIT_EXCLUDEDQ, HIT_LP, HIT_MULTICENTER, HIT_NON_R, HIT_ORDERING_NONE, HIT_ORDERING_UNDEF_R_MATCHING_GROUP_FIRST, HIT_R, HIT_R_EMPTY_MATCH, HIT_UNMAPABLE, IMPLICIT_H_MATCHING_DEFAULT, IMPLICIT_H_MATCHING_DISABLED, IMPLICIT_H_MATCHING_ENABLED, IMPLICIT_H_MATCHING_IGNORE, ISOTOPE_MATCHING_DEFAULT, ISOTOPE_MATCHING_EXACT, ISOTOPE_MATCHING_IGNORE, MARKUSH_AROM_FINAL_CHECK, MARKUSH_AROM_NO_FINAL_CHECK, MARKUSH_AROM_OFF, MARKUSH_HIT_INNER, MARKUSH_HIT_ORIGINAL, MARKUSH_MCS, MATCH_COUNT_BETWEEN, MATCH_COUNT_RELATION, NO_ABAS, NO_SCREEN, RADICAL_MATCHING_DEFAULT, RADICAL_MATCHING_EXACT, RADICAL_MATCHING_IGNORE, SEARCH_MODE_NAMES, SEARCH_OPTIONS_LENGTH, SIMILARITY, STEREO_DIASTEREOMER, STEREO_ENANTIOMER, STEREO_EXACT, STEREO_IGNORE, STEREO_MODEL_COMPREHENSIVE, STEREO_MODEL_DEFAULT, STEREO_MODEL_GLOBAL, STEREO_MODEL_LOCAL, STEREO_SPECIFIC, SUBSTRUCTURE, SUPERSTRUCTURE, TAUTOMER_SEARCH_DEFAULT, TAUTOMER_SEARCH_OFF, TAUTOMER_SEARCH_ON, UNDEF_R_MATCHING_ALL, UNDEF_R_MATCHING_GROUP, UNDEF_R_MATCHING_GROUP_H, UNDEF_R_MATCHING_GROUP_H_EMPTY, UNDEF_R_MATCHING_UNDEF_R, VAGUE_BOND_DEFAULT, VAGUE_BOND_LEVEL_HALF, VAGUE_BOND_LEVEL1, VAGUE_BOND_LEVEL2, VAGUE_BOND_LEVEL3, VAGUE_BOND_LEVEL4, VAGUE_BOND_OFF
 
Constructor Summary
Search()
           
 
Method Summary
 void addMatch(int[] queryAtoms, int[] targetAtoms, int length)
          Specifies extra prerequisites of the structure search that queryAtoms[0] must match to targetAtoms[0] only AND queryAtoms[1] must match to targetAtoms[1], etc.
 void addMatch(int queryAtom, int targetAtom)
          Specifies an extra prerequisite of the structure search that queryAtom must match to targetAtom only.
static boolean areMatchingBondTypes(int q, int t)
          Tests if a query bond matches the target.
protected static boolean areMatchingBondTypes(int q, int t, boolean exactMatch)
          Tests if a query bond matches the target.
 void clearMatch()
          Clears the extra prerequisites of the structure search specified using addMatch calls.
 int[][] findAll()
          Looks for all matching patterns in the molecule.
 int[][][] findAllGroups()
          Returns the group hits corresponding to all hits.
protected  SearchHit[] findAllHits()
          Looks for all matching patterns in the molecule.
 int[] findFirst()
          Looks for the first matching pattern in the target molecule.
 int[][] findFirstGroup()
          Returns the group hit corresponding to the first hit.
protected abstract  SearchHit findFirstHit()
          Looks for the first matching pattern in the target molecule.
 int[] findNext()
          Looks for the next matching pattern in the target molecule.
 int[][] findNextGroup()
          Returns the group hit corresponding to the next hit.
protected abstract  SearchHit findNextHit()
          Looks for the next matching pattern in the target molecule.
protected static int getAtomStereo(MolAtom atom, int parity)
          Determine the stereo type of an atom
 int getMatchCount()
          The number of times the query molecule appears in the target molecule.
 Molecule getMatchingQuery()
          Returns the query which produced the hit vector of the last findNext(), findFirst() or findAll() result.
abstract  Molecule getQuery()
          Retrieves the query structure stored in the MolSearch object.
protected  java.lang.String getQueryAsString()
          For internal purposes only.
protected  Molecule getQueryToPrint()
          For internal purposes only.
 MolSearchOptions getSearchOptions()
          Returns the SearchOptions object associated with this Search object.
abstract  Molecule getTarget()
          Retrieves the target molecule
protected  java.lang.String getTargetAsString()
          For internal purposes only.
protected  Molecule getTargetToPrint()
          For internal purposes only.
 boolean isMatchCountBetween(int hitLimitLow, boolean isLowerLimitIncluded, int hitLimitHigh, boolean isHigherLimitIncluded)
          Decides questions like "does the query match the target between 2 and 5 times (inclusively)" Makes this efficiently, which means it only searches for the number of hits necessary to decide the question.
 boolean isMatchCountInRelation(java.lang.String relation, int hitLimit)
          Decides questions like "does the query match the target at least 3 times", "[] up to 5 times", "[] exactly once".
abstract  boolean isMatching()
          Checks if the query structure matches a substructure in the molecule.
 boolean isVerbose()
          For debugging purposes only.
protected  void logException(java.util.logging.Logger logger, java.lang.String message)
          Writes on the logger output the shown message together with a representation of the query and the target.
abstract  void setQuery(Molecule mol)
          Specifies the query structure.
abstract  void setQuery(Molecule mol, int[] exclude)
          Specifies the query structure to be tested.
 void setSearchOptions(SearchOptions options)
          Copies all search parameters from options to the current search object.
abstract  void setTarget(Molecule mol)
          Specifies the target molecule to be tested.
abstract  void setTarget(Molecule mol, int[] exclude)
          Specifies the target molecule to be tested.
 void setVerbose(boolean verbose)
          For debugging purposes only.
abstract  void stop()
          Tries to stop the running search as fast as possible.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

searchOptions

protected final MolSearchOptions searchOptions
Object to store all search parameters.


preMatchQueryAtoms

protected int[] preMatchQueryAtoms
Array to store query atoms of pairs set by addMatch().


preMatchTargetAtoms

protected int[] preMatchTargetAtoms
Array to store target atoms of pairs set by addMatch().


preMatchLength

protected int preMatchLength
The number of used items in preMatchTargetAtoms and preMatchQueryAtoms arrays.


MRV_OUTPUT_LEVEL

protected static final java.util.logging.Level MRV_OUTPUT_LEVEL
Mrv format of the query and target is logged at this level.

Constructor Detail

Search

public Search()
Method Detail

setTarget

public abstract void setTarget(Molecule mol)
Specifies the target molecule to be tested.

Specified by:
setTarget in interface chemaxon.util.search.MolSearcher
Parameters:
mol - the target molecule.

setTarget

public abstract void setTarget(Molecule mol,
                               int[] exclude)
Specifies the target molecule to be tested.

Parameters:
mol - the target molecule.
exclude - index of atoms to exclude

getTarget

public abstract Molecule getTarget()
Retrieves the target molecule

Returns:
the Molecule to search on

setQuery

public abstract void setQuery(Molecule mol)
Specifies the query structure.

Specified by:
setQuery in interface chemaxon.util.search.MolSearcher
Parameters:
mol - the query structure.

setQuery

public abstract void setQuery(Molecule mol,
                              int[] exclude)
Specifies the query structure to be tested.

Parameters:
mol - the query molecule.
exclude - index of atoms to exclude

getQuery

public abstract Molecule getQuery()
Retrieves the query structure stored in the MolSearch object.

Returns:
Molecule to search for

isMatching

public abstract boolean isMatching()
                            throws chemaxon.sss.search.SearchException
Checks if the query structure matches a substructure in the molecule.

Returns:
true if the query was found.
Throws:
chemaxon.sss.search.SearchException

findFirst

public final int[] findFirst()
                      throws chemaxon.sss.search.SearchException
Looks for the first matching pattern in the target molecule. If the search object was previously used, this method re-initializes the search process, and starts returning the hits from the beginning.

Returns:
an array containing the atom indexes of the target atoms that match the query atoms (in the order of the appropriate query atoms) or null if there are no hits.

Special atom indexes: see description in findNext().

Throws:
chemaxon.sss.search.SearchException
See Also:
findNext(), findAll(), isMatching(), getMatchingQuery()

findNext

public final int[] findNext()
                     throws chemaxon.sss.search.SearchException
Looks for the next matching pattern in the target molecule. If the search object was not used previously, it also initializes the search. (So findFirst() call is not necessary prior to a findNext() call.)

Specified by:
findNext in interface chemaxon.util.search.MolSearcher
Returns:
an array containing the atom indexes of the target atoms that match the query atoms (in the order of the appropriate query atoms) or null if there are no more hits.

Special atom indexes:

  • In case of explicit query H atoms matching to implied H atoms in the target, a negative number is returned. The absolute value of this number equals with the atom index of the heavy atom bearing the implicit hydrogen, or Integer.MIN_VALUE in case of 0 heavy atom index.
  • The same method is used for explicit LP (lone pair) atoms in the query. The hit contains the negated number of the target heavy atom with the matching lone pair, or Integer.MIN_VALUE for 0 index. SearchConstants.HIT_LP is set for isolated lone pairs (in which case there is no such target heavy atom).
  • Multicenter atoms (e.g. of multicenter coordinate bonds) are not returned, the match array always contains SearchConstants.HIT_MULTICENTER for these atoms.
  • For R-group queries, R-atom matches are not returned, the match array contains SearchConstants.HIT_R for these atoms. If the "hitIncludesRNodes" parameter is NOT set then only the scaffold atoms are included in the match array. See MolSearchOptions.setHitIncludesRNodes(boolean).
  • Undefined R-atom handling depends on parameter "undefinedRAtom", see SearchOptions.setUndefinedRAtom(int). In case of undefined R-atom matching a group of atoms (see SearchOptions.isUndefinedRAtomMatchingGroup()), only one matching atom is set in the match array, or SearchConstants.HIT_R_EMPTY_MATCH denoting the empty group.
  • Unmapable atoms (e.g. polymer star atoms) are denoted by SearchConstants.HIT_UNMAPABLE in the match array.
  • Excluded atoms (see setTarget(chemaxon.struc.Molecule,int[])) will not appear in the match array at all (their appropriate indexes are left out).
  • When the query contains link nodes, the returned array may contain more indices than the query atoms. In this case the extra atom indices appear at the end and method getMatchingQuery() can be used to get the most specific matching form of the query.
  • All Superatom S-groups are treated as expanded during the search, so atom indices are returned accordingly.
Throws:
chemaxon.sss.search.SearchException
See Also:
findFirst(), findAll(), isMatching(), getMatchingQuery()

findAll

public final int[][] findAll()
                      throws chemaxon.sss.search.SearchException
Looks for all matching patterns in the molecule.

Returns:
an array containing the matches as arrays or null if there are no hits. The match arrays contain the atom indexes of the target atoms that match the query atoms (in the order of the appropriate query atoms).

Special atom indexes: see description in findNext().

Throws:
chemaxon.sss.search.SearchException
See Also:
isMatching(), findFirst(), findNext()

findFirstGroup

public final int[][] findFirstGroup()
                             throws chemaxon.sss.search.SearchException
Returns the group hit corresponding to the first hit. The group hit is a glove array allowing multiple target atoms matching a query atom. This can be the case if the query contains undefined R-atoms and these R-atoms can match a group of atoms (which is the case by default, see SearchOptions.setUndefinedRAtom(int)). If the SearchConstants.UNDEF_R_MATCHING_GROUP_H_EMPTY option is set then empty match is denoted by an empty int[] array.

For special matching atom indexes, refer to the description in findNext().

Returns:
the group hit corresponding to the first hit
Throws:
chemaxon.sss.search.SearchException
Since:
JChem 5.3
See Also:
findNextGroup(), findAllGroups(), isMatching(), getMatchingQuery(), findFirst(), findNext(), findAll()

findNextGroup

public final int[][] findNextGroup()
                            throws chemaxon.sss.search.SearchException
Returns the group hit corresponding to the next hit. The group hit is a glove array allowing multiple target atoms matching a query atom. This can be the case if the query contains undefined R-atoms and these R-atoms can match a group of atoms (which is the case by default, see SearchOptions.setUndefinedRAtom(int)). If the SearchConstants.UNDEF_R_MATCHING_GROUP_H_EMPTY option is set then empty match is denoted by an empty int[] array.

For special matching atom indexes, refer to the description in findNext().

Returns:
the group hit corresponding to the next hit
Throws:
chemaxon.sss.search.SearchException
Since:
JChem 5.3
See Also:
findFirstGroup(), findAllGroups(), isMatching(), getMatchingQuery(), findFirst(), findNext(), findAll()

findAllGroups

public final int[][][] findAllGroups()
                              throws chemaxon.sss.search.SearchException
Returns the group hits corresponding to all hits. The group hit is a glove array allowing multiple target atoms matching a query atom. This can be the case if the query contains undefined R-atoms and these R-atoms can match a group of atoms (which is the case by default, see SearchOptions.setUndefinedRAtom(int)). If the SearchConstants.UNDEF_R_MATCHING_GROUP_H_EMPTY option is set then empty match is denoted by an empty int[] array.

For special matching atom indexes, refer to the description in findNext().

Returns:
the group hits corresponding to all hits, each element of the array is a group hit glove array
Throws:
chemaxon.sss.search.SearchException
Since:
JChem 5.3
See Also:
findFirstGroup(), findNextGroup(), isMatching(), findFirst(), findNext(), findAll()

findFirstHit

protected abstract SearchHit findFirstHit()
                                   throws chemaxon.sss.search.SearchException
Looks for the first matching pattern in the target molecule. If the search object was previously used, this method re-initializes the search process, and starts returning the hits from the beginning.

Returns:
the search hit or null if there is no hit
Throws:
chemaxon.sss.search.SearchException
Since:
JChem 5.4
See Also:
isMatching(), findNextHit(), findAllHits()

findNextHit

protected abstract SearchHit findNextHit()
                                  throws chemaxon.sss.search.SearchException
Looks for the next matching pattern in the target molecule. If the search object was not used previously, it also initializes the search. (So findFirstHit() call is not necessary prior to a findNextHit() call.)

Returns:
the search hit or null if there are no more hits
Throws:
chemaxon.sss.search.SearchException
Since:
JChem 5.4
See Also:
isMatching(), findFirstHit(), findAllHits()

findAllHits

protected SearchHit[] findAllHits()
                           throws chemaxon.sss.search.SearchException
Looks for all matching patterns in the molecule.

Returns:
the search hit objects or null if there are no hits
Throws:
chemaxon.sss.search.SearchException
Since:
JChem 5.4
See Also:
isMatching(), findFirstHit(), findNextHit()

areMatchingBondTypes

public static boolean areMatchingBondTypes(int q,
                                           int t)
Tests if a query bond matches the target.

Parameters:
t - type of target bond
q - type of query bond
Returns:
true, if target fits query

areMatchingBondTypes

protected static boolean areMatchingBondTypes(int q,
                                              int t,
                                              boolean exactMatch)
Tests if a query bond matches the target.

Parameters:
t - type of target bond
q - type of query bond
exactMatch - if true, query bonds are matching the same query bond only
Returns:
true, if target fits query

getAtomStereo

protected static int getAtomStereo(MolAtom atom,
                                   int parity)
Determine the stereo type of an atom

Parameters:
atom - atom to determine its stereo type
parity - parityof the atom
Returns:
one of the following constants:
  • ATOMSTEREO_NONE : no stereo specific information
  • ATOMSTEREO_EITHER : "either" stereo information, parity independent
  • ATOMSTEREO_SPECIFIC: specific stereo information, parity dependent

addMatch

public void addMatch(int queryAtom,
                     int targetAtom)
Specifies an extra prerequisite of the structure search that queryAtom must match to targetAtom only. If this is impossible, the search methods will report no matching. The use of this method makes the search more effective than checking the hits afterwards.

Several addMatch() calls represent conditions connected by boolean operator AND.

The effect of all addMatch() calls can be canceled by clearMatch().


addMatch

public void addMatch(int[] queryAtoms,
                     int[] targetAtoms,
                     int length)
Specifies extra prerequisites of the structure search that queryAtoms[0] must match to targetAtoms[0] only AND queryAtoms[1] must match to targetAtoms[1], etc. If this is impossible, the search methods will report no matching. The use of this method makes the search more effective than checking the hits afterwards.

Several addMatch() calls represent conditions connected by boolean operator AND.

The effect of all addMatch() calls can be canceled by clearMatch().

Since:
JChem 2.2

clearMatch

public void clearMatch()
Clears the extra prerequisites of the structure search specified using addMatch calls.

Since:
JChem 2.2

isMatchCountInRelation

public boolean isMatchCountInRelation(java.lang.String relation,
                                      int hitLimit)
                               throws chemaxon.sss.search.SearchException
Decides questions like "does the query match the target at least 3 times", "[] up to 5 times", "[] exactly once". Makes this efficiently, which means it only searches for the number of hits necessary to decide the question.

Example:

isMatchCountInRelation("<", 2) - true if the query can be found in the target less than two times.

Parameters:
relation - The relational operation of the question. This operation will be used to compare the number of hits to hitLimit. Available values:
"=" - tests equality to hitLimit.
"<" - returns true if the number of found hits is less than hitLimit.
"<=" - less than or equality
">" - greater than
">=" - greater than or equality
hitLimit - The limit for the number of hits.
Returns:
"actual number of hits (of query in target)" <relation> hitLimit
Throws:
chemaxon.sss.search.SearchException - if encountered during the search.
See Also:
getMatchCount()

isMatchCountBetween

public boolean isMatchCountBetween(int hitLimitLow,
                                   boolean isLowerLimitIncluded,
                                   int hitLimitHigh,
                                   boolean isHigherLimitIncluded)
                            throws chemaxon.sss.search.SearchException
Decides questions like "does the query match the target between 2 and 5 times (inclusively)" Makes this efficiently, which means it only searches for the number of hits necessary to decide the question.

Example:

isMatchCountBetween(2, true, 4, true) - true if the query can be found in the target exactly 2, 3 or 4 times.

isMatchCountBetween(2, false, 4, false) - true if the query can be found in the target exactly 3 times.

Parameters:
hitLimitLow - The lower limit for the number of hits.
isLowerLimitIncluded - If true, equality is allowed with hitLimitLow.
hitLimitHigh - The upper limit for the number of hits. If you pass Integer.MAX_VALUE, it will be treated as infinity. (I.e. only the lower limit is applied.)
isHigherLimitIncluded - If true, equality is allowed with hitLimitHigh.
Returns:
Whether the "actual number of hits (of query in target)" is between hitLimitLow and hitLimitHigh, inclusively.
Throws:
chemaxon.sss.search.SearchException - if encountered during the search.
See Also:
getMatchCount()

getMatchCount

public int getMatchCount()
                  throws chemaxon.sss.search.SearchException
The number of times the query molecule appears in the target molecule.

If you would like to decide a simple relation regarding this number, you should consider method isMatchCountInRelation( String, int), because it is more efficient than this method.

Returns:
the above occurrence number.
Throws:
chemaxon.sss.search.SearchException - If encountered during the search.
See Also:
isMatchCountInRelation(java.lang.String, int)

isVerbose

public boolean isVerbose()
For debugging purposes only.


setVerbose

public void setVerbose(boolean verbose)
For debugging purposes only.


stop

public abstract void stop()
Tries to stop the running search as fast as possible. (E.g. used in another thread.)


getMatchingQuery

public Molecule getMatchingQuery()
Returns the query which produced the hit vector of the last findNext(), findFirst() or findAll() result.

Returns:
The query or null if the query is not initialized yet or no searching operations were performed.
Since:
JChem 3.1.2

getSearchOptions

public MolSearchOptions getSearchOptions()
Returns the SearchOptions object associated with this Search object. The object returned is linked with this Search object, so modifications in the returned SearchOptions object will change the behaviour of this Search object also!

Returns:
the current search settings as a SearchOptions object.
Since:
JChem 5.0
See Also:
setSearchOptions(SearchOptions)

setSearchOptions

public void setSearchOptions(SearchOptions options)
Copies all search parameters from options to the current search object.

Parameters:
options - search options to copy.
Since:
JChem 5.0
See Also:
getSearchOptions()

getQueryToPrint

protected Molecule getQueryToPrint()
For internal purposes only.

Returns:
the query to be displayed for debug purposes only.

getTargetToPrint

protected Molecule getTargetToPrint()
For internal purposes only.

Returns:
the target to be displayed for debug purposes only.

getTargetAsString

protected java.lang.String getTargetAsString()
For internal purposes only.

Returns:
the target to be displayed for debug purposes only.

getQueryAsString

protected java.lang.String getQueryAsString()
For internal purposes only.

Returns:
the query to be displayed for debug purposes only.

logException

protected void logException(java.util.logging.Logger logger,
                            java.lang.String message)
Writes on the logger output the shown message together with a representation of the query and the target. This representation is a short one (name or cxsmiles/cxsmarts) on the info level and a longer one (mrv if above formats are not suitable) on the fine level.

Parameters:
logger - logger to use for logging
message - message to write out with the query and target.