chemaxon.sss.search
Class MCES

java.lang.Object
  extended by chemaxon.sss.search.MCES
All Implemented Interfaces:
chemaxon.license.Licensable

public class MCES
extends java.lang.Object
implements chemaxon.license.Licensable

This class finds the largest common substructure of two molecules. More precisely, it finds the maximum common edge subgraph (MCES) of the molecular structures.

There are various options for specifying when the atoms and bonds of the two molecules are matched with each other: the algorithm can either ignore or take atom type, bond type, charge, hybridization, and isotopes (atomic mass) into account. By default, only atom and bond types are matched. The algorithm can search for disconnected MCES as well, that is the common substurctures could consist of more than one components (fragments). A minimum required size can be specified for these components, moreover, there is an option to keep only the largest component.

The query and target structures basically play the same role in MCES search with the only exception that the query molecule may contain special query atoms, namely, LIST, NOTLIST, HETERO, ANY.

This class applies a powerful heuristic algorithm, which typically finds large common substructures in a short time. However, it does not always provide the exact optimal result due to the size and complexity of the input structures. There are different search modes, which control the running time and the accuracy of the algorithm.

Although this is a randomized search, its behavior is deterministic and reproducible (unless setTimeLimit() is used). However, note that the found MCES could depend on the atom numbering of the query and target molecules, that is, different results could be obtained for equivalent molecule representations.

Typical usage:

    MCES mces = new MCES();
    mces.setMolecules(queryMol, targetMol);

    ... // search options using set*() methods

    mces.search();

    ... // obtain results using get*() methods
        // e.g. System.out.println(MolExporter.exportToFormat(mces.getAsMolecule(), "smiles"));
 

Since:
5.4
Author:
Peter Kovacs (pkovacs84)

Nested Class Summary
static class MCES.SearchMode
          Enum type for search modes of the MCES algorithm.
static class MCES.TerminationCause
          Enum type for causes of MCES search termination.
 
Field Summary
static int DEFAULT_MIN_COMPONENT_SIZE
          Default minimum size of MCES components (number of bonds).
 
Constructor Summary
MCES()
          Creates a new object for MCES search.
 
Method Summary
 void disableAromatization()
          Disables aromatization for the input molecules.
 void enableAromatization()
          Enables general aromatization for the input molecules.
 void enableAromatization(int method)
          Enables the given aromatization method for the input molecules.
 int getAromatizationMethod()
          Gets the aromatization method.
 Molecule getAsMolecule()
          Gets the found MCES as a Molecule object.
 int getAtomCount()
          Gets the number of atoms in the found MCES.
 boolean getAtomMapMatch()
          Gets the matching mode for atom map numbers.
 int[] getAtomMapping()
          Gets the atom mapping of the found MCES.
 int[] getAtomReverseMapping()
          Gets the reverse atom mapping of the found MCES.
 boolean getAtomTypeMatch()
          Gets the matching mode for atom types.
 int getBondCount()
          Gets the number of bonds in the found MCES.
 int[] getBondMapping()
          Gets the bond mapping of the found MCES.
 int[] getBondReverseMapping()
          Gets the reverse bond mapping of the found MCES.
 boolean getBondTypeMatch()
          Gets the matching mode for bond types.
 boolean getChargeMatch()
          Gets the matching mode for atom formal charges.
 int getComponentCount()
          Gets the number of components in the found MCES.
 boolean getHybridizationMatch()
          Gets the matching mode for atom hybridization.
 boolean getIsotopeMatch()
          Gets the matching mode for isotopes.
 boolean getKeepLargestComponent()
          Gets which components of the found MCES should be kept.
 MolAtom[] getMatchedQueryAtoms()
          Gets the query atoms that are part of the found MCES.
 MolBond[] getMatchedQueryBonds()
          Gets the query bonds that are part of the found MCES.
 MolAtom[] getMatchedTargetAtoms()
          Gets the target atoms that are part of the found MCES.
 MolBond[] getMatchedTargetBonds()
          Gets the target bonds that are part of the found MCES.
 int getMinComponentSize()
          Gets the minimum required size of the components of MCES.
 Molecule getQueryMolecule()
          Gets the query molecular structure.
 MCES.SearchMode getSearchMode()
          Gets the current search mode.
 int getStepCountLimit()
          Gets the maximum allowed number of elementary search steps.
 Molecule getTargetMolecule()
          Gets the target molecular structure.
 MCES.TerminationCause getTerminationCause()
          Gets the termination cause of the last search.
 long getTimeLimit()
          Gets the maximum allowed search time.
 MolAtom[] getUnmatchedQueryAtoms()
          Gets the query atoms that are not part of the found MCES.
 MolBond[] getUnmatchedQueryBonds()
          Gets the query bonds that are not part of the found MCES.
 MolAtom[] getUnmatchedTargetAtoms()
          Gets the target atoms that are not part of the found MCES.
 MolBond[] getUnmatchedTargetBonds()
          Gets the target bonds that are not part of the found MCES.
 boolean isLicensed()
          Returns information about the licensing of the product.
static void main(java.lang.String[] args)
          Simple command line interface mainly for testing purposes.
 boolean search()
          Performs MCES search according to the specified search options.
 void setAtomMapMatch(boolean match)
          Sets the matching mode for atom map numbers.
 void setAtomTypeMatch(boolean match)
          Sets the matching mode for atom types.
 void setBondTypeMatch(boolean match)
          Sets the matching mode for bond types.
 void setChargeMatch(boolean match)
          Sets the matching mode for atom formal charges.
 void setHybridizationMatch(boolean match)
          Sets the matching mode for atom hybridization.
 void setIsotopeMatch(boolean match)
          Sets the matching mode for isotopes.
 void setKeepLargestComponent(boolean keepLargest)
          Sets which components of the found MCES should be kept.
 void setLicenseEnvironment(java.lang.String env)
          Sets the license environment.
 void setMinComponentSize(int bondCount)
          Sets the minimum required size of the components of MCES.
 void setMolecules(Molecule query, Molecule target)
          Sets the two molecular structures to be matched.
 void setQueryMolecule(Molecule query)
          Sets the query molecular structure.
 void setSearchMode(MCES.SearchMode mode)
          Sets search mode.
 void setStepCountLimit(int maxStepCount)
          Sets the maximum allowed number of elementary search steps.
 void setTargetMolecule(Molecule target)
          Sets the target molecular structure.
 void setTimeLimit(long maxMilliseconds)
          Sets the maximum allowed search time.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_MIN_COMPONENT_SIZE

public static final int DEFAULT_MIN_COMPONENT_SIZE
Default minimum size of MCES components (number of bonds).

See Also:
Constant Field Values
Constructor Detail

MCES

public MCES()
Creates a new object for MCES search.

Method Detail

isLicensed

public boolean isLicensed()
Returns information about the licensing of the product.

Specified by:
isLicensed in interface chemaxon.license.Licensable
Returns:
true if the product is correctly licensed

setLicenseEnvironment

public void setLicenseEnvironment(java.lang.String env)
Sets the license environment.

Specified by:
setLicenseEnvironment in interface chemaxon.license.Licensable

setMolecules

public void setMolecules(Molecule query,
                         Molecule target)
Sets the two molecular structures to be matched.

Parameters:
query - query molecule
target - target molecule

setQueryMolecule

public void setQueryMolecule(Molecule query)
Sets the query molecular structure.

Parameters:
query - query molecule

setTargetMolecule

public void setTargetMolecule(Molecule target)
Sets the target molecular structure.

Parameters:
target - target molecule

getQueryMolecule

public Molecule getQueryMolecule()
Gets the query molecular structure.

Returns:
query molecule

getTargetMolecule

public Molecule getTargetMolecule()
Gets the target molecular structure.

Returns:
target molecule

enableAromatization

public void enableAromatization()
Enables general aromatization for the input molecules. By default, the query and target molecules are aromatized using the general aromatization method. Other methods can be specified using enableAromatization(int) and aromatization can be disabled using disableAromatization().

See Also:
Molecule.aromatize(int)

enableAromatization

public void enableAromatization(int method)
Enables the given aromatization method for the input molecules. By default, the query and target molecules are aromatized using the general aromatization method. Other methods can be specified using this function and aromatization can be disabled using disableAromatization().

See Also:
Molecule.aromatize(int)

disableAromatization

public void disableAromatization()
Disables aromatization for the input molecules. By default, the query and target molecules are aromatized using the general aromatization method. You can use this function to disable aromatization if it is not necessary or if the required standardization actions are already performed on the input molecules.


getAromatizationMethod

public int getAromatizationMethod()
Gets the aromatization method.

Returns:
the aromatization method or -1 if aromatization is disabled
See Also:
Molecule.aromatize(int)

setSearchMode

public void setSearchMode(MCES.SearchMode mode)
Sets search mode. For more information, see MCES.SearchMode.

Parameters:
mode - search mode

getSearchMode

public MCES.SearchMode getSearchMode()
Gets the current search mode. For more information, see MCES.SearchMode.

Returns:
search mode

setAtomTypeMatch

public void setAtomTypeMatch(boolean match)
Sets the matching mode for atom types. By default, atom types are considered (checked) in matching.

Parameters:
match - specifies if atom types are considered (true) or ignored (false)

setBondTypeMatch

public void setBondTypeMatch(boolean match)
Sets the matching mode for bond types. By default, bond types are considered (checked) in matching.

Parameters:
match - specifies if bond types are considered (true) or ignored (false)

setChargeMatch

public void setChargeMatch(boolean match)
Sets the matching mode for atom formal charges. By default, charges are ignored (not checked) in matching.

Parameters:
match - specifies if charges are considered (true) or ignored (false)

setHybridizationMatch

public void setHybridizationMatch(boolean match)
Sets the matching mode for atom hybridization. By default, hybridization states are ignored (not checked) in matching.

Parameters:
match - specifies if hybridization states are considered (true) or ignored (false)

setIsotopeMatch

public void setIsotopeMatch(boolean match)
Sets the matching mode for isotopes. By default, isotopes of the same element match each other, that is, mass numbers are ignored (not checked) in matching.

Parameters:
match - specifies if mass numbers are considered (true) or ignored (false)

setAtomMapMatch

public void setAtomMapMatch(boolean match)
Sets the matching mode for atom map numbers. By default, map numbers are ignored (not checked) in matching.

Parameters:
match - specifies if atom map numbers are considered (true) or ignored (false)
Since:
5.5

getAtomTypeMatch

public boolean getAtomTypeMatch()
Gets the matching mode for atom types.

Returns:
true if atom types are considered

getBondTypeMatch

public boolean getBondTypeMatch()
Gets the matching mode for bond types.

Returns:
true if bond types are considered

getChargeMatch

public boolean getChargeMatch()
Gets the matching mode for atom formal charges.

Returns:
true if charges are considered

getHybridizationMatch

public boolean getHybridizationMatch()
Gets the matching mode for atom hybridization.

Returns:
true if hybridization states are considered

getIsotopeMatch

public boolean getIsotopeMatch()
Gets the matching mode for isotopes.

Returns:
true if mass numbers are considered

getAtomMapMatch

public boolean getAtomMapMatch()
Gets the matching mode for atom map numbers.

Returns:
true if atom map numbers are considered
Since:
5.5

setMinComponentSize

public void setMinComponentSize(int bondCount)
Sets the minimum required size of the components of MCES. Components having less bonds than this limit are ignored. The default value is DEFAULT_MIN_COMPONENT_SIZE.

Parameters:
bondCount - minimum required bond count in components

getMinComponentSize

public int getMinComponentSize()
Gets the minimum required size of the components of MCES. Components having less bonds than this limit are ignored.

Returns:
minimum required bond count in components

setKeepLargestComponent

public void setKeepLargestComponent(boolean keepLargest)
Sets which components of the found MCES should be kept. By default, all components having at least as many bonds as the size limit are kept. If you use this function with true parameter, then only the largest component of the found MCES is kept (the one that has the most bonds). This search option results in a connected common substructure.

Parameters:
keepLargest - keep only the largest component (true) or all sufficiently large components (false)

getKeepLargestComponent

public boolean getKeepLargestComponent()
Gets which components of the found MCES should be kept.

Returns:
true if the largest component is kept only

setStepCountLimit

public void setStepCountLimit(int maxStepCount)
Sets the maximum allowed number of elementary search steps. If it is exceeded by the search process, it terminates with the best result obtained so far. This is an optional limit. By default, only the inherent limits of the current search mode are applied. You can use a negative parameter value to disable a previously specified step count limit.

This method complements setTimeLimit(), though with the guarantee of a deterministic, reproducible behavior. This is better for testing, validation and batch usage purposes.

Parameters:
maxStepCount - maximum number of elementary search steps
See Also:
getTerminationCause()

getStepCountLimit

public int getStepCountLimit()
Gets the maximum allowed number of elementary search steps. If no such limit is specified, this method returns -1. For more information, see setStepCountLimit().

Returns:
maximum number of elementary search steps or -1 if no such limit is specified

setTimeLimit

public void setTimeLimit(long maxMilliseconds)
Sets the maximum allowed search time. If it is exceeded by the search process, it terminates with the best result obtained so far. This is an optional limit. By default, only the inherent limits of the current search mode are applied. You can use a negative parameter value to disable a previously specified time limit.

This method complements setStepCountLimit(), though with no guarantee of a deterministic, reproducible behavior but with an easy and evident control of maximum running time. This could be better for interactive use cases.

Parameters:
maxMilliseconds - maximum running time in milliseconds
See Also:
getTerminationCause()

getTimeLimit

public long getTimeLimit()
Gets the maximum allowed search time. If no such limit is specified, this method returns -1. For more information, see setTimeLimit().

Returns:
maximum running time in milliseconds or -1 if no such limit is specified

search

public boolean search()
Performs MCES search according to the specified search options.

Before executing the search process, this function makes copies of the input molecules and performs the specified aromatization method on them (unless aromatization is disabled). Furthermore, it also calculates the hybridization states if they are considered in atom matching.

Returns:
true if a non-empty MCES is found.

getAtomCount

public int getAtomCount()
Gets the number of atoms in the found MCES.

Returns:
number of atoms in MCES

getBondCount

public int getBondCount()
Gets the number of bonds in the found MCES.

Returns:
number of bonds in MCES

getComponentCount

public int getComponentCount()
Gets the number of components in the found MCES.

Returns:
number of components in MCES

getAtomMapping

public int[] getAtomMapping()
Gets the atom mapping of the found MCES. The result is the mapping from the atoms of the query molecule to the atoms of the target molecule. Internal atom indexes are used, the returned array is indexed by query atom indexes.

Returns:
Mapping between the atoms of the query and target structures. The i-th element of the array is the index of the target atom that is matched with the i-th query atom, or -1 if no such atom is found in the target molecule.

getAtomReverseMapping

public int[] getAtomReverseMapping()
Gets the reverse atom mapping of the found MCES. The result is the mapping from the atoms of the target molecule to the atoms of the query molecule. Internal atom indexes are used, the returned array is indexed by target atom indexes.

Returns:
Mapping between the atoms of the target and query structures. The i-th element of the array is the index of the query atom that is matched with the i-th target atom, or -1 if no such atom is found in the query molecule.

getBondMapping

public int[] getBondMapping()
Gets the bond mapping of the found MCES. The result is the mapping from the bonds of the query molecule to the bonds of the target molecule. Internal bond indexes are used, the returned array is indexed by query bond indexes.

Returns:
Mapping between the bonds of the query and target structures. The i-th element of the array is the index of the target bond that is matched with the i-th query bond, or -1 if no such bond is found in the target molecule.

getBondReverseMapping

public int[] getBondReverseMapping()
Gets the reverse bond mapping of the found MCES. The result is the mapping from the bonds of the target molecule to the bonds of the query molecule. Internal bond indexes are used, the returned array is indexed by target bond indexes.

Returns:
Mapping between the bonds of the target and query structures. The i-th element of the array is the index of the query bond that is matched with the i-th target bond, or -1 if no such bond is found in the query molecule.

getMatchedQueryAtoms

public MolAtom[] getMatchedQueryAtoms()
Gets the query atoms that are part of the found MCES. The atoms are stored in increasing order of their indexes and for each i, getMatchedQueryAtoms()[i] corresponds to getMatchedTargetAtoms()[i].

Returns:
the matched atoms of the query molecule

getMatchedQueryBonds

public MolBond[] getMatchedQueryBonds()
Gets the query bonds that are part of the found MCES. The bonds are stored in increasing order of their indexes and for each j, getMatchedQueryBonds()[j] corresponds to getMatchedTargetBonds()[j].

Returns:
the matched bonds of the query molecule

getMatchedTargetAtoms

public MolAtom[] getMatchedTargetAtoms()
Gets the target atoms that are part of the found MCES. For each i, getMatchedQueryAtoms()[i] corresponds to getMatchedTargetAtoms()[i].

Returns:
the matched atoms of the target molecule

getMatchedTargetBonds

public MolBond[] getMatchedTargetBonds()
Gets the target bonds that are part of the found MCES. For each j, getMatchedQueryBonds()[j] corresponds to getMatchedTargetBonds()[j].

Returns:
the matched bonds of the target molecule

getUnmatchedQueryAtoms

public MolAtom[] getUnmatchedQueryAtoms()
Gets the query atoms that are not part of the found MCES.

Returns:
the unmatched atoms of the query molecule

getUnmatchedQueryBonds

public MolBond[] getUnmatchedQueryBonds()
Gets the query bonds that are not part of the found MCES.

Returns:
the unmatched bonds of the query molecule

getUnmatchedTargetAtoms

public MolAtom[] getUnmatchedTargetAtoms()
Gets the target atoms that are not part of the found MCES.

Returns:
the unmatched atoms of the target molecule

getUnmatchedTargetBonds

public MolBond[] getUnmatchedTargetBonds()
Gets the target bonds that are not part of the found MCES.

Returns:
the unmatched bonds of the target molecule

getAsMolecule

public Molecule getAsMolecule()
Gets the found MCES as a Molecule object. The returned substructure is originated from the target molecule, since the query could contain special non-specific atoms. As a result, the returned molecule has exactly the same properties (atom types, bond types, charges, isotopes, etc.) as the corresponding substructure of the target molecule, but those properties that are not considered during the search could differ from the corresponding substructure of the query molecule.

This function builds a new Molecule object each time it is called. You should store its result instead of multiple calls.

Returns:
the common substructure as Molecule

getTerminationCause

public MCES.TerminationCause getTerminationCause()
Gets the termination cause of the last search. For more information, see MCES.TerminationCause.

Returns:
termination cause

main

public static void main(java.lang.String[] args)
Simple command line interface mainly for testing purposes. This program can take two molecules, the first is referred to as query, the second is the target. Both are defined in structure files or by a string (e.g. SMILES) in the command line. Various flags are also accepted, see USAGE_INFO and EXAMPLES.

Parameters:
args - command line arguments (filenames, flags, options)