LibMCS searches for the Maximum Common Substructures of accompound Library in a hierarchical manner. The concept of maximum common substructure (or MCS) is defined for two graphs (in this case for two chemical graphs) and it refers to the largest subgraph (substructure) common to both structures.
This figure illustrates the Maximum Common Substructure of two compounds. The MCS is highlighted in red.
Note, that in this examples formal atomic charges were taken into account, this is why the two nitrogen atoms do not match each other.
The concept of pairwise MCS is not directly applicable for a set of compounds. Two problems arise: (1) The pairwise comparison of each and every compound in a set of compounds is practically infeasible, since it would involve n · ( n - 1 ) / 2 MCS calculations, where n denotes the number of compunds in the library. In the case of 1000 structures 499,500 pairwise calculations are needed. Even one such comparison is fairly complex and requires significant computation time. If, let's say the computation time of one MCS search is 10ms on a certain hardware, the full evaluation of a 1000 library would take 4995 seconds, that is, nearly 1.5 hours (and 35 hours for 5000 molecules). (2) Unless the set contains structural analogs, that is structures with similar molecular backbone and variations only in their functional groups, the structure common to all elements of the analyzed set degenerates easily to a carbon atom. Thus, outliers require careful handling.
Hierarchical clustering offers a feasible approach to overcome the above mentioned problems with two substantial benefits: (1) the number of MCS calculations needed is proportional to n (practically less than n/2) and (2) outliers form singletons thus they do not deteriorate larger common substructures.
Initial structures are found at the bottom of the hierarchy. The next level
contains the maximum common structures of clusters of initial molecules: all
molecules that share a common structure are placed in a cluster (group). This
MCS based classification of molecules creates disjoint subsets (clusters), that
is, one molecule belongs to one cluster only.
It is not guaranteed, though, that this disjoint classification is optimal. Molecule
a can have and MCS ma,b when compared to molecule
b, while the comparison against molecule c may results in a
different largest common structure ma,c. Since only one
of these two possible comparisons are made in hierarchical MCS clustering, the
algorithm may find ma,b even if ma,c is
larger. Furtunately, such anomalies occur very rarely, and it is neglectable in
practice. The price to be paid for an exact (optimal) solution to be found is
high: all possible pairwise comparisons have to be made, which, as seen above is,
is costly. A significantly faster solution, however approximate with ignorable
error, was preferred. The libmcs application (and the corresponding
LibraryMCS java class) is an approach to reach this goal.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Example of two MCS clusters. The four structures in the first row share a common substructure that is different from the MCS of the two molecules in the second row.
The maximum common structures of a compound library can be searched by the
libmcs command. In version 5.0 the command line/batch version is not supported, though
this is temporary only, in release 5.1 proper batch version will also be delivered.
In interactive mode (which is the only available in the current release of JChem) a graphical output is produced. This depicts the hierarchy by means of a dendogram. Initial structures are found at the bottom of this tree structure while higher levels of the tree correspond to higher levels of the hierachy.
A portion of a six level hierarchical clustering schema in shape of a dendogram.
This dendogram makes easier the overview of the clustered compounds by allowing the navigation through levels. Moving the mouse pointer over its nodes pops up a small window that displays the structure corresponding to that node.
The molecular structure associated to the treenode below the mouse pointer is displayed in a small pop-up window. As the mouse moves along the dendogram, the content of this pop-up window is updated.
This structure is an input molecule if the pointer traverses over a leaf node (a tree node at the bottom of the tree - note, that this tree is displayed upside-down, which is not uncommon at all). If the pointer moves to a higher level tree node, then the MCS of all of its child nodes (nodes directly below it) is displayed in the small pop-up window. These nodes, unlike leaves, can be selected by mouse click. There can only be one node selected, which is marked by a rectangle. Children of this node, that is, structure that share the same maximum common substructure are displayed uder the dendogram tree.
Children of the selected tree node (marked wirth the blue rectangle) are input molecules that all share the same common structure, highlighted in red.
Click again at the same node changes the marker to a filled rectangle, and then all descendants (ie. children, grand children etc) of the node are displayed. (Please note that the selection of descandant display is not selected by double clicking rather by two subsequent single clicks.)
Parameters can be set in the Options dialog in the Cluster pull
down menu.
There are a few tunable parameters involved in the detection of maximum
common substructures of a set of compounds. Some of these are important in
everyday use of libmcs while others remain interesting for
experienced users only.
A typical way of usage is to define the maximum allowed levels of hierarchy. For
instance, there are cases when only first level clusters are interesting, that
is, the common substructures of the input molecule set is sought for, but these
common structures are not clustered further. In such case there is no point to
generate higher levels of hierarchy, this shortens the execution time. In order
to stop libmcs
after the first level is created (ie. all input structures are clustered),
1 has to be defined in the parameters dialog.
other options provide control the behaviour of the algorithm and are
mainly intended for advanced users.
The minimum allowed size of any MCS found can be set by Minimal MCS size.
When set, for instance to 5, possible common structures
smaller than 5 atoms are abandoned.
By default, the heap size in some Java runtime environments is limited to 64MB, so you may run out of memory easily. See FAQ on how to increase the heap size.