'Natural' Clustering: An approximate MCES algorithm used on WOMBAT

September 2009 Author: ()
Products JKlustor, Screen Product group Discovery toolkit

We describe a clustering procedure for chemical compounds, which can rapidly be deployed on large compound libraries. The procedure creates initial cluster seeds starting from fingerprints and the Taylor-Butina algorithm (as implemented by MESA Analystics). After fast initial seeding the clusters are refined and rearranged using an approximate solution of the MCES (Maximum Common Edge Substructure) algorithm. The chemical patterns identified during fingerprint clustering are subject to filtering rules to ensure homogeneity of the clusters. The approximate MCES algorithm (RASCAL implementation) is then applied to identify shared chemotypes in an iterative manner until all chemical structures in the library are assigned to the appropriate cluster. We tested this procedure on the WOMBAT database, which presented a ‘natural’ clustering solution, i.e., the series reported within WOMBAT (from SUNSET Molecular) that were used to identify common patterns and evaluate algorithmic performance.