clustering and diversity analysis for chemical libraries
ChemAxon’s JKlustor Suite performs similarity and structure based clustering of compound libraries and focused sets in both hierarchical and non-hierarchical fashion as well as it carries out diversity calculations, and library comparisons based on molecular fingerprints and other descriptors. It is an essential tool in combinatorial chemistry, virtual library design, or other areas where a large number of compounds need to be analyzed.
A variety of methods for different purposes
JKlustor provides users a wide variety of clustering methods and options to refine calculations. The input data can come from structure files, text files or from database tables and the results are saved to text files or database tables.
Similarity based clustering
Hierarchical clustering: Ward’s minimum variance method, speeded up with Murtagh’s reciprocal nearest neighbour algorithm, creates tight and well separated clusters. It is recommended to be used with smaller data sets like focused libraries with < 100,000 structures.
Non-hierarchical clustering: The Sphere Exclusion method is based on fingerprints and/or other numerical data, it can easily cope with millions of structures and it is suitable for diverse subset selection. K-means cluster analysis method aims to find the center of natural clusters in the input data in a way that minimizes the variance within each cluster.
Last but not least the Jarvis-Patrick (Jarp) method uses a nearest neighbor approach and performs variable-length clustering of chemical databases with hundreds of thousands of structures contained.
Structure based clustering
Hierarchical clustering: LibraryMCS identifies the largest substructure shared by several molecular structures. It uses the hierarchical representation of clusters (dendograms) as well as it provides an alternative tree and table view.
Being fast and suitable for thorough analysis MCS profiling can help scientists in exploring screening results to quickly identify novel scaffolds and new examples of active compound families. Hierarchical SAR table enables viewing of clusters and associated non-structural data.
Furthermore, R-group decomposition can also be performed using the MCS as the core structure for each cluster. Non-hierarchical: JKlustor makes clustering available to use pre-generated Bemis-Murcko frameworks of structures, and therefore provides a convenient and quick way towards analyzing large databases with millions of compounds.
Non-hierarchical: JKlustor makes clustering available to use pre-generated Bemis-Murcko frameworks of structures, and therefore provides a convenient and quick way towards analyzing large databases with millions of compounds.
JKlustor can use ChemAxon’s proprietary chemical and pharmacophore fingerprint technology and/or other user defined descriptors such as BCUT and other predicted or measured physico-chemical properties such as logP, logD, pKa, hydrogen bond donor / acceptor, etc.
JKlustor can be configured to generate different types of similarity/dissimilarity comparisons. By using JKlustor’s command line tool (“Compr”) comparisons of any individual compounds of a set with the rest of the compounds from the same library can be made as well as two separate libraries can be compared. Furthermore, it is possible to perform Library self-dissimilarity test comparing all individual compounds of a set with the rest of the compounds within the same library.
CreateView is another command line tool included in JKlustor. It composes an SDfile that contains both structures and calculation results using the input SDfile of GenerateMD (an command line program for the generation of various molecular descriptors) and a table containing the ordinal number of compounds in the SDfile and other data to be viewed. Such table can be created for example by “Compr” or “Jarp”. The generated SDfiles can be displayed by the MarvinView application or other SDF viewer.
JKlustor tools can be called from the command line or from the API of JChem. JKlustor runs on many operating systems and can integrate with many database engines. Full Java and .NET integration is supported as well as connection to Oracle, MySQL, MS SQL Server, DB2, PostgreSQL, Access, etc. databases. The LibMCS element comes with a standalone GUI that can allows users to browse/navigate through a large set of data. Furthermore, maximum common edge sub-graph (MCES) and maximum common substructure (MCS) clustering methods are also available as ChemAxon components for both KNIME and Pipeline Pilot workflow management systems.