JKlustor

clustering and diversity analysis for chemical libraries

ChemAxon’s JKlustor Suite performs similarity and structure based clustering of compound libraries and focused sets in both hierarchical and non-hierarchical fashion as well as it carries out diversity calculations, and library comparisons based on molecular fingerprints and other descriptors. It is an essential tool in combinatorial chemistry, virtual library design, or other areas where a large number of compounds need to be analyzed.

Product Type:component
Interfaces:CLIAPI ( Java, .NET )
Available in:JChem Base

A variety of methods for different purposes

JKlustor provides users a wide variety of clustering methods and options to refine calculations. The input data can come from structure files, text files or from database tables and the results are saved to text files or database tables.

Similarity based clustering

Hierarchical clustering: Ward’s minimum variance method, speeded up with Murtagh’s reciprocal nearest neighbour algorithm, creates tight and well separated clusters. It is recommended to be used with smaller data sets like focused libraries with < 100,000 structures.

Similarity based clustering - Ward's clustering

Non-hierarchical clustering: The Sphere Exclusion method is based on fingerprints and/or other numerical data, it can easily cope with millions of structures and it is suitable for diverse subset selection. K-means cluster analysis method aims to find the center of natural clusters in the input data in a way that minimizes the variance within each cluster.

Last but not least the Jarvis-Patrick (Jarp) method uses a nearest neighbor approach and performs variable-length clustering of chemical databases with hundreds of thousands of structures contained.

Structure based clustering

Hierarchical clustering: LibraryMCS identifies the largest substructure shared by several molecular structures. It uses the hierarchical representation of clusters (dendograms) as well as it provides an alternative tree and table view.

Structure based clustering - LibraryMCS

Being fast and suitable for thorough analysis MCS profiling can help scientists in exploring screening results to quickly identify novel scaffolds and new examples of active compound families. Hierarchical SAR table enables viewing of clusters and associated non-structural data.

Furthermore, R-group decomposition can also be performed using the MCS as the core structure for each cluster. Non-hierarchical: JKlustor makes clustering available to use pre-generated Bemis-Murcko frameworks of structures, and therefore provides a convenient and quick way towards analyzing large databases with millions of compounds.

Non-hierarchical: JKlustor makes clustering available to use pre-generated Bemis-Murcko frameworks of structures, and therefore provides a convenient and quick way towards analyzing large databases with millions of compounds.

Structure based clustering - Bemis-Murcko frameworks of structures

Structure based clustering - Bemis-Murcko frameworks of structures (clusters view)

Descriptors

JKlustor can use ChemAxon’s proprietary chemical and pharmacophore fingerprint technology and/or other user defined descriptors such as BCUT and other predicted or measured physico-chemical properties such as logP, logD, pKa, hydrogen bond donor / acceptor, etc.

Diversity analysis

JKlustor can be configured to generate different types of similarity/dissimilarity comparisons. By using JKlustor’s command line tool (“Compr”) comparisons of any individual compounds of a set with the rest of the compounds from the same library can be made as well as two separate libraries can be compared. Furthermore, it is possible to perform Library self-dissimilarity test comparing all individual compounds of a set with the rest of the compounds within the same library.

Create View

CreateView is another command line tool included in JKlustor. It composes an SDfile that contains both structures and calculation results using the input SDfile of GenerateMD (an command line program for the generation of various molecular descriptors) and a table containing the ordinal number of compounds in the SDfile and other data to be viewed. Such table can be created for example by “Compr” or “Jarp”. The generated SDfiles can be displayed by the MarvinView application or other SDF viewer.

Availability

JKlustor tools can be called from the command line or from the API of JChem. JKlustor runs on many operating systems and can integrate with many database engines. Full Java and .NET integration is supported as well as connection to Oracle, MySQL, MS SQL Server, DB2, PostgreSQL, Access, etc. databases. The LibMCS element comes with a standalone GUI that can allows users to browse/navigate through a large set of data. Furthermore, maximum common edge sub-graph (MCES) and maximum common substructure (MCS) clustering methods are also available as ChemAxon components for both KNIME and Pipeline Pilot workflow management systems.

Articles in the library

MultiMCS: A Fast Algorithm for the Maximum Common Substructure Problem on Mu…

Mar 29, 2011 - Publication
Several efficient correspondence graph-based algorithms for determining the maximum common substructure (MCS) of a pair of molecules have been published in the literature. The extension o…

Chemical Mixture Fingerprints and Applications

Sep 25, 2013 - Presentation
Chemical mixtures have important applications in pharmaceutical, cosmetics, flavor and other industries. Composition and stoichiometry of a mixture are required to have complete and corre…

Still have questions?

Have a look on our support forum or drop us a line