We have begun the second phase of our research program intended to create a virtual classroom for teaching cheminformatics. This web-based project integrates concept-teaching with real-world software applications. Several vendors, including ChemAxon are participating by providing software, which will be incorporated into the e-learning modules. I will describe the overall goals of the project and demonstrate our first preliminary module. This module demonstrates the use of MarvinEdit for structure input, fingerprinting and ChemTattoo® from Mesa Analytics & Computing, LLC, OEChem from OpenEye Scientific Software, Inc. and MarvinView for color-coded display of results.
Our research results are based upon work supported by the National Science Foundation Small Business Innovation Research (SBIR) Program under Grant No. 0450457. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Biotech industry must track chemical structure information throughout the scientific discovery processes. Packaged solutions are usually too expensive for smaller companies and provide more functionality than they need. Marvin Tools provide enough functionality to build a compact and efficient database system. In NeoGenesis, a biotech company recently acquired by Schering-Plough for its unique technology based on affinity selection, we have developed a complete system for storing and retrieving Chemical Structures and properties accurately with fine grained access levels. We used Marvin in Oracle to ensure uniqueness and to calculate molecular properties. The GUIs developed to interact with the system relied on Marvin Components. Marvin tools have saved us at least one year of development time and freed two employees to work on other applications.
We shall describe Kelaroo’s Reagent Management System (KRMS) that encompasses in-house containers and commercially available reagents (catalogue items). Functionality includes container and catalogue item searching, container registration, inventory tracking, reagent request generation and processing, requisition generation and integration with in-house purchasing systems, and catalogue item data uploading. The KRMS utilizes Marvin and the JChem Oracle data cartridge, and is designed to work in concert with Kelaroo’s Enumeration solution.
We present LeadMarker(TM), a suite of tools which provide a customizable solution for acquiring and managing information generated as part of the drug discovery process. LeadMarker(TM) contains a chemical registration system (chemMarker) capable of managing small and large sets of chemical libraries; A sample management system (sampleMarker) capable of registering chemical information from single vials to high density source plates and tracking reformatting applications during the screening process; A unique module for acquiring screening data from a variety of detection instrumentation (assayMarker); processing the data to identify active chemicals and managing the screening campaign for any given biological target; and finally a high performance Java based chemical spreadsheet tool that can be used for processing and analyzing chemical and biological data (structShare).
Laboratory based drug discovery efforts, on their own, are insufficient to deal with the wealth of potentially druggable targets. Virtual discovery has an increasing role to play in the drug discovery process, however the demands of setting up the technology are considerable, even for a major pharmaceutical company. The Real World Chemistry concept aims to provide a sophisticated in-silico drug discovery framework available online.
CTI’s Scientific Systems group was challenged with the task of building cost-effective cheminformatics applications that can be easily deployed to a world-wide enterprise. We have met this challenge with a data warehousing architecture and open-source-based Java development infrastructure, with targeted use of commercial tools and libraries. ChemAxon’s JChem and Marvin Java libraries provide core functionality for the molecular structure component of the data warehouse, the Java middle tier, and a browser-hosted user interface. This presentation will demonstrate web applications that use the Marvin sketcher for a web-based structure search, integrate JChem structure search results into a large and complex data model, and perform server-side rendering of structure images. In addition we will describe a mechanism for automated import to JChem’s structure tables from an MDL-based chemical registration system.
The informatics team at the Genomics Institute of the Novartis Research Foundation (GNF) has been developing a comprehensive lead discovery database (LDDB) in order to support an aggressive drug discovery portfolio. The chemical structure searching capability of LDDB has been supported by the combination of Marvin and Daylight cartridge until 2004. However, we are migrating into an exclusive ChemAxon platform. This presentation intends to share with the user group our experience in using different vendor products, while focusing on the issues related to our cartridge migration process. Examples illustrating how ChemAxon tools are used in LDDB are also presented.
Virtual screening and high-throughput screening are two major components of lead discovery within the pharmaceutical industry. There is an ongoing effort to deploy massive in silico screening (e.g. docking) on a GRID infrastructure using large libraries of compounds. The challenge posed to the medicinal and computational chemist is how to sift through the potentially large number of hits that can be generated by a (virtual) high-throughput screen in order to select interesting compounds for further follow-up. Hence, there is an increasing need for interpretation of large amounts of “noisy” data. Fortunately, the number of success stories showing that virtual screening helps in accelerating the rational drug design process is growing rapidly. There still remain significant challenges in the application of those approaches. Diverse tools running on different platforms having other data formats have to be integrated. The resulting large log-files with millions of entries have to be compared, sorted, merged and filtered before the results can be further analyzed by the project scientist.
SEURAT is a tool used by biologists, chemists and preclinical groups at Celera for exploring SAR. It provides an intuitive and flexible interface for data retrieval (including crystal structures), property calculation, and visual recognition of chemical trends. Viewing biological data and modeling predictions in the context of structure allows chemists and biologists to bring to bear their many years of experience when interpreting experiment results. SEURAT allows scientists to succinctly construct assay result retrieval templates from chemically aware databases, tune the results in “excel-like” ways and perform sub-structure searches on both global and local structure sets. The result is a tight integration between a traditional (table like) compound-based assay data view and novel aggregate visualizations (heat maps, scaffold matrices and the like) that provide scientific users efficient data mining capabilities in a familiar structural context. In this talk we will present some of the capabilities of SEURAT and share some of our insights on the requirements for easing acceptance of such tools in a drug design and development environment.
InforSense Ltd, the leading provider of Open Discovery Workflow(TM) informatics platforms, integrates ChemAxon’s powerful and versatile technology within its ChemScience suite. Industry-standard Java components such as Marvin, JChem Base, Standardizer, Screen, Reactor, and Marvin Plugins (pKa, LogP, LogD) are seamlessly accessed via InforSense integrative analytic workflows and can link to different data sources and other products within the InforSense portfolio without programming. Capability highlights include multiple format conversion, handling and validation, a descriptor and fingerprint generation engine, a smart library design module, browsing and search capabilities, including sketch and search in deployed applications via the Discovery Portal. Close integration with InforSense bioinformatics and literature mining capabilities produces efficient informatics solutions able to handle complex research issues, previously very time-consuming.Sophisticated cheminformatics solutions combining different data sources, legacy or proprietary applications, external tools and web services are effortlessly built and deployed company-wide as reusable best practice solutions. InforSense’s visual programming environment and its Portal provide the framework for a single unified cheminformatics infrastructure. Indeed, multidisciplinary teams can now have seamless access to highly sophisticated and tailored solutions through the portal.
Aureus Pharma has constituted and maintains a unique and comprehensive database regrouping all the data already published on GPCR and Ion channel medicinal chemistry and pharmacology. Chemical structures of ligands (> 150,000) as well as precise description of their targets and of all type of in vitro and in vivo pharmacological responses (> 600,000) have been collected from more than 20,000 published references. A specific software has been designed to query the database using biological keys as well as chemical features. Substructure and similarity search within the database has been implemented through the integration of J Chem within the query tool. In the present work, we illustrate the superiority of hit identification and focused library design when a virtual screening approach uses a highly documented source of knowledge and a high performance fingerprint calculation and comparison software;. In our strategy, biological filters (queries) are used to identify training sets of active compounds. The virtual screening done using ChemAxon tools uses these training sets to constitute “chemical” filters to screen compound catalogs. This method was applied to NK1-specific ligands. The vast amount of data regrouped in Aureus Pharma AurSCOPE database allows us to enrich and increase the diversity of a query set, yielding higher hit rates. The used pharmacophoric fingerprints appear to give rise to structurally diverse series of hit compounds and a better enrichment ratio. We demonstrate that 2D chemical and pharmacophoric fingerprints lead to considerable improvement in the design of focused libraries.
The Chemical Biology Platform within the Broad Institute screens both the Molecular Libraries Small Molecule Repository (MLSMR) and a unique compound collection synthesized internally through diversity oriented synthesis, yielding novel starting points for follow-up chemistry in a range of disease areas including cancer, diabetes, and psychiatric illnesses. The supporting informatics needs include sample tracking, analysis of high-throughput assay data, hit evaluation, and analysis of compounds by patterns of activity across multiple assays (e.g., to understand mechanism of action). Data and results have been made public first through ChemBank and subsequently through PubChem and CARS. As our focus has shifted from infrastructure development to supporting the discovery of putative novel therapeutics, our informatics requirements now include capture of the entire discovery life cycle (assay development, HTS, lead optimization, ADME etc.).
This presentation describes the evolving informatics needs relative to probe and drug discovery. It includes an overview of informatics initiatives with a focus on our current Chemical Biology Informatics Platform (CBIP), and our plans to evolve toward a modular system that enables biologists, chemists, and computational scientists to integrate chemical and biological data. Success stories that highlight the ChemAxon family of products since the transition from a legacy chemistry cartridge in March 2010 will be presented including how we have integrated ChemAxon into our environment over the past year-and-a-half: hit calling for high-throughput screens, cherry picking of compounds for retest, and stereo-structure-activity relationship analysis.