2010 US User Group Meeting Archive

13th September Training Day, 14-15th UGM, 16th Markush Forum. Boston, MA.

Partner Session


Meeting Report

Meeting Report - ChemAxon User Group Meeting, Boston September 14-15, 2010

Lead-in Presentations by Users
Presentations by Partners Presentations on the Present and Future of ChemAxon Products
Summary UGM archive – all presentations

In summary it was an intensive fact-filled meeting. The enthusiasm of users and partners for the software and of the ChemAxon developers bodes well for the future of the company. Development continues quickly, both in terms of the existing products as well as new areas. ChemAxon is in upbeat mode (as always) and well they might be. The business is expanding and their standing within the community is certainly at a more confident level from two or three years ago. This is a tribute to the hard work of a dedicated team of cheminformaniacs. I don’t expect them to slow down any time soon.


· return to TOC
The 2010 North American ChemAxon User Group Meeting was held September 14-15 at the Omni Parker House hotel in downtown Boston, Mass., one block from City Hall and Boston Common and two blocks from the Boston Harbor and the theater district. The hotel has been a landmark since 1855—it provides a blend of modern amenities and historic charm at America’s oldest continuously-operating hotel. It is no ordinary hotel, being home to the creation of Parker House Rolls and Boston Cream Pie. Need I say that the lunches and snacks were delicious and expertly served? The general meeting was held on the 15th floor rooftop ballroom, which has a grand view of downtown Boston.

The evening before the end-user and developer training sessions on September 13, the attendees gathered at the hotel’s Parker’s Bar. After the training day and preceding the first UGM day the attendees walked to the Durgin-Park in Quincy Market for a traditional dinner that included seafood of various types, Indian pudding, and of course, beer and wine. The traditional Grand Dinner was held at the Harvard Faculty Club to which we were transported on buses. We were first served champagne and nibbles on a covered terrace outside the club, then enjoyed a buffet dinner in the dining room. After dinner we mingled in several charming rooms of the club—a group gathered around a piano manned by Daniel Bonniot with sheet music for “Hotel California” and other songs from sheet music downloaded by John Irwin from UCSF. Following the formal UGM and preceding the Markush Forum the group gathered at the Bell in Hand tavern, close to the hotel.

A total of 80 non-ChemAxon participants registered for the meeting—an increase of 11 or 19% from last year. The participants represented 47 institutions of which 14 are pharmaceutical companies, 15 are partners, and seven are university or research institutions. Notable are participants from Chemical Abstracts and Microsoft. Thirty-seven of the participants were from the US; four from Daiichi Sankyo, Infogram, and Patcore in Japan; two from Roche in Switzerland; two from Founder of China; one from Fairview Research Barcelona SLU in Spain; and one from Evotec in the UK.

The UGM itself was preceded by two workshops. I attended the one designed for the end-user. It increased my understanding of the Marvin editing and viewing family, the desktop program Instant JChem, JChem for Excel, JChem in KNIME, Reactor, Markush enumeration and search, and Virtual reactions and Markush enumeration in KNIME. These overviews were helpful, but of course real understanding comes from using the software for a problem of interest. I was disappointed to learn that the exciting product JChem for Excel is not available for the Mac because Excel on the Mac is different from that on a Windows machine.

The informal comments of attendees ranged from excitement about the products and their continuing development to whispers about swapping to a JChem Cartridge from a Symyx or Accord cartridge. Switching from Daylight is now considered obvious. Another consistent theme of conversations was the great support people get from ChemAxon. This is true of both whether they are customers or academics.

In the UGM itself there were a total of 28 presentations, half of which were by ChemAxon. In addition, there were 11 brief presentations by partners who incorporate ChemAxon software into more specific applications.

As one would expect, the formal meeting was started with a presentation by Alex Drijver, the CEO of ChemAxon (View the Presentation). He reminded us that ChemAxon emphasizes transparency in business and development modeling. It also emphasizes partnering with clients by listening and taking on board ideas or joint projects that increase industry relevance as the field progresses—this was validated by the comments during user presentations and informal conversations. He emphasized that it is hard to predict the effect of the recent merger of Accelrys and Symyx on the competing cheminformatics software companies, however it seems likely to bring some uncertainties, notably as the old Accord and ISIS platforms will be migrated or replaced sooner or later. In this regard ChemAxon presents a genuine alternative for an enterprise level platform for large pharmaceutical companies and biotechs alike. Responding to a question from the floor on how ChemAxon views the future of its primarily Java based technology, Ferenc Csizmadia, Head of R&D pointed out that ChemAxon already developed .Net API for almost all tools as well as a Web Services interface. They are now looking at adding JavaScript too. In any event ChemAxon is not keeping its head in the sand where technology developments are concerned.

Presentations by Users

· return to TOC

Evotec – 6 Years of using ChemAxon & Instant JChem to link chemistry and biology data · (View the 6 years of using ChemAxon tools at Evotec Presentation and View the IJC to link chemistry and biology data Presentation) · return to TOC
Ian Berry described the challenges that face a cheminformatics system at a contract research company with over 150 partners. Evotec activities range from screening, hit-to-lead, lead optimization, custom and large scale synthesis and analytical development, to clinical alliances and project management. Although originally they used Daylight and MDL cheminformatics software, in 2004 they switched to ChemAxon because of its well designed API and flexible licensing scheme. An additional attractive feature is that the cartridge can perform structural transformations for enumeration. In the succeeding years they have continued with ChemAxon because of ChemAxon’s excellent support and quick bug fixes, good structure standardization tool for business rules, good library enumeration tools, many plug-ins, both local and server-based databases via Instant JChem, and the addition of JChem for Excel and SharePoint capabilities. Although they developed their own chemical spreadsheet, it will be replaced by JChem for Excel. Evotec’s current cheminformatics suite ESMA includes a desktop tool for chemists (to be replaced by embedding calculators in other applications such as JChem for Excel), a chemical spreadsheet (also to be replaced by JChem for Excel), a data miner that supports searching on any fields associated with 1.9 million unique structures and an equal number of screening results, an ELN and a corporate compound database that enforces exclusivity of each compound and project and user security, and EVOsource for chemicals. The latter includes hazard data, shopping cart capabilities, support for ordering from stores, ordering from a supplier or getting a quote. Their plans include using KNIME to enhance chemistry workflows for such activities as combinatorial library design, selection of stored virtual compounds for synthesis, and QSAR analysis. They also plan to convert their own calculators as well as those from BioByte, Moe, ACDLabs, and Molecular Discovery into plug-ins for ChemAxon tools so that the user can access them in all pieces of software. SharePoint will aggregate data from internal and external services. The final word was an appreciation of the excellent support from ChemAxon.

Broad Institute – Migrating Chemical Toolkits · (View the Presentation) · return to TOC
Dan Durkin addressed the issues and rewards of migrating chemical toolkits, specifically from Daylight to ChemAxon. Their suite of tools includes support for reagent purchases and inventory, library design, an ELN, library production, library cleavage and formatting, library analysis, compound registration and management, screening LIMS, a data warehouse, data visualization and analysis. ChemAxon was chosen because of the ease of Java integration, the functionality of the cartridge, the variety of end-user tools, and that it is being actively enhanced. There were twin goals for the migration—to enhance end user functionality and to minimize impact on existing data. A detailed analysis of 750,000 records revealed 1532 differences. Of these 1025 involved macrocyclic double bond stereochemistry and 37 an additional stereocenter set due to ring geometry, 32 additional structures that can be interpreted—positive differences. Probable errors in the original files led to the loss of tetrahedral stereochemistry for 106 structures when there were bad 3D coordinates in the mol file and the change of a double bond from specified to unspecified for 30 structures. In summary, the majority of the differences favor ChemAxon. For compound registration and standardization, it took some work to configure Standardizer to produce the same results as the previous SMIRKS rules. This included issues with removing hydrogens or Standardizer adding hydrogens and some differences in desalting. For library enumeration 10 reactions had to be modified to be compatible with ChemAxon jc_react. The FTE estimated effort involved a three-month assessment followed by three-month conversion and validation by two software engineers and one application engineer. He concluded that ChemAxon offers a great set of tools that provide a technical and functional match with their requirements; that migration had minimal impact to existing data; that one must be prepared for updates; and that the forum is very responsive and helpful.

IBM Research – Using ChemAxon Technology for Computer Curation of patents & scientific literature · (View the Presentation) · return to TOC
Stephen Boyer provided an update to his long term project on computer curation of patents and the scientific literature. Patents and the scientific literature contain data as bitmap images of chemical structures as well as chemical names. Computer curation involves the analysis of text and of image. In addition, manually curated content can be added. To convert text into chemical structures, the first issue is to identify text that is a chemical name. Then the name must be cleaned up to remove improper spacing and a variety of other formatting issues. ChemAxon Structure Checker identifies text that contains such issues. Optical processing of chemical structure images involves isolating the separate images followed by OCR. ChemAxon Structure to Name and Structure Checker tools are key to producing the SMILES string. The resulting JChem database currently contains 10 million unique compounds from 18 million documents. Search results are linked to patents and Medline. The ChemAxon tools that enable the application include Name to Structure, Structure to Name, Structure Checker, Marvin View, JChem Manager, JChem, and Instant JChem.

Beth Israel Deaconess Medical Center & Harvard Medical School – A Robotic Chemistry System for the Discovery of Cancer-Specific Targeting Ligands · (View the Presentation) · return to TOC
John Frangioni described how they incorporated ChemAxon tools into their robotic chemistry efforts to discover cancer-specific targeting ligands. The goal is to produce a chip that contains 5000 compounds specific for breast cancer and use this chip to find the compound(s) that are the best match for a particular patient. To identify such compounds, their strategy is to link cancer-targeting ligands with a contrast agent, radiotracer, or therapeutic agent to produce compounds that specifically target cancer cells and produce a signal for diagnosis and/or treatment. To accomplish this, they employ automated split-pool synthesis of ligands attached to the tracer and single-bead screening to identify active compounds. ChemAxon JChem integrated with KNIME is used for in silico enumeration, to search and store chemical databases, for pharmacophore modeling, and for 2D and 3D viewing. The mass spectra of the active compounds are matched with the compounds in the enumerated library to provide the structure of the active compounds.

Genomics Institute of the Novartis Research Foundation – Chemical-text Hybrid Search Engines · (Read the Presentation) · return to TOC
Yingyao Zhou presented their approach to chemical-text hybrid search engines. A Microsoft Office SharePoint Server (MOSS) can index all of the data available on portals, shares, in business applications. However, text search engines such as MOSS don’t recognize that a compound might have several names, for example a drug has its trade name, generic name, IUPAC name, and company name. Additionally, they do not support substructure or similarity searching. On the other hand, chemical search engines overcome these deficiencies, but they ignore the context of a text search. To solve this problem these workers implemented an entity canonical keyword indexing of structures in documents. This index is connected by a custom filter to both the structure database and the previously generated content index database. The query search engine performs both a structure and a text search to identify documents that match the query. ChemAxon will offer this text-structure hybrid search technology in JChem for Sharepoint.

Microsoft Corporation – SharePoint for Scientists · (View the Presentation) · return to TOC
Gabor Fari reminded us that by using SharePoint an organization can set up web sites to provide their users with the ability to search for information in local and provider databases, scientific literature, internal reports and other applications such as ELNs. It supports the ability of the scientist to search, integrate, analyze, collaborate and report. With the use of Smart Tags a page about a particular chemical may be indexed by the name of the compound—this then provides a key to other data, such as in ChemSpider, about the compound. Its document control feature can be used in a workflow for witnessing and signing ELN documents. Because applications can be embedded into SharePoint, examples include the Collective Molecular Environment application at Scripps that combines molecular graphics and e-mail conversations about a particular crystal structure; one that collects data to provide management with an overview of the state of several HTS screens; another that shows a screenshot with bar charts of molecular properties, pie charts of the distribution of several biological properties, and 2D and 3D molecular structures. SharePoint enhances the formation of communities that allows users to collaborate with peers online or offline and makes expertise available across the whole enterprise. Effective data analysis is tied to Excel and other visualization services.

GlaxoSmithKline – Implementation of ChemAxon in a SOA Environment · (View the Presentation) · return to TOC
Brett Hiemenz described their efforts to simplify the architecture, technologies, and vendors used to reduce the cost of on-going support and maintenance of their application portfolio. For the past decade they have worked to establish a Services Oriented Architecture. However, their chemistry web services used components from three different cheminformatics vendors. To consolidate and reduce license and support costs, they chose ChemAxon as a key vendor for chemistry features because it provides both a toolkit and cartridge, it provides products that might replace other vendor software, its languages are also those preferred at GSK, it continually improves products, it contains as standard items replacements for GSK custom code, and its performance is excellent. The transition was aided by the SOA architecture because they could swap out the previous application with the new one without involvement of the users. An analysis showed that SOA saves on total cost of ownership and provides consistency because there is only one place to house and change business rules. The next phase will deploy Instant JChem to replace ISIS HVIEWs for forms-based data delivery.

GlaxoSmithKline – Helium: An Excel based User TOol for SAR Analysis · (View the Presentation) · return to TOC
Charles Wilkins discussed their Helium project to expand Microsoft Excel 2007 with special capabilities for SAR analysis. The objective was to combine all the capabilities for medicinal chemists into an easy-to-use application. The first iterations based on Spotfire were abandoned because users found Spotfire difficult to learn and to use and it proved unlikely that a forms view could be integrated. Because ChemAxon offers JChem for Excel and Instant JChem provides a tool for creating forms, the current project uses these tools. To ensure a successful project, key end users from all disciplines and sites were identified and invited to weekly user group meetings and early access to the software. A further key to the success was the excellent support and response from ChemAxon. Biological data is access directly through Oracle; web services supply non-biological data. Microsoft ClickOnce is used to update desktop Excel. A JChem Ribbon provides access to JC4XL functionality and a Helium Ribbon exposes functionality for non data specific tasks such as highlighting and deleting duplicates. Helium also provides a datatype sensitive task panel—for example when a structure is selected, one can calculate properties, fragment the molecule, search for similar molecules in a database, or perform a substructure search in the table or a database. The integration of Helium JChem and Excel brings new capabilities for SAR and selectivity investigations in an interface familiar to Discovery scientists. It also provides one application to support in contrast to multiple tools. It will be deployed to 1000 users by the end of the year. Integration with SpotFire will also be available in late 2010. Still to be resolved are issues with copy/past functionality of OLE objects, how to economically train scientists in Excel, and linking Instant JChem data with JChem for Excel database access.

UCSF – Will docking work? · (View the Presentation) · return to TOC
John Irwin reminded us that the ZINC JChem database contains 35 million commercially available compounds and associated properties and vendor information. The ChemAxon software provides a very fast search, it supports many concurrent queries, and it was simple to interface to their previous system. The system uses the Apache web server, a Python script, CURL, Tomcat, and MySQL. The talk itself highlighted their efforts to analyze the causes of failures in docking molecules to a 3D protein structure. Binding sites are complicated, many interactions are involved in protein ligand binding, and every such interaction involves a competition with water. Although many successful predictions of binding orientations are published, failures are also known. To investigate this problem they decided to automate docking so that each aspect of the docking process (site preparation, software configuration, parameter choices, and file manipulations) could be investigated. For example, they investigated the interplay of the scoring function (Polarized versus Amber) and sampling (Coarser versus finer) and found that the best results for pose-fidelity and enrichment was with a combination of Amber scoring and coarser sampling. To gain a broad view of how successful their automation is at this point, they started with 7408 PDB structures with ligands; were able to start docking of 65% (4826) of these structures; automated docking competed in 54% (4018) proteins; and 2500 runs showed good enrichment. Further work will examine reasons for failures.

Thomson Reuters – Structure-based approaches to the indexing and retrieval of patent chemistry · (View the Presentation) · return to TOC
Donald Walter (co-author Tim Miller) discussed structure-based approaches to the indexing and retrieval of patent chemistry. He reminded us that 70% of patent information is never published elsewhere. Topological Markush searching became possible in the 1980s with applications from the University of Sheffield (GENSAL); Derwent, INPI and Questel (Markush DARC); and CAS (MARPAT). Currently there is technology to extract structures from patents by text mining, name to structure, and chemical OCR. ChemAxon, Digital Chemistry, DecrlPt, and Symyx offer tools that search and enumerate the compounds in a Markush structure. These support identification of prior art overlaps with planned compounds. However, there are challenges to be faced: First, indexing patents remains an issue because there are many Markush patents in many languages and some of the Markush structures are very large and complex. Beyond this, a person searching patents typically wants to know if the structures retrieved are active materials or those used in the synthesis; if there are limitations that cannot be coded in the structure; and the associated biological information. The final challenge is to rank hits by nearness to the core of the invention—for this JChem’s selective enumeration of the structures nearest the query the can help. A tool to visualize the results of a patent search would help one identify the most relevant patents.

Presentations by Partners

· return to TOC

Nóra Lapusnyik (View the Presentation) introduced the session by pointing out the wide range of applications that are built on ChemAxon tools. This reflects the commitment of ChemAxon to ongoing development of both existing products and new ones. The partners have benefited from the addition of .NET, web services, SharePoint, JChem for Excel, Markush structure handling and Structure Checker.

William Lindstrom from Acelot Inc. (View the Presentation) presented their plug-in for Instant JChem that provides alternative structure searching using proprietary graph algorithms such as graph editing distance for similarity searches.

Chip Allee from Ceutical Soft (View the Presentation) provided an overview of their OpenHTS Excel-based modules that use JChem for chemical structure processing and registration.

Michael Dippolito from DeltaSoft (View the Presentation) reminded the audience that they provide either pre-made or custom solutions for a variety of cheminformatics needs of companies. They support any chemistry cartridge and sketcher, including ChemAxon products.

Jim Moeder from Kelaroo (View the Presentation) discussed their inventory management system that can be based on any chemistry cartridge and sketcher. It provides real time access to electronic material safety data sheets and automatic management of commercial catalogues.

Derek Hayes from KineMatik (View the Presentation) which focuses on knowledge management, described their ELN and eNovator™ project management tool.

Frank Schaffer from KNIME (View the Presentation) described the commercial organization that provides support for KNIME. For example, it includes an enterprise server to save protocols or initiate remote or scheduled executions.

Jeffrey Nauss from Linguamatics (View the Presentation) described their embedding of JChem into their natural language processing text-mining product.

Bill Fisher from Rescentris (View the Presentation) described their Collaborative Electronic Research Framework (CERF) biologically focused ELN. It is database agnostic.

James Baxendale from Synaptic Science (View the Presentation) described SEURAT, which brings together into one application all information, independent of where or how it is stored.

David Mosenkis from TIBCO (View the Presentation) described the integration of ChemAxon into Spotfire.

Presentations on the Present and Future of ChemAxon Products

· return to TOC

Szabolcs Csepregi – 2010 – a year of JChem · (View the Presentation) · return to TOC
Szabolcs reminded us that JChem Base is a chemical database management toolkit to handle molecules, chemical reactions and Markush structures and associated data stored in relational databases. JChem Cartridge provides similar functionality highly integrated into Oracle as well as an SQL interface to other ChemAxon products. The JChem DB family also includes the desktop application for scientists, Instant JChem, and JChem for Excel. The chemical file formats SKC, CDX, CDXML and DARC are added to MDL mol/rxn/sdf/rdf, SMILES, CML, MRV, IUPAC, traditional names, InChi, mol2 and PDB formats previously recognized. JChem can use Derby Composite database engines in addition to those previously supported; Oracle, MySQL, MS SQL, MS Access, PostgreSQL, IBM DB2.Version 5.3.X of JChem Base now supports homology (such as “alkyl” or “aryl”) specification in queries. The R-group decomposition API is integrated with JChemSearch and can be output as a Markush structure. Additional improvements in searching involve matching undefined R-atom matching and handling fused rings. For managers of JChem databases it is now possible to pre-regenerate new versions while the old version is still operational and a new method of handling log tables and register caches to improve batch loading. Cartridge-specific enhancements include Markush tables and indices, user-defined fingerprints for similarity search, screening of molecular descriptors in a similarity search, and increased security and improved exception handling. Version 5.4 will provide multi-threading of finger print screening and similarity searches, MCS highlighting of similarity search results, handling of symmetry in R-group decomposition, and homology properties such as “alkyl C1-6” or “heteroaromatic N1-2”. Planned future enhancements also include a maximum common substructure search type, am arbitrary JChem index table, and a JChem Server. It is also planned to add R-group decomposition to the cartridge and to further enhance search speed.

György Pirok – Restoration of Molecular Artifacts · (View the Presentation) · return to TOC
György presented a new software tool, the Structure Checker. The current checkers can detect and fix layout issues like overlapping atoms or bonds, unpreferred bond lengths or angles, chemical errors like invalid valences, wedge bonds, chiral flags, reaction maps, ring strains, and incorrect aromatic rings. Checkers can identify and repair common problems originated from atom aliases, pseudo atoms, abbreviated groups, metallocanes and many more. Structure Checker is integrated in ChemAxon’s drawing tool, Marvin, to identify and fix issues during molecule drawing. Batch validation of structures is possible with a new wizard that allows both manual and automatic fixing with advanced reporting services. Checkers and fixers are integrated into the Chemical Terms language allowing the detection of chemical errors in many other applications including JChem Base, Instant JChem, JChem Cartridge, and JChem for Excel. Software developers can access these advanced functionalities through the Java and .NET programming APIs and via the JChem Web Services interface.

György Pirok – Virtual reaction design for chemists · (View the Presentation) · return to TOC
György also presented an overview of using Reactor for chemists to design virtual reactions. Although it is relatively easy to draw a virtual reaction, to be practical for the organic chemist, the software must eliminate reactions that do not occur and recognize the specificity if more than one position may react. For generality one may wish to indicate that several types of atoms (such as oxygen and nitrogen) may react. With ChemAxon tools one can specify, for example, that amines and alcohols can participate in a reaction but that amides, esters, and carboxylic acids cannot. However, reactivity is also governed by physicochemical properties such as the charge on the reacting atom—this can be specified in a reaction by a REACTIVITY rule. Similarly, a SELECTIVITY rule can specify the property that determines specificity. In addition, an EXCLUDE rule can specify properties of molecules that should not be reacted because they generate side reactions. Scientists can access Reactor from the Wizard, Instant JChem, and JChem for Excel. However it is also available as a component in Pipeline Pilot or KNIME. The more computer savvy chemists can also access Reactor via JChem Web Services, the JChem Cartridge, the react command line tool, and Java and .NET API. In summary, Reactor can be programmed to provide only chemically feasible reactions that can be modified by the user, it provides support for SMIRKS and RXN based enumerations, all property calculations can be used within the reaction rules, it ships with ChemAxon’s reaction library of named generic reactions. Currently it doesn’t support multistep reaction schemes, pro-chirality or reactions that are governed by HOMO-LUMO properties. Enhancement plans include the possibility to select products manually within the Reactor wizard, improving property calculations relevant to reactivity, extending stereo features to include prochirality, and improving the reaction library.

Tamás Pelcz – ChemAxon SharePoint Technologies · (View the Presentation) · return to TOC
Tamás presented the work at ChemAxon to integrate Marvin and JChem functionality into Microsoft SharePoint using .NET based API and the Marvin applet. The presentation included demos of managing a structure field list; examples for blogs, discussion boards, and Wikis; and Markush search using Filter. JChem for SharePoint renders structures with the option to resize or edit them as well as to calculate structural properties. It supports using structures in blogs, Wikis, and discussion boards. Future plans include connecting with databases; adding chemical tools such as R-group decomposition, Standardizer, and Reactor; and integrating with Excel. JChem Search for SharePoint indexes and allows the user to find chemical entities within documents on file shares, e-mail, and pages from the Internet or corporate intranet. The index includes chemical structures in SharePoint sites, files, internal and external websites, databases, and e-mail. Compounds are recognized from file extensions, JChem for Excel workbooks, as well as names and corporate IDs in documents. The structure list can be sorted or filtered. Enhancements available soon include the ability to import and export structures and lists as well as a desktop editor of SharePoint content based on Silverlight. For these enhancements the technology plans include JavaScript jQuery for the client side. The Web Services will include custom SharePoint services and the ability to offload chemical services from the SharePoint server. A demo version is available online.

Jonathan Lee – Interfacing the JChem Suite outside of Java · (View the Presentation) · return to TOC
Jonathan presented the efforts at ChemAxon to provide tools to interface the JChem Suite with Web Services, .NET, SQL cartridges, and workflow platforms. The exact situation affects the choice of which to use. The current version of JChem Web Services includes database related services, Standardizer, Chemical Terms, Reactor, and molecular conversion as well as molecular search. These services follow WS-I, SOAP and WSDL standards and support web languages, application languages and many JChem modules. The server architecture is based on Apache open source components. It operates on Windows, Linux, Unix, and Mac OS X. Version 5.4 of JChem Web Services will include expanded batch processing, SQL execution, Markush/R-group decomposition, and JChem table management. ChemAxon components exist for Pipeline Pilot, KNIME, and InforSense workflow managers. The pure .NET components are faster than the previous JNBridge. They are available for all capabilities except the cartridge and Marvin Bean classes.

Ákos Papp – Quicker, better, sketcher · (View the Presentation) · return to TOC
Ákos described tricks and tips for using MarvinSketch. For example typing “1:” adds a single bond and “2:” adds a double bond, typing “O, N, S” adds the query atoms to the selected position, and “esc” selects the current position. The expanded template library provides drawn structures of many common molecular fragments. Tricks used in sketching a query include shortcuts for atom types, implicit hydrogen atoms, and R-group definitions. Link nodes and position variations of substituents are also supported in a query. Marvin now supports searching in ChemSpider and PubChem. There are many options for the format used when copying a structure—not only an OLE or MRV file, but also MDL or Daylight formats, InChi or Images. The Structure Checker is now in Marvin Sketch. Version 5.4 will include enhanced double bond drawing; improvements in projections of 3D structures and 3D sketching; and the ability to change the stereochemistry of groups. It will also store molecule source in images. Technical improvements include full 64-bit support.

Szabolcs Csepregi – Markush project: Introducing Markush DARC support · (View the Presentation) · return to TOC
Szabolcs described the accomplishments and challenges in the Markush DARC project. The objective is to provide a tool to store and query the information in Thomson Reuters’ Merged Markush Service. However, one can also use Markush structures to store a combinatorial chemistry database made either by reagent clipping or R-group decomposition. Markush functionality includes substructure searching, full or partial enumeration, calculating the number of compounds in the library, scaffold alignment and coloring in the display of enumerated compounds. The hits can be visualized in a number of ways including a reduced result that shows only the compounds that overlap the query. Supporting the DARC format presented special challenges: Up to 8 attachment points for groups can be specified, but especially challenging is the ability to attachment points or R-groups embed within other R-groups. It was also necessary to develop a system to designate which atom of a group is the attachment point. Although Marvin had the capability to use abbreviations in structures, the attachment points had to be correctly displayed. Homology groups such as alkyl or aryl are represented as a pseudo atom. The challenge was to devise long names that are descriptive enough for the user. Special code had to be developed to handle frequency variation within a Markush structure. This can involve link nodes of various lengths, repeating units, etc. The final challenge was to account for the variation in position of a bond. This was solved with a special S-group type or relocatable multicenter atom. Note that the UGM was followed by a morning devoted to the issues with Markush structures, which I did not attend.

Tim Dudgeon – Making development of chemistry based systems easier · (View the Presentation) · return to TOC
Tim described investigations into using modern persistence and web frameworks to build chemistry-based web applications. The advantage of using modern frameworks are that they can provide new approaches bringing type-safety, better support for refactoring, more reuse, extensibility and testability to the software development process. It would also avoid the need for specialist “JChem developer” skills. Using such tools to handle chemical structures has traditionally been difficult because of the complexities of storing and searching chemical structures in a database: The result is little use of productivity enhancing frameworks that are commonly used in other areas. A prototype system using a chemical suppliers database was demonstrated that incorporated JChemBase and JChem Cartridge into the Hibernate persistence framework using the extension hooks that Hibernate provides. This allows the persistence and query aspects of molecules to be treated in the same manner as other data, and would significantly simplify the generation of chemistry-based applications, and reduce the amount of specialist knowledge required for programming. This effort is currently at an early stage of development: Components being generated include engine architecture, groovy-JChem library, Web components such as JSP tablib, GSP, Wicket as well as Sketcher, structure display, search options. They will be used internally and may become available as an extension to JChem.

Tim Dudgeon – Instant JChem – Chemistry on your laptop · (View the Presentation) · return to TOC
Tim presented an overview of the current and future features of the Instant JChem desktop application developed to be used by any chemist or biologist. It includes a report designer, database management, and chemical search and property predictions. The current system provides improvements in database features, list management, printing, and the structure renderer. In addition, Reactor has been integrated and Markush DARC files can be imported. Version 5.4 includes more form widgets and a visualization module with chart widgets for histograms and scatter plots. It also provides support for training the logP/D and pKa calculators with user data. Somewhat later widgets will be available for line, X-Y, radar, pie, and Tukey plots and curve fitting. It is also planned to add more database features and visualization tools, improved cherry picking and browser capabilities, and additional molecular descriptors such as ECFP, BCUT, and pharmacophore. Case studies illustrated using IJC to report from Activity Base and supplying Markush/Patent databases to customers. The Thomson Reuters Markush data includes VMN files of the Markush, XML files of patent data, and other files such as images and PDFs. The VMN structures were directly imported into JChem and the XML files were loaded into additional database tables using a custom Groovy script.

Tamás Pelcz – Presentations on JChem for Excel · (View the Drug Discovery in Microsoft Excel Presentation) or (View the Integrating with and Extending JChem for Excel Presentation) · return to TOC
Tamás provided two talks on JChem for Excel. He first listed the new features in 5.3.x—inclusion of Marvin .NET, more settings for structure drawing, the option to display the Bemis structural framework, the ability to use Standardizer to clean up a structure and Reactor to generate new molecules, and the capability of copy-paste operations with Marvin, other editors, other Office applications. For example, plug-Ins are integrated with the display of chemical structures to show label such atomic features as orbital electronegativity, pKa, or atomic contribution to log P. In coming releases users will be able to use third party editors such as ChemDraw®, ISIS Draw®, or Symyx Draw® and third party cartridges such as Symyx and IDBS. Future developments will enhance importing from databases and web services, integrate SharePoint, and provide R-group decomposition and Markush structure enumeration. In his second talk he reviewed methods for developers to integrate with and extend JChem for Excel. The JChem for Excel API support existing Excel Add-Ins in VBA, VSTO or COM. Functions to read and write JChem for Excel specific workbooks can be written without Excel using APACHE POI library.

Szabolcs Csepregi – Methods for tautomer enumeration, -searching and -duplicate filtering · (View the Presentation) · return to TOC
Szabolcs presented a summary of the ChemAxon tools for handling tautomers. The tautomerization calculator plug-in is the basis of most methods. It can identify tautomerizable regions, enumerate all or dominant tautomers and predict the distribution of dominant tautomers. Furthermore, it can provide generic and canonical tautomers that are used by other methods. It first identifies possible proton donors and acceptors and finds the tautomerization paths between them. Depending on the desired operation, it then combines the paths into regions (generic tautomer), combinatorially enumerates all possible tautomeric forms (all tautomers), filters and ranks enumerated structures based on pKa and other criteria (dominant tautomers) or canonicalizes using empirical rules (canonical tautomer). The tautomerization plug-in is also used to improve results of other calculations, such as macro pKa and log P. The tautomer duplicate search uses canonical tautomers combined with a hash key. This method provides fast filtering of tautomers in chemical database tables, handles tautomeric migration of hydrogen isotopes and interactions with stereochemistry. A tautomer substructure search enumerates tautomers of the query, and searches each of them separately. Standardizer, the tool for performing custom and built-in transformations on molecules, generates a canonical tautomer so that database and query structures are automatically transformed by the specified transformations. Custom Standardizer transformations also allow handling of ring-chain tautomerism, but it development is underway to include ring-chain tautomerism.

Daniel Bonniot & Alex Allardyce – Naming and Chemicalize.org · (View the Presentation) · return to TOC
Daniel and Alex presented advances in naming tools and chemicalize.org. Structure to Name now includes support for isotopes. The major effort has been on Name to Structure. With Version 5.4.0 the error rate for converting systematic names to structure is approximately 2.5%, a three-fold improvement from 5.3.8. The error in converting mixed names to structures is also reduced by half in Version 5.4.0. Document to structure extracts names from text documents and in Version 5.4 from PDF documents. chemicalize.org is a free public service to add chemical structures to web pages and to display predicted properties for these structures. It uses Name to Structure, JChem, Marvin, Calculator Plugins, and JChem Web Services. To access the functionality, the user pastes the url of interest into the chemicalize.org web page. The application also keeps a database of the urls accessed and the structures found in them—these other pages as well as calculations can then be displayed as a data page. ChemSpider and Linguamatics I2E link out to the data page and there are WordPress and Safari browser Plugins. Developments envisioned for chemicalize include sorting urls for relevance, personalization, more browser and media plugins, and HTML 5.

Miklós Vargyas – Drug Discovery in the Sandbox · (View the Presentation) · return to TOC
Miklós introduced ChemAxon Drug Discovery Sandbox, which aims to accelerate product development relevant to scientists’ needs. It allows users to test new applications without waiting for a formal release. Their experiences and comments help guide development with the result that the results are what the user expects and there are fewer bugs in the release version. In one example, the generation of 3D structures of 737 molecules with MMFF, in eight revisions the speed increased 20-fold and the accuracy decreased from 0.11Å to 0.002Å. Capabilities developed this way include van der Waals volume, minimal projection area, length perpendicular to the maximum area, 3D screening, and view of chemical clustering.


· return to TOC

In summary it was an intensive fact-filled meeting. The enthusiasm of users and partners for the software and of the ChemAxon developers bodes well for the future of the company. Development continues quickly, both in terms of the existing products as well as new areas. ChemAxon is in upbeat mode (as always) and well they might be. The business is expanding and their standing within the community is certainly at a more confident level from two or three years ago. This is a tribute to the hard work of a dedicated team of cheminformaniacs. I don’t expect them to slow down any time soon.

Return to Table of Contents


Boston & the Event Hotel

Training Day

Evening of 13th of September: Yankee cuisine in Durgin Park with steaks and seafood

Sessions of the User Group Meeting

Grand Dinner in the Harvard Faculty Club

Social program on the evening of 15th September in the Bell in Hand tavern

The morning of the Markush structures and enumeration – Markush Forum