2010 European User Group Meeting Archive

18th May Training Day, 19-20th UGM, 21st Markush Forum. Budapest, Hungary

Partner Session

Posters

Meeting Report

The ChemAxon Double Bond Show - ChemAxon User Group Meeting, Budapest May 19-20, 2010

Lead-in ELNs and ChemAxon
Chemical catalogues More from Evotec
GSK and SOA SkunkWerks and Web Services
SharePoint Partners and Posters
Markush MedChem and more
Enhancements to ChemAxon software Conclusion
UGM archive – all presentations

Conclusion
The cognoscenti will have noticed that I did not start this article with Alex Drijver’s opening address. That’s because I thought it makes a nice conclusion. He said that ChemAxon is a dynamic, innovative, growing company. Even though the market is shrinking, ChemAxon saw 60% growth in revenues last year, after 30% the year before. The company had seven record months last year and revenues are likely to be 50% up this year. Seventy percent of revenues are ploughed back into R&D. ChemAxon is light on organisation and much of its development is done in partnership. All this means that total cost of ownership for customers is lower than for any other commercial provider. Open source solutions might seem cheaper but they are free only if you don’t value your time. You may have uncertainties about the competition – its name and its future platform – but you can always pronounce ChemAxon and ChemAxon will allow you to migrate.

That’s a positive ending in itself but some customer comments are probably even more significant. GSK is consolidating on ChemAxon as the key vendor not because ChemAxon is the best or fastest at everything but because ChemAxon software overall is the best. ChemAxon’s continuous development is also liked, and user perception of performance has been acceptable. The ChemAxon cartridge is a thin layer on the database, unlike the previous GSK cartridge that was replaced. A vote of confidence from a company as large as GSK means more than any words from ChemAxon itself. What more can I myself add?




Lead-in · return to TOC
I take my title from this year’s T-shirt. ChemAxon, believe it or not, has a band of some repute and it entertained the captive audience (a record number of 70 non-ChemAxon attendees this year) on the conference outing: a boat trip on the Danube. I have to admit that I am not the right person to do a critique of this type of music. Put it down to my age if you like, but I’ll stick to reviewing the technical programme.
That said, I’m also going to skip quickly over the first two presentations concerning tautomers because –

  • the first was my own and the work has been published (Warr, W. A. Tautomerism in chemical information management systems., J. Comput.-Aided Mol. Des. 2010, 24(6), 497-520)
  • ChemAxon’s approach is well established and well known. Szabolcs Csepregi described the ChemAxon approach in the talk that followed my own.
Let’s turn to other topics.

ELNs and ChemAxon · return to TOC
Ian Berry of Evotec said that his company chose the Contur ELN because it has a simple, intuitive, flexible user interface for chemists and biologists and it is integrated with ChemAxon software. Evotec has linked Contur ELN with Evotec in-house systems: four applications are linked together, three of them relying on ChemAxon technology. The Contur ELN is a rich client application using JChem Cartridge, Marvin Sketch, Reactor and JChem tools. EVOuser is a Java Web Application that Evotec chose to integrate with the ELN instead of using Contur’s admin console to set up projects, users and roles. The compound sourcing and ordering solution is a Java Web Application using JChem Cartridge, the JChem API and the Marvin Applet. It links directly to Evotec’s ERP system. Searches of EVOsource can be done directly from the ELN: results are obtained in less than five seconds; exact searching takes only one second. Integration of the Corporate Chemical Database registration system (CCD) with Contur ELN is a work in progress. Bob Marmon told us more about EVOsource.

Chemical catalogues · return to TOC
Catalogue data comes in a variety of types, usable (e.g., SDfiles) and unusable (e.g., PDF). Evotec combines the data in Instant JChem and loads them via an XML file, said Bob Marmon of Evotec. Structures are cleaned with Standardizer; salts are detected and stored so that both compound information and “de-salted” structures can be retrieved. Evotec currently produces IUPAC names for all structures using ACD/Labs software but soon ChemAxon Structure to Name will be used instead. Bob discussed some of the errors that occur and how they can be corrected. More challenging data such as PDF catalogues, polymers, mixtures, etc. cannot be loaded. Life would be so much easier if only suppliers would supply catalogues as SD files, keep to a consistent layout between different catalogues, keep the same layout between issues of catalogues, follow guidelines for structure depiction, put data in separate fields, consistently separate items in fields, and check the data for consistency.

Anna Rzepiela from Pyxis Discovery: Pyxis Discovery also has a chemical catalogue application. Chemonaut is an Oracle/JChem Cartridge database of screening compounds and building blocks from about 50 reliable suppliers. Pyxis Discovery has tackled some of the same problems that Evotec did in standardising structures and data but Anna Rzepiela also discussed leadlike filtering, using both ChemAxon tools and in-house developed algorithms. Pyxis has constructed more than 80 structural filters to remove the false positives, the covalent-acting electrophiles, the “suicide inhibitors”, the “magic bullets”, and the “warheads”, not to mention the dyes, fertilisers, photographic chemicals, and so on. Plans are in hand to enumerate tautomers, protomers, and conformers.

More from Evotec · return to TOC
In another talk Ian Berry of Evotec described the use of Instant JChem (IJC) to make a centralised database to replace multiple ISIS project databases. Oracle 10g and JChem Cartridge are mounted on a dedicated server. There is one Oracle schema per drug discovery project and a project IJC schema is connected to each Oracle schema. Built-in IJC security and roles are used for user management. The system is deployed via web server URLs. Integrating Oracle views has proved to be useful for calculations across multiple tables and for viewing dynamic URLs when the related table needs to be updated. Ian’s kludge to do IJC imports into table A and get the dynamically linked images of concentration response curves from table B will not be necessary once there are graphing tools in IJC.

Evotec is also constantly enhancing its screening library of over 250,000 small molecules with new chemotypes. Michael Mazanetz’s poster described how the company is integrating a KNIME workflow into the library enhancement programme, using the JChem Extensions set of KNIME nodes. The company also provides access to a library of 30,000 fragments for screening using its fragment-based drug discovery platform, EVOlution.

GSK and SOA · return to TOC
Shane Weaver from GSK: GSK has spent years establishing a Services Oriented Architecture (SOA) environment and SOA is now proving very useful as GSK goes through an upheaval reducing the number of (a) different technologies used in the environment, (b) software architecture components, and (c) vendor software solutions, all with the aim of simplifying systems and reducing costs. Currently Vendor 1 generates good molfile coordinates from SMILES, Vendor 2 supplies a commercial chemistry toolkit and Oracle cartridge integration, Vendor 3 generates the best quality molfile to SMARTS conversions, and GSK fills in the gaps with its own algorithms. (OK, Shane Weaver, we know who these guys are, but since you can’t name names we won’t point fingers at anyone either.) In future, away with all these vendors: ChemAxon will supply SMILES to molfile, molfile to SMARTS, SMILES canonicalization, SMARTS matching, calculation of “mainstream” properties, calculations from a toolkit, and Oracle chemistry cartridge solutions, and GSK will fill in fewer gaps with its own algorithms (for business rules and legacy information). The advantage of SOA is that the changes can be done with minimal impact on the business, and total cost of ownership is reduced. Shane briefly outlined one case study: structure rendering. Web applications require a method to display images of structures but a chemistry toolkit on every desktop is slow and difficult to support. The solution was to use SOAP and HTTP APIs.

SkunkWerks and Web Services · return to TOC
Julio Carneiro from SkunkWerks, which is a software house that writes Web applications. A client using ChemOffice found that ChemOffice did not integrate well with a high throughput screening system. ChemAxon software was chosen as an alternative because of its excellent technical support and open architecture. Julio Carneiro talked about the user interface that SkunkWerks added. An existing application, which already implemented Marvin, was based on Adobe Flex, a Rich Internet Application (RIA) framework that is easy to interface with Web Services. SkunkWerks used JChem’s Web Services to communicate with JChem Base to integrate substructure and similarity search into the application. Another use of JChem Web Services was a system for adding chemicals to the database, calculating SMARTS, molecular formulae and weights, and IUPAC names, and generating structure images.

SharePoint · return to TOC
Luke Bullard from Pfizer gave a remote presentation about OnePoint, Pfizer’s project team collaboration environment combining Microsoft’s desktop productivity application OneNote and Web based collaboration solution SharePoint. A Microsoft case study and video about OnePoint can be viewed on this webpage. OnePoint is specifically for project teams: a corporate wiki is used for sharing knowledge enterprise-wide, and a document management system for even wider dissemination.

Unfortunately there is no simple way to use a chemical structure data type in SharePoint, and chemists in a project team communicate in terms of structures. If structures could be incorporated they would need to be searchable. It would also be nice to have calculated properties available in lists plus intra-list structural filtering. Unfortunately SharePoint search does not provide for complex data type queries and although structures can be embedded within documents in document libraries, lists, blogs and wikis they cannot be surfaced. Luke has no desire to replicate Instant JChem but live structures are a necessity and search becomes much more important as project teams embrace using SharePoint. After Luke’s talk Richard Bolton of GSK led a lively discussion about integration of SharePoint with JChem. Indeed, the subject provoked sufficient interest for an impromptu discussion session to be set up early the following morning.

Partners and Posters · return to TOC
The second morning proper began with brief presentation from some ChemAxon partners: Agilent, BioChemFusion, DeltaSoft, InforSense (now part of IDBS), Infocom, KineMatik, KNIME, Linguamatics and The Edge Consultancy. BioChemFusion also presented a poster. István Bágyi and József Kovács had a poster about integration of ChemAxon software into the SZTAKI Desktop Grid based CancerGrid System. Ákos Tarcsay and György Balogh of Richter Gedeon have compared pKa predictors on in-house data set, and (not surprisingly) found that the ChemAxon algorithm worked best.

Markush · return to TOC
Tim Miller from Thomson Reuters: Thomson Reuters has been indexing chemistry in patents for Derwent World Patents Index since before the development of commercial structure search engines. Tim Miller took us from the early days, through the development of Markush search engines such MARPAT and Markush DARC and on (after a gap of nearly 20 years) to the possibilities of the new millennium. Text mining software has been developed by TEMIS, IBM, ReelTwo, and others; and name-to-structure algorithms by ACD/Labs, CambridgeSoft, and ChemAxon inter alia. Optical character recognition of chemical structures has been carried out with CLiDE, Kekulé, and chemoCR, and new products and services such as SureChem and Elsevier’s Reaxys have appeared. A new generation of Markush tools is being touted by ChemAxon, DecrIPt, Digital Chemistry, and Symyx.

Recent developments are re-invigorating the Markush space, opening up the potential of “end-user” Markush searching and the development of new tools for understanding and using the “patent space” especially in drug development. One can envisage medicinal chemists doing quick searches to establish where best to focus their efforts; and carrying out overlap and difference analysis to find holes in an intellectual property (IP) portfolio or IP screening of combinatorial libraries. It sounds exciting but there are challenges.

The first is the sheer volume of data: Thomson Reuters has to employ around 90 indexers to keep up with them and some patents are a nightmare to index. The second challenge is connecting up information that appears in different places in the patent (e.g., provisos (“we do not claim these things”), and biological information); and matching specifically disclosed compounds. The third is what Tim called “making sense of Markush” with facilities such as ranking hits based on “nearness to the core of the invention”. Here JChem’s selective enumeration is useful. Can we also visualise the patent landscape in some way?

A new generation of Markush services could be in the offing. Future versions of ChemAxon software could be used for refining and visualising patent search hits, for white space analysis, for patent busting, for Markush structure curation, and for in-house storage of small Markush databases. Szabolcs Csepregi gave a presentation on support for the Markush DARC format, the latest enhancement in ChemAxon’s Markush project. This gives compatibility with Thomson Reuters MMS patent Markush database; a test set is available. Other enhancements are handling of multiple attachment points, new homology definitions obtained from statistical analysis of a large chemical database, and an Instant JChem batch converter of .vmn to .mrv format.

Markush structures can be made by reagent clipping (with Reactor) or by R-group decomposition. György Pirok gave a separate presentation on Reactor, with interesting reaction examples where rules can be applied to allow for reactivity, selectivity, and exclusion (unexpected breaking of a 3-membered ring) and so on. Reactor is now integrated with Instant JChem and JChem for Excel. Future requirements for Markush structures were discussed in much more detail at a half-day satellite meeting, organised by both Thomson Reuters and ChemAxon, on the day after the user meeting.

MedChem and more · return to TOC
And now for something completely different. Steve Muskal of Eidogen-Sertanty presented a major enhancement to his company’s iKinase iPhone application: structure-based searching of Eidogen-Sertanty’s Kinase Knowledgebase (KKB) on iPhone/iPod/iPad devices using ChemAxon structure-search technology. iKinasePro is an iPad App in the cloud. Steve also mentioned Kinasedata.com, a Web portal encouraging kinase-related discussion and community building.

There were two talks that could be considered under the heading of computational chemistry. Both fragment counts and pharmacophore pattern fingerprints are classical, well-established molecular descriptors, thought to be complementary. Dragos Horváth of CNRS has been exploring hybrid intermediates of these two extreme classes of descriptors, such as pharmacophore fragments: substructures of the pharmacophore feature-coloured molecular graph, having atom symbols replaced by their types. He has used the ChemAxon microspecies calculator to devise a pH-dependent flagging scheme and the ChemAxon charge plugin to colour atoms by their partial charge. In his various experiments pharmacophore-coloured tree descriptors seemed to be the most versatile. Various symbol- and pharmacophore-coloured augmented atoms, sequences and pairs were also quite successful in QSAR and reasonably steady in neighbourhood behaviour (NB) tests. Electrostatic potential-coloured descriptors failed in QSARs, but some were useful NB monitors. This may be down to high dimensionality.

The use of the Tanimoto coefficient with 2D fingerprints is surprisingly effective as a simple robust technique for selecting focused libraries from multi-million compound libraries. György Dormán and his colleagues TargetEx defined their Reference Space by collecting known PDE inhibitors (“seeds”) from available literature and databases, and its property space was determined by applying the Lipinski and Veber parameters using Instant JChem. There are overrepresented seeds; virtual hits coming from those seeds can be reduced. György finally proposed an application of JKlustor/LibMCS for obtaining an optimal distribution of chemotypes. Incidentally, JKlustor now features sphere exclusion and k-means based clustering, plus clustering on frameworks such as those of Bemis and Murcko.

Enhancements to ChemAxon software · return to TOC
So much for the user presentations. When it comes to the ChemAxon presentations it is hard to know which aspects to single out: so much is going on and so many enhancements are constantly being released. In an overview Nóra Lapusnyik gave her selection of what’s hot: JChem for Share Point, .NET implementations, and JChem Web Services programming interfaces (Web services plus AJAX, a Web services API for relational tables, and pre-regeneration of calculated fields at upgrade). To that I would add Markush searching (see above) which now features homology groups (alkyl, heteroaryl, etc.), multiple attachment points, Instant JChem integration, and VMN file import.

Tamás Pelcz talked about the integration of JChem functionality in Share Point. The first product will be JChem for SharePoint and the second JChem Search for SharePoint. It will be possible to handle and filter structures in lists and do calculations on structures in lists, add and edit structures in SharePoint blogs, wikis, and discussion boards, and carry out hybrid structure and text search. Crawling and indexing structures in SharePoint sites, file systems, emails etc. will present interesting challenges for hit ranking and relevance. A demonstration Web site URL is available.

Tímea Polgár demonstrated Infocom’s JChem KNIME nodes and Szilárd Dóránt the latest Pipeline Pilot components. Jonathan Lee (Interfacing the JChem suite outside of Java) and Tim Dudgeon are doing all sorts of good geeky stuff, the latter on new languages and frameworks on the Java virtual machine. Alex Allardyce drew attention to the new chemicalize.org page. Tamás Pelcz gave a couple of presentations on JChem for Excel (Drug Discovery in Microsoft Excel; Integrating with and extending JChem for Excel). This product now supports some third party editors (ChemDraw, ISIS Draw, and Symyx Draw) and third party data cartridges from IDBS, Symyx and CambridgeSoft. Marvin OLE in Marvin .NET no longer requires Java. R-group decomposition can be run on the same sheet with structures, by adding new columns.

Ákos Papp described what’s new in Marvin and demonstrated some “quickies” for drawing Markush structures. Marvin 5.3 has a .NET version, new Template Library handling, structure recognition by OSRA, import of VMN files, import and export of CDX SKC files, and sprout drawing (attaching templates and abbreviated groups by automatically inserting a bond). The quality of Marvin objects embedded in Microsoft Office documents has been enhanced. Marvin 5.3 also has a Structure Checker add-on: a tool for detecting molecule features or drawing errors. This is a new tool under the Standardizer Pro licence, available from Chemical Terms. It has separate checkers and fixers for invalid structures. The interactive single molecule mode is free in Marvin; there will soon be a GUI and API in Standardizer Pro for checking structures in batch mode. György Pirok and Attila Szabó revealed the new Structure Checker framework. Szabolcs Csepregi covered new things in JChem Cartridge, Tim Dudgeon those in Instant JChem, including row level security, URL fields (link to images and other files), a binary field type for storing small images and binary data, and Markush support including VMN import.

Miklós Vargyas gave a talk on tools for similarity calculations but more exciting was his electronic poster board with lovely visualisations. ChemAxon may be thought of as a 2D company but there is actually a 3D world underneath: flexible 3D alignment, molecular mechanics, and even molecular dynamics. I have suggested to Miklós and his bosses that some of this stuff should be published.



Conclusion · return to TOC
The cognoscenti will have noticed that I did not start this article with Alex Drijver’s opening address. That’s because I thought it makes a nice conclusion. He said that ChemAxon is a dynamic, innovative, growing company. Even though the market is shrinking, ChemAxon saw 60% growth in revenues last year, after 30% the year before. The company had seven record months last year and revenues are likely to be 50% up this year. Seventy percent of revenues are ploughed back into R&D. ChemAxon is light on organisation and much of its development is done in partnership. All this means that total cost of ownership for customers is lower than for any other commercial provider. Open source solutions might seem cheaper but they are free only if you don’t value your time. You may have uncertainties about the competition – its name and its future platform – but you can always pronounce ChemAxon and ChemAxon will allow you to migrate.

That’s a positive ending in itself but some customer comments are probably even more significant. GSK is consolidating on ChemAxon as the key vendor not because ChemAxon is the best or fastest at everything but because ChemAxon software overall is the best. ChemAxon’s continuous development is also liked, and user perception of performance has been acceptable. The ChemAxon cartridge is a thin layer on the database, unlike the previous GSK cartridge that was replaced. A vote of confidence from a company as large as GSK means more than any words from ChemAxon itself. What more can I myself add?

Return to Table of Contents

Gallery

Event Hotel

Training Day

Evening of 18th of May: 1-2-1 Session & the Unforgettable Rainy Garden Party

Sessions of the User Group Meeting

Grand dinner and frenetic Double Bond concert on the Gróf Széchenyi Boat

Social program on the evening of 20th May in the Castro’s Bistro