ChemAxon 2007 User Group Meeting
June 13th and 14th, 2007 Budapest, Hungary
This was, in more than one respect, the hottest meeting I have attended in many a year. Unfortunately, the Hotel Gellért does not have air conditioning. When I heard that Patcore Inc., ChemAxon’s Japanese distributors, were to supply fans, I had blissful visions of great noisy things blowing oodles of cold air, but Patcore’s personal air conditioning system was much less sophisticated:
Mike Dippolito and Yvonne Shimshock of DeltaSoft (United States), pictured above, were just two of the truly international, hot audience: there were 57 attendees from Australia, Austria, France, Germany, Hungary, India, Japan, Latvia, Malaysia, Sweden, Switzerland, the United Kingdom, the United States, and Vietnam. ChemAxon personnel and associated staff added another 43 heads.
Partners presenting and exhibiting included Chemistry Logic, DeltaSoft (http://www.deltasoftinc.com), Genedata (http://www.genedata.com), Jubilant Biosys (http://www.jubilantbiosys.com/leadact.htm), KLEE Group (http://www.klee.fr/homeenglish.htm), Synaptic Science (http://www.synapticscience.com/products.html), The Edge Software Consultancy (http://www.edgesoftware consultancy.com; http://www.biorails.com), and Virtua Drug (http://www.virtuadrug.com).
Poster topics were clustering 5-HT4 ligands; 3D structure generation; evaluating compounds for the Biopharmaceutics Classification System (BCS) in ADME; collaborative drug discovery; compound database systems; pKa calculation (ChemAxon) and MCS-based hierarchical clustering (ChemAxon). In short, a great variety of applications is based on ChemAxon tools. One of my take-home messages from this meeting was “science”. Few academics attend MDL meetings, alas, but ChemAxon meetings attract them. At an MDL User meeting one will learn a lot about multi-tier architectures, chemical registration systems, logistics, and databases, but one does not hear much about force fields, solvation parameters and flexible docking. The two companies may compete in some respects but they have different core competencies. Nor is ChemAxon the new Daylight. ChemAxon may be a toolkit provider, as Daylight once was, but it has a different niche and its user meeting has a unique ambiance.
David Spender opened the meeting with a company and product overview. ChemAxon has more than 200 corporate clients, and almost 1000 academic users. (All ChemAxon products are free to academics.) It now has more than 35 staff, 31 of them with higher degrees. I won’t go into detail here about the product range; interested readers can find more detail on the Web:
MarvinSketch and MarvinView MarvinSketch_View.ppt
Calculator plugins Calculator_Plugins.ppt
Structural Search Structural_Search.ppt
JChem Base JChem_Base.ppt
Instant JChem Instant_JChem.ppt
JChem Cartridge JChem_Cartridge.ppt
ChemAxon also has a very active forum at http://www.chemaxon.com/forum. User meeting attendees can privately view some of Alex Allardyce’s excellent meeting photography there; the PowerPoint presentations are free for all to view at http://www.chemaxon.com/UGM/07/program.html.
Andrew Lemon has produced an interesting white paper for the Web site (http://www.chemaxon.com/conf/Migrating_chemical_information_to_new_architectures.pdf) entitled “Migrating chemical information to new architectures”. Like many people in IT, he seems to think that “migrate” is a transitive verb, but what can we expect of an industry that thinks it can “leverage” things. At the meeting, Andrew made some useful points about ChemAxon’s strengths: the company is very good on issue tracking and help (supplying patches and getting you up and running again); it is strong on focused chemistry components for use in modern architectures; and it is strong on adding chemistry to standard application architectures. “They don’t force you into proprietary routes.”
György Pirok of ChemAxon next described Chemical Terms, a simple but extensible language to combine chemical functions for various cheminformatics purposes. It is a way to add more chemical intelligence to software programs and a general interface for chemists to customise cheminformatics applications. A paper on the use of Chemical Terms in reaction rules has been published (Pirok, G.; Mate, N.; Varga, J.; Szegezdi, J.; Vargyas, M.; Dorant, S.; Csizmadia, F. Making “Real” Molecules in Virtual Space. J. Chem. Inf. Model. 2006, 46, 563-568). The syntax of the Chemical Terms language (http://www.chemaxon.com/jchem/doc/user/EvaluatorTables.html) is designed to be parsable by computer software programs as well as understandable by chemists. Its function list contains arithmetic and logic operators, substructure matching and similarity functions, and many property calculations (pKa, logP, logD, partial charge distribution, Hückel localisation energy, etc.). The available functions are integrated via an open plugin system. Apart from reaction rules, user defined chemical expressions can be used in other applications, such as pharmacophore screening, chemical searching, evolutionary drug design, or QSAR.
Szabolcs Csepregi’s talk on the latest in JChem Base and Cartridge was designed for the techies. Some query atom types that seemed obvious and basic to me are still in alpha test but interesting new query atom types (e.g., member of group n in the Periodic Table) were also mentioned. The session ended with an informal “hands-up” count of votes for future functionality. This sort of thing also happened in other sessions and we were actively encouraged to fill in a hard copy questionnaire about Marvin features. All this reminded me of the early MDL meetings where the user questionnaire was analysed each year and became more and more complicated. New(ish) small companies, and well established “plumbing” companies (such as MDL is now), have very different cultures. I will be returning to this underlying theme.
Teijin Pharma has changed from a client server ISIS/Base system to a JChem Web-based system for its main compound database and is now integrating other systems such as laboratory informatics and Spotfire, said Ryo Sogawa. It also uses Activity Base for HTS, Spotfire for data analysis; and Accord for Excel, Accelrys and academic software for design. The company seems to be dependent on a significant number of vendors but maybe that is a trend nowadays.
Tim Dudgeon opened a session on Instant JChem, an end-user desktop application for chemists and biologists that uses Marvin and JChem in an extensible architecture (though extensibility is at an early stage). The GUI is easy to use and set up. Most JChem functionality is available, although some tools are not. Version 1.0 was released in November 2006. The emphasis for the second version has been on relational data (the data hierarchy is defined using a data tree), a form builder, multi-user access, a schema editor (an advanced tool), improved query, and a more robust architecture. Petr Hamernik gave more details of the plans for extending Instant JChem (the architecture and API). Tim Dudgeon then led a discussion in which users could influence the development of the product. This is not an ISIS/Base replacement yet: it does some things better but some features are missing. The development session gathered the priorities for future development.
Pat Walters talked about Vertex’ integrated approach to library design. The company’s Reaction Planner links virtual combinatorial libraries with a well validated set of computational models which reduce the size of the library and focus on the most relevant compounds. Models are constructed by experts using proprietary software (previously) called NOMAD and are then published to Reaction Planner for medicinal chemists to use. At another point in the programme, Pat described the Vertex Research Database Interface (VERDI), an extensible cheminformatics system that provides an intuitive, user-friendly means of retrieving and analysing chemical and biological data. The software employs a multi-tier client-server architecture which simplifies the integration of multiple databases with in-house and third party analysis components. Built on ChemAxon software and Java, it offers different views on the data for different disciplines, and multiple results views. The Gene Family Central system handles targets, mutants, reagents, sequence similarity etc. ADME tools, four different ways of clustering, and all sorts of SAR and visualisation tools are available.
Reactor is a ChemAxon virtual synthesis tool, transforming molecules to products according to given reaction schemes. It is used for combinatorial library enumeration, reaction prediction and other transformation based applications. In his update on this tool, György Pirok gave an example of Baeyer-Villiger oxidation as a selective reaction. The oxidation is in the ChemAxon Reaction Library and the Reaction Editor and Chemical Terms are used to design the selectivity rules for it.
David Roush of FMC Corporation gave his presentation remotely by video link. He and his colleagues have evaluated Marvin and JChem and compared them with ISIS/Draw and ISIS/Host. The cynic might note that they were comparing an evolving system, JChem, with a ten-year old product that has already been replaced (by Isentris). FMC’s business evaluation concluded that the customer base is sound: ChemAxon is vibrant and will be around for a good while; customer support is better than MDL’s; and JChem is flexible, so that it was easier to put together what the chemist wanted. (I would note that MDL has been criticised in the past for being interested only in the life sciences, but smaller, newer companies are culturally more likely to respond fast to new customers.)
The scientific evaluation involved 1.8 million compounds, 51 simple substructure searches, 51 similarity searches, and 64 complex searches. FMC concluded that ChemAxon searches faster, and has the advantage that you can specify aromatic or aliphatic atoms and do searches for bidentate R groups. FMC does not like MDL’s definition of aromaticity, but likes the ChemAxon Standardizer. ISIS, they say, is a black box with an opaque data/table structure (and many will agree with them). MDL’s advantages are the MDDR, ACD and REACCS databases. FMC will get the MDDR data (more up to date) directly from Prous Science and will use an alternative commercial chemicals database. They will use SciFinder in place of REACCS (and good luck to them if they want to integrate it, say I).
ChemAxon is developing Metabolizer, a metabolic transformation prediction tool based on the Reactor engine and other technologies. The biotransformation library contains 183 generic phase I human xenobiotic CP450 biotransformations. Amongst other functions, Metabolizer will generate all possible metabolites, predict metabolic stability and predict dominating metabolites. It is being incorporated in a collaborative toxicity project in which Sanofi-Aventis, Aureus Pharma, ChemAxon, and the University of Budapest are building Knowtox, a knowledge base on drug-induced hepatotoxicity. Elodie Dubus of Aureus Pharma talked about this project. ChemAxon tools are integrated in AurQUEST, the query interface to Aureus’ AurSCOPE knowledge bases.
Christophe Cleva reported on Discrete Substructure Analysis which has been used routinely by Merck Serono since 2002 for virtual screening, focused set design, and selectivity and toxicity prediction. The frequency of occurrence of substructures in actives and inactives is calculated in order to identify structural patterns associated with activity. These patterns are recorded in a database. Christophe discussed substructure enumeration and scoring, the implementation of the system, and validation of virtual screening. Marvin was used because it was fast enough to allow (a) on-the-fly reconstruction (and “cleaning”) of the fragment structure from the fragment code; (b) dynamic SDfile reading (with file indexing and structure caching); and (c) on-the-fly aromaticity detection and valence checking. ChemAxon’s responsive support and openness to suggestions were also commended.
Paul Laffort, of the Centre Européen des Sciences de Goût said that the best method of determining solvation parameters is probably experimentally with GLC with five different stationery phases, but some solutes are hard to study experimentally so a simplified molecular topology algorithm has been developed based on various Java functionalities from ChemAxon (Laffort, P.; Héricourt, P. Solvation Parameters. 2. A Simplified Molecular Topology to Generate Easily Optimised Values. J. Chem. Inf. Model. 2006, 46, 1723-1734).
David Spender led a session on what’s new in Marvin and a discussion on future priorities. I was most surprised to find that OLE has not been implemented until very recently. I can’t imagine life without a smooth link between Word and ISIS/Draw. ChemAxon still have some way to go to catch up with MDL Draw functionality, and with ChemDraw’s presentation quality graphics, but future plans include making it easier for Marvin users to continue to use ChemDraw and ISIS/Draw as their preferred structure editors.
IUPAC naming has been available in Marvin since April 2007, with preferred IUPAC name or traditional name options. Daniel Bonniot claims that it is better than, or comparable to the ACD/Labs, Autonom, and ChemOffice products but only 193 molecules were used in the test. He succeeded in generating a name for 99.9% of 23,000 PubChem molecules. ChemAxon is also working on a name to structure algorithm.
Miklos Vargyas, ChemAxon’s JKlustor expert, described recent advances in maximum common substructure based, hierarchical clustering (LibraryMCS). This is reportedly more intuitive than similarity based clustering. The dendrogram view has a slider for zoom and move. Miklos showed the SAR-table view, with cluster statistics and structure filtering by properties, and the R-table view. LibraryMCS performance is not far off that of Jarvis-Patrick for up to 40,000 structures and it scales linearly. In the near future more “chemical sense” will be added, i.e., rings will not be broken and chirality will be considered. Additive clustering is another plan for the future: after you have clustered your corporate database, it will be possible to add updates to the clusters.
Szabolcs Csepregi led a Markush development discussion. The first results of the Markush project were shown at the 2006 user meeting. Recent improvements are special Markush tables in JChem Base to register “combinatorial Markush” structures, and speed improvements in Markush registration and searching. New features are planned with a view to handling the more complicated Markush structures of patents; ChemAxon plans to work closely with publishers in this venture. Searching Markush structures in Markush tables (i.e., Markush queries) is also planned, so that overlap between libraries can be studied.
Dragos Horvath and colleagues at the Centre National de la Recherche Scientifique (CNRS) are developing Docking@Grid (http://dockinggrid.gforge.inria.fr/) a Web portal for massively parallel flexible docking. The primary goal is developing efficient GRID-based conformational sampling and docking methods (http://paradiseo.gforge.inria.fr) but pre-docking treatment of the ligand is also very important. ChemAxon tools are ideal, claims Dragos, for standardisation, protonation, ligand charges, force field management, 3D structure building, storage and retrieval, and visualisation.
Ödön Farkas of Eötvös Loránd University (http://organ.chem.elte.hu/farkas/) has introduced the Parameter-Free Linear Relationship for 3D QSAR, by generalisation of the DIIS (Direct Inversion in the Iterative Subspace) interpolation scheme, used mostly in quantum chemistry. This allows parameter-free prediction of molecular properties if a linear relation holds with the descriptors or scores. I commend readers to the slides on the Web: this work is in a theoretical chemistry field of which I have no experience.
Collaborative Drug Discovery (CDD, http://collaborativedrug.com/) enables scientists (particularly academic ones, I imagine) to archive, mine, and collaborate to develop new drug candidates, says Barry Bunin. His company helps you to organise and upload your experimental data into an easy to search database; to use informatics tools to search through those data and suggest new drug candidates; and to keep your data private, and exchange them confidentially with collaborators, or share them openly within the CDD community. Community open access data are already available, mainly related to malaria. The venture uses Marvin, Calculator Plugins and JChem Cartridge, plus flexible toolkits for Web development such as Ruby on Rails, Java, MySQL and Oracle.
Last but not least, Akos Papp of ChemAxon described plans for a corporate registration system. This brought back memories of generations of “How we developed our new registration system” papers at MDL meetings; and of course, MDL itself now has the Isentris-based system, MDL Registration. Akos went through lots of familiar stuff about salts, batches, mixtures and database architectures. Is ChemAxon beginning to move away from toolkits and towards products? First Instant JChem and now a registration system, but I’m told that the company is not really abandoning its core competence.
The technical programme was not the only attraction of this meeting. Networking opportunities included taking the (hot) waters at the Gellért Spa, and a conference dinner at the Vasarely Museum (also hot), after an interesting guided tour. The meeting began with a very enjoyable garden party (cool, in more than one respect) at ChemAxon’s facility up in the hills. It ended with an informal get-together for those still remaining on the last night: a pleasant breeze wafted across Margaret Island, and ChemAxon paid for the all the drinks and the fast food. I will try to remember the hot software tools rather than the suffocatingly hot bedroom at the Gellért, and my best memory will be the cool air of the ChemAxon garden and the island in the Danube, where I had useful discussions in stimulating company.