Renaissance Meeting for ChemAxon
This year’s user meeting moved from its traditional Budapest location to Visegrád, in recognition of Hungary’s Year of Renaissance, commemorating the 550th anniversary of King Matthias Corvinus’s ascent to the throne. The conference banquet was held in his lower castle, after a visit to the summer palace. Visegrád is situated north of Budapest on the Danube Bend. The meeting proper took place in the Thermal Hotel, where I enjoyed a room with a superb view, and, this year, the all-important, promised air conditioning. The hotel’s spa pools, steam rooms, and multiple swimming pools with currents and waterfalls were an added bonus.
While Hungary celebrates a 550-year history, ChemAxon is enjoying an anniversary of its own: 10 years of successful business, and still growing. To mark this achievement, representatives of about 14 nations present at the meeting banquet sang Happy Birthday to You, or the equivalent, in their many languages, in a charming moment which brought home to me the very friendly nature of this user meeting, and the esteem in which ChemAxon’s user base holds the company. There were fewer attendees this year (about 45, excluding ChemAxon staff), partly because this is the first year that ChemAxon will be holding an additional user meeting in the United States, but we were still a jolly, international bunch, in our cheminformagicians’ T-shirts or other preferred attire.
The meeting began with a day-long workshop, which I did not attend, followed by the traditional garden party at ChemAxon’s facility in Budapest, a networking session that should definitely not be missed. The disadvantage was the long bus-ride to Visegrád for a late-night check-in.
In the two days that followed we had 25 presentations, about half of them from ChemAxon staff (most of the company’s 35-40 employees and adherents were present) and half from partners or users. There was also a small poster session. The meeting kicked off with a strategic overview from Alex Drijver, the newly appointed CEO of ChemAxon. Ferenc Csizmadia, founder and former CEO, is now chairman and head of research and development. The changes at the top are an indication of ChemAxon’s increased focus on business development and market position.
Alex said that ChemAxon is known for its quality, support and responsiveness. Support staff produce answers within 24 hours, or even within 24 minutes. ChemAxon has had an advantage in not being first-to-market: it can do better, having learned from the deficiencies of earlier offerings. Users who have done benchmark comparisons of the Daylight and ChemAxon cartridges have found that the ChemAxon cartridge is better. ChemAxon is also known for toolkits, flexibility and integration. It is not only “an alternative” (a different type of offering and service from that of the larger players): it is also known for “being alternative”, i.e., flexible, innovative and free thinking.
There is a downside to being alternative and ChemAxon wants to position itself as “The Alternative”. It wants to be the company that is looked at first, not the company of last resort. Is ChemAxon beginning to move away from toolkits and towards products? Yes, and no. At last year’s meeting Andrew Lemon said, “They don’t force you into proprietary routes” and that is still true; ChemAxon will be adding more components that can form part of an integrated solution. The company intends to be the market leader in niche new products and will do things better than anyone has done before. It will remain independent and will aim for user-oriented development and organic growth. More offices will be opened for support and development in major markets. ChemAxon is emerging as a more significant force, and a more dedicated, more rounded business partner for its clients. It will stand taller and prouder as a major cheminformatics supplier for the life sciences.
In the first of the user presentations, Peter Condron of the Experimental Therapeutics Centre (ETC) at the Singapore Biopolis gave a talk from afar by WebEx. ETC has implemented ChemAxon software as the cheminformatics platform in its “electronic research habitat”. Thus far, a “chemical library”, and an ELN have been built and the development of a results manager is ongoing. The chemical inventory is not yet built. Peter showed some chemical registration and spreadsheet screens. It is impossible to register a compound without also entering an ELN reference for the synthesis. Everything Peter showed had been designed, developed, tested and employed in 1.5 months by 1.5 FTE developers.
JChem Cartridge was chosen rather than JChem Base because search and registration could be implemented easily and rapidly, and integration with other systems and customisation were also easy. ETC also had existing Oracle skills. The software development approach involved an iterative methodology (delivery in small pieces), with time-box development (develop what you can within six weeks) and a Phase-Gate process control mechanism.
Peter reported that the cartridge worked out of the box and the core functionality was really well thought out and efficient. The software was simple to integrate with .NET and rapidly deployable. On the downside, valence checking was not possible [it will be in a later version], JChem Manager was a bit “clunky”, the .NET documentation and examples were lacking in some respects, and the licence files for calculators were a bit of a problem [but version 5.0 has a new system for setting up the licences, say ChemAxon]. Peter praised ChemAxon’s knowledgeable and responsive support.
Evotec has developed a registration system based on Marvin and JChem Cartridge technology, and also an ELN, a supplier database and screening database, a structure activity database, and a searching and modelling application. Java is the company’s preferred development platform and a single vendor toolkit had appeal. Evotec has changed a lot over the last few years and is still changing fast; writing a registration system in-house gave the company much needed flexibility. Catherine Reisser discussed structure standardisation, and showed screens from the reaction maintenance, and combinatorial library enumeration applications. She, too, praised ChemAxon for great support, saying that such support, within the same zone, was a contributing factor in Evotec’s choice of the ChemAxon platform.
Later in the meeting Alistair Sedwell described Evotec’s Library Profiler tool based on ChemAxon toolkits. Evotec’s previous system worked reasonably well for small numbers of compounds but it involved extra file export/import stages and the use of Spotfire, and ISIS for Excel, for which Evotec had only a limited number of licences. The new system uses MarvinView and MarvinSketch, JChem Cartridge for filtering, and JChem tools for many of the property calculations. The Library Profiler spreadsheet is scalable; its data cells are coloured-coded for desirable property ranges. Alistair showed screen shots of scatter plot and histogram charting options. In chart-based filtering the before and after effects of molecular weight filtering, for example, can be viewed. Removing reagents to get a combinatorial subset of a library can be done after viewing a structure matrix of reagents and products created with Marvin View and JTable. A pivot view of properties with coloured cells suggests which reagents might be removed (i.e., any with a row of many red cells). A spider plot of the library properties can then be viewed. Alistair asserted that ChemAxon’s tools are flexible and support from ChemAxon has always been top notch.
Lutz Weber of OntoChem explained how his company is creating intellectual property for clients by finding new uses for known compounds and new molecules for old applications. His team previously used the Daylight reaction toolkit but now uses ChemAxon’s Reactor for intelligent product generation. The resultant databases are non-combinatorial (because non-druglike compounds are eliminated) and “non-Markush”. They contain more than 100 million synthesisable compounds.
Lutz aimed to discover if fingerprints designed for substructure search will work in the case of large chemical databases with many similar compounds. Is it true that we need an index in memory for fast searching and can the index be made smaller? Disks (NAS and solid state drives) are becoming competitive, so Lutz compared three pieces of hardware: a PC with two cores, an SGI Altix 300 with four cores, and a Sun 4600 server with 16. For an 11-million compound database, ChemAxon software is very efficient and very selective on the PC. Access to memory is faster with the AMD chip on the PC than with the Itanium chip on the SGI. For 40 million compounds, the SGI machine is faster, but not 25 times faster, yet it costs 25 times as much as the PC. For a 200-million compound database, using PostgreSQL and JChem 5.0.2, the Sun machine was 10 times faster than the SGI, after the software had been appropriately tuned.
Lutz went on to illustrate the advantages of OntoChem’s topological torsions (“ToTo’s”) in similarity search, when compared with JChem Tanimoto and Daylight similarity searching methods, using a 20-million compound database on a Sun 16-core machine. Application in an MDM2-P53 inhibitors project was published very recently (Rothweiler, U. et al. Isoquinolin-1-one Inhibitors of the MDM2-p53 Interaction. ChemMedChem, published online 21st April 2008) but details of the ToTo fingerprint have not been published.
The VSEngine virtual laboratory has been developed by a team at Université Louis Pasteur Strasbourg to open up access to Alexandre Varnek’s atom-bond sequence and augmented tunable “ISIDA” fragments, the fuzzy pharmacophore triplets and similarity metrics developed by Dragos Horvath while he was in Lille, and the QSAR-building experience of both researchers. Dragos says that the ChemAxon API is a convenient way of getting a heterogeneous set of tools to talk to a MySQL chemical database, albeit by means of temporary files. He is not necessarily looking for high speed, since he is looking at more costly virtual screening options, but the approach has the advantage of being easily deployable on grids.
AstraZeneca has built a search application called IBEX for the reference-centric parts of the GVKBIO database. The database includes over three million records corresponding to two million unique structures, linked to 3200 sequences extracted from patents and journal articles. IBEX has links to other AstraZeneca systems and includes additional in-house derived data. The search application uses an Oracle database and the JChem chemistry engine, with a web-based interface. Péter Várkonyi showed numerous screen shots from the system which also links to the external applications MicroPatent, PubChem, GeneNames and Entrez. Peter, too, reported a high level of satisfaction with ChemAxon support.
Eszter Hazai of Virtua Drug discussed DockingServer, a molecular docking application on the web developed by Virtua Drug and Delta Informatika. A complete docking solution needs a number of software packages plus very expensive integration packages or expertise in computational chemistry script writing. The goal of Eszter and her colleagues was to produce user-friendly integrated software that allows reasonable control of the whole docking procedure but can be highly automated as well. DockingServer integrates, in a php framework, Autodock for docking calculations, MOPAC for ligand geometry, VMD for visualisation of the ligand-protein complex, and ChemAxon tools for ligand preparation and visualisation. In version II, ChemAxon software is used for file conversion, and generation of SMILES, molfiles, and PDB files. In version III, ChemAxon software is used for input and visualisation of the ligands in 2D and 3D, and for visualisation of the binding site-ligand interaction. Yet more use will be made of ChemAxon tools in future.
Heat Shock Protein 90 (HSP90) is a promising target for cancer therapy. Davide Audisio and colleagues from Université Paris-Sud and Aureus Pharma have organised SAR data on HSP90 inhibitors into the Aureus Pharma knowledge base format and analysed the data using descriptors calculated by ChemAxon software, with a radar-type visualisation, or by clustering with JKlustor. The ChemAxon Fragmenter was also used to understand the target space. Active molecules were extracted and diverse representative derivatives were selected as query molecules to screen the ZINC database. Virtual screening was performed using ChemAxon pharmacophoric fingerprints. Biological testing of selected hits is ongoing.
Representatives of three ChemAxon partners, DeltaSoft, Aureus Pharma and Chemistry Logic, made short presentations and demonstrated their offerings in a small exhibition. DeltaSoft markets ChemCart, a dynamic, web-based, forms interface to research information, including structures and reactions (in a chemistry cartridge), data, images, documents and files. It has also built a number of applications based on ChemCart. Aureus produces eight target space and three ADME/Tox knowledge databases, plus decision making applications integrating ChemAxon technology, improving access to data, profiling and risk. Chemistry Logic has developed an accelerator, ChemInfoLogic, which makes use of the Reconfigurable Application Specific Computing (RASC) blade from SGI (a field-programmable gate array or FPGA) to speed up cheminformatics calculations. In similarity search, a query set of 128 compounds can be searched in the 24-million compound ChemNavigator database in half a minute. ChemInfoLogic can also be used in library-to-library comparison and in pharmacophore search.
Infocom Corporation has developed a node (component) set, JChem Extensions, on the KNIME workbench, with ChemAxon’s toolkit. Shunichi Ozawa demonstrated a number of nodes and showed how easy it is to make effective workflows using ChemAxon functionality and other third party nodes (e.g., from Tripos and Schrödinger). His preliminary, but seemingly excellent benchmark results are unfortunately not included in the electronic proceedings supplied to attendees at the end of the meeting.
ChemAxon has taken over responsibility for supporting the ChemAxon Pipeline Pilot Component Collection (with help from Accelrys). Significant improvements have been made to existing nodes and there is a brand new Reactor node. ChemAxon plans to offer the following components: JChem Cartridge, name to structure and vice versa, tautomer and conformer generation, file format conversions, and MCS clustering. There will also be an Instant JChem end user cheminformatics solution. The release cycle, however, is fast and flexible.
A number of ChemAxon speakers outlined the new features of some of the products and spoke of things to come; the following is just a selection. MarvinSketch has a new GUI with a customisable menu and toolbar. It can be configured to look like ISIS/Draw or ChemDraw. OLE has been implemented for Microsoft Office documents. A new query tab allows generic query features and periodic table groups. MarvinView has a spreadsheet-like view for SDfiles and RDfiles and the GUI is being redesigned. Multistep reaction support, name-to-structure functionality, and further Markush related features are promised. Markush enumeration is a new feature of the Calculator Plugins. In future the plugins will have trainable pKa, improved logP calculation, faster conformer generation, shape descriptors in topology analysis and an extensive traditional name dictionary for IUPAC naming.
A recent test of ChemAxon’s IUPAC name generation used 25,000 compounds from PubChem with an average of 56 heavy atoms per compound. Version 4.1.8 took eight minutes and produced 107 failures and 15 timeouts. Version 5.1.0 took only 4.5 minutes and had only 2 failures and 5 timeouts. In addition, 3629 names were improved using version 5.1.0. In future ChemAxon will work on complex fused rings, and on specialised nomenclatures such as for sugars, on demand. Name import is scheduled for release in Marvin and JChem 5.1. Hantzsch-Widman, von Baeyer, spiral and functional names (in both prefix and suffix mode), and traditional names from a database will be recognised. Fused nomenclature and multiplied parents will not be supported in the first version. ChemAxon can currently convert structures to names and back to structures in more than 95% of cases, and with more than 95% of the structures strictly identical to the original.
A registration system, JChem for Excel, and handling of Markush structures in patents are in the pipeline. I made a mental note about the registration system and JChem for Excel, because they fit in with my speculation last year about ChemAxon beginning to move away from toolkits and towards products. JChem for Excel will be released this summer. It is implemented in C#, with no VBA code. There are plans to use Visual Studio Tools for Office the future. In JChem for Excel, ChemAxon has proved that its APIs can be used in a purely .NET environment.
Standardizer has a number of “actions” including aromatisation, addition and subtraction of implicit hydrogens, various ‘clean’ and stereo actions, salt handling, and reaction mapping. In future expect multiprocessor support, a structure checker function and graphical design improvements. There will also be some new actions and certain complex actions will be converted to smaller ones.
Reactor is an engine for conversion of starting materials into products according to a given reaction scheme. Future plans for Reactor include multiprocessor support and manual reaction site assignment, and handling of multistep reactions, reactant ratio, and reactant statistics (i.e., the success rate for combinatorial chemistry). The reaction library will be improved. ChemAxon is also developing a tool for metabolite prediction that will be released later in this year.
ChemAxon already offers tools for drawing, visualisation and generation of Markush structures, plus Markush enumeration techniques, and searching. Libraries of more than 1030 structures are handled, with the generic description including R-groups, atom and bond lists, link nodes, and position variation. Markush database tables are now available in Instant JChem as well as in JChem Base. In version 5.1, sketching of position variation will be easier, and scaffold alignment and R-group colouring will be available in enumeration. In Markush search, it will be possible to include abbreviations (superatom s-groups) in Markush structures, and position variation will be allowed in both the query and the database. In the longer term, homology variation and properties (e.g., number of atoms), multiple attachment points for R-groups, larger repeating groups, and bridging of multiple R-groups (e.g., R1 and R2 form a ring) will be allowed. In future it will be possible to check the overlap of two patents or libraries.
Since last year, Instant JChem, the desktop application for chemists, uses Marvin and JChem version 5.0 and has a number of new features. Query-by-form, an improved query builder, query-by-example, federated search, and list and query management functions have been implemented. Deployment and collaboration have been enhanced and a new management system for time-based licensing has been introduced. Miscellaneous improvements include export to Excel and import and export of InChI format (“and more”, as the marketing communications people always say). Features under development are calculated fields, improved printing, additional form widgets, cartridge support, Instant JChem server and relational data export. ChemAxon is interested to know whether users would like to see content (e.g., drug or supplier databases) provided for Instant JChem.
Apart from Alex Drijver, I have not named any of the ChemAxon speakers. I am making an exception for Miklós Vargyas who gave the very last talk, about maximum common structure (MCS) based hierarchical library clustering. I give this talk my prize for the best presentation, because Miklós has a real talent for engaging an audience, even in the dying moments of a meeting. It will be a long time before I forget his comparison of clustering molecules with the clustering of toy cars and with star clusters. LibraryMCS is not a new product so I will not go into detail here but I would recommend that ChemAxon consider doing a podcast of this talk. The slides are not available, but even if they were, on their own they would not do justice to this presentation.
On the final evening a residue of delegates and staff had an informal gathering in the swimming pools, spa and bar, ending the meeting on an even more informal note than that of the garden party:
As ChemAxon seeks to become a more “serious” contender in the cheminformatics marketplace I do hope that it will not lose too much of its informality and friendliness. The users will not want it to lose its justifiably high reputation for support. ChemAxon aims to retain its existing virtues with a more confident and serious business face. Alex Drijver talked of the company entering puberty after its 10 years in business. Puberty can be a trying time. Small, and newer companies have an advantage when it comes to responsiveness. Smaller companies are more agile than lumbering great corporations. As companies grow in size and age they have to face all sorts of new challenges. Fans love to stick up for the little guy; Microsoft and MDL never scored highly in the popularity stakes. I do have high hopes for ChemAxon and I wish the company well. It will be interesting to watch its progress over the next 10 years.