“In Silico Veritas” ChemAxon European User Group Meeting (UGM), Budapest, May 21-24, 2012

July 2012 Author:
Some all important social aspects Markush Technology
Introduction to user meeting JChem user experience
JChem: ChemAxon presentations Registration as a service
JChem for SharePoint Instant JChem
The future of Marvin Naming technologies and text mining
ChemAxon professional services Reactions
Some science Calculators
Partner presentations Final comments
Conclusions Archive of 2012 Budapest UGM

There is little I can say about ChemAxon that I have not already said before: it is financially stable, it has still not reached a plateau, it continues to develop software at a remarkable rate of knots, it is renowned for its customer service, and so on. There are minor changes: it produces more “products” (as opposed to toolkits) than when I first started attending user meetings, it is more “enterprise-oriented” and it has attracted business from really big companies such as major pharmas and Thomson Reuters. It has had to move to bigger offices. The trends are obvious but I see no warning signs yet about an exit strategy: no open source outfit can overtake ChemAxon, and being gobbled up by You Know Who is not on the cards. So the star is still on the ascent. As to the user meeting, it is beginning to be a highlight of my annual calendar, as is evidenced by the increasing length of my reports. It is hard to find anything negative to say about ChemAxon; even in the networking sessions (i.e., at the social events and in the bar) you do not detect grumbles about bugs, or missed deadlines, or licensing issues. The ChemAxon family seems to be pretty happy and I look forward to joining it again next year.

Some all important social aspects

· return to TOC
I take the title of my report from this year’s T-shirt, which had a vinous theme as a memento of the wine tasting that we enjoyed in the crypt of the Kiscelli Museum on the Tuesday night, before a museum tour and the conference dinner in the ghostly “church”.

ChemAxon staff are a sociable, friendly bunch and attendees are treated to entertainment of one sort or another every night, should they so wish. On the Sunday night a group of us enjoyed an informal trip on the metro, a stroll around Pest, and dinner outdoors, by St. Stephen’s Basilica, on a superb summer night. On the Monday night Alex Drijver, ChemAxon’s CEO entertained a few of us with bubbly on a water taxi trip down the Danube, ending up at the traditional ChemAxon garden party, this year held on the riverside science park where the company’s offices are now located. The final night’s outing is always more informal: this year a walk along Pest’s famous pedestrian shopping thoroughfare, Vaci Street, followed by dinner at a pub, I believe. Unfortunately, this year my flight time was such that I had to miss the final outing.

I make no apology for starting a technical report with a description of all this revelry. The nature of the hospitality is in fact a reflection on the spirit of ChemAxon. This is not a “stuffy” sort of company. I am told repeatedly that as a software company they have a reputation for high quality support and this is reflected in the way they entertain us at user meetings. After you have been to one or two meetings you feel as if you are being drawn into the family, but let us now turn to the more serious aspects of the UGM.

Markush technology

· return to TOC

Markush Forum · return to TOC
Another half-day “Markush forum” was held on the day before the user meeting proper. This provided an opportunity for forum members to meet and discuss developments and future directions. Steve Hajkowski of Thomson Reuters presented an update on the data: 580,000 patent families, plus 1.8 million related specific compounds, from 28 issuing authorities are now included. A sample dataset of 300 patents in pharmaceuticals, agrochemicals and general chemistry is available for evaluation on the desktop in-house. The full dataset is available for use with ChemAxon software in the cloud. In the future, Thomson Reuters will be adding the 1978-1987 backfile, auditing and correcting the current database (1987 to the present date) and adding patents which contain only specific structures and no Markush structures.

David Deng of ChemAxon presented an overview of the current functionality of the company’s Markush software. The technology is independent of the Thomson Reuters data: it can be used for drawing Markush structures, for Markush enumeration and for Markush search on the Amazon cloud. Markush search currently can handle the following generic features: R-groups (nested to any depth), atom and bond lists, link nodes and repeating units, position variation bonds, and homology variation (predefined generic atoms, such as alkyl and aryl, and user-defined ones, such as protecting group). Markush structures can be sketched or imported from Thomson Reuters files.

The Markush enumeration interface has been redesigned. Full, random or partial enumeration can be carried out; the software is also capable of calculating the total number of specific structures present in a Markush library. Homology groups can be enumerated. Each homology group comes with a certain number of representative examples. The enumeration algorithm uses these examples to enumerate a homology group. You may also give specific properties to a homology group (e.g., C3-7 for cycloalkyl), and then the enumeration algorithm will give you only examples that match that property. A customisable physicochemical property filter can be applied after enumeration.

Libraries can be searched in Instant JChem (IJC). David described atom and bond query features (e.g., chain/ring), equal and broad translation, and simple R-group queries. Homology translation can be switched on and off. During search against a Markush library, you can choose to do equal translation, or broad translation to match partial structures to homology groups. For searching a non-Markush library, you may use narrow translation to match homology groups to actual groups in structures. Full hydrogen matching is now supported in search: previously only the explicitly drawn hydrogens were considered in the Markush, but now implicit hydrogens are also taken into account.

The Markush viewer has been released since the last user meeting in Budapest. It can display very complex structures, with over 20 R-groups, and nested R-groups, within IJC. Hit alignment, hit colouring and structure cleaning are used. The display of a query within an original Markush structure is quite complex. There are two additional ways to visualise substructure hits better: firstly, relevant (matching) R-group definitions are shown only and secondly Markush structure reductions are used. Structure reduction picks out only the most relevant R-groups and uses an expanded core to show how a query and a hit structure match. Previously, Markush reduction was available only in the enumeration window; now it can also be used in search result visualisation.

Overlap analysis (i.e., the use of multiple queries) is integrated in IJC. The new Markush technology interface brings all the information together in one place. New buttons allow easier export of information, “fetching” a patent (to get a full patent document as a PDF file), adding notes to an invention, and batch search for multiple queries. R-group decomposition can be carried out in molecule tables and a Markush structure can be created from selected rows in a table of structures.

In future, ChemAxon will be further improving search speed and accuracy, supplying additional query variations and even better visualisation, and integrating Markush search with Document to Structure, to extract chemical information from patent documents. ChemAxon also continues to collaborate with the text mining company Linguamatics.

After David’s talk, Krisztián Niesz of ChemAxon gave a demonstration of IJC and Markush enumeration and then Szabolcs Csepregi of ChemAxon introduced a 47-question survey that attendees were asked to discuss and complete in writing. Brian Larner of Thomson Reuters provided examples to show how Markush structures in patents have been getting more complex since 2010. A complex patent takes more than 1.5 hours to index. The trend towards more complex Markush structures means that ChemAxon tools are even more useful.

Szabolcs described what is new in the Markush project, including the search and viewer functionality already covered by David Deng. ChemAxon has been very successful in speeding up online Markush searching and speed-up is even greater with complex structures. Substructure search is now about 24 times faster, full structure search is 72 times faster and enumeration is more than 100 times faster. Robustness has also been improved. The search service hosted on Amazon EC2 is currently for evaluation only. The IJC thick client can be run in-house. There have been many improvements to core IJC, including improved scripts for loading VMN files, reading in of new fields, handling of updates, more automation, and further customisation of imported fields and forms.

Five Markush fora have taken place over two years, with 178 non-ChemAxon attendees in all. Thomson Reuters data, use cases, ChemAxon technologies, and developments were discussed. Szabolcs presented a summary of user feedback (from questionnaires) and ChemAxon’s responses to the many suggestions. Half of the top 13 requests have been fully or partially addressed.

User meeting Markush talks · return to TOC
In the user meeting proper Brian Larner of Thomson Reuters told us how chemical structures in patents are disclosed, indexed and retrieved, and what they mean. The easy to use Markush search and analysis tool can be used by any chemist in an improved workflow; the results are displayed visually and can be exported for further analysis. The structure data are displayed with high quality Derwent World Patents Index abstracts. Visualisation and enumeration capabilities save time for busy chemists and enumerated structures are in a standard format for easy integration with other chemistry datasets and software.

Consider the patent WO 2006/044176 A1. Here, just 11 specific compounds are exemplified but over 107 compounds are covered by a relatively simple Markush structure. How would you find this patent if you needed to find disclosures of one of the structures, or a query Markush which overlaps this patented Markush? Can you determine which of the 107 structures may have properties unexpectedly superior to the 11 exemplified structures? Thomson Reuters’ structure indexing is designed to enable retrieval and, importantly, it tells the role of the compound in the patent. The abstracts are comprehensive, organised and searchable. Besides the structures, there is much valuable text information.

The system can be used for patentability, freedom to operate, validity and other technical searching by Intellectual Property (IP) departments, to assess the structural proximity of a query structure to a hit Markush structure; and for early stage screening of new structures against current IP. Other potential uses are in white-space identification and “patent busting”, and in creation of libraries of specific compounds for use in modelling software or other systems.

David Walsh of Grail Entropix discussed the integration of chemical space. He made reference to Pfizer’s use cases, which were derived up to 3 years ago, and may have been since modified. The integration of internal and external chemical information is a vital and complex activity for the pharmaceutical industry. Generating quality, comprehensive external data (plus metadata) is expensive but it is necessary because deriving knowledge on a competitor company’s molecular approaches is valuable competitor intelligence and determination of the creative processes and medicinal chemistry rules that drive lead optimisation is important to assist internal drug discovery processes. This external information describes prior art and areas of chemical space that are congested, and allows decisions to be made on freedom to operate. It allows determination of comparable physicochemical properties between internal and external molecules and modelling of those properties. External molecules must be in formats which can be integrated with internally derived molecules.

Pfizer required a large collection of exemplified molecules from patents, supplemented by data on pharmaceutically relevant exemplified molecules, and approaches to analyse Markush space were also investigated. The IBM database of exemplified structures has image to structure conversions for all patents back to 2000. Belonging to a consortium of nine companies reduces the costs for Pfizer and drives technical improvements. Another advantage is that Pfizer owns the data (8 million unique structures) and they can be easily integrated with internal and external datasets. On the other hand, the automated process used to generate the database results in errors from poor name to structure conversions, and only structures that are named in the text are indexed for patents prior to 2000. Pfizer also licenses the GVK Biosciences GoStar database, but does not own the contents. It is integrated into protocols for patent analysis. It has high quality structures and bioassay data, with little overlap with the IBM database, and the data can be easily integrated with internal datasets and tools, but its patent coverage is only about one tenth of that of the IBM database.

Integration of data is not in itself enough: it is important to integrate expertise, that is, to develop cross-disciplinary teams to develop use cases across a range of drug discovery disciplines, since individuals have differing requirements and skills. The teams also allow ownership to be distributed amongst silos in the organisation.

The objectives were to integrate third-party and in-house sources of data from chemical patents, to incorporate new sources and delete existing ones, and to develop a concordance between Pfizer patents and bioassays on the compounds. Any system had to be easily updatable and adaptable (anticipating change) and operate within third-party guidelines for use of data.

Use cases included a patent chemistry landscape, a target landscape and patent alerts. For the patent chemistry landscape it was necessary to identify relevant exemplified and Markush structures in the patent literature (and the Pfizer data warehouse) together with any biological data; to identify potential IP overlap; to visualise and analyse top-ranked patents; and to identify patents which claim or disclose selected Pfizer compounds. The target chemistry landscape should identify biologically relevant exemplified and Markush structures that have activity against a target or family of targets; identify key compounds for follow-up and the most similar matches within the file; and provide visualisation of property and chemical space landscapes to facilitate analysis of a competitor’s activity in the target space.

Since more than 70,000 chemical patents are filed every year there is a problem of information overload. An easier way of identifying relevant patents was needed. Search for Pfizer compounds within a database of curated patent structures is an exact structure match which is not difficult, but Pfizer is currently investigating alternatives to substructure and similarity searching (Bayesian models) to improve identification of patents of interest. Substructure and similarity are well known concepts but there may be a need for other methods of “relatedness” based on connectivity, such as feature analysis. David also presented an example of property landscaping where plots of logP versus PSA showed that Company X patents had very few compounds in the desired property space, while Pfizer needed to reduce compound lipophilicity and TPSA.

Introduction to user meeting

· return to TOC

Instead of starting with a “state-of-the-union” address, ChemAxon’s CEO Alex Drijver conducted an interview with Kent Stewart of Abbott (soon to be Abbvie) on informatics driving science. Kent said that for pharma to remain competitive there must be changes and computer-aided drug design will play a role in “doing more with less”. He foresaw two growth areas. One is GPCRs: 30% of all drugs target GPCRs and 3D crystal structures are now becoming available. The second is visualisation. Medicinal chemists are bombarded with data nowadays, from 10 or more different sources (including “patinformatics” sources), and they need ways to interpret them and take action.

Kent is the guru behind DrugGuru (Bioorg. Med. Chem. 2006, 20(14), 7011–7022) a program for drug generation using rules. Over and over again he had seen chemists changing carboxylic acid for tetrazole, for example, so he decided to encode, as SMIRKS, transformations corresponding to medicinal chemistry design rules-of-thumb. Drug Guru could be applicable to the field of patinformatics: in future it could be used to explore chemically feasible space, for example, the white space of kinase patents. ChemAxon software could meet the formerly unmet need to compare large collections of molecules.

The ChemAxon Markush software was rolled out to 10 users in Abbott in the last quarter of 2011. They liked it a lot although they discovered some issues with the data and data mining process. This was a learning exercise because there are differences in the ways humans and machines interpret the data. Nine months later, the newest version of the software is very good and the 10 Abbott users are re-evaluating it and learning more. “ChemAxon Markush” is still a project not a product but in 1-2 years the technology could be ready for embedding in a medicinal chemistry project team and getting measurable impact.

JChem user experience

· return to TOC

Structure normalisation in chemical extraction from patents · (View the Presentation) · return to TOC
Andrew Hinton spoke about use of ChemAxon technology in Digital Science’s SureChem products which allow users to search 20 million full text patents, 19 million MEDLINE abstracts and 12 million unique chemical structures hosted on the Amazon cloud. Digital Science’s text miners use proprietary software to extract chemical entities from patent text, followed by multiple name to structure tools, and they use CLiDE image-to-structure software, to obtain structures for registration and search in JChem.

Unfortunately a lot of “noise” is generated in patent mining so a robust structure “normaliser” is required. Digital Science used ChemAxon’s Standardizer which transforms structures using SMIRKS, and normalises aromaticity, charge and hydrogens, and Structure Checker which detects empty or erroneous atom types. The software is used in image-to-structure and name-to-structure processes, and in storage and searching, to filter and transform. Most “noise” from image-to-structure is removed and fewer duplicate structures are stored once all structures are changed into canonical form. Andrew showed structures mined from a patent where artefacts were removed, wedge bonds were cleaned, structures with irresolvable issues were removed and protecting groups were expanded.

Digital Science found that Standardizer and Structure Checker were straightforward to implement in a Java API and the GUI was easy to use. The products offered most of the functionalities needed and examples of best practice were available (although they were limited). On the other hand, error handling was not ideal, tautomerisation was not suitable for unguided automated use, and documentation and features overlapped between products.

Moving away from ISIS software with ChemAxon’s help · (View the Presentation) · return to TOC
This was the topic of Anna Pelliccioli’s talk. In 2010 the Novartis Institutes for Biomedical Research (NIBR) had ISIS/Draw, Base and Direct and ISIS for Excel. Novartis’ CERES, in ISIS/Base and Host held Novartis’ historical collection of reactions, the Available Chemicals Database (ACD), a screening sample database, and some commercial databases, and there were some personal databases in ISIS/Base. CambridgeSoft’s ChemOffice and ELN were also used. ISIS has been around at Novartis for a long time, and everyone (1600 users worldwide) knows how to use it, but something more up to date was needed. There was duplication of functionality, ISIS software is no longer supported and the ELN was slow to search. At the end of 2011 NIBR chose the JChem cartridge, hoping eventually to replace the mixture of legacy software. ISIS/Draw was replaced with ChemDraw and ISIS for Excel was replaced with ChemDraw/Excel.

In evaluating IJC as a global replacement for ISIS, NIBR gave IJC to users at three sites and compiled a list of use cases. Local structure and reaction databases were set up and IJC was used as an interface to a large Oracle database (10 million structures) running JChem Cartridge. In October 2011 a decision was made. The chemists liked IJC and found it did fast data import and structure search. Form creation was easier and the connection to Oracle worked well, but NIBR requested some new features from ChemAxon: chemists wanted to export a subset of data from the central database into a local database, they wanted better support to move projects from development to test to production environments, and they wanted greater flexibility when adding data to the existing database. Anna said that the overall experience with JChem was excellent and she praised the great support and responsiveness from ChemAxon. In February 2012 IJC was rolled out, using Java WebStart with UrlWebApp, plus JChem Cartridge, and a customised security module for IJC to integrate with Novartis’ authentication and authorisation infrastructure.

The ISIS to IJC migration is ongoing. The screening sample database, ACD and ChirBase will be moved to IJC. Other databases have been cancelled or await a decision. NIBR is talking to ChemAxon about the future of the historical reaction collection: a later talk in the user meeting described a web application under development. Novartis still has ISIS licences and will continue to use ISIS until it breaks: the problem is not one of technology but persuading people to stop using ISIS. Once ISIS replacement is accomplished, NIBR hopes to integrate its own in-house chemical properties calculators, and to do some custom development around the Data Analysis and Reporting Tool (DART) for the data warehouse, using IJC’s API to access DART’s middleware services.

A chemical tracking system · (View the Presentation) · return to TOC
Melanie Sanderson described the new chemical tracking system at Imperial College. The business driver was the identification of the laboratory supply chain as an area of major risk for higher education institutions in the UK; the college’s Oracle i-Procurement system did not meet the requirements for tracking hazardous chemicals. SciQuest was selected at the obvious supplier and SciQuest Enterprise Reagent Manager (ERM) had been implemented by February 2012. The need for the ChemAxon data cartridge was not identified until December 2011. JChem went live on May 14, 2012 with four groups in the chemistry department; other groups in chemistry will be involved over the next 4-6 months.

One technical challenge was interfacing with other college systems (ERM to i-Procurement & i-Procurement to ERM). There is an ongoing debate about structure drawing tools: JChemPaint (supplied with ERM) is not liked and users want ChemDraw. The late discovery of a requirement for a data cartridge caused delays while more money was raised. The integration of third party products with the ERM (ChemDraw and the ChemAxon data cartridge) was another challenge, as was the hidden configuration: test and production instances turned out to differ and search by common name was not possible at first.

There are also process-related challenges. The processes for ordering and receiving chemicals are different across five campuses. The level of engagement in Chemistry is not mirrored in other departments where there are fewer technicians and the system is seen as an administrative burden. The Chemistry model may not be scalable (or down scalable) to other departments. In retrospective it seems that there was insufficient consultation with other departments on the initial configuration of the system and there is no one business owner. The success of the pilot will now be monitored before the next candidate department is chosen. Other plans are to gain understanding of the methods of ensuring compliance; using ERM for inventory control of laboratory supplies and management of radio isotopes; and implementation of a new version of the system with enhanced functionality.

JChem: ChemAxon presentations

· return to TOC

JChem as a platform · return to TOC
Szabolcs Csepregi updated us on JChem. The ChemAxon modular software suite supports the tasks of modern cheminformatics and chemical communication through a range of industry-standard components. Core capabilities include structure visualisation, search and management, property prediction, virtual synthesis, screening and drug design.

JChem is customisable, feature-rich, efficient and scalable, and accessible with many different interfaces. Multiple chemical file formats and database engines, and all operating systems are supported. ChemAxon’s partner network includes over 30 content providers, consultants, and software vendors who offer compatible software for applications including ELN, LIMS, registration, inventory, data mining analysis and visualisation, and workflow. Evotec are using ChemAxon software for compound registration, chemical information management and compound sourcing and the Dow Chemical Company uses the technology for compound registration, through DeltaSoft’s ChemCart. These case studies are described in the white paper Future-proofing cheminformatics platforms.

Since the last Budapest UGM versions 5.6-5.10 of JChem have been released. Certain searches are 3.8 times faster, JChemBase has duplicate filtering of a stream in a command line and JChem Manager has an improved connection window. There have been numerous improvements to the cartridge, for example, in tracking of errors, in licences and in convenience functions for reading and writing molecules to and from disk. Markush technology has been emphasised recently but Szabolcs reminded us that JChem also handles reactions and polymers. In future, the speed of tautomer search and search in structure tables will be further improved, there will be a computational cluster solution, it will be easier to use JChem Manager, and a JChem index table API will be supported.

JChem for Excel and Reactor · (View the Presentation) · return to TOC
David Deng demonstrated the following features in JChem for Excel 5.9 (and 5.10):

  • the option to use ChemDraw as structure editor
  • preservation of orientation when a structure is entered as SMILES
  • conversion of a structure to an image
  • property calculation and populating a whole column with property values
  • adding a structure and identifier using name, CAS RN or InChI
  • importing multiple structures from an SDfile
  • substructure search to give a spreadsheet with only the hits listed (optionally highlighted)
  • R-group decomposition (hiding or showing structures)
  • the pivot function to get an SAR table and optionally colour cells
  • in version 5.10, copying and pasting a spreadsheet with structures to Word and PowerPoint.

He also demonstrated some features of Reactor:

  • SMIRKS entry and conversion to reactions in JChem for Excel
  • enumeration of products sequentially or combinatorially
  • in Reactor 5.10 (not Excel) combinatorial enumeration of three educts
  • inputting reactivity and selectivity rules using chemical terms (with reaction examples and conditions).

Reactor can be used in KNIME or Pipeline Pilot to handle multistep reactions. Reactor is also implemented in IJC.

Registration as a service

· (View the Presentation) · return to TOC

Richard Bolton of GSK updated us on the new GSK registration system. The background is described in my report on the San Diego UGM. In 2010 a prototype registration system was developed by GSK using ChemAxon components and a potential configuration was developed that would allow ChemAxon the control to manage the software as a Software as a Service (SaaS) product, but not expose GSK intellectual property to unacceptable security concerns. SaaS delivery had the advantages of lowered support and maintenance costs, quicker turnaround of releases, and a simplified payment structure compared with previous options. Liability issues could be avoided if GSK data were protected at GSK sites using GSK security. A new contract was signed in March 2011 for GSK and ChemAxon jointly to develop a new small molecule registration system with ChemAxon supporting and maintaining the system on GSK-owned hardware. The system is now in full production at GSK, with live registration, and the legacy registration system is no more.

The number of fully automated registrations is currently 50%; this number will be increased in future. Scientists, not registrars, are soon to own the quality of registered structures, so interactive registration checking tools will be provided and interactive feedback to avoid discrepancies or errors. Scientists will be allowed to correct mistakes in their own structures, but business process change must come first.

Ákos Papp of ChemAxon described the features of the system. Compounds from the ELN, the web registration client and the bulk loader (which maps SDfile fields to database fields) are automatically registered. A poller web service submits any pending data to the registry service. Single and multicomponent compounds and their versions can be registered. Salts and solvates are automatically split (each parent can have multiple salts or solvates) and a salt/solvate dictionary is maintained. Standardizer and Structure Checker do automatic validation and fixing.

Ákos showed a shot of the “MyStaging” screen where status and details are tabulated and searchable, and items assigned to registrars. Other tabs are “AllSubmissions” and “Unassigned”. On a submission correction page structures and data can be manually corrected, using system, quality and stereo checkers and fixers; batch registration is possible using the fixers. The match service compares the record to be registered with records already in the database. Exact, “metadata”, 2D, tautomer, and 2D plus tautomer matches can be found. (Note that “metadata” in the ChemAxon sense means items such means a data field with additional text that is considered when deciding on the uniqueness of a compound; this is not to be confused with the term “metadata” as used by open data and semantic web experts.) Mock registration can be carried out to find possible hits. The match list offers unique, replace, and accept options. Registered compounds can be amended at parent, version or lot level. An audit history of initial registrations, amendments, deletions and undeletions is maintained. In future, all the business logic related part will be made easily configurable. Chemists will be given live feedback about quality with a report on the status of submissions and will be able to fix submissions without the intervention of the registrar.

JChem for SharePoint

· return to TOC

According to Microsoft, SharePoint 2010 makes it easier for people to work together: people can set up websites to share information with others, manage documents from start to finish, and publish reports to help everyone make better decisions. Tamás Pelcz of ChemAxon said that SharePoint is present in almost all large companies and is getting popular with small and middle size companies. It is becoming the standard corporate intranet portal and will soon probably be as commonplace as Microsoft Office on the desktop. ChemAxon’s extensions in JChem for SharePoint will make the Microsoft product even more attractive for pharmaceutical companies.

Concepts in SharePoint are:

  • Sites (a collection of pages, lists, and libraries)
  • Lists (e.g., blogs, discussion boards and custom lists, collecting pieces of information)
  • Libraries (lists of items which refer to files stored in SharePoint)
  • Web Parts (Filters, Lists and other sections that can be inserted into Pages), and
  • Pages (Wiki pages, Web Part pages, and Publishing pages).

SharePoint allows certain extension points, for example, Web Parts and Custom List Fields. The extensions in JChem for SharePoint are a set of ChemAxon components not related to Microsoft SharePoint Search, for example, handling structures on SharePoint Sites. Some of these component building blocks could be used separately. New in JChem for SharePoint are CDX and SKC support, and the linked file structure field. Import from File has been enhanced.

JChem for SharePoint Search has extensions for searching structures on SharePoint Sites, recognising chemical names in SharePoint document libraries, searching chemical structures in SharePoint document libraries, combining chemical structure searches with text search, and structured visualisation of search results in document and structure view. Performance and indexing have been recently improved. Both JChem for SharePoint and JChem for SharePoint Search now have a Service Application based architecture and robust MSI based deployment.

There are plans for further external data integration and using SharePoint as an application platform. Besides out of the box SharePoint technologies such as Business Connectivity Services and InfoPath, several other custom approaches will be examined. JChem for SharePoint Search will have corporate ID search and precise hit location. Image to Structure and OCR will be integrated. FAST Search and external systems such as Documentum and OpenText will be supported.

Many companies have implemented SharePoint but are having trouble making it popular with users; if there is no useful content in SharePoint researchers see no point in using SharePoint instead of other applications. To be able to search for structures and text in existing files is a good starting point for increasing user adoption among researchers. SharePoint is also an ideal choice for publishing and collaborating on new synthesis ideas. Import of chemical files makes it easy to populate SharePoint with content from other applications.

Ian Berry described the introduction of JChem for SharePoint at Evotec (View the Presentation). While SharePoint has many useful features it does present some problems, for example, handling scientific files, such as Pymol, and painful installation of add-ons. Governance can be a full time job and user acceptance is affected by the fact that file shares are easier. Nevertheless, Evotec launched SharePoint in April 2011, with management backing, across all sites, as the new company intranet, to replace an intranet that was hardly used. A SharePoint developer (actually 1.2 FTE) was available to support “EVONET” across the organisation; this support was very necessary. One year on, there has been significant buy-in from one site and some buy-in from other departments (but not enough). Some people “get it” and drive others to take-up EVONET.

Ian reported that the software has a fairly good installer process but developers were on hand to help with problems. It was necessary to install a Marvin server. Without the continued support of the ChemAxon team, the installation of JChem for SharePoint would have been challenging. It is still installed only on a test environment at present; a Document-to-Structure licence needs to be acquired soon. Potential applications are project work plans, a project ideas “database”, discussions around problematic synthetic routes, and requesting systems for chemistry-related services. JChem for SharePoint Search can be used for searching through documents and reports containing JChem, ISIS or ChemDraw embedded objects.

Ian concludes that JChem for SharePoint and JChem for SharePoint Search are nearly ready for the enterprise; project leaders who have seen these applications are enthused by the possibilities. The main challenge is that you will need a lot of support from ChemAxon to get a system set up, but, said Ian, once again ChemAxon has shown itself to be innovative supplier of cheminformatics tools backed up by a superior support network. In his own words: “ChemAxon support is the best in the industry.”

Instant JChem

· return to TOC

User presentations · return to TOC
EpiTherapeutics is a Copenhagen-based biopharmaceutical start-up that is using IJC as an integrated tool in the discovery process. The company required software that was easy to use and needed no in-house expertise, was compatible with other software, scalable, reliable, proven and versatile. These requirements influenced the choice of MySQL and IJC. Thomas Boesen briefly described the structure, batch, project, primary assays, secondary assays, and Pivot SAR tables of the database he implemented. Scientists can easily upload and access data, and quality control of data is easy, but from the administrator point of view pivoting requires MySQL programming which is time consuming and complex and introduces performance issues. Possible developments in future may include the addition of data reporting and export options. (View the Presentation)

Stephen Swanson of GSK, a project leader in the respiratory area, led GSK’s SAR tools replacement project. The object was to replace a chemistry desktop suite of applications with Helium in Excel/Spotfire and IJC. The IT-business partnership was critical to the success of this high change programme; it was driven by a core user group (a business lead and seven scientists meeting weekly with the IT team) and an extended user group of about 40 scientists. Scientists had enthusiastically adopted Helium for Excel/Spotfire by 2010 and did not complain about the loss of legacy applications but the switch from Spotfire Decision Site to TIBCO Spotfire was less successful: customisations were lost and many Decision Site users found it difficult to adjust. So, for the support model of the SAR tools replacement project two new user roles, business system owner and business expert users, were instituted.

It was recognised that the change from ISIS to IJC would be hard to sell to scientists and the GSK federation of ISIS/Host databases would be hard to replace: many processes evolved around ISIS functionality, the structure search systems were well liked, and ISIS Hviews had been tuned for optimal performance, although performance appeared to be better in the UK than in the US.

A small group of UK users first tested IJC and found that structure searches were slower than in ISIS but adequate; functionality was fairly intuitive and “similar to ISIS but different”. The experience in the US was much worse: there was a significant drop in search performance, and performance with Hviews converted to IJC Projects was unacceptable. (The GSK biological database is located in the UK.) The solution was access to IJC on CITRIX servers but uses were still frustrated and high priorities are clearly the need to remove reliance on Citrix and improve overall performance. Conversion of ISIS Hviews to IJC Projects has been carried out by the IT team: 184 have been converted and 203 have not, although the underlying data are available via Helium. The “owner” responsible for the future customisation and maintenance of IJC is now a scientist, not an IT person.

A number of functionality gaps and issues remain. Substructure searching is still very slow for more complex queries and large hit lists. A workaround for slowness of domain searching (carrying out a search across a subset of the database, such as a previous hit set) is “acceptable” to key users, but is still significantly longer than the ISIS process and the query builder is non-intuitive. Additional IJC requirements have been identified for exporting SAR data, though Helium provides an alternative approach for creating SAR tables. Effective printing of IJC project forms needs addressing. Last but not least, sorting is an issue: it could be that the architecture makes effective sorting (e.g., on biological activity) unachievable. Project owner “pain points” include working with development and production environments and creating links to biological data tables but the owners are very grateful for the creation of the IJC admin website.

Stephen made some final comments about structure editors. An earlier project had seen ISIS/Draw replaced by ChemDraw, with the goal of enforcing a single drawing package, since ChemDraw is also used in the ELN and is incorporated into Helium. The desire to consolidate chemical drawing tools was driven by IT support considerations, not by chemists. When IJC ChemDraw integration was enabled issues were identified with substructure searches using ChemDraw structures, so IJC was launched with Marvin as the default drawing tool until the issues were resolved, but users now continue to use Marvin and they like it for querying. Thus two editors are still being used. (View the Presentation)

IJC ChemAxon talks · return to TOC
Since May 2011 IJC has moved from version 5.6 to version 5.10. Large companies such as GSK and Novartis (and another that cannot be named) are now deploying IJC, said Petr Hamernik of ChemAxon, so ChemAxon has concentrated on issues such as performance, extensibility, visualisation and general usability. Schema initialisation and browsing of relational data are functions for which performance has been improved. In terms of extensibility, Spotfire has been integrated, and scripting support, the API and documentation have been improved. The new BoxPlot widget and multiple widget customisation have improved visualisation. Cherry picking is a much easier approach to building up lists. R-group decomposition was new in version 5.6. Printing has been improved. Plans for version 5.11 include an outline widget (a new form widget for displaying data grouped by categories in a hierarchical manner, in effect a combination of a tree and a table), an IJC Server and web client, IJC as a platform, and continual improvements in performance.

Erin Bolstad (who now works for ChemAxon) presented a paper on IJC as a platform (“co-authored” by Tim Dudgeon). There are two ways of extending the basic IJC platform: with Java extension modules or Groovy scripting. Much of IJC’s functionality is done with Java extension modules which are very powerful, but complex, and redeployment is necessary if the extensions change. Groovy scripting was introduced with IJC 5.4 in January 2011. It is simpler and faster than the other method, but less powerful. Runtime deployment and execution is an advantage. Schema and data tree scripts are good for building or modifying databases and for custom import and export. Form scripting is good for adding interactivity to forms and for extending forms.

Erin showed how scripting is used in the GSK registration and inventory systems and in the Thomson Reuters Fetch Patent button. She also described a “drill down” button in WOMBAT that displayed all fields from a table for only the selected rows, and pivoted the data. This effectively allowed the user to drill down through a simplified table to provide all the data for comparison. A prime example is to allow a user to select several rows of assay data for which only the average number is displayed in a form, and in a pop-up, providing all the values that went into each data point in a pivoted fashion. In the future Erin envisages “hooks everywhere”. Scripts can become big and cumbersome, so another future direction is libraries. There will be IJC “MiniApps” for clustering, screening, interfaces with external data such as ChEMBL and ZINC, statistics and analysis, structure cleaning, analogue and library design, simple registration and inventory, and intellectual property protection.

The future of Marvin

· return to TOC

Eufrozina (“Efi”) Hoffmann of ChemAxon explained that Marvin web applet technology is not adequate for the era of tablets, touch screen devices and new developments such as HTML5. As an effective web application, a lightweight chemical editor is needed that downloads fast, runs on both PCs and touch screen devices, and has the stable chemical background of Marvin. A JavaScript implementation appeals because JavaScript is the leading web application technology, it is platform independent, it can run without a browser plugin, it is lightweight (although there is frequent server-client communication) and original Java code can be reused.

Efi outlined five milestones for Marvin-JS. The first addresses basic drawing, editing and display features; abbreviated groups and S-groups, atom properties, basic display of biomolecules, import and export of MRV files on the client side and other file formats supported by Marvin on the server side, and basic server side calculations (e.g., cleaning, name import and checking). Milestone 2 includes optimising the application for touch screen devices: introducing multi-touch gesture handling, and special touch events. Next come 3D display, advanced templates, and user defined templates and other calculators on the server side. MiIestone 4 delivers reaction drawing, reaction query features, and the text box, and the final milestone encompasses all query features, complex S-groups, and graphical objects. The planned Milestones 3-5 will be revised according to user needs. The JavaScript API will be available. Marvin-JS will be a part of the Marvin package and the other parts will be developed and supported in the future as well.

After describing these future plans, Efi highlighted some recent developments in Marvin. Marvin 5.7 introduced the Align&Distribute function, the curved chain tool, and hashed bond as a new bond type. Marvin 5.8 uses native PDF format as the default image pasteboard format on Mac systems. Marvin 5.9 allows electron flow arrows to be drawn in a reaction. Marvin 5.10 allows disconnected molecules to be exported separately, the abbreviated group list is easily extensible, there are new graphical arrows, and peptides are displayed with one-letter amino acid codes.

Naming technologies and text mining

· return to TOC

Introduction · return to TOC
ChemAxon’s naming technologies Structure to Name (S2N) and Name to Structure (N2S) are mature products but David Deng demonstrated some of their features before showing us Document to Structure, which now works with scanned PDFs and returns structures and their locations in the document. Document to Structure uses ChemAxon naming technologies and OSRA (Optical Structure Recognition Application) for images. OCR and syntax correction are being continually improved. New features in Version 5.9 included support for Microsoft Office documents and embedded structure objects. Results are progressively displayed. IJC was integrated, the API was simplified, and speed was improved. Version 5.10 has image-to-structure “confidence scores”: structures below a certain threshold will not be output. Therefore, “nonsense” structures from OSRA will not be generated, so Structure Checker will not be needed for post processing. Also in version 5.10, fragment groups are integrated with Markush generation, and biological names such as PDB codes and EC numbers are handled. Chemicalize.org is a free showcase for the technology: it “chemicalises” web pages. All structures are indexed and can be found through structure and keyword search on the site.

User talks · return to TOC
Andrew Hinton’s talk on structure normalisation earlier in the meeting is also of relevance to this section. Lutz Weber discussed OntoChem’s data and knowledge extraction technology OCMiner. Chemical ontologies are useful even for users with chemical expertise. For example, they allow high-level queries such as “What diseases can be treated with terpenes?” Current ontologies such as PubMed’s and ChEBI (Chemical Entities of Biological Interest) are manually created, leading to redundant, missing and wrong relationships and problems with homonyms in the hierarchy path. A common understanding of terms such as “backbone”, “superstructure” and “tautomer” is needed. Lutz wanted to create an ontology editor for chemistry that allows him to classify chemical compounds into structure classes, automatically assign compound classes to compounds in structure files or databases, and perform quality checks. It should also be optimised for recognising compounds and compound classes in text, e.g., propane, propanes, propane derivatives and propyl substituent.

The cycloalkane compound class must be distinguished from the cycloalkane derivatives class (where the compound contains a cycloalkane ring but is not a cycloalkane). Class type distinguishes pregnanes from androstanes, which are steroids, within the hierarchy organic compounds, lipids, prenol lipids, terpenes and triterpenes. In OntoChem’s ontology, compounds are described with SMILES and classes with SMARTS. If there is no defined substructure (e.g., “lipids”) SMARTS for children can be used. The ontology editor, SODIAC, released in 2012, is integrated with ChemAxon software. OCminer can be tried out on the web. (View the Presentation)

Alexander Klenner of the Fraunhofer Institute for Algorithms and Scientific Computing (Fraunhofer SCAI) presented a grid-based solution for chemical named entity recognition (NER) in full text patent PDFs. In the pre-processing stage images and text are separated using chemoCR and OCR (Optical Character Recognition) is performed using tesseract-ocr from Google. Several tools are available for chemical NER processing and patent zoning. The chemical names identified and extracted are translated into InChIKeys that can be used to generate structures with Marvin. All structures are finally stamped into the original PDF as “pop-ups” with links to ChemSpider and PubMed. All retrieved chemicals are also stored in an IJC database with a reference to the original patent. The workflow is based on UIMA (Unstructured Information Management Architecture) and can be adapted to incorporate different chemical NER tools. UNICORE (Uniform Interface to Computing Resources) is used to access grid resources for efficient parallelisation of all processes.

Chris Southan (ChrisDS Consulting) has been using chemicalize.org with other open resources to extract SAR from patents and explore overlap or similarities in PubChem. He concluded that chemicalize.org is a powerful, flexible and free tool that significantly enables small-scale “roll-your-own” mining of patents and journal articles and abstracts. A certain amount of cunning is still needed to discern SAR details. Chemicalize.org is complementary to commercial patent databases populated by manual extraction: it can extract more structures. Commercial automated patent extraction databases typically combine ChemAxon’s N2S with other algorithms, so they out-perform chemicalize.org, but chemicalize.org is still very useful for intersecting journal articles, or other sources, and databases.

Significant novel content (compared with public databases) is accumulating through “default crowdsourcing” in the chemicalize.com archive, which is becoming an important cross-check source. “Walking” among documents is also enabled. The combination of OPSIN (Open Parser for Systematic IUPAC Nomenclature), OSRA and chemicalize.org allows structures from most sources to be extracted. (Note that Document to Structure does use OSRA but at the moment chemicalize.org does not.) Chris’s take-home message was that synergies of chemicalize.org with sources such as PubChem, PubMed Central, ChEMBL and SureChemOpen will advance academic drug discovery and chemical biology. (View the Presentation)

ChemAxon professional services

· return to TOC

ChemAxon has created a new informatics services team (currently Erin Bolstad in the US and Tim Dudgeon and Daniel Butler in Europe) which can also draw on the expertise of external consultants and ChemAxon application scientists and product teams. The relevant product areas at the moment are likely to be IJC, the JChem cartridge and Markush but soon chemical registration and SharePoint may be included. The two main areas of work anticipated are migration support and application development. The team wants to understand the customer’s needs, work out, with the customer, how to solve the problem, to solve the problem efficiently, to transfer the knowledge and to ensure satisfaction. “Consultancy toolkits” created could well become products in the fullness of time.

Three examples of migration assistance already exist: ChemAxon is helping GSK to convert ISIS forms to IJC for hundreds of projects, and another company in moving a reporting solution to IJC and the JChem cartridge, and is soon to start assisting Novartis in migrating hundreds of local ISIS databases to IJC and Oracle. In application development ChemAxon is helping EpiTherapeutics to build a compound and assay database in IJC and MySQL. For Novartis a reactions data warehouse is being built by importing data from various ISIS legacy databases (see below). It will have a live feed from the CambridgeSoft ELN. This is an AJAX-style web application providing query and filtering capabilities with a web services interface for fitting into SOA. ChemAxon is listening and invites other consultancy business!


· return to TOC

Greg Landrum talked about Novartis’ new reaction database application. He first referred to the move away from ISIS as described by Anna Pelliccioli earlier in the meeting. Currently, literature reactions and Novartis reactions before the implementation of the CambridgeSoft ELN are stored in an ISIS/Host database searchable from ISIS/Base. ELN reactions are stored in the CambridgeSoft schema, searchable, but only very slowly, from the ELN. A new system was needed to allow NIBR scientists to search both sets of reactions quickly and efficiently, in one place. This was to have a web-based user interface and to use web services for searching and retrieving the data. There is a serious effort underway in Novartis to expose every computation, if possible, via web service.

User stories were collected within NIBR, informed by Pistoia Alliance questions together with experience with the existing system CERES and various commercial tools. Requirements and design were done collaboratively with ChemAxon and implementation was by ChemAxon. The aim was for queries to retrieve results based on exact structure and substructure searches of reactants and/or products, retrieve additional information for a particular reaction, query across multiple data sources, and limit results by yield, reagent, solvent, etc. The multiple different collections of historic data were so heterogeneous that it was decided to bring over only the pieces of information that were actually needed for searching and browsing and to present additional data using a “details” page that could be different for each original source.

The architecture is fairly standard: a web browser with JavaScript on top of a Tomcat/Wicket web application, accessing an Oracle/JChem database through a Tomcat data service layer. The database is updated automatically from the Oracle-based ELN. The interface is in HTML5 and supports a range of browsers. The system supports Novartis’ internal authentication and authorisation framework and uses REST-style web services. A beta version of the web interface and services is complete; 260,000 historic reactions have been loaded; and chemists are starting to test the application. The conversion of about 600,000 reactions from the ELN to JChem is being planned. Greg showed some screen shots of the application, which looks like all other NIBR applications, with a search pane, a results pane and a filter pane. (View the Presentation)

The next user talk was from Jonathan Davies of IDBS. IDBS has called upon the ChemAxon Reactor toolkits to assist in building a solution for chemists running parallel synthesis experiments. The impetus for the development was BASF’s requirement for extra features after adopting IDBS’ E-WorkBook ChemBook. BASF wanted the same features for small library syntheses as they had for single syntheses. InforSense (now part of IDBS) had experience of working with ChemAxon, although times have changed and mashups are now the way ahead. IDBS likes the ChemAxon kit and thought the company could work with ChemAxon. The IDBS ChemAxon BASF partnership turned out to be a win win win situation.

The solution supports enumeration of reactions and products plus entry of associated stoichiometry parameters. It is available as part of IDBS’s E-WorkBook ChemBook product so that scientists can enumerate reaction products and capture all of the context and management and sign-off information associated with the experiment in a single application. Feedback has been very positive although Jonathan reported just a few problems. IDBS works in an ISO 9001 and GxP environment, which adds overheads. The frequency of new releases of ChemAxon software is also a problem for IDBS and IDBS was not always sure which bits of the bundled ChemAxon software were needed. All in all, the experience has been positive and now that Markush searching is becoming important, there is an opportunity for further collaboration. (View the Presentation)

Some science

· return to TOC

The UGM always includes a few genuine computational chemistry talks (which tend to be relegated to the final afternoon!). This time there were three volunteers. Firstly Dragos Horvath reported some work done with his colleagues at the University of Strasbourg on the use of self-organising maps (SOMs) to accelerate similarity search. I look forward to hearing Dragos talk, and not only because of the quality of the work done by this team. With a deadpan delivery Dragos raises laugh after laugh from the audience: very much in contrast to his rather serious and dense prose style.

The Strasbourg team has used the ChemAxon API to develop high-quality, high information content, pH sensitive and otherwise chemically meaningful descriptors such as fuzzy pharmacophore fingerprints (J. Chem. Inf. Model. 2006, 46, 2457-77; J. Chem. Inf. Model. 2008, 48, 409-425) and ISIDA coloured fragment counts (Curr. Comput.-Aided Drug Des. 2008, 4(3), 191-198). It is time to exploit them in similarity-driven virtual screening but these fingerprints are neither binary, nor really short. One solution is to use SOMs to map molecules and search a neighbourhood. The query is compared only to neighbouring references, where a neuron “radius” is defined to establish the neighbourhood.

Dragos concluded that maps should be trained on relatively small (but diverse) compound sets: too many input molecules just make convergence more difficult. It was good to find that there is no need to retrain the maps when expanding the database. Too much fitting of the code vectors may be detrimental: interestingly, unsupervised learning methods may suffer from overfitting too. SOM acceleration works in both Tanimoto and Euclidean spaces, and with different sets of descriptors: you get about 90% of the expected virtual hits in 10% of the time. Losing a few virtual hits is not really a problem but should it become an issue, a larger neuron radius can be chosen.

The next speaker was Ágnes Peragovics of Eötvös Loránd University, who talked about Drug Profile Matching (DPM), an affinity fingerprinting method capable of predicting the complete effect profiles of small molecules based on their interaction patterns which are generated by flexible docking to a series of rigidly handled non-target protein active sites. DPM was found to classify molecules excellently. Ágnes also compared the performance of DPM with 2D and 3D similarity fingerprinting approaches using ChemAxon JChem Base, Screen and Calculator Plugins. Drug classification was carried out for a two-level hierarchical effect database using a set of about 1200 FDA-approved small molecule drugs. To get a more realistic view of the feasibility domain of DPM, its predictive power was also tested on external data. Ágnes found that 2D and 3D similarity fingerprinting of rigid structural categories had a similarly high predictive power to DPM, but for many effects DPM was able to overcome the common screening problems of 2D and 3D similarity searches arising from the presence of structurally diverse molecules.

György Dormán of TargetEx reported on virtual screening of potential PDE5 inhibitors combining 2D and 3D similarity search (Mol. Div. 2012, 16(1), 59-72). 2D fingerprint-based similarity search is commonly used to select focused libraries from multi-million compound repositories based on the structure of known biologically active compounds. Even though the hit rate often reaches 2-5% further improvement is required. One of the standard methods is to filter the similarity search results using calculated physicochemical parameters. Another ligand-based approach is to determine the shape/flexibility similarities of the 2D search results to the known actives and such values can be used for ranking and filtering the 2D similarity results. This tool, ChemAxon’s Screen3D, allows fully flexible matching and can also align flexible database compounds to rigid query molecules using on-the-fly generated conformation ensembles. Two types of 3D similarity screening methods are available in Screen3D: “Shape” and “Match”. While “Shape” evaluates full molecular shape similarity measures that can model ligand binding to a protein’s active site, “Match” aims to explore key atoms in ligand binding while the actual shape of the molecule is not considered.

György’s team applied the Match function analysing a previous phosphodiesterase 5 (PDE5) 2D similarity search and screening campaign in order to identify the 3D similarity threshold. For 2D similarity search using IJC Screen they set the similarity threshold to a Tanimoto coefficient of 0.65. For the first screening round they identified 0.35 cut-off values for Screen3D while for the second round screening using close analogues 0.5 was determined. (2D similarity values had a Tanimoto coefficient of over 0.8.) In order to validate the results they applied the 0.35 values for refinement of a completely new 2D similarity search generating PDE4 focused libraries. In future, further refinement and investigation for a more accurate combination score will be carried out.


· return to TOC

Csaba Fábri presented ChemAxon’s new NMR Predictor (in Marvin 5.10) which can predict 13C and 1H NMR spectra for standard organic molecules containing the most frequent atoms (H, C, N, O, F, Cl, Br, I, P, S, Si). It has an easy to use graphical user interface, can compare predicted and experimental NMR spectra, is ready for the web, and can be used via a MarvinSketch applet. The physicochemical and topological descriptors for the description of atomic environments were reported by Meiler et al. in J. Chem. Inf. Comput. Sci. 2000, 40, 1169-1176. Model fitting is with multilinear least-squares regression (MLR) and support vector machine (SVM). The 13C NMR chemical shift model has 9 categories based on hybridisation and the number of attached protons. The 1H NMR chemical shift model has 2 categories: C-H protons and heteroatomic protons. Training and test data were obtained from the NMRShiftDB database.

Csaba did a demonstration. Multiplets and shifts in a table are dynamically linked to lines in the spectrum. The units (ppm or Hz) are interchangeable. There are vertical and horizontal zoning features and a neat help function. PDF, JCAMP-DX files and molfiles can be exported; spectra are imported as JCAMP-DX files. In future H-F and C-F coupling constants will be calculated, diastereotopic protons will be handled, more NMR data formats will be supported, the chemical shift training set will be extended to get a more accurate and versatile model than the one current one (trained only with NMRShiftDB), additional fitting algorithms besides MLR and SVM will be considered, and chemicalize.org will include NMR prediction. ChemAxon is also working on a solubility predictor a beta version of which will be appear in the 5.11 ChemAxon software release.

Over the years, Evotec has built up a library of custom calculators. Bob Marmon showed how, with the new services interface built in to Marvin, Evotec is now easily able to integrate these across the company’s own applications as well as ChemAxon’s. Cxcalc can be used to access the custom calculators, they can be mixed with ChemAxon calculators and results can be output to SDfiles. Chemical Terms calculations can also access the custom calculators from the command line and in IJC. Again, combination with ChemAxon calculators is possible. Not only are the custom calculators integrated with Marvin and IJC, they can also be combined with ChemAxon/Infocom nodes in KNIME workflows. This is a case of “write once, deploy everywhere” for the calculators and has enhanced what the chemists can do for themselves on their desktops, without the need to ask for help. (View the Presentation)

Partner presentations

· return to TOC

Biochemfusion’s Proteax software bridges the gap between cheminformatics and bioinformatics; it can be used with Microsoft Excel or Oracle databases to register and analyse protein structures. CWM Global Search now has a Proteax button and search results can be manipulated in JChem for Excel. Chemicalinventory, originally built in collaboration with the University of Copenhagen Department of Medicinal Chemistry, is a web-based application to manage the chemical stock in a laboratory. Its cheminformatics features are powered by JChem. DeltaSoft’s Discovery in a Box is an integrated suite of applications for drug discovery. The ChemCart suite of applications is powered by the ChemAxon cartridge and tools. eADMET offers an online QSPR modelling platform at http://www.ochem.eu.

Infocom supports JChem Extensions, a set of KNIME nodes with which users can build their own workflows and data mining applications. The extensions contain over 90% of ChemAxon’s cheminformatics functionality; Marvin Family Nodes and Marvin are freely available in KNIME. Mcule is a combination of services providing users with a fast and cost-effective way of identifying and ordering new drug candidates. Chemical suppliers can integrate their compounds and software vendors can integrate their tools. MolPort is an outsourced procurement service. It is a one-stop shop for acquiring thousands of compounds quickly from dozens of suppliers.

Seurat is a data sharing and visualisation tool from Schrödinger. JChem Cartridge, MarvinSketch, and ChemAxon property calculations and substructure search are embedded in it. SciQuest supplies chemical life-cycle management and procurement solutions. Its combination of software and services is integrated into existing tools such as those from ChemAxon, SAP, Oracle and Ariba. Joint SciQuest-ChemAxon customers include GSK, Novartis and Imperial College.

The Sysment reaction tool handles both traditional medicinal chemistry and biopharmaceuticals and also offers image handling in reaction drawing. It does automated calculations and can be integrated with existing ELN systems and corporate databases. It incorporates Marvin and Proteax. TIBCO Spotfire LeadDiscovery supports IJC, Marvin and JChem.

Final comments

· return to TOC

Alex Drijver, the CEO of ChemAxon, made some final comments under the heading “END: Energy Never Dies”. A lot is still happening at ChemAxon. Alex listed some milestones, and top of the list was “a gong”: GSK, in partnership with ChemAxon, has won one of the 8th Real IT awards for a cloud-based application. There have been global roll-outs of end user tools. Registration has gone live. Document to Structure has been added. A consulting group has been formed. It is also worth noting some products that usually get little attention: Screen3D is one of these “stealth products”. NMR Prediction has been launched. Marvin is being used at GSK: the chemists used to like ChemDraw but now GSK admits that Marvin is liked. JChem for SharePoint is “oh so nearly there”. Marvin JavaScript is off the starting blocks. All the products are in active development and the product range is so wide. Maybe version 5.10 will solve all the world’s problems, quipped Alex.

He asked the users if they had noticed what ChemAxon calls its platform. It had the same name 10 years ago. It has had the same core since 1998, although, of course, it has been improved. Key ChemAxon products are never discontinued, unlike You Know Who’s. ChemAxon has a customer focus. It is building a community of users, has long term partnerships from which the development strategies or product ideas emerge and is very active in the Pistoia alliance. JChem is the platform of choice: it has stability and longevity.

A suitable conclusion is also evident in what Nóra Lapusnyik said in her introduction to the partner presentation session. ChemAxon was not the first company on the market but it is the first alternative. The company has been able to learn from the mistakes made by others. ChemAxon is not forcing unwanted decisions, duplication or functionality on users. It has the chemistry cartridge. New directions for the future include a new “Lite Marvin” (a JavaScript one that will be good for mobile applications); improvements in naming and Document-to-Structure; more features for SharePoint; a registration system (complementing ones built by partners) and NMR prediction.


· return to TOC

There is little I can say about ChemAxon that I have not already said before: it is financially stable, it has still not reached a plateau, it continues to develop software at a remarkable rate of knots, it is renowned for its customer service, and so on. There are minor changes: it produces more “products” (as opposed to toolkits) than when I first started attending user meetings, it is more “enterprise-oriented” and it has attracted business from really big companies such as major pharmas and Thomson Reuters. It has had to move to bigger offices. The trends are obvious but I see no warning signs yet about an exit strategy: no open source outfit can overtake ChemAxon, and being gobbled up by You Know Who is not on the cards. So the star is still on the ascent. As to the user meeting, it is beginning to be a highlight of my annual calendar, as is evidenced by the increasing length of my reports. It is hard to find anything negative to say about ChemAxon; even in the networking sessions (i.e., at the social events and in the bar) you do not detect grumbles about bugs, or missed deadlines, or licensing issues. The ChemAxon family seems to be pretty happy and I look forward to joining it again next year.

Return to Table of Contents