ChemAxon US User Group Meeting, September 29-30, 2015, San Diego California
The meeting was held at the Catamaran Resort Hotel and Spa, San Diego, CA, as in 2011 and 2013, but, sadly, the number of attendees was rather lower. Dinner on the night preceding the UGM was in the precincts of the hotel:
The moon was particularly notable: this was the night after the total lunar eclipse with a “supermoon”. The conference party was held at the end of the first day of the conference at a rooftop bar overlooking the Pacific. Beds were a feature! But now to more serious stuff.
My previous reports have tended to concentrate on user talks. This time there were a great many ChemAxon presentations worth reporting, and they were in a block of their own, rather than interspersed with user talks. I have therefore decided to tackle the meeting in chronological order. In the ChemAxon talks, I have tried to emphasize new software features, and simply provide hyperlinks for readers who are not familiar with the product line.
The first morning was devoted to presentations from ChemAxon staff. Jon Patterson of ChemAxon welcomed the delegates, and then Aurora Costache opened the session with a product overview. In the pharmaceutical industry, in the small compound arena, ChemAxon is becoming the industry standard in chemical representation and search, as indicated by the growing number of strategic collaborations with partners who select the ChemAxon Cartridge and JChem Base as their chemistry platform, but the company also realized that handling just small molecules is not enough. So they started to develop a biomolecule toolkit for the storage and search of biological entities such as peptides, proteins, protein complexes or bioconjugates. Document repositories and analytical data support are other more recent areas of focus, and the company is also putting more effort into supporting general chemistry, polymer chemistry, the petroleum industry, agrochemistry, and flavors and fragrances.
ChemAxon is also expanding the way the technology is supplied to customers. Traditionally, ChemAxon was a toolkit company, and providing APIs for integration into custom solutions is still very important, but the company also realized that providing full, out of the box solutions for chemists is a must: hence Instant JChem (IJC), and now Plexus Suite. The collection of desktop tools, including MarvinSketch and IJC, is still important, but in addition, ChemAxon has developed web-based applications to give easy access to the chemistry platform.
One problem with the product portfolio was that it was constructed from individual bits and pieces that could communicate with each other, but still lived as separate components. Instead of using this approach, ChemAxon is now moving toward a single, unified platform that integrates all its technologies and applications. Last, but not least, are tools enabling collaboration, to meet the needs of the emerging networks of smaller (and larger) organizations working together at different locations, in different time zones.
Despite the expansion of the product line, ChemAxon still wants to increase the quality of its code and the responsive nature of development. In July 2014 they thus introduced a new policy of “rolling releases” (a standard for many major companies), to address the problem of bugs that could accumulate over long development periods. A new test system also means that the code is continuously tested in module, integration, system and performance tests, so that bugs are caught very early. A comparison of the number of bugs in JChem Base and JChem Cartridge in a 10-month period before and after the frequent releases were introduced shows that the number of these bugs has dropped to one third in the new infrastructure. A lot of effort is now put into validating ideas, and trying to fail fast and fail often, to achieve something valuable. The new test system is easily applied on new products, which are therefore much more robust from the beginning, and are released very quickly.
The rest of the morning was devoted to presentations about ChemAxon products, mainly exemplified by solutions to problems faced by the pharmaceutical industry. The following, rather busy chart shows how these solutions fit together.
The first group of the ChemAxon talks was concerned with the “Create, Design, Register” section of the above chart.
Marvin Live also comes under the “Create” heading. András Strácz first explained why this solution is needed. Drug discovery teams are increasingly spread out over the world, and involve CROs or consultants. The research partners suffer from outdated or missing data, complex tools, and cumbersome reporting and data sharing, because of the way that screen sharing is done. Lack of co-location leads to knowledge silos and duplicate effort. Marvin Live, however, enables real time collaboration from any location and any device, with productivity features for automatically characterizing and searching molecules on the canvas, generating actions, and capturing meetings in a searchable format.
András discussed some use cases. The first concerned industrial users: Company A brings in consultants every week; Company B has multiple US East Coast sites, and frequent traveling between them; and Company C is a global enterprise, with facilities on three continents. András showed a video of a medicinal chemist in San Diego and a computational chemist in Zürich, Switzerland working together on improving the properties of a lead candidate. Using Marvin Live, they see the same molecules and information, so they can continue each other’s ideas. Automatic updates send any changes over to the other user’s screen. The chemists can access physicochemical properties and databases, and automatic searches provide them with information to guide the design. When they notice a problem, such as low solubility, they can work together to improve the property. Useful Web Services, databases, and applications can all be accessed. Once the chemists achieve a good design, they can save it with notes, and continue on to other ideas. They can also generate meeting reports that contain all the information from a discussion. Built-in PowerPoint reports show all the saved structures, together with the author and time of the idea, and all calculation results. Chemical structure files are also available for use in other chemistry software.
The second case concerned a university professor whose virtual “office hours” involve tutoring, and working on publications. Since Marvin Live keeps a history, the professor and his students or collaborators can continue discussions after days or weeks. If only one person uses Marvin Live, it becomes a personal idea repository where he or she can refine ideas before sharing them with others. It is very easy to create “meeting rooms”, and using the real-time feedback and reporting features, preparation and administration become much faster.
Critical needs for these two use cases were data access, and security. In the second use case, students have Lightweight Directory Access Protocol (LDAP) accounts; company B uses Active Directory; company C has a cloud solution that needs single sign-on (Security Assertion Markup Language, SAML). For different projects or classes, separate rooms and security contexts are available. Data access also needs to be flexible. Plugins allow exploration of physicochemical property space, and connection to Reaxys, SciFinder and corporate databases. Ligand protein complexes can also be displayed: a new ChemAxon 3D viewer is significant here.
Csaba Peltz started with an example of registering selected results from scaffold enumeration in Plexus Design. The Compound Registration user expects to check for uniqueness and assign a corporate ID using a simple workflow: submit, validate and normalize, evaluate matches, and then store data. In practice the workflow can get complicated because of the need to handle salts, make manual corrections, register new batches, and so on. Existing chemical business rules can be imported into Standardizer. Structure Checker is an interactive tool to detect and fix issues with structures using JChem technology. Customized checkers can be included. The user interface has been improved recently. Fast and single click batch upload, and handling of compound libraries are now supported. The same automatic fixing options that you can apply for single compounds can now be applied to batches. Upload from an Excel spreadsheet is possible. The search page has been redesigned, and gives the same search experience as in other ChemAxon tools. Csaba showed a typical integration flowchart for feeding the registration, registering, and using and analyzing the registered data. Pipeline Pilot components are available for registration. KNIME nodes will be available in the near future.
ChemAxon’s vision is to deliver an integrated and entity-agnostic data management and analysis solution for scientists, who collaborate closely in discovery and development research across specialty domains in chemistry and biology. A major goal is to provide a toolkit to represent any biomolecule properly; the broader strategy is to build an intuitive, out-of-the-box solution around the toolkit.
Roland Knispel explained that chemists and biologists view entities (small molecules and “biologics”) from a different perspective. There are cheminformatics tools to deal with small molecules as chemical structures, and bioinformatics tools to process macromolecular structures as monomer sequences, but there are areas (e.g., macromolecule-drug conjugates) where macromolecules need to be handled as chemical structures and areas (such as synthetic peptides) where small molecules need to be handled as sequences.
There is a need to store the description of a diverse set of entities in a single data management environment; to access those entities with common cheminformatics and bioinformatics tools; to interconvert between domain-specific file format standards; and to pass data between collaborators in different domains. The solution is the Biomolecule Toolkit. It gives a complete, consistent, unambiguous, machine-readable description for any molecular and non-molecular asset. Biomolecules are described as sets of attribute value pairs; molecular entities are described as a combination of chemical building blocks. Rules for connecting the building blocks are defined at the atomic level. Ambiguity is supported at various levels, and the description can be complemented with metadata.
Roland ran through a number of cases of supported entities. The first was nucleic acids with standard, non-standard or unnatural bases and backbone chain chemistries, exemplified by Mipomersen (Kynamro), a synthetic oligonucleotide in which the natural backbone chemistry of the nucleic acid is changed by replacing some of the ribose sugars by 2-O-methoxyethyl ribose, and all the phosphates by thiophosphate. The second example was ribosomal, non-ribosomal or synthetic peptides with standard, post-translationally modified, non-standard or unnatural amino acids, exemplified by Goserelin (Zoladex). The third was protein sequences, including post-translationally modified residues, and intrachain and interchain cross-links, exemplified by the monoclonal antibody Trastuzumab (Herceptin).
The fourth case was macromolecule conjugates: molecule(s) bound with known chemistry to a known building block at known position(s). Roland’s example was Thio-trastuzumab emtansine, an antibody drug conjugate. In this, a Trastuzumab molecule has an engineered Cys in the variable region of the antibody. This unpaired cysteine may be used to couple the payload (DM1, a maytansinoid cytotoxic agent) through an appropriate linker (bis-maleimido-trioxyethylene glycol). The fifth use case example was a related antibody drug conjugate, Ado-trastuzumab-emtansin (Kadcyla), involving ambiguous conjugation site location. In the final use case, the structure and sequence of the antibody is not yet determined, but it is conjugated with the payload and the small molecule is bound to it using a chemistry that, in the example, targets lysine side chains of the antibody.
Entities may be assigned to entity types which define the set of attributes collected for the entity type, as well as the business rules stating whether the values for the attributes must be provided or not (mandatory and optional fields), and which set of attribute values must be unique.
The Biomolecule Toolkit provides tools to visualize and search the data. Visualization of the data is possible at multiple levels: a schematic view of the entire entity, a schematic view of the entity as composed from building blocks, or a view of the full chemical structure of the entity (where applicable). Entities can be searched by sequence, chemical substructure or keyword.
The architecture for the Biomolecule Toolkit as a platform is as follows.
Hierarchical editing language for macromolecules (HELM) notation support is a native capability of the BioMolecule Toolkit, and has been built in since development started; ChemAxon is committed to implementing version 2.0 specifications once they are published. V3000 molfile round-tripping to and from ChemAxon’s internal representation was introduced at the end of October 2015.
Mike Braden said that Minireg is a lightweight compound registration and biological assay data management solution designed mostly for small companies and workgroups. It is an add-on to IJC and is deliberately simple and highly customizable. It is not a fully-fledged system for managing chemical and biological data, an out-of-the-box, all-inclusive solution, or a standalone, or thin client product. It is not designed for large companies.
Being based on IJC plus Groovy scripts, it does not need any extra IT infrastructure, for example, if an Apache Derby database is used, but if a shared database is needed, then only a database server is needed (MySQL, Oracle, or Microsoft SQL Server are supported). The handling of chemistry information is simple, but reasonably robust. Compounds and batches are handled out of the box, and handling of samples has been added as customization for some customers.
Handling of biological assay data is the most complex part of the system. Most common assay types are supported, but this area typically needs customization according to customer needs. The assay handling is relatively simple, and should not be directly compared to more complex systems such as ActivityBase or BioRails. Assay data are aggregated (with repeat values typically being averaged) to allow reporting of combined chemistry and biology data. Essentially there are two types of report: a compound detail report of all data for a single compound and a SAR views report. The SAR pivot table views are customizable to report any assay data for all compounds, in columns by compound. Typically this is done with the same assay “type” (e.g., comparing dosage response curve IC50/Ki values at different species or targets).
ChemAxon is working on a thin client solution (IJC is a thick client): Marvin JS and JChem Web Services can be used to submit compounds to be reviewed by a registrar in IJC. ChemAxon is also working on a Plexus solution.
The second group of the ChemAxon talks was concerned with the Plexus Suite:
Max Šauer described IJC as the “Swiss army knife for everyone”, for creating, exploring and sharing chemical data. He demonstrated form design, highlighting of cells with the same data in a spreadsheet, editing cells, and the options for data warehouses in IJC. Import from PDF is now possible and a logo can be added to reports. The adaptive search and data retrieval algorithm is now almost twice as fast. Other 2015 features include:
• synchronization of Groovy scripts between IJC schema and a local disk folder
• a “Filter Search Results” option for filtering child data results in 1:N and M:N relationships
• time support for date fields
• customizable actions according to user role.
Plexus Connect is the central model of Plexus Suite. It is a Web Services toolkit that uses IJC. Max Sauer demonstrated data import and synchronization of Groovy scripts between IJC schema and a local disk folder (available for both IJC and Plexus Connect). Views can be unlinked to prevent synchronization. A long running search can be canceled. Colleagues can view your search results. Max demonstrated widget selection, and grid view column selection for data export. Recent new features include:
• file export to Excel, SDfile, RDfile, and .txt,
• search speed, data loading, and various performance improvements
• chemical structure clipboard operations (paste to Marvin, ChemDraw, query in Plexus, etc.)
• visualization of images and external web pages in forms
• new query operators (in IJC as well)
• similarity search.
Features to be implemented soon include:
• user lists based on selection (cherry picking)
• a customizable dashboard
• set operations on lists
• customizable search options, and
• a performance testing framework, deployable at the customer site, and able to determine required hardware specifications.
Plexus Mining produces a database index of a document collection, with metadata and chemistry. It handles changes intelligently: the index in the JChem Base database is constantly updated as the document collection changes. The Plexus web interface means that no software has to be installed for end-users and that the application is consistent with other Plexus tools. It works standalone too. Plexus Mining also integrates with existing systems because it is modular. Daniel Bonniot de Ruisselet demonstrated a search for “Merck” in mined US patents. The exemplified structures popped up from a list of names.
Anna Tomin demonstrated Plexus Design for fast enumeration of virtual libraries. It offers library enumeration, property prediction, and Markush technology within a simple, optimized interface. Anna showed how to select a reaction from over 100 supplied, or create a reaction of your own using Marvin JS. Reaction-based enumeration was running in the background. Anna then provided reactants from a file or drawn with Marvin JS. She previewed the enumerated reactions, checked them, and stored them. Next she explored the physicochemical property space by calculating the property values and displaying them in the same spreadsheet as the chemical information. These values can be used for filtering and searching. Anna searched with various operators and ranges of values and examined one compound in more detail, displaying various curves. Interactive charts can be viewed; Anna showed some charge values on a structure. Next she demonstrated SAR analysis. She used scaffold enumeration, providing a scaffold and R-groups, and generating all possible combinations. The same features were available on the resulting dataset as the features she had already described. Plexus Design has been integrated with Optibrium’s StarDrop and with Schrödinger’s LiveDesign.
ChemAxon has gathered user requirements for a data visualization tool and found that scatter plots were the most wanted feature, followed by histograms as second choice. Users want to be able to handle millions of data points, not just thousands. One available solution is reportedly powerful, but it takes time to set it up correctly, and it takes time to learn how to use it. ChemAxon decided to aim for powerful visualization on the client; analysis can be done elsewhere. Plexus Analysis for visualization of chemical and biological data is tightly integrated with Plexus Connect, Web Service calls, and pipelining tools, and it takes data from files and external databases.
György Pirok showed a movie of quick charting. He put some data into a spreadsheet, searched on a partial IUPAC name, and at least five hydrogen bond acceptors, produced a scatter plot of XlogP versus molecular weight, and zoomed in and manipulated it. Next he demonstrated relational data analysis from multiple forms and tables. He changed the shape, size and colors of the scatter plot, and also made a bubble chart which he scaled and zoomed. Finally he demonstrated visualization of large data: 1 million records from ZINC. He selected axes (logP and molecular weight), chose color and shapes for the spots, and created a scatter plot.
Plexus Analysis will be released in 2015 with interactive scatter plots, support for multiple millions of points, histograms, data synchronization (coordinating a spot and its data value), and reporting to PowerPoint. In 2016, R-group decomposition, SAR and pivot tables, curve fitting, and parallel coordinates are promised.
Miklós Szabó gave a brief overview. In terms of discovery user interfaces, ChemAxon released IJC in 2006, Plexus Connect and Plexus Design in 2014, and Plexus Mining in 2015. Plexus Analysis will be released later in 2015. As for data capture UIs, Compound Registration was released in 2013, and Plexus Assay and Plexus Inventory (better interfaces for small companies) are planned for 2016. Plexus Cloud Solutions are coming later in 2015.
The third and final group of the ChemAxon talks was concerned with searching, analyzing, reporting, and sharing, starting with search, which fits in to the scheme as follows.
András Volford gave this talk. JChem Base is at the heart of ChemAxon. It is available in JChem Oracle Cartridge, Instant JChem, JChem for Office, Plexus Suite, Biomolecule Toolkit, chemicalize.org, Compound Registration, JChem Web Services, JChem for SharePoint, Compliance Checker, KNIME, and Pipeline Pilot. JChem Base adds chemical intelligence to relational databases. JChem Web Services offer chemical intelligence for web applications. They allow molecule entities using different file formats to be searched through a RESTful API. Stateless property calculation, Markush searching and reaction enumeration can be carried out in thin client solutions.
The JChem Oracle Cartridge has been on the market for more than 10 years. A recent improvement is similarity search speed-up. The JChem PostgreSQL Cartridge was launched in May 2015. It features substructure, full fragment, duplicate, and similarity searching (but not yet reactions). Tautomers and stereoisomers are handled. IJC integration is in progress. ChemAxon built this cartridge because Oracle is expensive and complex for small companies, and even in big companies, small groups of users might like to manage a database separately from the central company database. Behind the scenes of the new cartridge is a new search engine, since in parallel with the development of the JChem PostgreSQL Cartridge, ChemAxon is working on a new search engine that will have changed technologies, to take advantage of new developments such as distributed systems and NoSQL.
The remaining talks were related to reporting and sharing. Efi Hoffmann’s subject was the Marvin suite applications and API for chemical sketching, visualization and data exploration. It has various interfaces:
There have been improvements in publication quality: in the label editor, IUPAC standard abbreviations, more flexible double bond drawing, new parameters, and custom and journal styles. The pKa, logP, and logD plugins have been enhanced. The tautomer plugin has improved tautomer models for tautomer generation, and new stereocenter types are recognized in the stereoanalysis plugin. Bond and atom label drawing has been changed, and the new parameters (such as bond length, bond spacing, hash spacing) can be applied on the exported images as well.
Ákos Papp updated us on the JChem for Office application. A prototype of chemistry in OneNote has been built, that allows, so far, only adding and editing of structures, and single structure copy and paste across Office. Chemical filtering is now 5-10 times faster: the first search is about five times faster and further searches (even if the query is changed) are about 10 times faster. An unlimited number of R-groups in a Markush structure is now supported and properties (e.g., logP) can be displayed in colored cells for R-group pairs in an SAR table. Multiple hits are represented by an average value. A fixed scale feature in Office documents keeps the bond length and ring sizes the same everywhere in the document. The scale and display options are reflected during copy and paste, are pushed into MarvinSketch, and are kept after editing. The zoom feature is retained. The options are applied from local settings. Akos demonstrated data import using JChem Web Services. In future, import through Plexus Web Services will enable data import from hierarchical databases. Next, Office 365, Office2016, and Office Online must be considered. Save to share already works in Office Online.
József Dávid ran through many of the features and benefits of the IT platform toolkit JChem for SharePoint. With the help of the SharePoint architecture, it is possible to connect external data sources to SharePoint. If you use Microsoft OLE DB provider to connect your external data, JChem for SharePoint will index it and make it chemically searchable for you. Corporate and compound IDs can be searched chemically from SharePoint. When the corporate ID exists in any document, the structure search will find it. A corporate identifier database (MS SQL, Oracle, and Web Services, under development) can be connected through ChemAxon’s administration interface. The system can even convert the ID into a structure and use it as the basis for a structure search. ChemAxon’s compound registration database can be used as a source. If the source data grows, you can scale-out the chemical indexing and searching together with SharePoint, and provide the same user experience independently from the amount of source data behind the scenes.
Daniel Bonniot de Ruisselet presented some enhancements to ChemAxon’s naming technologies. Name to Structure and Document to Structure have been improved. Plexus Mining is an out of the box solution for chemical data mining in ChemAxon’s web-based chemistry platform, Plexus Suite. Chemically annotated HTML is generated from internal and external documents stored as PDF, text, XML, and HTML. Daniel closed with an IUPAC naming conundrum. A customer reported in 2014 that no vendor had solved the problem of naming the following compound, but ChemAxon had the closest solution.
Nevertheless, even ChemAxon’s best names ignored the full, extended stereochemistry:
The structure can be represented using the V3000 molfile format but IUPAC has not recommended a name. So ChemAxon extended its rules to produce the name:
This solves a problem for the ChemAxon customer, and maybe one day IUPAC will adopt the name.
Of the various Markush technologies (search, enumeration, hit visualization, non-hit visualization, overlap and composer), Árpád Figyelmesi selected non-hit visualization, overlap, and composer for special mention. Non-hit visualization in ChemCurator is new in 2015. It highlights the non-matching parts of the exemplified structure and the Markush structure.
Compliance Checker is a combined software system and content package providing a way to check whether your compounds are controlled according to the relevant laws of the countries of interest. Norbert Sas ran through an example of the benefits. Compliance Checker has been rewritten so that it will scale. The Compliance Checker plugin runs an automatic controlled substance compliance check during design in Marvin Live, and alerts the user if any legislation is violated.
In the partner session there were presentations from Jeffrey Nauss of Linguamatics, Denise Williams of Core Informatics, John McNeil of John McNeil & Company, Diana Soto of DeltaSoft and Sol Reisberg of Schrödinger. Extended talks were given by BSSN Software, IDBS, DeltaSoft, and InnoCentive in the main technical program.
I2E Chemistry combines the search capabilities of JChem Base, Name to Structure and Document to Structure with the natural language processing (NLP) capabilities of Linguamatics’ I2E. It offers an interactive text mining system for chemistry that can be used to identify chemical structures and understand their role within documents such as patents, scientific articles, and internal reports. Using NLP, textual information associated with a chemical structure can be found and extracted in a structured format.
Core Informatics provides Laboratory Information Management Systems (LIMS), Electronic Laboratory Notebooks (ELN), Scientific Data Management Software (SDMS), and collaboration solutions, to customers across multiple industries. Underlying each of the products is Platform for Science (PFS), a web-based, scalable and extensible informatics platform. PFS systems require no client software installation and are available as hosted or installed solutions configured to meet customers’ needs without custom coding. Core Informatics’ solutions incorporate many ChemAxon technologies.
John McNeil & Company produces integrated solutions for life sciences. Schrödinger is the exclusive reseller of these. The compound registration integrates with LiveDesign and Seurat. Also available are the Assay Capture & Analysis System (ACAS) and its Simple Experiment Loader (SEL); a biologics registration system; ELN Protocols (in ACAS); ELN Experiments (a lightweight ELN); a plate-based HTS analysis module; and a curve fit tool. Additionally, there is a Contracts Manager module. The company also offers professional services, and has been a ChemAxon integrator since 2008.
DeltaSoft supplies a fully integrated suite of applications for discovery research. The company has been a ChemAxon partner since 2005. The ChemCart platform, powered by ChemAxon technology, includes ChemCart Registration, ChemCart BioAssay, ChemCart ELN, ChemCart Reagent Inventory, ChemCart Sample Inventory, and a Structure Activity Browser. The solutions can be used in the cloud or on-premises.
Schrödinger’s LiveDesign is a browser-based platform for collaborative drug discovery informatics: a “Google docs for chemistry”. It enables medicinal chemists to do basic modeling, and track compound ideas and status, and it facilitates communication across project teams. Data visualization (a 3D viewer, and plotting) and predictive modeling are integrated. Users have access to both experimental and predicted property values, they can do 2D and 3D model predictions for real and virtual compounds, and they can run models in a single click. Each plot is integrated with LiveReport: the plot updates in real time with new data. Hovering the mouse over a point produces a “form view” of data for the relevant compound. LiveDesign can be integrated with Schrödinger’s modeling software. MarvinSketch and Plexus enumeration support the user interface, and JChem Base standardization and searching, and ChemAxon property prediction tools support the backend.
Analytical data come off many instruments, in many data formats, from many analytical techniques. The data end up in silos, not linked to chemistry. Total cost of ownership would be reduced by reducing the number of software tools deployed. To address these challenges, BSSN Software and ChemAxon are partnering to bring analytical data to ChemAxon tools, and provide best-of-breed chemistry and cross-vendor analytical data capabilities, plus an integrated experience of analytical data in a chemical context.
Burkhard Schaefer of BSSN Software said that the partnership had begun with Instant JChem and mass spectrometry, and would be extended to Plexus Suite and HPLC, NMR, IR, UV, and Raman spectra in the future. Joint products are an off-the-shelf plugin for Instant JChem and, later, for Plexus Suite. In a reseller relationship, BSSN Software’s Seahorse Scientific Workbench will be available from ChemAxon as an upgrade.
The initial project, in collaboration with PepsiCo, involved a database of structures and spectra, searchable by structure or spectrum. A new widget, bound to spectral data fields, is added to the collection of widgets in IJC. A compound, its data, and its mass spectrum can be displayed together on one screen. Spectra and chromatograms will be available in IJC forms and tables later this year. In future, spectra and chromatograms will be available on Plexus views.
The Analytical Information Markup Language (AnIML) data standard is at the heart of this application. AnIML is an emerging ASTM XML standard for analytical data, for multiple analytical techniques, possibly combined. It allows sample and process data to be captured, and supports audit trails, digital signatures, and validation for regulatory compliance.
The Seahorse software comes as thick client, thin client and mobile apps for analytical data access. There are Windows and Mac versions of the desktop solution, Seahorse Scientific Workbench. Seahorse Web Edition does not yet have all the features of the desktop version, but it is an HTML5, cross-browser product requiring no plugins; users can jump to the desktop application with one click. The mobile app (for Android and iOS) has low bandwidth and CPU requirements, and no data are kept on the device. Seahorse supports most open standard formats (including AnIML and JCAMP-DX) and a growing list of proprietary (vendor) formats.
Scott Weiss of IDBS outlined the rationale behind his company’s latest partnership with ChemAxon. IDBS’ flagship products are E-WorkBook and ActivityBase. The company does do chemistry. ActivityBase has chemical registration, assay execution and data management, and SAR reporting. E-WorkBook has a chemistry ELN, ChemBook (with stoichiometry, reaction planning, reaction enumeration, and ActivityBase registration); a chemically aware spreadsheet with property calculators; sample testing and results management; analytical system integration and data management; and reagent inventory management. Nevertheless IDBS has a functionality gap. The company also needs to move its platforms over to the cloud because the industry is moving away from the desktop. How could all this be achieved as quickly as possible? In addition, user expectations have changed in terms of ease of use, so IDBS is rethinking its user interface. IDBS also needs to target the virtual laboratory of the future. E-WorkBook 10 is supporting diverse chemical market verticals, and an IDBS chemistry decision had to be made. IDBS is excellent as regards biology but is only number seven in the chemistry market.
The company wanted to deliver an end-to-end solution across chemistry and biology. Updating IDBS’ own technology to deliver web and cloud products would have been expensive and time consuming; web-enabled components with a known name were needed soon. IDBS also wanted to acquire an existing chemistry customer base for cross-selling, immediately. So IDBS made a strategic technology partnership with ChemAxon which sees ChemAxon’s chemical sciences software, including chemical indexing and web rendering, integrated into E-WorkBook and ActivityBase. The ChemAxon indexing technology will be used to store all structural chemistry data and make them searchable across the IDBS platform. The move also accelerates IDBS’ web migration. Why did IDBS choose ChemAxon? The two companies’ existing partnership helped develop ChemBook for the desktop. All IDBS’ primary customers named ChemAxon; ChemAxon clearly had a sterling reputation as a technology and services provider. There was also an excellent technology fit.
E-WorkBook Connect, a chemically aware collaboration portal is scheduled for October 2015, E-WorkBook Inventory for November, and E-WorkBook 10.1 for December. The “ChemBook” ELN on the web should be ready in the first half of 2016, together with the updated chemistry registration system. ActivityBase Web is planned for release in the second half of 2016, completing the legacy IDBS chemistry migration.
Matthew Pustelnik of Global Blood Therapeutics (GBT) entitled his talk “Building a symbiotic informatics platform to support drug discovery” or “Getting heterogeneous tools to ‘play nicely’ in the informatics sandbox”. There are a great many vendors and tools in GBT’s “sandbox”; Matthew had to decide how to fit them together to support his various scientific constituents (e.g., chemistry, biology, and pharmacology), and he had to think how the tools interact together to support various workflows.
As GBT is a start-up, Matthew had to “wear many hats” so he focused on key important tasks, and prioritized others. An informatics strategy and architecture was established with senior management buy-in. GBT planned to use Office 365 as the central core and add chemistry functionality through ChemDraw, JChem, IJC, and JChem for Excel; data analysis via Prism and R, and XLFit; and data visualization via Excel and Vortex. Data persistence and storage are handled by an Oracle database through Office, and external integration services using custom Java code.
The research informatics workflow is overlaid on this foundation. In a small molecule project, various knowledge sources (patents, scientific literature, seminars, or modeling) initiate the formation of concepts and some of them materialize into compounds which are tested in assays. Assays generate data that are processed to produce results that are used in progressing compounds. Throughout this iterative process the informatics systems capture data into the warehouse.
Matthew gave four examples to illustrate symbiosis. The first was the structure drawing and name service. Here a toggle button displays a structure drawing form using the ChemDraw component, along with a single text entry box that accepts various chemical identifiers, that calls out to an identifier Web Service, and returns the molecular structure. The identifier service is based on ChemAxon’s Name to Structure with customizations to support GBT-specific identifiers. After the user has drawn a structure, another Web Service (created using JChem) is called to display the image of the drawn structure into the toggle button.
The interface pattern for GBT’s compound searching and registration system is also used in the chemical reagent and concept management systems. In the top section are search input widgets, and below that are search results. Selecting any of the search result records will display the record details in the right frame (structure, properties, registration information, radar plot, etc.). At the bottom of the form, the status bar allows the user to toggle between various modes (viewing versus registration) and data display (compound versus lot). Rarely do chemists draw structures at registration; they select them from the sources (ELN, Concept, Compound) frame which populates the registration details. In most cases the chemists just scan the vial barcode and click “register”. The players involved in this tool are Agilent Technologies, PerkinElmer (ChemDraw), Microsoft (Excel), and ChemAxon.
Matthew’s next example was the HEMOX data workflow. One of GBT’s primary assays measures blood oxygen equilibrium curves. The result output of the HEMOX analyzer instrument proved to be sub-optimal, but the raw data file output had a wealth of information that GBT used to drive a new calculation algorithm that produced a more accurate assessment of patients. GBT enabled the clinical CROs running patient samples to use the new calculation by means of the cloud, using Dropbox to store the data temporarily. Players involved in the HEMOX solution are Linux, Java, R, Excel, and Dropbox.
At GBT, standard assay data analysis starts with instrument file exports which are processed to produce assay results. The automated data analysis engine is based on Excel and XLfit which output quantitative values and visual charts that are reviewed and verified by a scientist. Only verified results are broadly available outside the assay group. The result publication process is linked to data capture into the corporate scientific data warehouse. Thus the assay data management system is an invaluable tool when publications such as an IND, journal article, or patent are being drafted. In the system, assays can be tracked and searched; results and graphics can be displayed. For plate-based assays, standard layouts enable biologists to establish new assay types. Instrument definitions are externalized into a JSON configuration file so instruments can be added or edited. Users input samples and select instrument files to initiate the analysis, the data are analyzed, and the results are displayed in Excel. The players for assay data analysis are BMG Labtech, PerkinElmer, IDBS, and Excel.
Compound progression workflow is the process in which compounds are systematically evaluated along the drug discovery process for their potential to become a clinical candidate. It requires an informatics platform which helps project teams focus on the most important compounds. Project leaders create a data packet for the compound progression meeting which includes all project compounds divided into status sections, with all the summary data and results for each compound. This PDF file is emailed out to the project team for review before the meeting. At the meeting the project leader projects the decision capture tool that mimics the PDF data packet, and will capture any status changes and action assignments made by the team. The summary of the changes is emailed to the project team, so assay owners know which compounds need to be tested, and action owners can prioritize and track. The players in the compound progression workflow are Adobe, Excel, ChemAxon, Java, and Microsoft Office.
In summary, by using Office 365 as a foundation, GBT built add-ins integrating standard industry software tools; replaced costly, dated LIMS; established a framework for third party tools to participate in the informatics ecosystem; reduced licensing costs, and maximized the return on investment of core software tools; and streamlined drug discovery processes, in order to fulfill the company’s vision.
Lee Schaller of GSK described how GSK has worked closely with ChemAxon scientists to develop the Plexus Connect tool. The initial replacement for ISISBase in GSK began in 2009 and the first version using ChemAxon software was released in 2011. ChemAxon converted about half of the 400 H-views to IJC projects like-for-like. Bespoke tools were to create test-production environments. Unfortunately, slow US performance necessitated Citrix. The JChem Web Browser was evaluated in 2013, but the GSK database architecture (joining across multiple data sources) made performance slow. Since then, GSK has been working with ChemAxon on the Plexus roadmap and vision, since web-based tools allow for more rapid deployment, easier support, fewer desktop conflicts, and easier migration to touch interfaces, and also address architectural and security problems with desktop data analysis.
GSK also has a partnership with Schrödinger on LiveDesign, and released the software in GSK in the second quarter of 2015. LiveDesign integrates Marvin JS. GSK has licensed Plexus Design for use in Markush enumeration and reaction-based enumeration. GSK and ChemAxon collaborated on “IJC Web” with Plexus Connect. Project form load took 10 minutes with IJC in November 2014, but only 10 seconds in March 2015 with Plexus Connect. In August 2015 phase 1 of IJC Web was initiated to deliver the five most used IJC projects via Plexus Connect.
A web-based chemistry desktop gives a more unified experience. Performance is enhanced because the Plexus Connect server is co-located with scientific data. There are fewer application conflicts and dependencies such as Java: Marvin JS has replaced Marvin Beans for structure editing and rendering, is used instead of IJC for search, analysis, and reporting, and Plexus enumeration is used instead of bespoke solutions. The system is easier to update globally: there is no need to script a new client for Citrix deployment and updates are deployed once via the server. Opening a project took 2-5 minutes with IJC over Citrix; it takes less than 30 seconds with Plexus Connect. The web interface for form-based searches and visualizations looks similar to the IJC thick client version, but it has been improved and will include a Marvin JS query builder.
In phase 1 of the Plexus Connect deployment, molfile V3000 search and visualization capabilities will be added, and a selection of prioritized features will be included. In phase 2 (to be completed by July 2016), the rest of the 170 or more IJC projects will be moved to Plexus Connect; the IJC thick client and JChem Cartridge will be updated; the Citrix interface will disappear; 90% of desktop IJC Java dependency will be removed; and additional features will be added.
The future chemistry desktop will have integrated interfaces. There will be a web-based editing tool. Plexus Connect searches will be “bookmarkable”. The server will take advantage of connection pools to improve search performance, and more web-based tools will allow for more rapid deployment, easier support, fewer desktop conflicts, and easier migration to touch interfaces. The vision is, however, complicated by multiple ChemAxon versions across the GSK infrastructure. Despite a complicated architecture at GSK, ChemAxon and GSK are making progress toward delivering the vision of a web-based chemistry desktop.
Yingyao Zhou of the Genomics Institute of the Novartis Research Foundation (GNF) described the redesign of an important application at GNF. The two most used cheminformatics applications in drug discovery at the company are an IJC-like SAR analysis application, and the so-called “Compound Report”. The SAR application shows a large activity matrix with multiple compounds as rows, and multiple assays as columns. Since not all assay data can be represented as numbers, and are suitable for the matrix display, Compound Report acts as a complementary tool that displays all the data known about a given compound. Previously an HTML report like an ISIS form, rendered by a server-side CGI, was used for this purpose.
One improvement was to accommodate different display form factors. Dynamically adjusting the number of columns to eliminate horizontal scrolling bars leads to a much improved user browsing experience. In the JS architecture, instead of server-side rendering, the client obtains underlying data from the server, and takes care of all the rendering work. Because of this, the client JS can dynamically determine the dimension of the current browser space, and determine how many columns the report page should adopt. Users find it difficult to browse data if they have to use both horizontal and vertical scroll bars. By choosing the right number of columns, the need for horizontal scrolling can be eliminated in most cases, leading to a much improved user experience. The application might choose one column on an iPad, and three on a large PC monitor. As the PC browser window resizes, the number of columns may vary as well.
Another feature of the new design is “lazy rendering”. Chemists often use Compound Report to browse data obtained on dozens of new compounds they synthesized recently. The page may not render until all data for all those compounds are rendered, which led to a poor user experience in the past. Now, the data for the compound within the field of vision are rendered, and if data for other compounds are not visible, and two seconds have passed, the rendering process stops. This leads to very fast display of web page content and time is not wasted on rendering data the user does not want to browse. When the user scrolls down to a particular compound, data are then retrieved and rendered. This lazy rendering of content, one page at a time, produces a much improved browsing experience. Popular compounds have a great deal of associated data. In most cases, users do not need to read all the data, so only the 1000 most recent data points are rendered by default (unless the user chooses “Show all data”).
It was important to enable search and filter on the Compound Report, because users are often interested only in specific data items. Since many compounds must be rendered on one report page and each report contains about eight data grids, hosting dozens of data grids (180 for 30 compounds) quickly makes the page unresponsive. The solution is to provide a combination of a printer-friendly report and a powerful, interactive analysis interface. By default, data are rendered in non-grid format, but users may click to open a popup window, where data are displayed within a data grid; users can search, filter, export, and then update the results in the main report.
If the user is interested in finding out which of the dozens of newly profiled compounds is most active in a certain assay, the standard “list view”, with one compound per report and static sections, is not adequate. GNF therefore designed a “merge view”, where data from all compounds are combined, section by section, into sortable, filterable grid views. The user can choose either list view or merge view. Filtering data in one section of a grid reduces matched data entries in other sections. This offers a compact SAR interface, where more data types can be handled, including those not most suitable for the matrix format.
Compound Report contains a compound summary section, where the JChem Base library and JChem Cartridge enable users to render compound structures, calculate in silico properties, and carry out substructure search. They can also use a drop-down list to expand a compound quickly into its analogue list and browse the analogues within the same Compound Report page.
The Chemours company was created July 2015 from the DuPont performance chemicals businesses. Yvonne Shimshock of DeltaSoft gave a talk about the systems that DeltaSoft is developing for Chemours. DuPont has been a DeltaSoft customer since 2004. After a competitive bidding process with many vendors, Chemours chose the DeltaSoft ChemCart platform for its robustness, flexibility, ease of use, ability to support large worldwide environments, and ability to integrate with ChemAxon components.
Data needed to be migrated from the Chemical Information Management System (CIMS) and the Electronic Document Library (EDL), and applications were needed for compound registration, compound browsing, document submission, document search and request, document approval, and thesaurus maintenance.
For CIMS, requirements were to replace Dassault Systèmes BIOVIA components with ChemAxon components; track samples and properties; set up a customized registration interface for new groups on-the-fly; demonstrate good performance, even with complex structures and Fine Grained Access Control (FGAC) security; integrate the system with the EDL and Lightweight Directory Access Protocol (LDAP); and represent all materials of interest, including mixtures and complex polymers. A new EDL system was needed to store, index, and search both document content and metadata; make requests; approve requests and deliver the document securely; manage the thesaurus used for automatic generation of keywords; provide a single document storage location, accessible company-wide; link to CIMS; and integrate with LDAP. Work began in January 2015 with the aim of having systems in place in July. The strategy was to use the ChemCart Platform and ChemAxon tools; install systems in the cloud; develop in parallel; and work with users to develop applications, bringing them in-house once ready.
The DuPont ChemCart CIMS system has applications based on the ChemCart server, which sits on the CIMS server, under ISIS Direct, which accesses an Oracle database plus the Direct cartridge, through Cheshire and ISIS PL from Dassault Systèmes BIOVIA. Chemours CIMS has applications based on the ChemCart server, but the CIMS server uses JChem, chemistry is added to the Oracle database through the JChem Cartridge, and Cheshire and ISIS PL are replaced by ChemAxon’s Standardizer. DeltaSoft identified the business rules needed for CIMS structure standardization and replaced the 20,000 lines of proprietary Cheshire code with Oracle-stored procedures that call JChem Standardizer functions. These are integrated with the Structure Checker API. The registrar is able to by-pass the rules if necessary. Novelty checking uses configurable ChemAxon Structure Match, instead of Flexmatch, and includes the components of multi-components. Molecular formula search is available for incompletely described substances and “non-routine” compositions. Business rules were verified by users before the chemistry migration, and then a molfile column in Oracle was generated using a Direct function, and a JChem index was built on the molfile column. The interface to Chemours CIMS looks just like the CIMS interface at DuPont; users really appreciate this.
In the EDL system, thesaurus administration, document submission, document search and request, and document fulfillment are built on the ChemCart Server, which accesses the Oracle Database (plus text cartridge) through Oracle WebCenter Content. Users have requested that ChemAxon Document to Structure be added. The EDL submission system differentiates required and optional fields; PDF and other file types can be dragged and dropped; there are controlled vocabulary picklists; keywords are generated automatically, but a user can modify or suggest new keywords for the thesaurus; and there are links to the thesaurus viewer, EDL search, and CIMS.
ChemCart design mode was used in conjunction with users to design the EDL search screen. The system searches by form on any field, or combined fields. There are controlled vocabulary picklists. A new search type using WebCenter Content allows full text and metadata to be searched in one search, with Boolean operators. The system also features hitlist management, report generation, and links to the thesaurus viewer and CIMS. The EDL request system allows justification of the request, and notifies the approver of the business. Once the request is approved, a PDF is delivered to the user. A status workflow is included. The Oracle thesaurus provides a data model and procedures to create relationships between terms, and the ChemCart thesaurus application was put on top. To build the database, documents were batch-loaded through WebCenter Content, and metadata were bulk loaded using Oracle import.
CIMS and EDL were built in the cloud, and then went through quality assurance, before being put into production. No deployment was needed. Web-based training of teams around the world was carried out. DeltaSoft and Chemours have received very positive feedback from users about CIMS and EDL.
Patcore is ChemAxon’s exclusive distributor in Japan. Fumiaki Aruga of Patcore made interesting comments about the Japanese market. Japan an attractive market for pharmaceutical companies; it is the second largest drug market in the world. The country is also the “most aged” nation in the world: 25% of the population is older than 65 years old. Japan is the third largest new drug creator in the world, according to 2010 figures. Thus Japan could be one of the key markets for ChemAxon.
Ten years ago, MDL had an almost 100% share of the cheminformatics market in Japan. Other vendors were offering some enterprise cheminformatics solutions, but they were not strong enough to replace MDL products at that time. Before ChemAxon entered the Japanese market, Japanese users had no choice but to go for MDL, but customers were not happy, because the price was high, choices were limited, and solutions were not flexible (e.g., ways of development such as PL language were limited).
This provided Patcore with an excellent opportunity to propose ChemAxon. One day, a research IT person from a major pharmaceutical company approached Patcore looking for an alternative to MDL’s Cheshire. Patcore checked ChemAxon’s Standardizer and found out that it would work nicely. That led Patcore to sell Standardizer, and the customer was happy, as the cost was significantly reduced. Ten years later, ChemAxon has about a 60% market share in enterprise cheminformatics in Japan. Japanese users are enjoying the benefit of the free competition. Patcore provides freedom of choice for Japanese users, in terms of business conditions as well as a development environment, operating systems, and even database management systems.
The research IT environment is different in Japan from that in western countries. Firstly, Japanese clients tend to have fewer IT resources in-house, even in a large pharmaceutical company. The majority of software engineers (75%) work for IT vendors in Japan (compared with 28.5% in the United States) so Japanese clients tend to rely on IT vendors, regardless of industry. Secondly, Japanese customers are conservative and tend to avoid changes in their business workflows. Since they stick to existing workflow, the requirements are very specific to each company. Finally, customers’ expectation of quality is very high: the vendor will get into hot water when the client discovers bugs. Patcore took into account these characteristics in order to market ChemAxon products.
Back in 2005, ChemAxon had no end-user solutions such as IJC or Plexus. To meet the needs of the Japanese market, Patcore has built packaged solutions on top of the ChemAxon products. The controlled substance check system called CRAIS Checker was the first successful application, and now it is offered by ChemAxon as the Compliance Checker. In recent years, integration solutions for ELNs are being well received. Patcore has about 50 customers in various industries. The biggest clients are pharmaceutical companies that have drug discovery laboratories. They are followed by agrochemical and biotechnology companies, and academic and governmental institutions. Patcore also serves many reagent suppliers.
Fumiaki discussed some use cases, beginning with ELN Integration. Patcore is offering modules to integrate with ELNs and inventory management systems, Compliance Checker, and Compound Registration. Keeping compliant is essential in any company in any country. Compliance Checker enables chemists with one click to check if a material is controlled. It checks reactants and reagents to remind chemists of safety. If the material is unchecked, data cannot be saved, so users have to go through the check. This integration is very popular in Japan: almost all E-Notebook users are using this combination to improve their compliance level.
The registration application is one of the most difficult parts in standardization as packaged software in Japanese customers’ environments. Theoretically, it can be unified as one application package, but, in reality, it is not so simple. Most of the Japanese users do not want to change their existing workflows, so a lot of customization is required in implementing a registration package. So Patcore has developed configurable modules for registration workflow, modifying numbering systems, and salt or solvate processing, and adding compliance check. Users can submit compounds from PerkinElmer’s E-Notebook for registration. Properties fields are fully configurable. This user interface has been built by Fujitsu, a distributor of Perkin-Elmer. Thanks to collaboration with ELN vendors, Patcore can offer integration and support services to clients.
Patcore also has clients in sectors other than the pharmaceutical industry. For instance, the National Institute of Advanced Industrial Science and Technology (AIST) is one of the largest governmental institutions in Japan. It is using JChem Base and Marvin to build a freely available, structure-searchable, sugar chain database with about 6,000 glycans and 78,000 glycosides. Marvin is very convenient for this database because it supports custom templates. ChemAxon products are very popular among reagent suppliers, too, for structure searching. Reagent suppliers are using ChemAxon products internally as well as on their commercial websites.
DART NeuroScience (DNS) searches commercial catalogs to provide “novel” chemical matter for projects. Scenarios include “SAR-by-catalog”, scaffold-hopping orders, and fragment-based orders. Jesper Soerensen said that historically, DNS ordered 10,000-30,000 compounds sold as a library for a class of target. The problem with this is that other companies are acquiring the same libraries, and everyone is buying inactive compounds. From 2013 onwards, the new goal is to use scientific computing, and order 60,000 compounds at a rate of about 2,000 per week. This is more efficient. Each of the smaller orders is based on a scientific hypothesis, and automated validation and statistics are added.
Ordering from catalogs presents challenges. Purchasing department has the effort of negotiation. Scientists’ time is wasted on purchasing. Receiving the samples requires effort: compound management staff have to re-plate mini-tubes by project, and scientific computing staff have to deal with inconsistent data file formats. In addition, duplicate compounds may be ordered. One solution is to limit suppliers to, say, a subset of eMolecules, order more mini-tubes per plate, send pre-tared vials and plates to vendors, and negotiate for a lower price based on a minimum number of samples, with a standardized data file format.
“Catalogs” containing multiple vendors are innately messy. Curation involves different file formats, different numbers of data columns, different chemical structure file formats, and, crucially, lost stereochemistry in some cases. This is where the JChem API came to the rescue. Compounds can be standardized by desalting and keeping the largest fragment, clearing isotopes, aromatizing, enumerating stereocenters (or requiring that they be defined), tautomerizing (at the relevant pH), adding explicit hydrogens, and cleaning in 2D.
The method used for compound selection depends on the scenario. Substructure searching is easy via JChem. DNS chooses to use alternative methods, but these work best when ChemAxon software is used for post-filtering. Filtering is necessary to remove insoluble compounds, compounds with undesirable properties, duplicates, and compounds that are already “known” (i.e., add no new value) to the project. As a standard, DNS removes duplicate stereoisomers, compounds with poor properties (e.g., by using Rule of Five), reactive compounds (e.g., using PAINS filters), insoluble compounds (using a knowledge base), compounds already owned by DNS, and compounds already in the order queue. Most of this can be done by substructure searching via the JChem API.
Diversity filters are used to remove “known” compounds. Unfortunately a scaffold is not an objective quantity. You cannot automate something that is not objective, but you can make best scientific efforts. In an internal study, five chemists, in three projects were given a set of 200 molecules “about to be ordered” and asked to exclude scaffolds similar to a set of five “actives”. The JChem filters (Bemis-Murcko atomic scaffold, graph scaffold, and fingerprinting methods) were tested to see which agreed with the majority opinion from the chemists. It turned out that chemists’ opinion is project-specific and relies on prior knowledge, i.e., it is subjective. Fortunately, a fingerprinting method (undisclosed) available in the JChem API, if used with specific settings, agreed with the majority of the chemists.
Jesper described an example of the use of the diversity filter. He was to order novel compounds for a target, use different diversity metrics, and generate 3,000 compounds per method. The selection method returned 100,000 ranked compounds, which were reduced to 65,000 by the standard filters.To test the Bemis-Murcko method, 65,000 compounds were ranked against a target. Jesper took about 650,000 molecules tested for that target and reduced them to 186,584 Bemis-Murcko frameworks. He then compared the frameworks to the ranked molecules to order them and he excluded duplicates. The method filtered 65,000 molecules to 34,000. In a similar procedure the fingerprint method filtered 65,000 to 7,000 molecules. Jesper showed a plot of cumulative count against compound rank, showing that the fingerprint method was indeed a lot better than Bemis-Murcko, and it led to three hits (with EC50’s of 386nM, 588nM and 621nM), at least one of which has a good scaffold for modification.
Jesper concluded with some screenshots from the compound ordering tool. Computational chemists produce an order queue for input to “Kevin Neal’s magic algorithm” which produces a purchasing requisition. Automatic emails are generated from order requests. An order dashboard shows the status of orders. A vendor commitments dashboard shows price and delivery time statistics.
Work has changed from the artisans of the 19th century, through the hierarchies of the 20th century, to the networks of the 21st century. InnoCentive accelerates innovation by delivering creative insights and novel solutions for customers through crowdsourcing and open innovation programs. The business model has moved from traditional to transformational: networked and quick, finding and using the best minds anywhere, using the wisdom of the crowds. This is a journey in innovation: “the world is my lab”, the challenge is asked, and inside and outside “ecosystems” are used. Crowdsourcing brings greater diversity. It harnesses the long tail of outside resources who might know a solution. There are three types of crowdsourcing: micro-task crowdsourcing, idea crowdsourcing, and solution crowdsourcing. Employees, invitational networks, and external communities can provide input. InnoCentive@work software encourages collaboration, and enables social connection between employees and invited networks.
The use of Marvin JS within InnoCentive@work is exemplified by AstraZeneca’s open innovation program. AstraZeneca is a client of InnoCentive and of ChemAxon, and uses the Marvin JS features. AstraZeneca writes a description of a challenge on the website and adds a structure using Marvin JS. Marvin JS is used by all of InnoCentive’s customers.
The benefits of crowdsourcing are knowledge capture, novel thinking, speed, and engagement. These lead to business results such as new products, markets, and services. These in turn inform return on investment: revenue enhancements, cost savings, and time to market. As the proof of concept validates itself, InnoCentive will develop extended capabilities. They will add capabilities such as calculations and editing; expand the system as a suite of tools for external use, and “crowdlabor” chemistry services.
Medicinal chemists are responsible for a variety of activities on a daily basis. They are extremely busy and have a lot of responsibilities. The keys to adoption of new tools are simplicity and effectiveness. Sami Bahmanyar, Joe McDonald, and their colleagues at Celgene wanted to provide medicinal chemists with desktop tools that catalyze a shift in thinking about idea generation, and testing hypotheses in drug discovery. Instead of only relatively few compounds representing ideas to fuel next synthesis, the company wanted to facilitate the enumeration of a larger number of compounds upon which to test hypotheses. They saw that Plexus Design enumeration might be the answer, and integrated it with Optibrium’s StarDrop software, as a menu item with options to run Plexus scaffold enumeration, run Plexus reaction enumeration, or load a dataset into StarDrop. A widget in StarDrop accesses the Plexus web server.
Sami first described a medicinal chemist’s workflow for generating reaction-based virtual libraries. The chemist inputs an idea; selects a building block collection; carries out a substructure search of the building blocks (adding appropriate filters); selects the list as the building blocks; defines the reaction, generates a virtual library; and pushes the results to StarDrop. Before uploading the building blocks into Plexus, there is a building block “preparation” stage (desalting, calculating descriptors, etc.) which is carried out by the computational chemistry group in Pipeline Pilot. It would be great if this were automated in Plexus and occurred upon import.
Upon import, each file becomes a new list entry in the table. There is no obvious way to arrange the building blocks in a specific order. The enumerated libraries are stored in the same table as the building blocks. Searching for specific entries and lists in the table is not possible. Plexus performance diminishes as the number of entries and lists increases. As part of the searching menu, Celgene asked ChemAxon for a “filtering” option based on vendors and other descriptors. Searching on multiple vendors simultaneously is not possible. There is no obvious way in Plexus to save a subset of building blocks from a search. Instead, the user has to export and re-import their selected set of building blocks as a new list (or redo the search each time). New features in Plexus would be extremely useful to the chemists. The “refine” search button, used when searching and selecting building blocks, works only for the entire building blocks list; the user cannot further refine a search within existing search results to eliminate (or keep) certain subsets. Chemists cannot save a table and go back to it later. There is a hiccup with “aliphatic or aromatic amine”; sp2 and sp3 are not differentiated.
Currently, there is no obvious way for the user to access saved reactions and manage them (e.g., view reactions that others have saved, and delete historical reactions). This means that the users might repeat the same work. Saved reactions should ideally be organized and transparent to the user. Defining a reaction in Plexus is slow because Plexus exhaustively searches for reagent 1 and reagent 2 features in each building block; Celgene learned that atom mapping is more efficient.
Celgene requested a few features to facilitate downstream analysis of the library and purchasing of building blocks. In the final table of enumerated products, virtual compounds should have unique ID numbers, and their attributes should be labeled accordingly (e.g., R1: name, R1: SMILES, R1: Catno, etc.). The results cannot be copied from this window displaying the table of enumerated products: the results should be saved in a table that facilitates sending data back into StarDrop for downstream analysis.
Next Sami described the medicinal chemist’s workflow for generating scaffold-based (Markush) virtual libraries in Plexus. The chemist draws a scaffold and adds connectivity points for R1, R2, R3 etc., and then defines R1’s, R2’s, etc. (Here Celgene insisted on a “save” option.) The sketcher compatibility is with Marvin, but many chemists prefer to use ChemDraw at Celgene. They have the ability to import CDX files, but then they have to edit each entry individually (redefining connection points, etc.). This is very time consuming. At the enumeration stage, the “save” button allows the chemists to return to editing the R1’s and R2’s, etc., but these also appear in the same table with building blocks and enumerated libraries which can cause confusion to the user. The library is finally generated and the results are sent to StarDrop, but, again, the results cannot be copied from the relevant window.
Synthesis is the next step after scaffold enumeration. Chemists always want to know what is commercially available first, before thinking about custom synthesis of building blocks. It would be very useful if the chemists could use the R1 and R2’s to search for commercially available building blocks.
Sami presented his ideal workflow for reaction-based virtual library enumeration, starting with a database of building block files that are “cleaned-up” and have descriptors for the reagents precalculated. Reagents that exist as duplicates (various salt forms, etc.), and reagents with known protecting groups should be tagged. Chemists should be able to search all the database of building blocks, or select specific vendors, for their chosen scaffold. The next steps are:
• select a subset of building blocks and save it
• generate a virtual library and tag duplicate molecules
• triage the virtual library
• cross-check the triaged library against the internal library of compounds and remove duplicates
• register virtual compounds
• cross-check the virtual compounds against building blocks available in-house
• order vendor compounds
• order in-house building blocks
• execute synthesis.
The ideal workflow for scaffold-based (Markush) virtual library enumeration, starting from a desired starting point scaffold, is as follows:
• draw R groups and save them
• generate a Markush library
• triage the virtual library
• cross-check the triaged library against the internal library of compounds and remove duplicates
• register virtual compounds
• extract the “building blocks” from the elaborated molecules
• cross check the building blocks against the databases of available building blocks, in-house and from vendors
• order vendor compounds
• order in-house building blocks
• execute synthesis.
In conclusion, Plexus Design is a good starting point as an enumeration tool: it gets the job done, but based on the feedback received from the medicinal chemists, there is room for improvement.
As I said earlier, this meeting was a little different from previous ones. The balance between user talks and talks from ChemAxon and its partners was non-ideal, but there was still a lot of interesting material. What I did miss was a definite conclusion to the meeting. For someone in my business, a concluding “state-of-the nation” address by the CEO or, failing that, an SVP of business development, is a sine qua non. When I spoke to Csizi (i.e., the CEO) he assured me that no one wants to hear such a talk, that ChemAxon has not changed, and so on. He may well think so, but market perception has certainly changed, the company is at a turning point, and rumors need to be scotched. The opening talk outlined the company strategy, but more was needed. I wanted to hear more about company growth, revenues, the number of customers, agile development, flat management structures, staff turnover, and so on. Readers of this report (including ChemAxon’s competitors, who may well be the biggest rumor-mongers) want to know these things. The closest the meeting came to a corporate PR exercise was the interesting talk by Fumiaki Aruga, who made it clear that ChemAxon now has the biggest market share (at least in Japan), and is perceived to be giving users a choice, by supplying flexible solutions, at a reasonable cost. Once a company really dominates the market it can become unpopular and prices start to rise. Where is ChemAxon now on the S-curve? That is the question that I have to answer each year. Perhaps Fumiaki Aruga should give the closing talk next time. It would be nice to close with some cherry blossom.