Integration of Chemical space: The Case for Use Cases
The integration of internal and external chemical information is a vital and complex activity for the pharmaceutical industry. First, finding non redundant data sources with comprehensive data sets and available chemical formats for merging internal and external chemistry data collections is a difficult activity. Assessing the yield of text mining approaches and the quality of human annotation is important in determining whether externally derived data sets are comprehensive or adequate to permit decision making on novelty, infringement, or patentability issues. The cost of procuring data, redundancy of information, and the labour costs of deriving accurate data sets are additional issues. That is the case of exemplified molecules; managing the integration and matching of Markush definitions represent more complex data resolution problems. The development of use cases act as a guide to prioritize the types of integrations that may be required, promotes methods to achieve these integrations, and identifies the level of confidence that can be expected from such operations to determine whether decision making can be based on assurance that the assumptions are correct. It is also important to develop cross disciplinary teams to develop use cases across a range of discovery chemists, informatics and patent practitioners, as their combined requirements are important to maximize the use of information and their individual usage or data handling skills and knowledge are vastly different. The cross disciplinary teams are capable of extending the variety of use cases, as well as extracting the meta knowledge of data sources and perspectives for using chemical information, and also allows ownership to be distributed between silos in the organization. This talk will describe certain use cases, methods by which they can be achieved, and the underlying data issues that make them operate to enable comparisons between internal chemical understanding and trends identified from external data collections.