Data Mining Small Molecule Drug Discovery

Despite having more information and technology than at any point in history, drug discovery is becoming harder. It is tempting to believe that there was ‘low hanging fruit’ in the past, and that previous generations had easier to treat diseases, simpler biology and a large number of drug-like leads to optimize. Regardless of the cause, there is now a pressing need to understand fundamental complex biological systems, especially those linked to disease pathology. The most definitive tools for illuminating biology for this are often small molecules, and there is now intense interest in developing, in a cost effective way, potent, well distributed and selective chemical probes, then applying these to understand the role of novel genes, potentially leading to a new medicine.
Underlying the development of chemical probes and drug leads, is what is known from the past, and what general rules can be learnt that are useful in the future. The presentation will detail the background and development of two large, now public domain, chemical biology databases – ChEMBL and SureChEMBL. These databases, in particular ChEMBL have led to the development of many new algorithms for target prediction, chemical library design, etc. Next four examples of data mining of ChEMBL and other public domain data will be described.

1) A framework to anticipate and integrate into compound design processes the effect of mutations in the target – this is of special importance in the area of anti-infective and anti-cancer drugs where resistance is a significant healthcare issue.
2) An analysis of drug properties according to target class for the antibiotics, where differences in physicochemical properties can be correlated in target properties.
3) Addressing the problem of target validation using genetics, which could de-risk the development of chemical tools and leads, and place novel targets into an appropriate therapeutic setting.
4) Is the concept of ‘Druggability’ real, or has it led to restriction in the number of systems that the community is prepared to work on?

Download slides