Predictive models to detect incorrect atom-atom mapping of reactions using condensed graph of reactions
Atom-Atom Mapping of chemical reactions represents a difficult task, because it should be related on the knowledge of the reaction mechanism. In fact, there exists no algorithm providing unique and definitive solution of this problem; one can speak only about more or less successful techniques. Here, we propose to identify the cases of incorrect mapping of combining Condensed Graphs of Reaction (CGR) 1 approach and machine-learning methods. A CGR condenses a chemical reaction (several molecular graphs of all reactants and products) into one only graph. Fragment descriptors generated from the ensemble of CGR could be used for modelbuilding.
In this work the metabolic reactions from KEGG database for the three first enzymatic classes were mapped first by ChemAxon tools, then manually, following the published mechanisms. Comparison of two mappings allowed us to select a dataset containing 95 incorrectly mapped reactions, split into the training (62 reactions) and test (33 reactions) sets. They have been completed by 62 (training) and 33 (test) correctly mapped reactions. Resulting datasets have been transformed into CGR for which the ISIDA fragment has been generated. Two methods -SVM and a rule learner- have been used to develop the classification models separating correctly and incorrectly mapped reactions. Application of the models on the test set demonstrates very high performance of the given approach: all badly mapped reactions have been retrieved. We believe that this approach could be useful to detect the cases of incorrect mapping for any popular mapping algorithm.
1 Chemoinformatics Approaches to Virtual Screening”, A. Varnek and A. Tropsha, Eds., RSC Publishing, 2008