Computational toxicology – in silico assessment of the hERG channel inhibition potential for the early drug cardiotoxicity testing

Introduction – Acquired long QT syndrome (LQTS) can lead to fatal ventricular arrhythmia and one of the most common reasons for developing LQTS seen in clinical settings as Torsade de Pointes (TdP) are drugs. LQTS syndrome and TdP are principally caused by the inhibition of the potassium channels encoded by hERG (the human ether-a-go-go related gene). Drugs affinity to hERG channels and life-threatening interferences in heart electrophysiology resulted in withdrawal of many substances from the pharmaceutical market and some other drugs were black-boxed as potentially dangerous. Nowadays regulations describing potential drugs studies require in depth assessment of the hERG liability including various in vitro and in vivo tests. These conventional techniques are connected either with cost and effectiveness or ethical obstacles especially at the early stage of the drug development. Therefore accurate screening tests of the drug candidates become appreciable and are widely used. The main objective of the work was to develop a reliable in silico model for the prediction of drug concentration – potassium channel inhibition correlation based on the chemical structure and in vitro research description. Materials and methods – Database used for the modeling purposes was recently published and is freely available from the CompTox project website ( Input data were derived from the published in vitro experiments. Final set contains 1969 various records describing 200 drugs which were utilized during the modeling and validation levels. Enhanced 10-fold cross validation (whole drugs excluded from test sets) was applied. External test set of 193 records for both previously present (different in vitro settings) and absent in native dataset drugs (25 substances) was used for external validation. Random Forest (RF) algorithm with either 10 or 50 or 100 generated trees and unlimited tree depth implemented in WEKA software was used. Artificial neural networks were trained with use of back-propagation (BP) algorithm, various activation functions were tested. The input consisted of 39 parameters describing in vitro setting (8), physico-chemical properties chosen based on the sensitivity analysis results (30), and drug concentration (1). Chemical structures were drawn in MarvinSketch or downloaded through PubChem Compound database. Then structures were structurally optimized with use of molconvert command line program included in Marvin Beans package. Resulting sdf files were then subject to descriptor calculations. The total number of 107 numeric inputs (chemical descriptors) describing particular chemical compounds were obtained using cxcalc command line program with selected 41 plugins. Default settings were used in both cases of cxcalc and molconvert use. Output had a continuous characteristic – % of hERG channel inhibition (range 0 to 1) for defined drug concentration. Results – The RMSE of the best RF based models (10 and 100 trees) estimated in 10-fold CV was 0.24. Same procedure applied for ANN (15_7_5 logistic activation function) resulted with RMSE 0.23. Expert committee approach was tested – combination of RF10, RF100 and two ANNs architectures (15_7_5 logistic activation function and 40_10_5 logistic activation function) resulted in RMSE 0.22. All best models predictive performance were tested with use of the external dataset. Discussion and conclusions – Developed RandomForests based model due to its high specificity and sensitivity, relatively easy to obtain description of a new chemical entity as well as flexibility (user-friendliness) can be considered as the screening model for the cardiotoxicity testing purposes at the early stage of the drug development. Obtained results although being a subject to improvement can be used for further in vitro – in vivo extrapolation.