Training of the correction library for pKa calculations
If you feel your experimental data could improve the performance of the default pKa calculator, you can take advantage of the supervised pKa learning method that is built into the pKa calculator. Special structural parts may have an effect on the pKa values calculated by the built-in method, so your correction library based on experimental data of your compound family helps the pKa calculator to increase the prediction accuracy.
How to improve the accuracy of the pKa calculation?
First, you need to see clearly which ionization center(s) was predicted inaccurately by the pKa calculator. You have to collect experimental data for that ionization center(s). The learning algorithm is based on linear regression analysis, therefore you need to collect a certain amount of experimental pKa data otherwise the regression analysis will fail. There is no strict rule how large pool of data is required to perform a reliable pKa training. If your purpose is to create a local model only for a certain type of chemical environment of the ionization center, then it may be enough to collect a few representative structures. A more robust model, however, requires as many diverse structures and pKa values of the ionization center in question as possible.The first step of the training process is the input of the collected data into an sdf file. After that, you have to run the training algorithm which creates a correction library from your data. This will be stored on your computer. You can use this correction library via MarvinSketch, cxcalc, Chemical Terms.
How to create a training set and generate a correction library
- Create a training set in sdf file (.sdf) format.
This can be easily done by using the graphical user interface of Instant JChem. Your sdf file must contain the following fields:- structure of the molecule
- pKa value 1 (field name: pKa1)
- ID of the atom which has the pKa1 value (field name: ID1). It can be viewed by checking the Atom number option in MarvinView (menu: View > Misc).
Example
The picture below shows the details of the training set (pKa_trainingset.sdf). ID1 is the index of the atom with the experimental pKa1 value (ID2 would be the index of the second measured pKa value /pKa2/, etc.).
- Generate the correction library
Execute the following command from command line:cxtrain pka -i [library name] [training file]
Examplecxtrain pka -i mypka mydata.sdf
Usage of the pKa plugin with correction library
- Select MarvinSketch menu:Tools > Protonation > pKa.
- Set the 'Use correction library' box to activate the training option (see figure below).
- If you have created multiple training sets, choose the most accurate one from the dropdown list below the checkbox.
MarvinSketch

The next figure shows the results with (I) and without (II) applying the correction llbrary.
|
|
| I. pKa calculation with training data | II. pKa calculation without training data |
|---|
cxcalc
To apply your corrections for the pKa calculation use the parameter --correctionlibrary or its short form: -L).
cxcalc pKa --correctionlibrary [library name] [input file/string]
Example
$ cxcalc pKa --correctionlibrary mypka "CSC1=NC2=C(N1)C=NC(O)=N2"
Result
id apKa1 apKa2 bpKa1 bpKa2 atoms
1 11.19 16.01 2.34 -2.59 7,11,9,4
If you use cxcalc pKa calculation without the correction library, the results will be calculated with the built-in dataset.
Example
$ cxcalc pKa "CSC1=NC2=C(N1)C=NC(O)=N2"Result
id apKa1 apKa2 bpKa1 bpKa2 atoms
1 8.34 16.01 2.34 -2.59 7,11,9,4
For more options see this page.
Chemical Terms
pKa calculation applying correction library can be performed via Chemical Terms from Evaluator command line or from Instant JChem.Chemical Terms Evaluator
The Chemical Terms Evaluator is designed to evaluate mathematical expressions on molecules. To use your correction library, the following expession has to be typed into the command line.evaluate -e "pKa('correctionlibrary:[library name]')" "[input file/string]"
Example
evaluate -e "pKa('correctionlibrary:mypka')" "CSC1=NC2=C(N1)C=NC(O)=N2"
Result;;;-2,59;;;11,19;;2,34;;16,01;
For more details see this page.
- Choose the 'New Chemical Terms Field icon' on the panel on the right side.
- Type the chemical term into the window, use the correctionlibrary:[library name] parameter. Do not forget to adjust the Name, the Type and the DB Column Name.
Chemical Terms in Instant JChem
Instant JChem is an out-of-the-box tool that allows scientists to create, manage and analyze chemical structures and their data. You can also apply your pKa correction library via Chemical Terms in it.Example
The following figure presents the usage of pKa training in the 'New Chemical terms' window. The expression
pKa ('correctionlibrary:mypKa type:acidic','1') defines that the plugin use the correction library named mypKa, and it will calculate the strongest acidic pKa of the molecule(s).
The part of the results of this calculation is presented on the next figure. You can see the difference between the untrained(column 5., Strongest acidic pKa) and trained (column 6., Trained strongest acidic pKa) pKa values.
Do you have a question? Would you like to learn more? Please browse among the related topics on our support forum or search the website. If you want to suggest modifications or improvements to our documentation email our support directly!
