Training of Calculator Plugins via cxtrain

Version 5.8.2

Contents

 

Introduction

    Calculation of property predictions (such as logP and pKa ) can be enhanced when experimental data are available for molecules that are similar to the target. Such user-specific information can be incorporated into so-called training libraries, which can be generated via the ChemAxon's commandline tool cxtrain. It is a part of JChem and Marvin Beans pogram packages. The generated training library, stored on the user's own computer, is later used by the calculator plugins for improving the prediction of properties.

 

Installation

    Download and launch platform specific installer by following the installation instructions.

Usage

    cxtrain <prediction> [options] [input file (training set)] 
    Prediction:
    pka                                   train pKa prediction
    logp                                  train logP prediction
    prediction                            train custom prediction
    General options:
    cxtrain -h, --help                    this help message
     -i, --training-id<training>          sets the training ID
     -l, --list                           list available training ID's
     -g, --ignore-error                   continue with next molecule on error
    pKa options:
     -V, --validation <filepath>          validation results file path
    logP options:
     -t, --tag <tag name>                 name of the SDFile tag that stores the experimental logP values
     -a, --add-built-in-training-set      add built-in logP training set
    Custom prediction options:
     -t, --tag <tag name>                 name of the SDFile tag that stores the experimental property values 
    The training is run by calling cxtrain as follows:
    cxtrain <prediction> [options] [input file (training set)] 
    where 'prediction' must be chosen from among "pka", "logP" or "prediction" (used for a custom property).
    There are general options,which can be used with each training type, and property-specific options as well.
    General options
    • Applying the option --training-id (-i), you can set the ID of your training. Afterwards, this ID will refer the given training during the calculation.
    • The available training ID's can be listed using option --list (-l).
    • --ignore-error (-g) skips the molecule on error and continues with the next correct one.
    pKa specific option
    --validation <filepath> (-V) creates validation data; the file path of the pKa training validation chart can be defined optionally.
    logP specific options
    • --add-built-in-training-set (-a) merges your data with the data from built-in logP training set.
    • Option --tag (-t) defines the name of the SDFile tag that stores the experimental logP values.
    Custom prediction option
    Option --tag (-t) defines the name of the SDFile tag that stores the experimental custom defined values.

Input

    The input of the software is a file which supports molecular properties (such as SDfile, MDL molfile, Compressed molfile, Compressed SDfile,).

The place of the training library

    The generated training library will be stored on your computer , and it can be used via Marvin, Chemical Terms, Instant JChem or cxcalc.
 

Usage examples

    • Training
    • This command trains pKa calculation, using the datafile pKa_trainingset.sdf and setting training ID to "mypka":
      cxtrain pka -i mypka pKa_trainingset.sdf
    • Calculation
    • The following example presents, how this generated training set can be utilized in pKa calcutlations via cxcalc :
      cxcalc pKa --correctionlibrary mypka "CSC1=NC2=C(N1)C=NC(O)=N2"
    • Result
    •  id      apKa1   apKa2   bpKa1   bpKa2   atoms
       1       11.19   16.01   2.34    -2.59   7,11,9,4

    • Training
    • Same for logP calculation, using the datafile logP_trainingset.sdf, with the experimental logP values stored in the SDF tag named "LOGP", setting training ID to "mylogp" and including data from the built-in training set:
      cxtrain logp -t LOGP -i mylogp -a logP_trainingset.sdf
    • Calculation
    • To apply your generated LogP training library in calculations; use the parameter --trainingid combine with the parameter --method via cxcalc.
      cxcalc logp --method user --trainingid mylogp "CC(C)CCO"
    • Result
    • id      logP
      1       1,13
    • Training
    • The following command lists available training ID's for logP calculation:
      cxtrain logp --list
    • Training
    • This command trains a custom property calculation, using the datafile pampa_trainingset.sdf, with the experimental values stored in the SDF tag named "PAMPA", setting training ID to "mypampa":
      cxtrain prediction -t PAMPA -i mypampa pampa_trainingset.sdf
  1. See also logP, pKa and Predictor training pages.