The Administration Guide of JChem provides background information about the administration of structure tables in relational databases using JChemManager. It contains information about table creation, importing and exporting structure files.
JChemManager is a tool for creating and deleting structure tables and importing and exporting structure files into and out of these tables. The program is a two-tier Java application that can be run in operating systems supplied with Java (see Installing and Starting JChemManager for more on the system requirements).

.jchem file located under the .chemaxon (UNIX / Linux) or
chemaxon (Windows) subdirectory in the user's home directory.JChemProperties)
In order to run JChemManager the following software need to be installed:
Supported database systems:
The database server doesn't have to be installed on the same
machine as JChemManager if the computers are connected in a local
network (Internet connection is also sufficient, but the speed of
such a system might be low).
If needed, create a new database and/or a new user for the database
with appropriate rights.
Prepare the usage of the jcman script
as described in
Preparing the Usage of JChem Batch
Files and Shell Scripts. Run JChemManager by
entering
jcman
Warning:
To avoid the conflict of different
versions of classes,
jchem.jar should not be included in the
system's class path (CLASSPATH). See jcman or
jcman.bat in the bin subdirectory
as examples. Avoid using directory names with spaces
(e.g. use PROGRA~1 instead of Program Files
in Win32)
For example, such problem might occur
when you view a HTML page using a browser, which contains a
Marvin applet and, at the same time, the system's class path
consists of jchem.jar.
Many operations of jcman can also be invoked
without graphic interface. The command line usage makes the administration
easier from a remote machine
telnet or
ssh and invoke jcman.
The list of options (as listed with jcman -h):
Usage of GUI program:
jcman
Usage of command line program:
jcman <command> [options]
Commands:
c <table> create table in database
t list structure tables
t <table> show information on a structure table
t <table> <row> show the specified row
v show version information
a <table> <file> import (add molecules) from a file into a table
x <table> <format> export table to standard output.
format:
sdf (MDL SDfile)
mol (MDL Molfile)
rdf (MDL RDMolfile)
smi (Daylight smiles)
mol2 (Tripos Mol2 file)
inchi (IUPAC InChI file)
mrv (Marvin document)
x <table> <file> export a table into a file. (The format will be
determined from the extension:.sdf/.mol/.smi
/.rdf/.mol2/.inchi/.mrv)
x <table> <file> <field>[:<field>]
exports specific fields from a table into a file
d <table> drop (remove) a structure table
u updates database structure and regenerates old
tables if necessary (typically after upgrade)
r regenerate all structure tables
r <table> regenerate a structure table
r --ct <table> regenerate only the Chemical Terms columns of a table
k precalculate all structure tables with Chemical Terms
k <table> precalculate a structure table with Chemical Terms
k -- remove [<table>] remove precalculated data for one or for every table
s <table> calculates and prints statistics for table
m <table> miscellaneous operations on the specified table
g <global-option> set options affecting all tables
Options (general):
-h --help this help message
--driver <JDBC driver> the JDBC driver to use
--dburl <url> the database URL to connect
--proptable <table> the name of property table
-l --login <login> login name
-p --password <password> password
-s --saveconf save settings into
"<an actual directory in your file system appears here>"
Options for table creation:
--fplength <n> fingerprint size in bits (default: 512)
--bits <n> bits to be set for patterns (default: 2)
--bonds <n> pattern length (default: 6)
--coldefs <column defs> column definitions. If not empty, then syntax is
", name1 type1, name2 type2, ..."
see doc of CREATE TABLE on how to define columns
--stconfig <file> standardizer configuration. If not given, default
standardization is used.
--relative only treats as absolute stereo if chiral flag set.
--ctcolcfg <cols-exprs> a semicolon separated list of pairs of
column names and Chemical Terms expressions.
Each pair specifies that the values of the given
column should be automatically calculated using
the given Chemical Terms expression.
In each pair, the column name and the
Chemical Terms expression is separated with
an equal sign ('=').
--t:<type> the type of the structure table
--t:molecules Specific structures, like single molecules,
mixtures, salts, polymers
--t:any All types of structures are allowed, but no
structure type-specific searching
--t:reactions single step reactions
--t:markush for the storage of Markush structures (this table type
is not allowed for Ms Access dbms)
--t:query query structures
--tds:[y/n] specify "y" to consider tautomers during duplicate
search (default is "n")
Option for regeneration :
--stconfig <file> standardizer configuration. If not given, there is
no change in the standardizer configuration.
Specify "reset_to_default" to change to default
standardization.
Options for import:
--connect <connections> assign custom table fields to SDFile tags
Note: If not given, only those SDFile tags will be imported
which have identical field names in the structure table
Example: --connect "dbfield1=sdfield1;dbfield2=sdfield2"
--skip <n> skip the first n molecules in SD file
--lines <n> check only the first n lines for field names
--nodup does not import molecules already in database
--noempty does not import empty molecules
--setchiralflag sets chiral flag for MDL file formats.
--diff does not import ANY molecules, only output
--duplicates write duplicate molecules to output
--nonduplicates write non-duplicate molecules to output
--output redirects output to a file (otherwise stdout)
Options for export:
--where <condition> where clause. Example: --where "cd_id<1000"
Options for miscellaneous table management operations (command 'm'):
--add-ctcolcfg <opt> adds Chemical Terms expressions to existing user-
defined columns. <opt> has the same format as the
argument to the --ctcolcfg parameter of the 'c'
(create table) command. The columns specified
must have no Chemical Terms expression
assigned to them.
--set-ctcolcfg <opt> sets Chemical Terms expressions to existing user-
defined columns. <opt> has the same format as the
argument to the --ctcolcfg parameter of the 'c'
(create table) command. The columns specified may
have Chemical Terms expressions already assigned
to them.
--del-ctcolcfg <opt> removes the assignment of Chemical Terms expressions
from existing user-defined columns. <opt> is
a semi-colon-separated list of column names.
Examples:
$ jcman c strdata --coldefs ", name CHAR(200), stock INTEGER"\
--fplength 16 -l joe --driver org.gjt.mm.mysql.Driver\
--dburl jdbc:mysql://localhost/mydb -s
$ jcman c ctcols --coldefs ", logp numeric(18,9), rotbl_bnd_cnt numeric(1,0)"\
--ctcolcfg "logp=logp();rotbl_bnd_cnt=rotatableBondCount()>4"
...
$ jcman a str3d str3d.sdf.gz
$ jcman a str3d str3d.sdf.gz --connect "MOLNAME=NAME;PH=PH_VAL"
$ jcman t --rowcount\n"+
Precalc. Precalc. Precalc. Precalc. Precalc.
Table name Version Rows recommended version status valid invalid rows
1 str3d 5020100 198556 N
2 testdata 5020100 4260 Y 5020100 READY Y 16
$ jcman t ncidata
Column name Type name
1 cd_id LONG
2 cd_structure BLOB
...
18 cd_fp16 LONG
$ jcman m strtable --add-ctcolcfg "logp=logp();rotbl_bnd_cnt=rotatableBondCount()>4"
After starting JChemManager the "Connecting to a Database" dialog box appears. The dialog box can also be displayed by selecting the Connect icon on the tool bar of JChemManager.
After filling the form and selecting Ok, the system attempts to connect to a database using the settings entered. JChem uses the JDBC protocol to connect to relational databases. Before trying to connect, make sure that the appropriate JDBC driver is included in the CLASSPATH environmental variable. To establish a JDBC connection the following parameters have to be set:
A Java class name, the entry point of the JDBC
driver has to be specified here. If a JDBC driver is available for the
database, consult the documentation of the driver for the proper name. If
you would like to use an ODBC driver, enter sun.jdbc.odbc.JdbcOdbcDriver.
See FAQ
for more details on JDBC and ODBC drivers.
You can find the most common driver names here.
A JDBC URL provides a way of identifying a database so that the appropriate driver will recognize it and establishes a connection with it. Please check the documentation of the driver to determine what the JDBC URL that identifies the particular driver will be.
In the case of an ODBC driver, the full syntax is
jdbc:odbc:<data-source-name>[<attributes>]
where each attribute has the following form
;<attribute-name>=<attribute-value>
For other common URL formats please click here.
Enter the name of the property table. The default value
is JChemProperties. Since version 1.6 this can be
changed by the user to support flexible
multiuser capabilities.
Enter a user ID needed to enter the database. If a login name is not needed, then leave the field empty
Enter the password for the login name. If you want the
system to save the password in the .jchem file
for later use, check Remember password.
The above settings are stored in the .jchem file located under "chemaxon" or ".chemaxon" in your home directory. At the first connection, a new table called
JChemProperties (or the name you specify) is generated in the database, which contains parameters of structure tables.
Structure tables contain chemical structures and associated data, including both data used by the JChem system internally and custom, user defined data (static/imported or calculated). For more information about JChem table structure, see JChem database concepts.
To create a structure table select the Create icon. The Create a Table dialog box appears.
Parameters to be specified:
The name of the table to be created in the database that was specified at the Connecting to a Database dialog box.
Specify the number of INTEGER columns that will contain
the chemical hashed fingerprints of
the molecules. Higher number provides better screening performance for
substructure searching, but too many columns may significantly increase
the size of the
structure
cache.
The number of bits to be set for each substructure pattern in the chemical hashed fingerprints. 1 or 2 bits for a pattern are the best choice.
The maximum length of linear substructure patterns used for chemical hashed fingerprints.
You may specify a custom standardization XML for the structure table. You have to regenerate the table if you want to change the standardization.
If checked, all query and target structures are treated as absolute stereo. This setting can be changed later without regenerating the table.
You can specify here a fix set of structures in a file that will be used as structural keys. The fingerprint will be extended with the appropriate number of integer columns to provide 1 bit for each structure.
If checked, tautomers are considered during duplicate search. Enabling this feature increases import time. This option is described in detail in the JChem Database Concepts.
You can select from the following table types according to the desired scope of use:
Compatibility notes: Tables created before JChem version 3.2 will be treated as "Any structures" to maintain previous behaviour. The default type for new tables is "Molecules".
* See the Chemical hashed fingerprint documentation about the optimization of these parameters. Statistics about fingerprint darkness of existing tables can also be obtained via running jcman s <table_name>. If you haven't made any previous testing, use the defaults that are optimized for typical compounds of pharmaceutical interest. The default values differ according to table type.
After pressing Ok in the Create a Table dialog box, the SQL statement for creating the structure table is displayed in the Create Table Statement dialog box.
If you would like to add more columns, then modify the SQL statement (though this can also be done later in most RDBMS-s).
If you would like to have Chemical Terms calculated columns (additional columns, the values of which are automatically calculated based on Chemical Terms expressions), you can specify them at the bottom of the dialog along with their respective Chemical Terms expressions. (The columns appearing in the bottom of the dialog as Chemical-Terms-based columns, must be also specified in the CREATE TABLE statement above. If you add columns later using your RDMBS and decide to make them Chemical-Terms-based columns, you can configure the Chemical Terms expressions for the new columns using the command line version of JChem Manager.)
NOTE:
There is a default limit on the length of the field cd_smiles
for most RDBMS-s. If the majority of your molecules' SMILES representation
is longer than this limit (in case of HUGE molecules), the search process can
become slower. In this case you may try to increase the limit.
After selecting Ok, the SQL statement is executed.
For each structure table, the fingerprint properties are stored in the JChemProperties table. If the
RDBMS supports schemata, username is also attached to the table name in the name column of the property table, like in the
following example:
|
prop_name |
prop_value |
|
|
|
|
|
|
|
|
|
Import formats in JChemManager:
Specify the database table and the input file. The data fields in the file can be imported into the columns of the table.
To support connecting the corresponding field names and column names, the program will detect field names in the file. If the file is too big, checking may be time consuming. Use the Check whole file for field names in selected file check box and the Number of lines to check input box to decide whether you want the whole file or just a given number of lines to be searched for field names.
If an error occurs during import, the error message and the corresponding stack trace information is written to the standard error. Check the Halt if an error occurs box if you would like the system to stop if a molecule can not be imported.
When Allow duplicate structures is unchecked, JChemManager will not import a structure if the database contains another structure with the same topology.
If Allow empty structures is unchecked, JChemManager will not import empty structures (structures where the atom count is zero).
If Set chiral flag for MDF formats is checked, JChemManager will set the chiral flag (absolute stereo flag) for the imported structures.
In case of SMILES*, InChI, Mol2, Molfiles, RDfiles and Marvin Documents importing starts after selecting the Ok button.
For SDfiles and JTF, the "Connecting Fields" window appears, where you can connect the corresponding field names and column names.
* Some SMILES may contain additional data separated by whitespace from the structure string. The additional data columns are separated by tabulators ("\t"). In this case, the "Connecting Fields" dialog will also appear, the fields will be named as "field_0", "field_1" etc.
Clicking on a cell in the
Field in file column of the window, a list box appears with the alternative
field names. In the case of the cd_id
column, Auto-incrementing can also be selected, which means
that the value is increased by one after each new record. This works even if the
RDBMS does not support auto-incrementing, because in this case JChemManager
will take care of incrementing the value. (Some RDBMS-s that have the
auto-incrementing feature for columns
do not allow the explicit setting of cd_id.)
Importing starts after selecting the Ok button. A progress window displays the progress of the import.
Export formats in JChemManager:

Specify the database table and the output file.
File format is determined from the extension:
| Format | Extension | Examples |
|---|---|---|
| MDL SDfile | extension starts with "sdf" | .sdf, .sdfile |
| MDL Molfile | extension starts with "mol" | .mol, .molfile |
| MDL RDfile | extension starts with "rd" | .rd, .rdf |
| MDL Rxnfile | extension starts with "rxn" | .rxn, .rxnfile |
| SMILES | extension starts with "smi" | .smi, .smiles |
| InChI | extension is "inchi" | .inchi |
| Mol2 file | extension is "mol2" | .mol2 |
| Marvin Document | extension is "mrv" | .mrv |
| JTF | extension starts with "jtf" | .jtf, .jtfile |
After pressing the Ok button, the next dialog appears.

On the left panel, you can specify which fields to export in the case of formats
that support additional data.
By default, cd_id and the additional fields are selected (fields not
beginning with "cd_"). You can add more fields by pressing the Add button.
To remove one ore more fields, select them and press the Remove button.
You can restore the default setting by pressing Reset. The Sort
button arranges fields according to the original order in the database rows.
The last dialog follows, where you can specify

Exporting starts after selecting the Ok button. A progress window displays the progress of the export.
When the Delete icon is selected, the "Delete" dialog box appears.

In this dialog you can
JChemProperties table.



Change the table name in the combo box on the top to view / edit settings for other tables.
Changing some settings (e.g. standardization, tautomer duplicate search) requires the regeneration of the table. This will be performed after pressing the "Ok" button. The regeneration can take considerable amount of time depending on the size of the structure tables and other factors.
NOTE: you can make changes for multiple tables, your changes will be stored when selecting other tables. The actual changes in the database and the regeneration (if needed) will take place for all tables after pressing the "OK" button.

If "Assume absolute stereo flag" is set for a table, all query and target
structures are treated as absolute stereo ("chiral flag" in MDL files).
Changing this setting does not require regeneration.
One can also specify if tautomers are considered during duplicate search.
Enabling this feature increases import time.
Changing this setting requires regeneration.
When the Options icon is
selected, the "Options" dialog box appears.
The options set here are stored in the
property
table.
You can set advanced options in the second tab.

cd_structure field in the
database. This has the following benefits:
When a new version of JChem is released usually the calculated data in the structure tables have to be refreshed to be consistent with the new version. Normally this is offered by JChem Manager when connecting, but in some cases one may want to initiate regeneration by hand.
To regenerate fix columns, select the File -> Regenerate menu option, the "Regenerate" dialog will appear:

You can select one table, or regenerate all tables.
You can also regenerate the tables from
command-line.
When updating to a new JChem version, either in JChem Manager or in command line mode we may get a message that asks us to regenerate some of the existing structure tables. In general the following rules are applied to decide whether to require regeneration or not when a new JChem release is coming.
Chemical Terms columns can be recalculated manually using the API, or the jcman in command line
mode. In the API
recalculateCTColumns() method can be used, which is part of the UpdateHandler.
The following line should be executed in command line to regenerate all
Chemical Terms columns for a structure table.
jcman r --ct <tableName>If you want to skip Chemical Terms calculations during the regeneration, you should use recalculateWithoutCTColumns method in the API, or
-noct option in command line jcman.
jcman r --noct <tableName>
For table version numbers one rule is applied. They must follow each other in an ascendant order which makes the regeneration check simplier. The version of one particular table can be viewed using the
jcman t [--rowcount]command for all structure tables, or
jcman t <tableName>for one particular table. The number 5020105 could be an example for table version. Now this value contains information about the JChem version. The first five numbers here define the JChem version 5.2.1 while the rest of this sequence is used to make distinctions among different alpha and beta releases. The current JChem and table version can be retrieved executing the
jcman vfrom 5.3. In older versions, type
java -jar jchem.jarcommand in the lib directory, where jchem.jar can be found.
Regeneration process can be long sometimes, and since it is running on the live system it is not recommended to modify the molecules or the table structure until it has not been finished. Precalculation is implemented to make this pending time as short as possible. Preregeneration process runs in the background, and precalculated data are stored in temporary tables. When finished you can copy these data to the structure tables with one step. Meanwhile some molecules or the table structure may still change of course. In first case modified molecules should be regenerated after precalculated data have been applied (copied) but it will be automatic and will take much less time than regenerating the entire table. Structure change of the precalculated JChem table will invalidate the former precalculation so it has to be restarted again.
You can precalculate those tables which need regeneration according to the current version of JChem. You can get information about the need of precalculation, the status and the valdity of a previously started precalculation process by using
jcman t [--rowcount]If precalculation is recommended for a table but there is not preregenerated data available with the corresponding version of JChem, or the data are not valid or there are too many changed (invalid) rows since the last precalculation, we suggest you to run the precalculation process on that table.
You will need precalculation when you are updating your JChem from a previous version to a newer one. If you want to run precalculation you will need to install the new version because you have to use the preregeneration process of this. During and after the execution of this process the older version can be used so the work does not need to be stopped on JChem databases. To run precalculation you should type
jcman kcommand for all structure tables, or
jcman k <tableName>for one particular table.
jcman k --remove <tableName>command where tableName is the structure table which the temporary table belongs to.
When you are starting JChem from GUI or using the command line jcman u option, JChem will automatically detect if there are complete and valid precalculated data available. In this case it will ask whether to apply these to structure tables or not. This will happen before JChem requires table regeneration and you can skip this step by answering No.

Otherwise preregenerated data will be copied from temporary table to JChem table. It is possible that some molecules have been changed since the last precalculation. In this case a regeneration process is started for only these structures after precalculated data have been successfully copied. However, usually regeneration time of these rows must be much shorter than recalculation of the whole table. Of course, structure tables which already have the appropriate data gained by the way of precalculation won't appear later in the list of tables that need to be regenerated according to JChem.
Precalculation is supported on the following DBMSs now: