JChem Cartridge for Oracle

Introduction

JChem Cartridge adds chemical knowledge to your Oracle platform and you automatically take advantage of Oracle's security, scalability, and replication. You can search data by structure, substructure and similarity through extensions to Oracle's native SQL language. Chemical data can be easily inserted and modified using SQL procedures of the JChem Cartridge.

SQL operators of the Cartridge can be used to search for chemical data (molecules or reactions) more efficiently than it is possible using traditional mechanisms such as PL/SQL Stored Procedures. To increase the speed of the searching process, the jc_idxtype index type has to be applied.

Architecture

High Level Overview

The chemical knowledge for JChem Cartridge comes from ChemAxon's core Java libraries. Oracle offers the possibility to use existing Java libraries by calling them from Java Stored Procedures. Java Stored Procedures run in a special, in many respects unconventional Java runtime environment which is tightly integrated with the Oracle RDBMS. One of the most important characteristics of the Oracle Java runtime environment is that Java libraries are run in a byte-code-interpreted mode. This aspect puts Java Stored Procedures at a serious disadvantage in terms of performance compared to modern Java runtime environments. These latter use sophisticated techniques to compile Java byte-code into native binary executable at runtime. Native executables can then be run much faster than Java byte-code.

The solution used for JChem Cartridge to compensate for the performance disadvantage with Oracle Java Stored Procedures is to perform the computation intensive functions in an external Java runtime environment capable of runtime byte-code compilation ("external" meaning: external to Oracle). We will call this external Java runtime environment (as well as the JChem Cartridge components running in it) JChem Server.

At an abstract level, the JChem Server comprises the following components:

For structured database access, the Search and Update engines use JDBC connection(s) (how to configure the JDBC connection for the Search Engine).

To reduce hard disk access and network round trip, the Search Engine extensively caches the chemical information stored in database tables. (How to configure memory for the search cache.)

The JChem Cartridge components (PL/SQL and Java Stored Procedures) which are located within Oracle, perform the following functions:

The Oracle-resident Java classes of JChem Cartridge are packaged in jcart.jar.

Database Access Modes

The standard way for an Oracle Java Stored Procedure to access database objects is to use a JDBC-connection which has been pre-opened by Oracle in the context of the database session, in which invocation of a given stored procedure has been initiated.

However, as JChem Cartridge stored procedures make extensive use of the JChem Server for all kind of tasks to speed up computation, there are cases where it is much more efficient to access the database (as part of some sub-tasks) directly from within the JChem Server rather than relying entirely on stored procedures for database access. Relying entirely on stored procedures for database access would require moving (in some cases) a large amount of data back and forth between the JChem Server and the stored procedures which would have an obvious impact on performance (and in the case of earlier Oracle versions also on stability).

Direct database access from within the JChem Server thus involves opening a new database session (within JChem Server). This results in certain cartridge operations using two database sessions: the "original" (initiator) database session and an auxiliary database session. INSERT operations into regular Oracle tables as well as inserts into JChem tables using the jchem_table_pkg.jc_insert function are currently performed exclusively using the initiator session. A single session is also used for updates and deletes.

The dual-session access mode is used for searches that use the jc_idxtype operators in domain index scan mode (where the table is scanned using the jc_idxtype domain index). The functional equivalents of operators as well as the operators executed in "non-domain-index-scan" mode use one singe database session.

Dual-Session Database Access Example

When you use a search operator (such as jc_contains ) in the WHERE clause of a SQL query, the following execution path is traversed:

  1. The application sends Oracle Server a SQL statement involving substructure search.
  2. The Oracle Execution Engine processes the statement and the substructure search operator of JChem Cartridge is called with the appropriate parameters.
  3. The parameters of the search are passed onto the JChem Server over RMI.
  4. The Search Engine checks (over the JDBC connection) to see if the database table involved in the search has been modified since the corresponding structure cache was loaded the last time.
  5. The structure cache is refreshed (over the JDBC connection), if necessary.
  6. A pre-screening is used on fingerprints in the cache, then an atom-by-atom graph search on the structures from the cache.
  7. The results are communicated over RMI back to the cartridge components located in Oracle.
  8. The ROWIDs of the matching structures are returned to the Oracle Execution engine.
  9. The Oracle Execution engine further processes the original SQL statement, if necessary.

Single-Session Database Access Example

  1. The application sends Oracle Server a SQL statement involving an INSERT (in the case of regular structure tables) or jc_insert (in the case of JChem structure tables).
  2. The Oracle Execution Engine processes the statement and calls JChem Cartridge to either update the index table for the newly inserted structure (in the case of regular SQL INSERT) or directly execute the jc_insert function. (jc_insert includes both inserting the structure and updating the index information in the same JChem structure table.)
  3. Additional information (Standardizer configuration, fingerprint generation rules) is fetched from the index table.
  4. The molecular structure is passed onto the JChem Server over RMI along with the additional information.
  5. JChem Base's Update Handler computes the chemical information to update the index table with.
  6. The computed chemical information is communicated over RMI back to the JChem Cartridge components located in Oracle.
  7. The chemical information is inserted into the index table.

Implications of the Dual-Session Database Access Mode

Using two separate JDBC connections for searching has the following implications:

User Credentials

While it is currently impossible to retain "normal" transaction semantics with dual-session database access, beginning with JChem version 3.0.12 a simple mechanism is provided to allow to retain security semantics: i. e. to execute all database operations with the credentials of the initiating database session. The mechanism consists of

  1. the JChem Server maintaining in transient form (in system memory) the user credentials (user name, and password);
  2. propagate the user name of the initiating user for each call to JChem Server, so the proper credentials can be used to open the JDBC connection in the JChem Server on behalf of the user action "in Oracle".

A PL/SQL procedure jchem_core_pkg.use_password(password VARCHAR2) is made available for users to set their transient password in JChem Server. While the password is kept in JChem Server's main memory, it will be remembered and used as long as JChem Server is shut down. It is therefore recommended but not strictly necessary for users (or applications on their behalves) to call the use_password procedure at the beginning of each database session. (It must called once after each time the JChem Server is (re)started.)

If the password of a given user has not been set for JChem Server via the use_password PL/SQL procedure, the user credentials (if any) that were set as part of the JChem Server JDBC configuration will be used to open database connections in JChem Server (how to configure the JDBC connection for the Search Engine).

jchem_core_pkg.use_password(password VARCHAR2) transmits the password to JChem Server via in plain text via a regular RMI call. If JChem Server and Oracle are running on different machines, malicious users may be able to sniff the passwords as it travels on the network. The necessary steps must be taken to prevent this (e.g. by limiting user's privileges on client machines so that they are unable to put their Ethernet card in promiscuous mode, or using a dedicated/closed network connection between the Oracle host and the JChem Server host).

Support For JChem tables

In addition to regular structure tables containing molecular structures, structure tables that have been created and populated by tools included in JChem Base can also be used with JChem Cartridge. The advantages of JChem tables over plain Oracle tables are:

The disadvantage of JChem tables is that you often cannot use the standard SQL commands to manipulate/access them. The following limitations apply:

From an architectural perspective, the main difference between using/indexing regular structure tables and using/indexing JChem tables is the following:

When you create a JChem Cartridge domain index on a regular table, the chemical information on the structures found in the table (called in this context base table) is stored in a separate table (called index table). Since the same chemical information which is stored in an index table is already present in a JChem table, the base table and index table are physically the same table in case of a JChem table. (Thus creating a jc_idxtype index on a JChem table is an instantaneous operation.)

Parameters representing target structures can be either a structure string in any format recognized by JChem or a column containing such structures.

Some of the parameters representing target structures can only be a structure string (literal-only target structures).

Parameters representing query structures can be in any format supported by JChem (like SMARTS, Molfile, Rxnfile, etc.). Starting with JChem 3.1.1, multiple query structures can also be specified for a single search operator at a time. The individual query structures are to be separated as specified by the given format (e.g. in the case of SMILES/SMARTS, SDFiles with a newline character). The search is performed for each individual query structure and the union of the hits is returned. (See also our Future Plans.) Multiple query structures are accepted only for operators evaluated in domain index scans. See the example for jc_compare.

Some of the operators and functions have an option list parameter. Option list parameters are of type VARCHAR2 and accept a list of options (specific to the given operator or function). The default separator between the individual options is the space character. In case the space character is not appropriate as a separator (because, for example, the value of any of the options itself contains space(s)), a custom separator string can be used by specifying the special option "sep=" as the first option followed by a string that will be used as the custom separator. The custom separator string must be delimited by a space at the end (the space will not be part of the separator string). If a custom separator string is specified (always at the beginning of the options string), each option following option must be delimited using the custom separator string.

Without creating a jc_idxtype index of a chemical table, operators occurring in expressions are evaluated for each row of the result. If the value of the smiles column of the current row is NULL then a NULL value will be returned by the operator.

Examples:
SELECT cd_id, jc_contains(cd_smiles, smiles) FROM scott.jchemtable WHERE cd_id < 10;
SELECT cd_id, jc_tanimoto(smiles, 'CN1C=NC2=C1C(=O)N(C)C(=O)N2C') FROM scott.jchemtable WHERE cd_id < 10;
SELECT cd_id FROM scott.jchemtable WHERE jc_contains(smiles, 'CN1C=NC2=C1C(=O)N(C)C(=O)N2C') = 1;
SELECT cd_id FROM scott.jchemtable WHERE jc_evaluate(smiles, 'logd( "7.4" )' ) > 2;

NOTE: Operators appearing in the WHERE clause are evaluated much faster with the use of the JChem index (jc_idxtype).

Example:

SELECT cd_id FROM scott.jchemtable WHERE jc_contains(smiles, 'CN1C=NC2=C1C(=O)N(C)C(=O)N2C') = 1;

Functional equivalents

Each operator supported by the jc_idxtype index type have a functional equivalent in the jcf package. The name of the functional equivalents have the 'jc_' prefix of their operator peers removed. E.g. the functional equivalent of jc_molconvert is jcf.molconvert.

One possible use of functional equivalents is in SQL constructs where operators cannot be used such as embedded in another operator:

select jc_compare('c1ccccc1', jcf.molconvert('C1C=CC=CC=1', 'smiles:au'), 't:s') from dual;

Default properties

The behavior of operators and functions can be configured by using default properties and index parameters. Currently, two global properties are supported:

For example to use chemaxon-style aromatization as default:
call jc_set_default_property('standardizerConfig','aromatize:b');
The effect of setting this property will be two fold:

Default properties provide default values only to indexes that are created in the current user's schema. The scope of default properties is the schema. Each user requiring daylight style aromatization with dehydroganization must call

call jc_set_default_property('standardizerConfig','dehydrogenize:optional..aromatize');
to initialize their environment. If the user does not specify their own default, a built-in default standardization will be used (which includes dehydrogenization and a ChemAxon-style aromatization).

Extensible Index, jc_idxtype

The purpose of the jc_idxtype index type is to enable efficient search and retrieval functions. Currently the CREATE INDEX and the ALTER INDEX SQL statements are supported.

CREATE INDEX

The index can be created on a column of any table, provided that the column contains molecular structures in any of the supported representation. Use the following general statement to create the index:

 CREATE INDEX <index name> ON <table>(<structure-column>) INDEXTYPE IS jc_idxtype PARAMETERS('param1=paramvalue1,param2=paramvalue2,...');
The index name cannot currently be longer than 22 characters.

The following optional parameters can be specified for both JChem and non-JChem tables:

The following option can be specified only for JChem tables: In addition to the preceding parameters the following can be used for plain Oracle tables (tables not generated by JChem Base): Here are some examples to create an index:
  1. Create index with default parameters:
    CREATE INDEX jc_idx ON jchemtest(structure_col) INDEXTYPE IS jc_idxtype;
  2. Create index with default parameters on a JChem table for use with VARCHAR2 operators:
    CREATE INDEX jc_idx ON jchemtest(cd_smiles) INDEXTYPE IS jc_idxtype;
  3. Create index with default parameters on a JChem table for use with BLOB operators:
    CREATE INDEX jc_idx ON jchemtest(cd_structure) INDEXTYPE IS jc_idxtype;
  4. Create an index with structural keys stored in the stkey column of the stkeys table:
    CREATE INDEX jc_idx ON jchemtest(cd_smiles) INDEXTYPE IS jc_idxtype PARAMETERS('structuralfp_config:SELECT stkey FROM stkeys');
  5. Create index with Daylight-style aromatization:
    CREATE INDEX jc_idx_2 ON jchemtest2(structure_col) INDEXTYPE IS jc_idxtype PARAMETERS('std_config=aromatize:d');
  6. Specify that query and target structures must be treated as absolute stereo by default:
    CREATE INDEX jc_idx_2 ON jchemtest2(structure_col) INDEXTYPE IS jc_idxtype PARAMETERS('absoluteStereo=y');
  7. Specify that
    1. the LogP of the structures being indexed be stored in a column of type numeric(30,15)
    2. the rotatableBondCount be stored in a column of type numeric(10,0)
    3. the pKa in a column of type numeric(30,15):
    CREATE INDEX jcxautocalccttest ON autocalccttest(structure)
            INDEXTYPE IS jchem_cc.jc_idxtype
            PARAMETERS('sep=! autoCalcCt=numeric(30,15);logp()!autoCalcCt=numeric(10,0);rotatableBondCount()!autoCalcCt=numeric(30,15);pKa("acidic","2")')
  8. Create index so that JChem stores metainformation about the index in a table called jchemprops:
    CREATE INDEX jc_idx_2 ON jchemtest2(structure_col) INDEXTYPE IS jc_idxtype PARAMETERS('JChemPropertiesTable=JChemProps');
  9. Create an index on a structure table generated by an earlier, incompatible version of JChem so that the table will be upgraded to the current JChem table version:
    CREATE INDEX jc_idx_2 ON jchemtest2(structure_col) INDEXTYPE IS jc_idxtype PARAMETERS('RegenerateTable=true');
  10. Create an index on the USERS tablespace with the storage_clause 'INITIAL 2M' using Daylight style aromatization: CREATE INDEX jc_idx_2 ON jchemtest2(structure_col) INDEXTYPE IS jc_idxtype PARAMETERS('TABLESPACE=USERS,STORAGE=INITIAL 2M,std_config=aromatize:d');

ALTER INDEX

Currently only the rebuild_clause is supported with the ALTER INDEX SQL statement:
ALTER INDEX <index-name> REBUILD PARAMETERS( 'param1=paramvalue1,param2=paramvalue2,...');
The following parameter can be specified for jc_idxtype indexes created on columns of non-JChem tables (plain Oracle tables) containing molecular structures:

Examples

  1. Move the index table for a jc_idxtype index to the SYSTEM tablespace, with storage_clause 'INITIAL 3M BUFFER_POOL KEEP':
    ALTER INDEX jcidx_nci_1k REBUILD PARAMETERS('TABLESPACE=SYSTEM,STORAGE=INITIAL 3M BUFFER_POOL KEEP');
  2. Adds the following Chemical Terms expressions to the list of precalcuated expressions:
    1. the LogP of to be stored in a column of type numeric(30,15)
    2. the rotatableBondCount to be stored in a column of type numeric(10,0)
    3. the pKa to be stored in a column of type numeric(30,15):
    alter index jcxautocalccttest
    parameters('sep=! addAutoCalcCt=numeric(30,15);logp()!addAutoCalcCt=numeric(10,0);rotatableBondCount()!addAutoCalcCt=numeric(30,15);pKa("acidic","2")')
  3. Adds the following Chemical Terms expressions to the list of precalcuated expressions:
    1. the LogP of to be stored in a column of type numeric(30,15)
    2. the rotatableBondCount to be stored in a column of type numeric(10,0)
    3. the pKa to be stored in a column of type numeric(30,15):
    alter index jcxautocalccttest
    parameters('sep=! delAutoCalcCt=logp()!delAutoCalcCt=rotatableBondCount()!delAutoCalcCt=pKa("acidic","2")')

Index operators

The valid operator comparison using the index are:

jc_matchcount, jc_tanimoto, jc_dissimilarity, jc_molweight, jc_evaluate operators return real values. These index operators support the full range of comparison functions: ">", ">=", "<", "<=", and "=". Note that the "!=" comparison does not use the index and will use the functional form.

The jc_contains and jc_equals operators perform substructure search and perfect structure search respectively. These operators take two parameters, the first is the smiles column, the second is the query structure in any format supported by JChem (like SMARTS, Molfile, Rxnfile, etc.). jc_contains returns the structures from the table, which contain the query as a substructure, jc_equals returns the ones that are equal to the query structure.

NOTE: In case of the jc_contains and jc_matchcount operators, if the query structure is given in SMILES format it will be converted to SMARTS.

SELECT id FROM JCHEMTEST WHERE jc_contains(smiles, 'CN1C=NC2=C1C(=O)N(C)C(=O)N2C') = 1;

SELECT id FROM JCHEMTEST WHERE jc_equals(smiles, 'CN1C=NC2=C1C(=O)N(C)C(=O)N2C') = 1;

The jc_matchcount operator performs substructure search and returns the occurrence of the query structure in the target structure.

If, for some reason, you need a search behavior other than provided with jc_contains or jc_equals (e.g. you want to perform perfect search with query imported as SMARTS, or you want to perform substructure search with the query imported as SMILES), have a look at jc_compare.

Similarity search using Tanimoto formula can be performed by using the jc_tanimoto and jc_dissimilarity operators. The return values of these operators are in the range of 0.0 - 1.0, inclusive.

SELECT id FROM jchemtest WHERE jc_tanimoto(smiles, 'CN1C=NC2=C1C(=O)N(C)C(=O)N2C') < 0.6;

SELECT id FROM jchemtest WHERE jc_dissimilarity(smiles, 'CN1C=NC2=C1C(=O)N(C)C(=O)N2C') > 0.4;

Molecular weights are calculated by the jc_molweight operator.

Select id from JCHEMTEST where jc_molweight(smiles) < 23;

Calculations can be performed by using the jc_evaluate, jc_compare operators.

SELECT id FROM jchemtest WHERE jc_evaluate(smiles, 'logp()') < 11;
SELECT id FROM jchemtest WHERE jc_evaluate(smiles, 'logd( "7.4" )' ) > 4.2;
SELECT id FROM jchemtest WHERE jc_evaluate(smiles, 'psa()') > 3;

If two or more operators are used in the where clause of one select statement it is more efficient to use the jc_compare operator with a filterQuery.

Index statistics

The jchem_core_pkg.get_idx_stats function can be used to to collect statistics on the chemical information in jc_idxtype indexes. The jchem_core_pkg.get_idx_stats function accepts three parameters:
  1. the index owner's schema;
  2. the second the name of the index;
  3. the third the name of the index partition (for partioned local jc_idxtype indexes).
Example:
    select jchem_core_pkg.get_idx_stats('jchemuser', 'jcxnci_10k', null) from dual
A sample output follows:
Statistics for table: JCHEMUSER.JC_NCI_10K
--------------------
Row count: 10000
NULL SMILES count: 0
Average SMILES length: 27.66
Average compressed SMILES length: 12.96
Markush structure count: 0 (0.0%)
Min. CFP bits: 0.78%      cd_id: 2110
Max. CFP bits: 68.35%     cd_id: 3035
Avg. CFP bits: 21.48%

Chemical Fingerpint distribution:
--------------------------------
0% - 10% : 15.38 %
10% - 20% : 34.04 %
20% - 30% : 29.55 %
30% - 40% : 14.69 %
40% - 50% : 4.91 %
50% - 60% : 1.21 %
60% - 70% : 0.22 %
70% - 80% : 0.0 %
80% - 90% : 0.0 %
90% - 100% : 0.0 %

User defined function

JChem Cartridge supports the ability to execute external program objects written in Java. To make an external Java program available from SQL, a PL/SQL function (and an according operator) and a Java class has to be defined.

The PL/SQL function has to call the send_user_func function of the jchem_core_pkg PL/SQL package to send the data to the external Java program. This function has three VARCHAR2 type parameters:

For example the my_func PL/SQL function and the my_op operator to call the MyClass external Java class:

CREATE FUNCTION my_func(query VARCHAR2, param VARCHAR2) RETURN NUMBER AS
BEGIN
RETURN jchem_core_pkg.send_user_func('MyClass', '{DELIM}', query || {DELIM} || param);
END;
/
show errors;

CREATE OPERATOR my_op BINDING(VARCHAR2, VARCHAR2) RETURN NUMBER
USING my_func;

The external Java class has to implement the chemaxon.jchem.cartridge.JChemCartModule interface. This interface contains the doFunc function that performs the operation on the data was sent by the previously defined PL/SQL function. The doFunc function accepts a String array parameter. This String array contains the items of the parameter list that was defined in the third parameter of the send_user_func function. The doFunc function of the external Java class returns an object and sends it back to the PL/SQL function. The newly created Java class has to be found on the classpath of JChem Server.

For example the MyClass Java class:

     public class MyClass implements chemaxon.jchem.cartridge.JChemCartModul
    {
        public Object doFunc(String[] args) throws Exception
            String query = args[0];
            String param = args[1];
            Object result = null;
            ...
            result = ...
            return result;
        }
    }

Make sure that the JCART_XCLASSPATH environment variable includes the parent directory of the MyClass.class file. Now the new function is ready to be called from PL/SQL, for example:

SELECT my_op(smiles) FROM jchemtable WHERE id < 11;
SELECT count(*) FROM jchemtable WHERE my_op(smiles) >= 4;

In the example above the doFunc function of the MyClass external Java class gets the value of the smiles column of the current row as parameter.

Take a look at the molconverter and getatomcount user defined function examples.

A more efficient way

There is a more efficient way to evaluate user defined operators appearing in the where clause.

  1. Drop all indexes (SQL: drop index idx_name) created before with the jc_idxtype indextype.

  2. Execute as the JChem owner (the owner of the schema, in which JChem Cartridge has been installed):
        alter indextype jc_idxtype add my_op(VARCHAR2, VARCHAR2);
            
  3. Grant execute privileges to users:
        grant execute on my_func to jchemuser;
        grant execute on my_op to jchemuser;
        grant execute on jc_idxtype to jchemuser;
            
  4. Execute the following insert statement:
    INSERT INTO jc_idx_udop VALUES('operator_name', 'java_class_name', 'separator', 'params_list');

    Description of the parameters see here.

    For example:

    INSERT INTO jc_idx_udop VALUES('MY_OP', 'MyClass', '{SEP}', '$1{SEP}$2');
    (The table jc_idx_udop must currently be in the schema where the jc_idxtype index will be created, but we plan to support user defined operator in the scope of the entire JChem Cartridge installation.)
  5. Implement the Java class which will evaluate the relation in the where clause. The name of the class has to be the same as the name of the previously created Java class (implements JChemCartModul.java) with the "_eval" end. For example: MyClass_eval.java

    This class has to implement the chemaxon.jchem.cartridge.JChemCartEvalModul interface. The eval function of this class is called on each object returned by the JChemCartModul.java class. It returns a boolean value that represents whether the object evaluate the expression given in the SQL statement or not. The newly created Java class has to be found on the classpath of JChem Server by setting the JCART_XCLASSPATH environment variable.

  6. Re-create your jc_idxtype indexes again. An index with the jc_idxtype indextype also has to be created on the table on which you want to use the new operator.

Performing these changes on a user defined operator it will be evaluated in a more efficient way if it is appearing in the where clause of the select statement.

Take a look at the getatomcount user defined function example.

JChem-table functions

JChem Cartridge defines two procedures to perform DML operations on a table:

These procedures can be called with or without jc_idxtype index.

call jchem_table_pkg.create_jctable('myjctable',  'JChemProperties', 16, 2, 6, ', RECNO NUMBER', 'aromatize', 1);
call jchem_table_pkg.drop_jctable('myjctable', 'JChemProperties');
cdid_array := jchem_table_pkg.jc_insert('CC(N)Cc1ccccc1', 'scott.jchemtable', null, 'true', 'false');
call jchem_table_pkg.jc_update('CN1C=NC2=C1C(=O)N(C)C(=O)N2C', 'scott.jchemtable', 12, 'scott.JChemProp');
call jchem_table_pkg.jc_delete('scott.jchemtable', 'WHERE cd_id > 1000 AND cd_id < 1100', 'scott.JChemProp');

It is possible to insert more than one structure with one call. If the first parameter starts with "select" the procedure will perform the select statement and insert the resulting structures to the table. The result of the select statement should contain only one column with the structure.

cdid_array := jchem_table_pkg.jc_insert('select cd_structure from jchemtable2 where cd_id < 11', 'scott.jchemtable', null);

Currently, jchem_table_pkg.jc_insert and jchem_table_pkg.jc_update can be used exclusively to manipulate JChem generated tables.

To manipulate plain (non-JChem) structure tables, the standard INSERT, UPDATE and DELETE SQL operations can be used.

Note, that since the caching search engine uses its own JDBC connection to Oracle, the updates to the structure tables must be committed before searches include the recent changes in their results.

Miscellaneous functions

For transformation, structure enumeration you can use jc_react, jc_transform jc_standardize.

To convert a molecular structure into various supported text-based formats, use jc_molconvert. For binary formats, or text-based formats that do not fit into 4000 characters (limitation of VARCHAR2), use jc_molconvertb.

Cost estimation for the Oracle Optimizer

In order for JChem Cartridge to provide cost estimations to the Oracle Optimizer you have to perform the following steps:

  1. Enable cost estimation in JChem Cartridge
  2. Gather statistics on the structure tables
  3. Calibrate the cost estimation factors for your environment
You can override/modify configured defaults at both session level or operator-call level.

Enable cost estimation in JChem Cartridge

Execute the assoc_stats.sql script in the cartridge directory as the JChem-owner. For example:
sqlplus jchem/tiger@mydb @assoc_stats.sql
You can disable cost estimation by executing the disassoc_stats.sql script in the same directory. For example:
sqlplus jchem/tiger@mydb @disassoc_stats.sql

Gather statistics on the structure tables

Gather statistics on the structure tables which you plan include for query optimization including their indexes and index tables. For example, for structure table JCHEMUSER.PLAIN_NCI execute the following command:
  1. Oracle 10g:
    call DBMS_STATS.GATHER_TABLE_STATS('JCHEMUSER', 'PLAIN_NCI')
  2. Oracle 9i:
    call dbms_stats.GATHER_SCHEMA_STATS('JCHEMUSER')
It is recommended to repeat statistics collection when the content of the table has significantly changed compared to what was captured during the last collection.

Calibrate the cost estimation factors for your environment.

This subsection is not applicable to Oracle 9i.

The relative cost of JChem Cartridge operations depends on a number of factors specific to your operational environment. Cost estimation needs therefore to be calibrated. The calibration process consists of

  1. executing two simple SQL statements with known query plans (one with domain index scan and one with the cartridge operator applied as function filter) measuring their actual execution time;
  2. iteratively running EXPLAIN PLANs against the SQL statements and adjusting at each iteration the estimation factors (used internally by JChem Cartridge) until the optimizer's prediction of the execution time (appearing in the output of EXPLAIN PLAN) closely matches the actual execution time.
The output of the calibration is an SQL statement that can be used to update the default JChem Cartridge cost estimation factors with the calibrated results.

Configuring the calibrator

The calibrator.properties file contains the database connection properties and the calibration configuration. The first group of properties include the properties of the database connection and they are self-explaining.

The second group of properties describe the two SQL statements to be used for the measurements. The first a statement is used to calibrate the index cost factor. In PL/SQL code:

    execute immediate 'select count(*) from ' || struct_table
                        || ' where jc_compare(' || struct_col
                        || ', :a, :b) = 1'
            using search_queries(1), search_options;
which expands, for example, to:
    select count(*) from plain_nci where jc_compare(structure, 'c1ccccc1', 't:s') = 1;
    
The second SQL statement is used to calibrate the function-based cost factor. In PL/SQL code:
    execute immediate 'select count(*) from (select ' || struct_col
                        || ' from ' || struct_table || ' where ' ||  pk_col 
                        || ' < :a) a where jc_compare(' || struct_col
                        || ', :b, :c) = 1'
            using less_than_ids(1), search_queries(1), search_options;
which expands, for example, to:
    select count(*) from (
        select structure from plain_nci where id < 10000
        ) a
        where jc_compare(structure, 'c1ccccc1', 't:s') = 1;
    
The dots in the calibrator property names have been replaced in the codes above by the underscore (_) character.

struct.table should contain a structure column (struct.col) having data representative of your production strurcture tables. It must have a primary key column of numeric type (pk.col). The name of the structure column can is to be specified by the struct.col property. Currently, only the first element of the search.queries and the less.than.ids is evaluated. When selecting the value for the less.than.ids calibrator property, bear in mind that the function based execution of the operators is typically significantly slower than the index-scan based execution: the higher the value, the slower the calibration process will be.

Starting the calibrator

The calibration process is automated and can be executed by calling the calibrate.{sh|bat} script in the cartridge directory
on Windows:

calibrate.bat calibrator.properties
on Unix/Linux:
bash calibrate.sh calibrator.properties

Using the output of the calibrator

The output of the calibration is an UPDATE statement. You need to execute it as the JChem owner in order to update the cost factors used by JChem Cartridge for the estimations.

Modifying configured default cost estimation factors

Session level overrides
The jchem_opti_pkg.set_volatile_cost_factors procedure can be used to override the default cost estimation factors for the session. It can be repeatedly called in a session changing the factors as many times as needed for subsequent calls.

The procedure accepts the following parameters:

  1. opName varchar2: the identifier of the operator to which the new estimation factors are to be applied; currently the following identifiers are accepted:
    1. CONTAINS
    2. COMPARE
    3. EQUALS
    4. MATCHCOUNT
    5. EVALUATE
    with each of them standing for to the corresponding JC_XXXX operator or jcf.XXXX function.
  2. idxCpuCost number: the CPU cost component of the index scan estimation factor;
  3. idxIoCost number: the IO cost component of the index scan estimation factor;
  4. idxNetworkCost number: the network cost component of the index scan estimation factor;
  5. funcCpuCost number: the CPU cost component of the function-based estimation factor;
  6. funcIoCost number: the IO cost component of the function-based estimation factor;
  7. funcNetworkCost number: the network cost component of the function-based estimation factor.
You can obtain the current estimation factors using the jchem_opti_pkg.get_cost_factor function which accepts the following parameters:
  1. costType varchar2: accepted values are index and func
  2. opName varchar2: accepted values are
    1. CONTAINS
    2. COMPARE
    3. EQUALS
    4. MATCHCOUNT
    5. EVALUATE
  3. options varchar2: the options (third argument) of the JC{_|.}COMPARE operator/function, NULL for other operators/functions.
  4. resrc varchar2: the identifier of the resource, whose cost estimation component is being queried; accepted values are:
    1. cpu
    2. io
    3. net
  5. idxSchema varchar2 must currently be NULL;
  6. idxName varchar2 must currently be NULL.

Modifying defaults for a single operator execution

The jc_compare operator accepts the following options (as part of the option list argument):

Example:

In order to declare the index scan mode of substructure search with Lipinski's rule of 5s as Chemical Terms filter two times more costly than the default estimation:

SELECT count(*) FROM nci_3m WHERE jc_compare(structure, 'O=C1ONC(N1c2ccccc2)-c3ccccc3','sep=! t:s!
ctFilter:(mass() <= 500) && (logP() <= 5) && (donorCount() <= 5) && (acceptorCount() <= 10)!indexCost=2.0') = 1

Examples and Demos

The JChem installation package contains (in the examples/cartridge/asp directory) an example WEB-application that shows how to use JChem Cartridge to insert, update, retrieve molecular structures and associated non-structural data in an ASP environment. The example application combines JChem Cartridge with Marvin to display and edit structural data. You can find a description of how to set up the example ASP application here.

Known issues

  1. In "user" schemata, the index type name must be qualified with the name of "index owner's" schema.

  2. dbms_lob.freetemporary does not free temporary BLOBs returned by JChem Cartridge functions.

    Certain functions (jcf_reactb, jcf_molconvert(b), jcf_standerdizeb, jcf_evaluateb_x) return a temporary BLOB (as the corresponding operators do). Due probably to what we believe to be an Oracle bug, when such functions are called from a PL/SQL stored procedure/function or a PL/SQL anonymous procedure, dbms_lob.freetemporary completes without an error or warning but will have no effect: the temporary BLOB created by JChem Cartridge (in a Java stored procedure) will not be freed.

    Workaround:

    For these functions we provide an additional parameter (temp_blob) of type BLOB, which can be used to pass to the function the locator of the BLOB which is to hold the results. If this additional parameter is set to NULL (default), JChem Cartridge will create a temporary BLOB that will be used to store the result of the conversion and this temporary BLOB will be returned to the caller. If the user assigns a BLOB locator to this additional parameter (one created by the caller using dbms_lob.createtemporary, for example), the BLOB represented by this locator will be used to store the results. In this case, the BLOB locator assigned to this additional parameter will be also returned to the caller.

Future plans

 
Copyright © 1999-2008 ChemAxon Ltd.    All rights reserved.