JChem Cartridge adds chemical knowledge to your Oracle platform and you automatically take advantage of Oracle's security, scalability, and replication. You can search data by structure, substructure and similarity through extensions to Oracle's native SQL language. Chemical data can be easily inserted and modified using SQL procedures of the JChem Cartridge.
SQL operators of the Cartridge can be used to search for chemical
data (molecules or reactions) more efficiently than it is possible
using traditional mechanisms such as PL/SQL Stored Procedures.
To increase the speed of the
searching process, the jc_idxtype
index type has to be applied.
The chemical knowledge for JChem Cartridge comes from ChemAxon's core Java libraries. Oracle offers the possibility to use existing Java libraries by calling them from Java Stored Procedures. Java Stored Procedures run in a special, in many respects unconventional Java runtime environment which is tightly integrated with the Oracle RDBMS. One of the most important characteristics of the Oracle Java runtime environment is that Java libraries are run in a byte-code-interpreted mode. This aspect puts Java Stored Procedures at a serious disadvantage in terms of performance compared to modern Java runtime environments. These latter use sophisticated techniques to compile Java byte-code into native binary executable at runtime. Native executables can then be run much faster than Java byte-code.
The solution used for JChem Cartridge to compensate for the performance disadvantage with Oracle Java Stored Procedures is to perform the computation intensive functions in an external Java runtime environment capable of runtime byte-code compilation ("external" meaning: external to Oracle). We will call this external Java runtime environment (as well as the JChem Cartridge components running in it) JChem Server.

At an abstract level, the JChem Server comprises the following components:
For structured database access, the Search and Update engines use JDBC connection(s) (how to configure the JDBC connection for the Search Engine).
To reduce hard disk access and network round trip, the Search Engine extensively caches the chemical information stored in database tables. (How to configure memory for the search cache.)
The JChem Cartridge components (PL/SQL and Java Stored Procedures) which are located within Oracle, perform the following functions:
The standard way for an Oracle Java Stored Procedure to access database objects is to use a JDBC-connection which has been pre-opened by Oracle in the context of the database session, in which invocation of a given stored procedure has been initiated.
However, as JChem Cartridge stored procedures make extensive use of the JChem Server for all kind of tasks to speed up computation, there are cases where it is much more efficient to access the database (as part of some sub-tasks) directly from within the JChem Server rather than relying entirely on stored procedures for database access. Relying entirely on stored procedures for database access would require moving (in some cases) a large amount of data back and forth between the JChem Server and the stored procedures which would have an obvious impact on performance (and in the case of earlier Oracle versions also on stability). Direct database access from within the JChem Server thus involves opening a new database session (within JChem Server). This results in certain cartridge operations using two database sessions: the "original" (initiator) database session and an auxiliary database session. INSERT operations into regular Oracle tables as well as inserts into JChem tables using thejchem_table_pkg.jc_insert
function are currently performed exclusively using the initiator session. A
single session is also used for updates and deletes.
The dual-session access mode is used for searches that use the
jc_idxtype operators in domain index scan mode (where
the table is scanned using the jc_idxtype domain
index). An exception is the duplicate search, when the requireCommit option is
specified with the n value. Another (related) exception is the
duplicate search performed to enforce uniqueness during structure insert either using
jchem_table_pkg.jc_insert
with the checkDuplicates option set to true or using regular
INSERT into a regular structure table indexed with the duplicateFiltering index parameter.
The functional equivalents of operators as well as the operators executed
in "non-domain-index-scan" mode use one singe database session.
When you use a search operator (such as
jc_contains
) in the WHERE clause of a SQL query,
the following execution path is traversed:


jc_insert (in the case
of JChem structure tables).
jc_insert function. (jc_insert includes both
inserting the structure and updating the index information in
the same JChem structure table.)
Using two separate JDBC connections for searching has the following implications:
While it is currently impossible to retain "normal" transaction semantics with dual-session database access, beginning with JChem version 3.0.12 a simple mechanism is provided to allow to retain security semantics: i. e. to execute all database operations with the credentials of the initiating database session. The mechanism consists of
A PL/SQL procedure
jchem_core_pkg.use_password(password VARCHAR2)
is made available for users to set their transient password
in JChem Server. While the password is kept in JChem Server's
main memory, it will be remembered and used until
JChem Server is shut down. It is therefore recommended
but not strictly necessary for users (or applications on their
behalves) to call the use_password procedure
at the beginning of each database session. (It must called
once after each time the JChem Server is (re)started.)
If the password of a given user has not been set for JChem Server
via the use_password PL/SQL procedure,
the user credentials (if any) that were set as part
of the JChem Server JDBC configuration will be used
to open database connections in JChem Server
(how
to configure the JDBC connection for the Search Engine).
jchem_core_pkg.use_password(password VARCHAR2)
transmits the password to JChem Server via in plain text via a regular RMI
call. If JChem Server and Oracle are running on different machines,
malicious users may be able to sniff the passwords as it travels on the
network. The necessary steps must be taken to prevent this (e.g. by
limiting user's privileges on client machines so that they are unable to
put their Ethernet card in promiscuous mode, or using a dedicated/closed
network connection between the Oracle host and the JChem Server host).
In addition to regular structure tables containing molecular structures, structure tables that have been created and populated by tools included in JChem Base can also be used with JChem Cartridge. The advantages of JChem tables over plain Oracle tables are:
jchem_table_pkg.jc_insert
function provided with JChem Cartridge
to insert into a JChem table.
jchem_table_pkg.jc_update,
procedure provided with JChem Cartridge
to update structures in a JChem table.
jchem_table_pkg.jc_delete
procedure provided with JChem Cartridge to remove structures from a
JChem table.
From an architectural perspective, the main difference between using/indexing regular structure tables and using/indexing JChem tables is the following:
When you create a JChem Cartridge domain index on a regular table, the
chemical information on the structures found in the table (called in this
context base table) is stored in a separate table (called
index table).
Since the same chemical information which is stored in an index table is
already present in a JChem table, the base table and index table are
physically the same table in case of a JChem table. (Thus creating a
jc_idxtype index on a JChem table is an instantaneous
operation.)
CREATE INDEX
and the ALTER INDEX SQL statements are supported.
The index can be created on a column of any table, provided that the column contains molecular structures in any of the supported representation. Use the following general statement to create the index:
CREATE INDEX <index name> ON <table>(<structure-column>) INDEXTYPE IS jc_idxtype PARAMETERS('param1=paramvalue1,param2=paramvalue2,...');
The index name cannot currently be longer than 22 characters.
The following optional parameters can be specified for both JChem and non-JChem tables:
JChemPropertiesTable is required for determining the
properties of fingerprints
during the queries. These properties, that are stored in the JChemProperties
table, can be different for each JChem molecule table. If the JChemPropertiesTable
parameter is not defined, the method uses the default property table
name in the user's schema: JChemProperties.
RegenerateTable:
specify this option if you want to create index on a structure table
generated by an earlier, incompatible version of JChem. Trying to
create an index on an incompatible JChem-table version without
specifying this option will result in an error message. This option
accepts no parameter value.idxSubType: the kind of content being indexed. Two kinds of contents are supported:
fp_size: the number of INTEGER columns
that will contain the chemical hashed fingerprints of the molecules.
(For more details see the JChem
Administration Guide.)
fp_bit: the number of bits to be set for each
pattern in the structure.(For more details see the JChem Administration Guide.)
pat_length: the number of edges in a pattern.(For
more details see the JChem
Administration Guide.)
structureType: the type of the structures being indexed.
The following structure types are currently valid:
anyStructures:
All types of structures are allowed, but no structure type-specific
searching.
molecules:
Specific structures, like single molecules, mixtures, salts,
polymers. Other structures (e.g. with query or Markush features or
reactions) are not allowed in database (exception is thrown).
reactions:
Index on single step reactions.
queries:
Index on queries. Only superstructure, full and duplicate
searches are allowed.
markush:
Structures with markush features.
structuralfp_config: a select statement returning
the structural keys.
std_config: the configuration for custom standardization.
The standardizer configuration is used during structure search. See here how. When not
specified, the standardization configuration set as a default property is used. When not
specified and no default standardization property is configured, the
built-in default standardizer configuration is used. For details see
the Standardizer
documentation. You can use either the XML format or the action string variant of the Standardizer configuration. Special
care has to be taken when custom standardization (other than the
build-in default) is used on the database table. The "aromatize" action
should be present in the configuration, and it is safest to put it
first. See also the examples below.
absoluteStereo [y|n]: if set to "y" all query and target
structures are treated as absolute stereo.
This setting can be changed later with ALTER INDEX.
This setting can be overridden for individual searches with the
corresponding option of the
jc_compare
operator.
haltOnError [y|n|nf]:
duplicateFiltering
option is turned on). In case error occurred, indexing will abort
after all structures were processed with a generic error message
indicating the number of errors occurred. The errors ignored during
indexing can be retrieved in JChem Server's logfiles.
duplicateFiltering [y|n]: if set to "y", indexing and INSERT will
check whether the structure being indexed are a duplicate of a
structure already indexed in the same column. If the new structure is a
duplicate, an exception is thrown. (Specify the haltOnError parameter, if you want
to avoid restarting indexing on each problematic structure in the
column being indexed.)
TDF [y|n]: If set to "y", DUPLICATE search will consider
tautomers as duplicates. If set to "y" and the duplicateFiltering option
is also turned on, duplicate filtering will consider tautomers of the
structure being checked on duplication as well.
More details about the theory behind this option can be found in the
JChem concepts chapter
of the JChem Developers Guide.
exclusiveDF [y|n]: if set to "y" and the duplicateFiltering option
is also turned on, only one database session will be given the chance
at a time to insert or update the structure column in order to
effectively serialize otherwise concurrent non-atomic duplicate search
and update/insert operations. This means that once a database session
starts inserting into (or updating) a structure column, other database
sessions subsequently starting to insert into (or update) the same
structure column will block until the first session commits. Concurrent
searches are not affected: they can proceed as usual.
autoCalcCt=<column-type-definition>;<ct-expression>
specifies Chemical
Terms calculated columns: that the expression should be
automatically evaluated on structures
being indexed with the value being stored in a column of the specified
type.
The expression will be evaluated during CREATE INDEX for the structures
already inserted and automatic calculation of the values during future
structure inserts is arranged.
The precalculated values will be used by jc_evaluate and jc_evaluate_x functions.
(See example)
TABLESPACE to specify the table space for the
index. Use: TABLESPACE=<tablespace-name>.
STORAGE to specify any of the elements of the
standard Oracle storage_clause. Use:
STORAGE=<storage-clause-elements>.
CREATE INDEX jc_idx ON jchemtest(structure_col) INDEXTYPE IS jc_idxtype;
CREATE INDEX jc_idx ON jchemtest(structure_col) INDEXTYPE IS jc_idxtype;
CREATE INDEX jcxextfp ON extfp(fp) INDEXTYPE IS pkovacs_priv.jc_idxtype
PARAMETERS('idxSubType=fp,fp_size=32')
CREATE INDEX jc_idx ON jchemtest(structure_col) INDEXTYPE IS jc_idxtype;
CREATE INDEX jc_idx ON jchemtest(cd_structure) INDEXTYPE IS jc_idxtype;
stkey
column of the stkeys table:
CREATE INDEX jc_idx ON jchemtest(cd_structure) INDEXTYPE IS jc_idxtype PARAMETERS('structuralfp_config:SELECT stkey FROM stkeys');
CREATE INDEX jc_idx_2 ON jchemtest2(structure_col) INDEXTYPE IS jc_idxtype PARAMETERS('std_config=aromatize:d');
CREATE INDEX jc_idx_2 ON jchemtest2(structure_col) INDEXTYPE IS jc_idxtype PARAMETERS('absoluteStereo=y');
numeric(30,15)
numeric(10,0)
numeric(30,15):
CREATE INDEX jcxautocalccttest ON autocalccttest(structure)
INDEXTYPE IS jchem_cc.jc_idxtype
PARAMETERS('sep=! autoCalcCt=numeric(30,15);logp()!autoCalcCt=numeric(10,0);rotatableBondCount()!autoCalcCt=numeric(30,15);pKa("acidic","2")')
jchemprops:
CREATE INDEX jc_idx_2 ON jchemtest2(structure_col) INDEXTYPE IS jc_idxtype PARAMETERS('JChemPropertiesTable=JChemProps');
CREATE INDEX jc_idx_2 ON jchemtest2(structure_col) INDEXTYPE IS jc_idxtype PARAMETERS('RegenerateTable=true');
storage_clause
'INITIAL 2M' using Daylight style aromatization:
CREATE INDEX jc_idx_2 ON jchemtest2(structure_col) INDEXTYPE IS jc_idxtype
PARAMETERS('TABLESPACE=USERS,STORAGE=INITIAL 2M,std_config=aromatize:d');
jc_idxtype indexes created
on columns of regular structure tables:
addAutoCalcCt=<column-type-definition>;<ct-expression>
adds a Chemical Terms
expression which should be automatically evaluated on structures being
indexed with the value being stored in a column of the specified type.
The expression will be evaluated during ALTER INDEX for the structures
already inserted and automatic calculation of the values during future
structure inserts is arranged.
The precalculated values will be used by jc_evaluate and jc_evaluate_x functions.
(See example)
delAutoCalcCt=<ct-expression> stops
automatically evaluating the given Chemical Terms expression on
structures being indexed and drops the column holding the already
calculated values. (See example)
addMdDesc=<pointer-to-descriptor><pointer-to-descriptor> can be either
numeric(30,15)
numeric(10,0)
numeric(30,15):
alter index jcxautocalccttest
parameters('sep=! addAutoCalcCt=numeric(30,15);logp()!addAutoCalcCt=numeric(10,0);rotatableBondCount()!addAutoCalcCt=numeric(30,15);pKa("acidic","2")')
numeric(30,15)
numeric(10,0)
numeric(30,15):
alter index jcxautocalccttest
parameters('sep=! delAutoCalcCt=logp()!delAutoCalcCt=rotatableBondCount()!delAutoCalcCt=pKa("acidic","2")')
This command is supported only for indexes created on structure columns of regular structure tables. The equivalent functionality for JChem structure tables is available through JChemManager or Instant JChem.
When specified without parameters for a jc_idxtype index, this
command will rebuild the index.
upgradeOnly
If this parameter is specified with the value y, the index
will be rebuilt only, if the index data has been calculated using a JChem
version older than the current.
skiprecalc
If this parameter is specified with the value y, index
data will not be recalculated. Only the index's internal structure will
be cleaned up base on parameters as were used (last time) to create (or
alter) the index. Use this command with precaution, because it will
clear the status flags of the index and an index with missing or fault
data will be marked as valid.
skipAutoCalcCt
If this parameter is specified with the value y, index
data recalculation will not include the recalculation of the auto
calculated values of Chemical Terms expressions associated with the
given index. (Recalculation of the auto calculated values of any Chemical
Terms expression associated with the given index can be achieved by
first removing the Chemical
Terms expression from the index, then readding it.)
absoluteStereo to specify/change default chirality.
TABLESPACE to specify the table space for the
index. Use: TABLESPACE=<tablespace-name>.
STORAGE to specify any of the elements of the
standard Oracle storage_clause. Use:
STORAGE=<storage-clause-elements>.
skipRecalc to specify that the index data should not be
recalculated. With this parameter, it will only be checked wether
the structure and version of the specified index is consistent with
the currently installed JChem Cartridge version. (This mode can also
come handy, if you want to bail out from an alter index operation
aborted due to a syntax problem with addAutoCalcCt.
Examples
alter index <index-name> rebuild;
jc_idxtype index to the SYSTEM
tablespace, with storage_clause 'INITIAL 3M BUFFER_POOL KEEP':
alter index jcidx_nci_1k rebuild parameters('TABLESPACE=SYSTEM,STORAGE=INITIAL 3M BUFFER_POOL KEEP');
Parameters representing target structures can be either a structure string in any format recognized by JChem or a column containing such structures.
Some of the parameters representing target structures can only be a structure string (literal-only target structures).
Parameters representing query structures can be in any format supported by
JChem (like SMARTS, Molfile, Rxnfile, etc.).
Starting with JChem 3.1.1, multiple query structures can also be specified
for a single search operator at a time. The individual query structures are
to be separated as specified by the given format (e.g. in the case of
SMILES/SMARTS, SDFiles with a newline character). The search is performed
for each individual query structure and the union of the hits is returned.
(See also our Future Plans.) Multiple
query structures are accepted only for operators evaluated in domain
index scans. See the example
for jc_compare.
Some of the operators and functions have an option list parameter.
Option list parameters are of type VARCHAR2 and accept a list of
options (specific to the given operator or function). The default separator
between the individual options is the space character. In case the space
character is not appropriate as a separator (because, for example, the value of
any of the options itself contains space(s)), a custom separator string can be
used by specifying the special option "sep=" as the first option followed by a
string that will be used as the custom separator. The custom separator string
must be delimited by a space at the end (the space will not be part of the
separator string). If a custom separator string is specified (always at the
beginning of the options string), each option following option must be
delimited using the custom separator string.
Without creating a jc_idxtype index of a chemical
table, operators occurring in expressions are evaluated for each row of the
result. The operators will therefore throw an exception to remind you of a
missing index.
If the value of the smiles column of the current
row is NULL then a NULL value will
be returned by the operator.
SELECT cd_id, jc_contains(smiles, smiles) FROM scott.jchemtable WHERE cd_id < 10;
SELECT cd_id, jc_tanimoto(smiles, 'CN1C=NC2=C1C(=O)N(C)C(=O)N2C') FROM scott.jchemtable WHERE cd_id < 10;
SELECT cd_id FROM scott.jchemtable WHERE jc_contains(smiles, 'CN1C=NC2=C1C(=O)N(C)C(=O)N2C') = 1;
SELECT cd_id FROM scott.jchemtable WHERE jc_evaluate(smiles, 'logd( "7.4" )' ) > 2;
NOTE: Operators appearing in the WHERE clause
are evaluated much faster with the use of the JChem index (jc_idxtype).
Example:
SELECT cd_id FROM scott.jchemtable WHERE jc_contains(smiles, 'CN1C=NC2=C1C(=O)N(C)C(=O)N2C') = 1;
Each operator supported by the jc_idxtype index type have a
functional equivalent in the jcf package. The name of the
functional equivalents have the 'jc_' prefix of their operator
peers removed. E.g. the functional equivalent of jc_molconvert
is jcf.molconvert.
One possible use of functional equivalents is in SQL constructs where operators cannot be used such as embedded in another operator:
select jc_compare('c1ccccc1', jcf.molconvert('C1C=CC=CC=1', 'smiles:au'), 't:s') from dual;
The behavior of operators and functions can be configured by using default properties and index parameters. Currently, two global properties are supported:
standardizerConfig, whose value
can be a list of
actions.
absoluteStereo, whose
value can be false and true.
call jc_set_default_property('standardizerConfig','aromatize:b');
The effect of setting this property will be two-fold:
select ... from dual).
jc_idxtype.
You can override the default value for a given index
by specifying the corresponding index parameter with
a different value (the std_config index parameter
for the standardizerConfig default property).
Default properties provide default values only to indexes that are created in the current user's schema. The scope of default properties is the schema. Each user requiring daylight style aromatization with dehydroganization must call
call jc_set_default_property('standardizerConfig','dehydrogenize:optional..aromatize');
to initialize their environment. If the user does not specify
their own default, a built-in default standardization will be used
(which includes dehydrogenization and a Generic aromatization).
The valid operator comparison using the index are:
- jc_contains(...) = 0
- jc_contains(...) = 1
- jc_equals(...) = 0
- jc_equals(...) = 1
- jc_compare(...) = 0
- jc_compare(...) = 1
jc_matchcount, jc_tanimoto, jc_dissimilarity,
jc_molweight, jc_evaluate
operators return real values.
These index operators support the full range of comparison functions:
">", ">=", "<", "<=", and "=". Note that the "!="
comparison does
not use the index and will use the functional form.
The jc_contains
and jc_equals
operators perform substructure search and duplicate structure search
respectively. These operators take two parameters, the first is the smiles
column, the second is the query structure in any format supported by JChem
(like SMARTS, Molfile, Rxnfile, etc.). jc_contains
returns the structures from the table, which contain the query as a
substructure, jc_equals returns the ones that are equal
to the query structure.
NOTE: In case of the jc_contains
and jc_matchcount
operators, if the query structure is given in SMILES format it will be
converted to SMARTS.
SELECT id FROM JCHEMTEST WHERE jc_contains(smiles, 'CN1C=NC2=C1C(=O)N(C)C(=O)N2C') = 1;
SELECT id FROM JCHEMTEST WHERE jc_equals(smiles, 'CN1C=NC2=C1C(=O)N(C)C(=O)N2C') = 1;
The jc_matchcount
operator performs substructure search and returns the occurrence of the
query structure in the target structure.
If, for some reason, you need a search behavior other than provided
with jc_contains or jc_equals (e.g. you
want to perform duplicate search with query imported as SMARTS, or you
want to perform substructure search with the query imported as SMILES),
have a look at jc_compare.
Similarity search using Tanimoto formula can be performed by using
the jc_tanimoto and jc_dissimilarity
operators. The return values of these operators are in the range of 0.0
- 1.0, inclusive.
SELECT id FROM jchemtest WHERE jc_tanimoto(smiles, 'CN1C=NC2=C1C(=O)N(C)C(=O)N2C') < 0.6;
SELECT id FROM jchemtest WHERE jc_dissimilarity(smiles, 'CN1C=NC2=C1C(=O)N(C)C(=O)N2C') > 0.4;
Molecular weights are calculated by the jc_molweight operator.
Select id from JCHEMTEST where jc_molweight(smiles) < 23;
Calculations can be performed by using the jc_evaluate, jc_compare operators.
SELECT id FROM jchemtest WHERE jc_evaluate(smiles, 'logp()') < 11;
SELECT id FROM jchemtest WHERE jc_evaluate(smiles, 'logd( "7.4" )' ) > 4.2;
SELECT id FROM jchemtest WHERE jc_evaluate(smiles, 'psa()') > 3;
If two or more operators are used in the where clause of one select
statement it is more efficient to use the jc_compare
operator with a filterQuery.
Note, that since the caching search engine uses its own JDBC connection to Oracle, the updates to the structure tables must be committed before searches include the recent changes in their results.
jchem_core_pkg.get_idx_stats function can be used to
to collect statistics on the chemical information in
jc_idxtype indexes. The
jchem_core_pkg.get_idx_stats function accepts three
parameters:
jc_idxtype indexes).
select jchem_core_pkg.get_idx_stats('jchemuser', 'jcxnci_10k', null) from dual
A sample output follows:
Statistics for table: JCHEMUSER.JC_NCI_10K -------------------- Row count: 10000 NULL SMILES count: 0 Average SMILES length: 27.66 Average compressed SMILES length: 12.96 Markush structure count: 0 (0.0%) Min. CFP bits: 0.78% cd_id: 2110 Max. CFP bits: 68.35% cd_id: 3035 Avg. CFP bits: 21.48% Chemical Fingerpint distribution: -------------------------------- 0% - 10% : 15.38 % 10% - 20% : 34.04 % 20% - 30% : 29.55 % 30% - 40% : 14.69 % 40% - 50% : 4.91 % 50% - 60% : 1.21 % 60% - 70% : 0.22 % 70% - 80% : 0.0 % 80% - 90% : 0.0 % 90% - 100% : 0.0 %
JChem Cartridge supports the ability to execute external program objects written in Java. To make an external Java program available from SQL, a PL/SQL function (and an according operator) and a Java class has to be defined.
The PL/SQL function has to call the send_user_func
function of the jchem_core_pkg PL/SQL package to send the
data
to the external Java program. This function has three VARCHAR2
type parameters:
For example the my_func PL/SQL function and the my_op
operator to call the MyClass external Java class:
CREATE FUNCTION my_func(query VARCHAR2, param VARCHAR2) RETURN NUMBER AS
BEGIN
RETURN jchem_core_pkg.send_user_func('MyClass', '{DELIM}', query || {DELIM} || param);
END;
/
show errors;
CREATE OPERATOR my_op BINDING(VARCHAR2, VARCHAR2) RETURN NUMBER
USING my_func;
The external Java class has to implement the chemaxon.jchem.cartridge.JChemCartModule
interface. This interface contains the doFunc
function that performs the operation on the data was sent by the
previously defined PL/SQL function. The doFunc function
accepts a String array parameter. This String array contains the items
of the parameter list that was defined in the third parameter of the send_user_func
function. The doFunc function of the external Java class
returns an object and sends it back to the PL/SQL function. The newly
created Java class has to be found on the classpath of JChem Server.
For example the MyClass Java class:
public class MyClass implements chemaxon.jchem.cartridge.JChemCartModul
{
public Object doFunc(String[] args) throws Exception
String query = args[0];
String param = args[1];
Object result = null;
...
result = ...
return result;
}
}
Make sure that the JCART_XCLASSPATH environment variable includes the parent
directory of the MyClass.class file.
Now the new function is ready to be called from PL/SQL, for example:
SELECT my_op(smiles) FROM jchemtable WHERE id < 11;SELECT count(*) FROM jchemtable WHERE my_op(smiles) >= 4;
In the example above the doFunc function of the MyClass
external Java class gets the value of the smiles column
of the current row as parameter.
Take a look at the molconverter
and getatomcount
user defined function examples.
where clause.
drop index idx_name)
created before with the jc_idxtype indextype.
alter indextype jc_idxtype add my_op(VARCHAR2, VARCHAR2);
grant execute on my_func to jchemuser;
grant execute on my_op to jchemuser;
grant execute on jc_idxtype to jchemuser;
insert statement:
INSERT INTO jc_idx_udop VALUES('operator_name', 'java_class_name', 'separator', 'params_list');
Description of the parameters see here.
For example:
INSERT INTO jc_idx_udop VALUES('MY_OP', 'MyClass', '{SEP}', '$1{SEP}$2');
(The table jc_idx_udop must currently be in the schema
where the jc_idxtype index will be created, but we plan to
support user defined operator in the scope of the entire JChem
Cartridge installation.)
Implement the Java class which will evaluate the relation in the
where clause. The name of the class has to be the same as
the name of the previously created Java class (implements
JChemCartModul.java) with the "_eval" end. For example:
MyClass_eval.java
This class has to implement the chemaxon.jchem.cartridge.JChemCartEvalModul
interface. The eval function of this class is called on
each object returned by the JChemCartModul.java class. It
returns a boolean value that represents whether the object evaluate the
expression given in the SQL statement or not. The newly created Java
class has to be found on the classpath of JChem Server by setting the
JCART_XCLASSPATH environment variable.
jc_idxtype indexes again. An index with the
jc_idxtype indextype also has to be created on the table
on which you want to use the new operator.
Performing these changes on a user defined operator it will be
evaluated in a more efficient way if it is appearing in the where
clause of the select statement.
Take a look at the getatomcount
user defined function example.
JChem Cartridge defines two procedures to perform DML operations on a table:
jchem_table_pkg.create_jctablejchem_table_pkg.drop_jctablejchem_table_pkg.list_jctablesjchem_table_pkg.jc_insertjchem_table_pkg.jc_updatejchem_table_pkg.jc_deletejc_idxtype
index.
call jchem_table_pkg.create_jctable('myjctable', 'JChemProperties', 16, 2, 6, ', RECNO NUMBER', 'aromatize', 1);call jchem_table_pkg.drop_jctable('myjctable', 'JChemProperties');cdid_array := jchem_table_pkg.jc_insert('CC(N)Cc1ccccc1', 'scott.jchemtable', null, 'true', 'false');call jchem_table_pkg.jc_update('CN1C=NC2=C1C(=O)N(C)C(=O)N2C', 'scott.jchemtable', 12, 'scott.JChemProp');call jchem_table_pkg.jc_delete('scott.jchemtable', 'WHERE cd_id > 1000 AND cd_id < 1100', 'scott.JChemProp');
It is possible to insert more than one structure with one call. If the first parameter starts with "select" the procedure will perform the select statement and insert the resulting structures to the table. The result of the select statement should contain only one column with the structure.
cdid_array := jchem_table_pkg.jc_insert('select cd_structure from jchemtable2 where cd_id < 11', 'scott.jchemtable', null);
Currently, jchem_table_pkg.jc_insert and jchem_table_pkg.jc_update can be
used exclusively to manipulate JChem generated tables.
To manipulate plain (non-JChem) structure tables, the standard INSERT,
UPDATE and DELETE SQL operations can be used.
For transformation, structure enumeration you can use jc_react, jc_transform
jc_standardize.
To convert a molecular structure into various supported text-based
formats, use jc_molconvert.
For binary formats,
or text-based formats that do not fit into 4000 characters (limitation
of VARCHAR2),
use jc_molconvertb.
In order for JChem Cartridge to provide cost estimations to the Oracle Optimizer you have to perform the following steps:
assoc_stats.sql script in the cartridge directory as
the JChem-owner. For example:
sqlplus jchem/tiger@mydb @assoc_stats.sqlYou can disable cost estimation by executing the
disassoc_stats.sql script
in the same directory. For example:
sqlplus jchem/tiger@mydb @disassoc_stats.sql
call DBMS_STATS.GATHER_TABLE_STATS('JCHEMUSER', 'PLAIN_NCI')
call dbms_stats.GATHER_SCHEMA_STATS('JCHEMUSER')
This subsection is not applicable to Oracle 9i.
The relative cost of JChem Cartridge operations depends on a number of factors specific to your operational environment. Cost estimation needs therefore to be calibrated. The calibration process consists of
The calibrator.properties file contains the database
connection properties and the calibration configuration. The first group of
properties include the properties of the database connection and they are
self-explaining.
The second group of properties describe the two SQL statements to be used for the measurements. The first a statement is used to calibrate the index cost factor. In PL/SQL code:
execute immediate 'select count(*) from ' || struct_table
|| ' where jc_compare(' || struct_col
|| ', :a, :b) = 1'
using search_queries(1), search_options;
which expands, for example, to:
select count(*) from plain_nci where jc_compare(structure, 'c1ccccc1', 't:s') = 1;
The second SQL statement is used to calibrate the function-based cost
factor. In PL/SQL code:
execute immediate 'select count(*) from (select ' || struct_col
|| ' from ' || struct_table || ' where ' || pk_col
|| ' < :a) a where jc_compare(' || struct_col
|| ', :b, :c) = 1'
using less_than_ids(1), search_queries(1), search_options;
which expands, for example, to:
select count(*) from (
select structure from plain_nci where id < 10000
) a
where jc_compare(structure, 'c1ccccc1', 't:s') = 1;
The dots in the calibrator property names have been replaced in the codes
above by the underscore (_) character.
struct.table should contain a structure column
(struct.col) having data representative of your production
strurcture tables. It must have a primary key column of numeric type
(pk.col). The name of the structure column can is to be
specified by the struct.col property. Currently, only the first
element of the search.queries and the less.than.ids
is evaluated. When selecting the value for the less.than.ids
calibrator property, bear in mind that the function based execution of the
operators is typically significantly slower than the index-scan based
execution: the higher the value, the slower the calibration process will
be.
The calibration process is automated and can be executed by calling the
calibrate.{sh|bat} script in the cartridge
directory
on Windows:
calibrate.bat calibrator.propertieson Unix/Linux:
bash calibrate.sh calibrator.properties
The output of the calibration is an UPDATE statement. You need to execute it as the JChem owner in order to update the cost factors used by JChem Cartridge for the estimations.
jchem_opti_pkg.set_volatile_cost_factors procedure can be used
to override the default cost estimation factors for the session. It can be
repeatedly called in a session changing the factors as many times as needed for
subsequent calls.
The procedure accepts the following parameters:
opName varchar2: the identifier of the operator to which the
new estimation factors are to be applied; currently the following
identifiers are accepted:
CONTAINS
COMPARE
EQUALS
MATCHCOUNT
EVALUATE
JC_XXXX
operator or jcf.XXXX function.
idxCpuCost number: the CPU cost component of the index scan
estimation factor;
idxIoCost number: the IO cost component of the index scan
estimation factor;
idxNetworkCost number: the network cost component of the index
scan estimation factor;
funcCpuCost number: the CPU cost component of the function-based
estimation factor;
funcIoCost number: the IO cost component of the function-based
estimation factor;
funcNetworkCost number: the network cost component of the
function-based estimation factor.
jchem_opti_pkg.get_cost_factor function which accepts the
following parameters:
costType varchar2: accepted values are index and func
opName varchar2: accepted values are
CONTAINS
COMPARE
EQUALS
MATCHCOUNT
EVALUATE
JC{_|.}COMPARE operator/function, NULL for other
operators/functions.
resrc varchar2: the identifier of the resource, whose cost
estimation component is being queried; accepted values are:
cpu
io
net
idxSchema varchar2 must currently be NULL;
idxName varchar2 must currently be NULL.
The jc_compare operator
accepts the following options (as part of the option list argument):
indexCost: <decimal-number><decimal-number> will be used to multiply the default cost
estimation for index-scanning.
funcCost: <decimal-number><decimal-number> will be used to multiply the default cost
estimation for function based execution.
In order to declare the index scan mode of substructure search with Lipinski's rule of 5s as Chemical Terms filter two times more costly than the default estimation:
SELECT count(*) FROM nci_3m WHERE
jc_compare(structure, 'O=C1ONC(N1c2ccccc2)-c3ccccc3','sep=! t:s!
ctFilter:(mass() <= 500)
&& (logP() <= 5) && (donorCount() <= 5) &&
(acceptorCount() <= 10)!indexCost=2.0') = 1
examples/cartridge/asp directory) an example WEB-application that shows how to use JChem Cartridge to insert, update, retrieve molecular structures and associated non-structural data in an ASP environment. The example application combines JChem Cartridge with Marvin to display and edit structural data. You can find a description of how to set up the example ASP application here.
Parametrized similirity metrics (such as tversky) cannot be used with locales where the decimal separator is comma. (FS#9922)
In "user" schemata, the index type name must be qualified with the name of "index owner's" schema.
dbms_lob.freetemporary
does not free temporary BLOBs returned by JChem Cartridge functions.
Certain functions (jcf_reactb,
jcf_molconvert(b), jcf_standerdizeb,
jcf_evaluateb_x) return a temporary BLOB (as the
corresponding operators do). Due probably to
what we believe to be an Oracle bug, when such functions are called
from a PL/SQL stored procedure/function or a PL/SQL anonymous procedure,
dbms_lob.freetemporary completes without an error or warning
but will have no effect: the temporary BLOB created by JChem Cartridge
(in a Java stored procedure) will not be freed.
temp_blob)
of type BLOB, which can be used to pass to the function the locator of the
BLOB which is to hold the results. If this additional
parameter is set to NULL (default), JChem Cartridge will create a
temporary BLOB that will be used to store the result of the conversion and
this temporary BLOB will be returned to the caller.
If the user assigns a BLOB locator to this additional parameter
(one created by the caller using dbms_lob.createtemporary,
for example), the BLOB represented by this locator
will be used to store the results. In this case, the BLOB locator
assigned to this additional parameter will be also returned to the caller.