JChem Cartridge FAQ


Related documents and FAQs
 
  • What computer resources are required by JChem Cartridge?

    • CPU:

      The faster the better. Parallel processing capabilities of the host are also utilized, if available. See our benchmarks with a 3GHz dual Xeon machine for indexing and searching. (Indexing 12 x NCI (more than 3 million structures) takes 5,801 seconds (~ 96 minutes) on the same machine.)

    • Memory:

      1. Oracle
        The memory requirement of the JChem Cartridge specific stored procedures depends on the size of the result set returned by JChem Server. Currently, the processing of the results is started on the Oracle end only after the entire result set has been returned by JChem Server. This requires temporarily buffering the entire result set. In the case of structure searching, temporary memory requirement is about 8 bytes per hit. We are working to remove this bottleneck, so that results from JChem Server will be instantaneously returned for processing (which will result in both decreased memory buffer need and improved response times).

        Not directly required by, but related to, JChem Cartridge is the size of the Oracle buffer cache that speeds up the SELECT SQL statement whereby the JChem-internal compound identifiers of the hits are converted to Oracle-internal ROWIDs.

        There is no memory requirement (direct or indirect) to buffer in Oracle the structures themselves, target compounds being cached in highly optimized data structures in the JChem Server. See below.

      2. JChem Server
        JChem Server uses a mechanism highly optimized for speed and memory footprint to cache molecular structures. The memory requirement of the structure cache depends primarily on the number of structures to be cached (the number of molecular structures in your structure tables) and on the size of the fingerprint used to capture the main attributes of the target molecular structures. As a rule of thumb, 100 MB of main memory can hold the cache for 1 million structure. (See here a detailed description of how to calculate the memory requirement of the structure cache.)
    • Storage capacity:

      Domain index information for the NCI 2000 data set (250251 molecular structures) (index table together with the auxiliary indexes on the index table) requires --with default Oracle storage settings-- 96 megabyte disk space (regardless of the format in which the molecular structures are stored in the base table).

  • Which environments are supported by JChem Cartridge?

    JChem Cartridge is made up of two major components:

    • Stored procedures running in the Oracle server.
      Oracle stored procedures require Oracle 9i or later (tested with Oracle 9.2 and Oracle 10g).
    • JChem Server
      JChem Server requires JDK 1.6 or later. Please, see Supported System Configurations - Java SE 6 for a the list of platforms support by JDK.

    The two components can be run on separate hosts which adds the flexibility of running them on different platforms.

    Note that Oracle Express Edition is not supported.

  • How to setup JChem Cartridge in environments with "one-way" virtual IP adresses?
    In environments using "one-way" network virtualization (where virtual IP addresses exist only as destination addresses with the physical addresses being used as source addresses -- such as in SunClusters), the url property has to be set in jchem/cartridge/conf/jcart.properties using the Oracle JDBC URL notation: http://www.chemaxon.com/jchem/doc/admin/JChemBaseFAQ.html#dburl . Setting this property will override the oracle.server.host, oracle.server.port and oracle.server.instance settings and will disable IP sanity checking in the JChem Cartridge (JCC) server. IP sanity checking aims at making sure that only the Oracle server connects to the JCC server which the JCC server is configured to connect to.

  • Which molecular formats are supported by JChem Cartridge?
    All major molecular formats can be stored (formats created by MDL, Daylight, ChemAxon) in structure tables. The formats supported in structure tables are the same as with JChem Base.

    In addition to the formats for structure tables, a number of other formats are supported for various operations.

  • How fast is inserting in JChem Cartridge?

    • Inserting 10000 structures into a jc_idxtype-indexed regular structure table: 4 minutes 18 seconds
      INSERT INTO mytable VALUES(<structure>);
    • Inserting 10000 structures into a JChem structure table: 6 minutes 4 seconds
      jc_insert('<structure>', 'mytable', null, 'false', 'false');
    • Inserting 10000 structures into a JChem structure table from a temporary table in one batch without duplicate filtering: 1 minute 20 seconds
      jc_insert('SELECT structure FROM tmptable', 'mytable', null, 'false', 'false');
    The configuration was the same as in the cartridge benchmark.

  • How fast is indexing in JChem Cartridge?

    The following table shows the duration of indexing in some cases. The configuration was exactly the same as in the cartridge benchmark.

    Number of structures Elapsed time
    10,000 25 sec
    100,000 3 min 50 sec
    200,000 5 min 43 sec

  • How can I make searching with JChem Cartridge faster?
    • Try setting the CURSOR_SHARING Oracle parameter to SIMILAR. It can be set either globally or in the session scope:
      ALTER SESSION SET CURSOR_SHARING = SIMILAR
      This will avoid the reparsing of SQL queries with literal parameters of which JChem Cartridge uses potentially many.
    • Either gather table/schema statistics (preferred) or set the OPTIMIZER_DYNAMIC_SAMPLING parameter to 1. Both will prevent the Oracle Optimizer from executing sampling queries which might take much longer than the actual queries (executed internally by JChem Cartridge) which they are supposed to make faster.
    • Enable cost estimation to be used by the Oracle Optimizer can help in a number of cases. It may have the most dramatic effect with Oracle Enterprise Edition in such cases, when JChem Cartridge operators are called in the inner loop of a nested loop like:
      -----------------------------------------------------------------------------------------------------
      | Id  | Operation                           | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
      -----------------------------------------------------------------------------------------------------
      |   0 | SELECT STATEMENT                    |               |     1 |  2041 |    27   (4)| 00:00:01 |
      |   1 |  VIEW                               | VM_NWVW_1     |     1 |  2041 |    27   (4)| 00:00:01 |
      |   2 |   HASH UNIQUE                       |               |     1 |    86 |    27   (4)| 00:00:01 |
      |   3 |    NESTED LOOPS                     |               |       |       |            |          |
      |   4 |     NESTED LOOPS                    |               |     1 |    86 |    26   (0)| 00:00:01 |
      |*  5 |      TABLE ACCESS FULL              | HIGHSAMP      |     1 |    26 |     7   (0)| 00:00:01 |
      |   6 |      BITMAP CONVERSION TO ROWIDS    |               |       |       |            |          |
      |   7 |       BITMAP AND                    |               |       |       |            |          |
      |   8 |        BITMAP CONVERSION FROM ROWIDS|               |       |       |            |          |
      |*  9 |         INDEX RANGE SCAN            | SYS_C0032145  |     1 |       |     1   (0)| 00:00:01 |
      |  10 |        BITMAP CONVERSION FROM ROWIDS|               |       |       |            |          |
      |  11 |         SORT ORDER BY               |               |       |       |            |          |
      |* 12 |          DOMAIN INDEX               | JCXPBCH_4500K |     1 |       |            |          |
      |  13 |     TABLE ACCESS BY INDEX ROWID     | PBCH_4500K    |     1 |    60 |    26   (0)| 00:00:01 |
      -----------------------------------------------------------------------------------------------------                                      
      
    • Set _optimizer_rownum_pred_based_fkr Oracle parameter to false, if the Oracle Optimizer generates an execution plan similar to the above for SQL queries which include ROWNUM as a condition. (The parameter setting is the solution suggested by Oracle for problem ID 833286.1.)
    • Use the filterQuery search option to move the evaluation of non-chemical search conditions from the top level SQL statement into the jc_compare operator. It is worthwhile considering the use of this search option, if, broadly speaking, the number of rows to be moved into the filter query is likely to be less than the number of rows the jc_compare operator would return without the filter query.
    • Use the maxHitCount or the maxTime options to restrict the number of hits to be processed or the time the search can take. The maxHitCount option might especially be useful for similiarity searches: the structures with the highest scores will be returned, so users will not have to wait for the remaining, less interesting matches of the search.
    • Create the index with the structuralfp_config parameter, if there are query structures which are likely to be searched for many times in the given column.
    • Create the index with the autoCalcCt parameter, if there are Chemical Terms expressions which are likely to be used often in searches.
    • See also the Fingerprint settings section of this forum topic. Note that with regular structure tables, you will want to use this SQL function for finding out fingerprint statistics:
      jchem_core_pkg.get_idx_stats(idx_schema varchar2, idx_name varchar2, idx_partition varchar2) return varchar2
      Also, you can use the t:na search type option with the select jchem_core_pkg.get_hit_count function to find out the number of structures which were marked positive during fingerprint screening. For example:
      select jchem_core_pkg.get_hit_count('PKOVACS_TRUNK_USER', 'PBCH_4500K', 'STRUCTURE', 'c1nc2cncnc2n1', 't:na') from dual;

If your question is not answered, please check out our forum or .

Do you have a question? Would you like to learn more? Please browse among the related topics on our support forum or search the website. If you want to suggest modifications or improvements to our documentation email our support directly!