Technical Support Forum Index
Technical Support Forum
Access ChemAxon scientists and developers here. For confidential or other support please email.
The time now is Sat Jul 04, 2009 11:54 pm
 <b>Register</b> (required to post and download)Register (required to post and download)
 Username:    Password:   Remember login       
 FAQFAQ   SearchSearch   DigestsDigests 
JChem search exact search speed issue
To watch this topic for replies  Register (enables digests) or give email address:
Reply to topic    Home -> Forums -> Support -> Storage & search: JChem Base /Cartridge -> JChem search exact search speed issue

Display posts from previous:        View previous topic :: View next topic    
Author Message
amri
Joined: 20 Apr 2006
Posts: 20


View user's profile

Visit poster's website

Back to top
Post subject: JChem search exact search speed issue
Link to postPosted: Tue Dec 18, 2007 4:26 pm  Reply with quote

If we switch off caching mode the exact structure search becomes 10 times faster. (I know it is deprecated)
With caching mode the speed is 700-800 ms for one molecule, we would expect about 100ms for one molecule.
We have 7 000 000 molecules in our structure table.

Here is the log from JChem:
with cache:
Tue Dec 18 16:01:28 CET 2007
Search mode: EXACT
Structure table: DBO.MOLECULES
Query: [#7]
Screened: 1
Hits: 1
Total time: 727 ms Screening: 696 ms
Processing threads: 2
Current / peak / maximum searches per minute: 9 / 9 / Unlimited

no cache:
Tue Dec 18 16:30:59 CET 2007
Search mode: EXACT
Structure table: DBO.MOLECULES
Query: [#8-]
Screened: 1
Hits: 1
Total time: 93 ms Screening: 23 ms
Processing threads: 2
Current / peak / maximum searches per minute: 9 / 9 / Unlimited

Any idea?
Thanks
Gabor
Szilard
Joined: 21 May 2004
Posts: 935
ChemAxon personnel

View user's profile

Visit poster's website

Back to top
Post subject:
Link to postPosted: Tue Dec 18, 2007 7:35 pm  Reply with quote

Hi,

The difference is in the screening time (the phase for selecting hit candidates for the slower graph search).

The discrepancy is due to a "trick" we apply in this phase:
The cd_hash column in the database table is normally used for speeding up duplicate filtering (PERFECT search).
This cannot be used for EXACT search in general, as the hits are not necessarily identical (e.g. a "single-or-double" bond should find both, an "any atom" can match on anything).
If the query atom does not have such features, we can "cheat" and use the hash code.

This speedup is currently not applied in cached mode, this explains the discrepancy in the search times.
We are planning to improve on this in the future.

By the way do you use the EXACT search for finding duplicate structures ?
In that case I recommend PERFECT search mode, which is specifically designed for handling this.
Please see the chemistry differences in our Query Guide:
http://www.chemaxon.com/jchem/doc/user/Query.html#otherSearchTypes
The search time should be similar to your faster measurement.

Best regards,

Szilard
Szilard
Joined: 21 May 2004
Posts: 935
ChemAxon personnel

View user's profile

Visit poster's website

Back to top
Post subject:
Link to postPosted: Wed May 28, 2008 4:13 pm  Reply with quote

Hi,

The exact search will use the hash code for screening whenever possible in the future releases, which will greatly improve the screening time.

Szilard
Display posts from previous:   
Reply to topic    Home -> Forums -> Storage & search: JChem Base /Cartridge -> JChem search exact search speed issue All times are GMT + 1 Hour
Page 1 of 1
To watch this topic for replies   Register (enables digests) or give email address  
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum