Posted: Tue Sep 13, 2016 9:45 amPost subject: "Wall time limit reached" error while indexing
Hello,
I have met following error while indexing a large table (about 100 million records )
ERROR: ChemIndex.cpp:61 OperationAborted:
*** class com.chemaxon.zetor.api.exceptions.OperationAbortedException
Wall time limit reached
And I noticed that at first the index calculation was spread on different CPUs (each 20% - 30%), but after some time it became only 1 CPU with high occupation (near 100%) while others were all idle, then above error came out.
Does anyone has any clue how this error comes from and which configuration can be done to avoid that ?
Also, since indexing such big table is very time consuming, I would like to know if there's any tips that can fasten the indexing ?
Thanks,
William
Krisztina ChemAxon personnel
Joined: 27 May 2011
Posts: 375
Unfortunately, the 100 million records are above the current scope of our developments and testings. We are aware of the increasing time request of the indexing process in case of such tables sizes and are working on finding a good solution for the performance issues.
At the moment, we can recommend the following workaround:
Drop chemical indexes.
sudo service jchem-psql stop
Modify the file jchem-psql.conf in folder /etc/chemaxon .
Change the setting 'mapdb' to 'rocksdb' in the row:
com.chemaxon.jchem.psql.env.scheme=mapdb
Yes, I did set Xmx to 20g and restarted the jchem-psql service.
And in /etc/chemaxon/jchem-psql.conf, I set "com.chemaxon.jchem.psql.env.scheme=rocksdb", should i also change "com.chemaxon.jchem.psql.main.scheme" and "com.chemaxon.jchem.psql.idx.scheme" ?
Are there any specific setting for rocksdb like other backends(mapdb, mvstore, hashed, cassandra)? Cause I didn't find in the configuration file.
Also, I noticed that usually the indexing is multithreaded, but at some point jchem will get stuck in a single thread task for a long time (only 1 processor with nearly 100% occupation and others are all idle), is that normal ?
Thanks,
William
Roland ChemAxon personnel
Joined: 13 May 2016
Posts: 2
Setting 'com.chemaxon.jchem.psql.env.scheme' to 'rocksdb' in/etc/chemaxon/jchem-psql.conf is sufficient as long as the another two options mentioned are commented out (with '#' sign).
Unfortunately there are no specific settings for rocksdb in the configuration file at the moment.
Regarding indexing at some points executes in a single thread for a long time. This is a known behavior we also observed. It does not indicate a bug or failure.
Thanks for your reply, and here's another question, is there any significant performance difference between JChem PostgreSQL Cartridge and JChem Oracle Cartridge ?
Thanks,
William
Volfi ChemAxon personnel
Joined: 07 Jun 2004
Posts: 993
Thanks for your reply, and here's another question, is there any significant performance difference between JChem PostgreSQL Cartridge and JChem Oracle Cartridge ?
Yes definitely, JChem PostgreSQL Cartridge (JPC) is faster for queries returning only small number of hits (like under few thousand). But there is an other major difference. JPC has higher memory footage then JChem Oracle Cartridge (JOC), however if the memory needed to cache all the structures is not available then JOC just cannot work, while JPC can still work. So as you can see there are multiple factors to consider.
I reproduced the previous "wall time limit reached" error, this time was copy 10 millions smiles into an indexed table, I tried many times, the error always comes out at certain point (for me is copy the batch starts from 5535000), so could it be caused by some invalid smiles ? But I didn't get any further error information.
ERROR: ChemIndex.cpp:61 OperationAborted:
*** class com.chemaxon.zetor.api.exceptions.OperationAbortedException
Wall time limit reached
CONTEXT: COPY jchem_10m_mol, line 5535000
Thanks,
William
Krisztina ChemAxon personnel
Joined: 27 May 2011
Posts: 375
Yes, unfortunately one erroneous / invalid smiles can produce this error. Could you identify and send us this molecule in smiles ? If the molecule is confidential, you can send it to jpc-support _at_ chemaxon.com.
We think that the erroneous molecule is between lines 5530000 and 5535000 because indexing runs in batches of 5000 molecules, by default.
Would you copy these 5000 lines (5000 smiles) in a new text file and try to import and index them separately, but before starting the create index process, please run
set chemaxon.index_creation_batch_size to 1;
This way, the batch size will be changed to 1 in session level.
An other independent idea is to increase the wall_time_limit by
All of these setting can be modified in the /etc/postgresql/9.5/main/postgresql.conf file as well, but in that case after the modification postgresql service must be restarted.
Thank you for the molecule. Unfortunately, it really freezes the indexing in PostgreSQL Cartridge. Additionally, this molecule freezes the indexing in JChem Oracle Cartridge as well. Now we start to investigate what causes this behavior and will let you know when the issue is fixed.
Until then, as a workaround, we can only recommend to delete this molecules from the dataset.
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You cannot download files in this forum