Technical Support Forum Index
Technical Support Forum
Access ChemAxon scientists and developers here. For registration and login issues contact website support.
'sphere exclusion' clustering - the speed of performance
To watch this topic for replies  Register (enables digests) or give email address:
Reply to topic
Display posts from previous:   
    View previous topic :: View next topic    
Author Message
bakhras

Joined: 31 Mar 2009
Posts: 7

View user's profile
Visit poster's website

Back to top
Link to postPosted: Thu Jun 09, 2011 3:58 pmPost subject: 'sphere exclusion' clustering - the speed of performance Reply with quote


Could you please advice me on some cluster performance issues.
I need to perform the clustering for a number of compound libraries with ECFP fingerprints. As I understood the 'sphere exclusion' clustering is the most suitable and fastest method in JKlustor for these purposes. I started calculation for the set of 300k compounds about twenty-four hours ago and it's still running. Should it take so long? At this case how long will take to cluster 1-2 mln compounds librariy? Is there any way to increase the speed of calculation?

System specifications:
Windows XP x64 Edition
Intel(R)Core(TM)i5 CPU
760 @ 2.80 GHz
RAM: 4Gb

Command line:
jklustor -c sphex:0.2 -d ecfp carb.smi -o "wrclus:smiles:carb_clus.txt:descs"



P.S. As it can be seen from Task Manager the process uses 32bit Java(1processor and up to 500 Mb of memory), while 64bit Java is present in the system and perfectly works for InstantJChem.
bakhras

Joined: 31 Mar 2009
Posts: 7

View user's profile
Visit poster's website

Back to top
Link to postPosted: Mon Jun 20, 2011 10:10 pmPost subject: Reply with quote

Is there anybody who can help me with that question? Is anybody did JKlustor calculations for thousands compounds?



Last edited by bakhras on Mon Jun 20, 2011 10:44 pm; edited 2 times in total
gimre
ChemAxon personnel
Joined: 29 May 2005
Posts: 291

View user's profile

Back to top
Link to postPosted: Mon Jun 20, 2011 10:34 pmPost subject: Reply with quote

Sorry for the late answer. Consider using -v to turn on verbose mode which will pront out progress messages during the clustering process. In case of sphere exclusion (and also bemis-murcko) clustering is done during input time.

Increasing dissimilarity threshold (0.2 in yor case) will decrease cluster count and speeds up clustering process.

Regards,

Gabor

nanomed

Joined: 12 Jul 2011
Posts: 15

View user's profile

Back to top
Link to postPosted: Thu Aug 25, 2011 3:36 pmPost subject: 'sphere exclusion' clustering Reply with quote

Dear Gabor and Bakhtiyor,

I recently also tried to use jklustor sphere exclusion clustering with "-d ecfp" option for 250k set.

I was running more than 3 days and then failed, but with default descriptor (CF fingerprints - I persume) it gave me results within 20 min.

Is it possible that we are facing here some intrinsic problem with ECFP similar to issues with "compr", see my post: https://www.chemaxon.com/forum/viewpost36439.html#36439

Thank you very much for your suggestions,

Lex

nanomed

Joined: 12 Jul 2011
Posts: 15

View user's profile

Back to top
Link to postPosted: Fri Aug 26, 2011 3:01 pmPost subject: 'sphere exclusion' clustering Reply with quote

I have to bring my apologies, there was obviously some system failure happening when I used sphex with "-d ecfp:tanimoto" option.

I recently triet it once more and it worked just fine, although much slower than the default cfp.

By the way, the default (CFP, as I understand) has produced more meaningful for me clustering results for compounds with fused heterocyclic systems. So, I am curious what parameters are used in JKlustor for both types of fingerprints:

bond depth, bit length, number of bits?

Is it possible to make any ajustements for these parameters for JKlustor in some XML configuration file?

Thank you very much in advance for your suggestions,

Lex

gimre
ChemAxon personnel
Joined: 29 May 2005
Posts: 291

View user's profile

Back to top
Link to postPosted: Tue Aug 30, 2011 3:28 amPost subject: Reply with quote

Dear Lex,

It seems that ecfp tends to find higher dissimilarity values than cfp. Currently sphere exclusion dissimilarity radius parameter should be adjusted depending on the used fingerprint type.

The default fingerprint parameters are used and they can not be modified, but it is a planned feature.

A brief introduction to sphere exclusion parameter tuning will be linked here in the next days.

Regards,

Gabor

nanomed

Joined: 12 Jul 2011
Posts: 15

View user's profile

Back to top
Link to postPosted: Tue Aug 30, 2011 2:53 pmPost subject: 'sphere exclusion' clustering Reply with quote

Dear Gabor,

Thank you very much for your prompt response.

If it is possible, would you mind to tell what are the default parameters for both fingerprints bundled with JKlustor.

It might be helpful for publishing data as well as to figure out the range for the sphere radius ajustment.

Thank you very much in advance,

Lex

gimre
ChemAxon personnel
Joined: 29 May 2005
Posts: 291

View user's profile

Back to top
Link to postPosted: Fri Sep 02, 2011 4:18 amPost subject: Reply with quote

Dear Lex,

Sorry to the late answer.

CFP config XML, (as far as i know also available on a locally installed jchem in examples/config):

<ChemicalFingerprintConfiguration Version="0.3" schemaLocation="cfp.xsd"><Parameters Length="1024" BondCount="7" BitCount="2"/><StandardizerConfiguration Version="0.1"><Actions><Action ID="aromatize" Act="aromatize"/></Actions></StandardizerConfiguration><ScreeningConfiguration><ParametrizedMetrics><ParametrizedMetric Name="Tanimoto" ActiveFamily="Generic" Metric="Tanimoto" Threshold="0.2"/><ParametrizedMetric Name="Euclidean" ActiveFamily="Generic" Metric="Euclidean" Threshold="15"/></ParametrizedMetrics></ScreeningConfiguration></ChemicalFingerprintConfiguration>

ECFP config XML:

<ECFPConfiguration Version="0.1"><Parameters Length="1024" Diameter="4" Counts="no"/><IdentifierConfiguration><!-- Default atom properties (switched on by Value=1) --><Property Name="AtomicNumber" Value="1"/><Property Name="HeavyNeighborCount" Value="1"/><Property Name="HCount" Value="1"/><Property Name="FormalCharge" Value="1"/><Property Name="IsRingAtom" Value="1"/><!-- Other built-in atom properties (switched off by Value=0) --><Property Name="ConnectionCount" Value="0"/><Property Name="Valence" Value="0"/><Property Name="Mass" Value="0"/><Property Name="MassNumber" Value="0"/><Property Name="HasAromaticBond" Value="0"/><Property Name="IsTerminalAtom" Value="0"/><Property Name="IsStereoAtom" Value="0"/></IdentifierConfiguration><StandardizerConfiguration Version="0.1"><Actions><Action ID="aromatize" Act="aromatize"/><RemoveExplicitH ID="RemoveExplicitH" Groups="target"/></Actions></StandardizerConfiguration><ScreeningConfiguration><ParametrizedMetrics><ParametrizedMetric Name="Tanimoto" ActiveFamily="Generic" Metric="Tanimoto" Threshold="0.2"/><ParametrizedMetric Name="Euclidean" ActiveFamily="Generic" Metric="Euclidean" Threshold="10"/></ParametrizedMetrics></ScreeningConfiguration></ECFPConfiguration>

It might be helpful for publishing data as well as to figure out the range for the sphere radius ajustment.

Internally JKlustor uses 0 .. 1 dissimilarity range, usually 0 as the most similar (identical fingerprints) and 1 for the possible most dissimilar values (*). Generally starting sphere exclusion clustering from high (even 0.9) dissimilarity radius and checking cluster sizes (with following import using "-v" option to turn on verbose mode or using "-o wrstat" option to obtain statistics) while decreasing seems to be a useful approach.

JKlustor web gui provides the "matrix" view to compare cluster representants/centroids and individual structures dissimilarity values. Also fingerprint binary representation is visualized on the individual structures page.

Online demo available at http://discoverygroup.chemaxon.com/MGSandbox/jkdemo.jsp (select structures to fetch from URL or upload, set clustering parameters and launch jklustor) or locally you can use jklustor option "-s 88" to lauch web server mode after clustering and connect to it by opening http://localhost:88 in a browser.

(*) For LibraryMC(E)S a metric called "commonbits" implemented which calculated by dividing simultaneously set bits count by fingerprint length and subtracting the result from 1.0. This dissimilarity metric will not give 0 for identicall structures.

Regards,

Gabor

nanomed

Joined: 12 Jul 2011
Posts: 15

View user's profile

Back to top
Link to postPosted: Fri Sep 02, 2011 2:50 pmPost subject: 'sphere exclusion' clustering Reply with quote

Dear Gabor,

Thank you very much for the configuration files.

I am sorry for a silly question, but how can I be shure that JKlustor will read a particular configuration file.

In other words, in which directory (is it '\JChem\examples\config') should I place these .XML files or how can I specify certain path in a command string to my custom configuration .XML's?

Additional naive question, if I get it right - parameters given in these configuration files are the default parameters for the fingerprints whenever they are called by any ChemAxon subroutine?

Thank you very much for your great support,

Lex

gimre
ChemAxon personnel
Joined: 29 May 2005
Posts: 291

View user's profile

Back to top
Link to postPosted: Tue Sep 06, 2011 3:31 pmPost subject: Reply with quote

Dear Lex,

Sorry for the misunderstanding.

In JKlustor the default fingerprint parameters are used which parameters are hardwired in the code. These can not be modified in JKlustor; using some kind of paramateriztaion is a planned feature in the near future. In 5.7 the verbose mode in JKlustor will be extended to print used main parameters (length, etc).

The referenced files contains fingerprint congfiguration examples (which can be used in other products); the main fingerprint parameters in those files match to the hardwired defaults. Configurations in these example files are not (and can not) read by JKlustor.

The actual hardwired configuration used in cfp (this modification in the contents of cfp.xml will be corrected in release 5.7):

<?xml version="1.0" encoding="UTF-8"?>

<ChemicalFingerprintConfiguration Version ="0.3" schemaLocation="cfp.xsd">

<Parameters Length="1024" BondCount="7" BitCount="2"/>

<ScreeningConfiguration>
<ParametrizedMetrics>
<ParametrizedMetric Name="Tanimoto" ActiveFamily="Generic"
Metric="Tanimoto" Threshold="0.2"/>
<ParametrizedMetric Name="Euclidean" ActiveFamily="Generic"
Metric="Euclidean" Threshold="10"/>
<ParametrizedMetric Name="Tversky" ActiveFamily="Generic"
Metric="Tversky" Threshold="0.5" TverskyAlpha="1" TverskyBeta="1"/>
</ParametrizedMetrics>
</ScreeningConfiguration>

</ChemicalFingerprintConfiguration>

regards,

Gabor

gimre
ChemAxon personnel
Joined: 29 May 2005
Posts: 291

View user's profile

Back to top
Link to postPosted: Fri Sep 09, 2011 3:33 amPost subject: Reply with quote

Dear Lex,

A brief introduction to sphere exclusion parameter tuning will be linked here in the next days.


This introduction to sphere exclusion clustering and paramtere handdling available at https://docs.google.com/document/pub?id=1C5xYiV4Gk_UWSWV2UQ-PhWKPGBkZitnOlKICT2qXHKk .

A tracker topic where notifications on the major modification of this document can be found is available at https://www.chemaxon.com/forum/ftopic8015.html

If you have any further questions please do not hesitate to ask them,

Regards,

Gabor

nanomed

Joined: 12 Jul 2011
Posts: 15

View user's profile

Back to top
Link to postPosted: Fri Sep 09, 2011 2:24 pmPost subject: 'sphere exclusion' clustering Reply with quote

Dear Gabor,

Thank you very much for your help.

Actually, the most important pice of information for me was to know that default parameters correspond to my personal preferences for clustering, i.e.

ECFP Parameters: Length="1024" Diameter="4", as I understand bit counts aren't used for ECFP.

Also I am curious, whether FCFP could be invoked on its own.

Many thanks,

Lex


gimre
ChemAxon personnel
Joined: 29 May 2005
Posts: 291

View user's profile

Back to top
Link to postPosted: Tue Sep 13, 2011 3:23 pmPost subject: Reply with quote

Dear Lex,

Also I am curious, whether FCFP could be invoked on its own.

Could you please clarify this question?

Regards,

Gabor

nanomed

Joined: 12 Jul 2011
Posts: 15

View user's profile

Back to top
Link to postPosted: Tue Sep 13, 2011 5:21 pmPost subject: Reply with quote

Dear Gabor,

According to your release note

http://www.chemaxon.com/news/marvin_jchem5-4_launched/

"...Screen (Fast/robust 2D and now 3D ligand-based virtual screening)

- Extended connectivity fingerprint (ECFP) now available (includes FCFP)

- Available both as hashed binary fingerprint and as a list of integer features..."

I thought that FCFP is also implemented.

May be I got it wrong, and FCFP is available only within Sreen module (http://www.chemaxon.com/products/screen/)?

Thanks,

Lex

mvargyas
ChemAxon personnel
Joined: 21 May 2004
Posts: 1185

View user's profile
Visit poster's website

Back to top
Link to postPosted: Tue Sep 13, 2011 5:40 pmPost subject: Reply with quote

Hi Lex,

Indeed, ECFP/FCFP are fully supported in Screen. At present FCFP cannot be used in JKlustor, I'm afraid.

Regards,

Miklos

Reply to topic
Page 1 of 1


To watch this topic for replies   Register (enables digests) or give email address  
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum