Technical Support Forum Index
Technical Support Forum
Access ChemAxon scientists and developers here. For registration and login issues contact website support.
smilarity search calculation
To watch this topic for replies  Register (enables digests) or give email address:
Reply to topic
Display posts from previous:   
    View previous topic :: View next topic    
Author Message
renjutj

Joined: 01 Mar 2003
Posts: 160

View user's profile

Back to top
Link to postPosted: Wed Nov 10, 2010 6:47 pmPost subject: smilarity search calculation Reply with quote

Hi,

I have attached 2 structures and our chemist believes that they should be reported as 85% or more similar when running similarity search. In fact, Benchware Dataminer that uses Unity fingerprint does that. However, all tools in Chemaxon (tried jcsearch, compr, instant jchem, jchem web services) report that they are less than 20% similar. Any ideas on why this is happening..

Thanks




 Filename: 36440.mol    Filesize: 3.44 KB    Downloaded: 79 Time(s)
 Description:  

 Filename: 39048.mol    Filesize: 1.75 KB    Downloaded: 79 Time(s)
 Description:  
rwagner
ChemAxon personnel
Joined: 23 Nov 2007
Posts: 215

View user's profile

Back to top
Link to postPosted: Mon Nov 15, 2010 11:31 amPost subject: Reply with quote

Hi,

The 39048.mol structure is aliphatic while the other one is aromatic using general aromatization method.

If the aliphatic one is changed regarding aromaticity (see the attached structure.), then a high similarity value is achieved. 

Using basic aromatization all the structures are considered aliphatic and the two original structures have a similarity of 78%. Which aromatization method are you using?

Bye,

Robert




 Filename: 39048_2.mol    Filesize: 1.72 KB    Downloaded: 68 Time(s)
 Description:  
renjutj

Joined: 01 Mar 2003
Posts: 160

View user's profile

Back to top
Link to postPosted: Mon Nov 15, 2010 7:07 pmPost subject: Reply with quote

I do not specify any aromatic definitions when using JChem (during import, structure searches etc). So, it is using whatever comes with it as the default. I am thinking that the default standardizer when using jcman to import structures into the database should have aromatized this structure. I know this has caused issues for us in the past when we upgraded JChem and that upgrade would change the default aromaticity. So, the question is how do I overcome this problem. Should I use standardize to specify an aromaticity definition? What are the drawbacks in doing that? Any pointers would be helpful.

Thanks

rwagner
ChemAxon personnel
Joined: 23 Nov 2007
Posts: 215

View user's profile

Back to top
Link to postPosted: Tue Nov 16, 2010 9:52 amPost subject: Reply with quote

Hi,

 

If you are not specifying any special aromatization, then the default one is used which is general aromatization. This is generally suitable, however in your case different aromaticity is returned for you two structures. 

You can read about aromatization here:

http://www.chemaxon.com/jchem/marvin/help/sci/aromatization-doc.html

Using basic-aromatization the two structures have the same aromaticity and thus have higher similarity.

You (or your chemists) can read the documentation above which explains the differences. Although in this case basic aromatization yields a better result for you, I really suggest to read the docs to know which method is generally preferable for you.

Ways for specifying the aromatization method are listed here:

http://www.chemaxon.com/jchem/doc/user/query_standard.html#aromatization

Bye,

Robert

renjutj

Joined: 01 Mar 2003
Posts: 160

View user's profile

Back to top
Link to postPosted: Tue Nov 16, 2010 10:39 pmPost subject: Reply with quote

Thanks for your response, Robert. I think JChem changed their default aromatization from 'basic' to 'general' after the 3.2 release. So, I am assuming that since I currently work with v 5.3, all my structures (even those that had been in the compounds table prior to 3.2 release) are standardized to the general method. Am I right?

Now, the documentation also states that 

"All transformation methods work only in structures which are in non-aromatic representation. If the molecules are in partially aromatic form (containing any aromatic bond) the transformation method may fail."

 Since we have structures from many different vendors with different representations of aromaticity, I wonder if we first need to run Standardizer with the de-aromatize option and then run it again with one of the aromatization options to insure consistent representation of aromatic rings. Please advise

Szabolcs
ChemAxon personnel
Joined: 03 Jun 2004
Posts: 1924

View user's profile
Visit poster's website

Back to top
Link to postPosted: Wed Nov 17, 2010 8:49 amPost subject: Reply with quote

Hi Renju,

Yes, the general way to overcome this problem is to first dearomatize and then aromatize the molecules. If some of your sources may already be aromatized by an unknown aromaticity model, it may make sense to de-aromatize indeed.

But I think that the documentation is only referring to very complex cases where fused ring systems are partially aromatized.

 

You do not need to run Standardizer twice for this. It is possible to specify both actions in the same standardizer configuration. ("dearomatize..aromatize" using the short configuration.)

 

Furthermore, you can assign the standardization directly to the JChem table or JChem index.

See more details here:

https://www.chemaxon.com/jchem/doc/user/query_standard.html#standardizationDB

 

Another note is that descriptor tables provided by JChem Screen offer many different similarity calculations. However, they all start from the standardized form, so the desired aromatization method needs to be found first.

You can find more information about the other available similarity descriptors and metrics here:

https://www.chemaxon.com/products/screen/

 

Best regards,

Szabolcs

renjutj

Joined: 01 Mar 2003
Posts: 160

View user's profile

Back to top
Link to postPosted: Thu Nov 18, 2010 5:15 amPost subject: Reply with quote

I ran the dearomatize..aromatize:b option with standardize as suggested and it looks like both the structures have been converted into their aliphatic forms. However, in their aliphatic forms the similarity of course does not match the 78%. How did you manage to convert the above structure (39048.mol) into the aromatic form? I have attached the structures after I ran the standardizer.




 Filename: 36440_STD.sdf    Filesize: 3.35 KB    Downloaded: 63 Time(s)
 Description:  

 Filename: 39048_STD.sdf    Filesize: 1.72 KB    Downloaded: 119 Time(s)
 Description:  
rwagner
ChemAxon personnel
Joined: 23 Nov 2007
Posts: 215

View user's profile

Back to top
Link to postPosted: Fri Nov 19, 2010 8:51 pmPost subject: Reply with quote

Hi,

 

The attached two structures yield me a similarity of ~78% (dissimilarity 21.68%) when using basic aromatization and 21% similarity (79% dissimilarity) when using general aromatization.

How did you calculate the similarity value? If you do the similarity searching in memory and the calculation performs standardization then you need to specify the aromatization method, otherwise the general aromatization is performed:

e.g. jcsearch:

jcsearch -t:i:0.9 -q 36440_STD.sdf  39048_STD.sdf -f mrv

shows 79% dissimilarity

jcsearch -t:i:0.9 -q 36440_STD.sdf  39048_STD.sdf -f mrv -S "aromatize:b"

shows 21%  dissimilarity.

Does this help you?

Bye,

Robert

mvargyas
ChemAxon personnel
Joined: 21 May 2004
Posts: 1185

View user's profile
Visit poster's website

Back to top
Link to postPosted: Sun Nov 21, 2010 1:45 amPost subject: Reply with quote

Hi, 

Do you know which similarity metric does Benchware Dataminer use? That can also make a difference. 

I checked the similarity score of these two using ECFP (soon to be released) and while the ChemAxon fingerprint resulted in 0.88 dissimilarity, ECFP showed 0.79 (so not much different). I used Tanimoto.

Regards

Miklos

renjutj

Joined: 01 Mar 2003
Posts: 160

View user's profile

Back to top
Link to postPosted: Sun Nov 21, 2010 1:56 amPost subject: Reply with quote

Benchware uses UNITY fingerprints with Tanimoto comparison. So, in response to Robert's response: when you compare the aliphatic forms of 2 compounds you get approx 20% similarity and when you compare its aromatic forms you get 80% similarity. My question is why is that the case? Shouldn't it be  within 5-10% range?

Now, the 2 structures I supplied with the _STD in their names have been first dearomatised and then aromatised using the basic option. But if you look at the structure they both look to be in their aliphatic forms. So my next question is why did they not convert into their aromatic forms when I ran them through the standardizer?

Also, Robert specified the basic aromatize option when he ran jcsearch. Is there a way to specify that when using the web services. I use the beginsearch (https://www.chemaxon.com/webservices/soap/JChemSearchWS.html#beginsearch) SOAP service to run my searches. Can the 'vaguebond' option in beginsearch be used to specify the aromaticity type?

Thanks a lot for helping me get a handle on this.

rwagner
ChemAxon personnel
Joined: 23 Nov 2007
Posts: 215

View user's profile

Back to top
Link to postPosted: Mon Nov 22, 2010 9:19 amPost subject: Reply with quote

Hi

High similarity is always achived if the rings in the two molecules have the same aromaticity.

Regarding second aromatization: The dearomatize-aromatize with basic returned a molecule with double/single bonds because using basic aromatization both molecules are aliphatic.

Bye,

Robert

jlee

Joined: 11 Aug 2008
Posts: 720

View user's profile

Back to top
Link to postPosted: Mon Nov 22, 2010 11:22 amPost subject: Reply with quote

Renju,

You can use the standardize web service to run standardization rules on a molecule before running the search.

https://www.chemaxon.com/webservices/soap/StandardizerWS.html#standardize

 

Jonathan Lee

 

renjutj

Joined: 01 Mar 2003
Posts: 160

View user's profile

Back to top
Link to postPosted: Mon Nov 22, 2010 3:56 pmPost subject: Reply with quote

Jonathan and Robert,

Thanks for responding. So, if I understand right the similarity between two structures that are in their aliphatic forms could be completely different and have no relationship to the simiarity when they are are in their aromatic forms. My understanding earlier was that if similarity value between 2 structures in their aliphatic form is X and between their aromatic form is Y, then X and Y should be withing the 10% of each other. It looks like that that is not the case.

Jonathan, how will your solution to standardize work? If you follow the thread here, I actually used standardize to dearomatize+aromatize the structures in question. That does not help me achieve the desired similarity. What does is the fact that what aromatization you specify when running the search. If you look at Robert's post earlier, he specifies the basic aromatization optin here:
jcsearch -t:i:0.9 -q 36440_STD.sdf  39048_STD.sdf -f mrv -S "aromatize:b" 

That's when the standardized structures achieve 78% similarity. So, I don't think I am looking for solution on how to standardise the structures before running the search. And anyways, if I had to do that I would obviously run standardizer on my compounds table rather than running it at search time. My question is if the vaguebond option is analogous to  the -S "aromatize:b" option in jcsearch.

Thanks again..

volfi
ChemAxon personnel
Joined: 07 Jun 2004
Posts: 801

View user's profile
Visit poster's website

Back to top
Link to postPosted: Mon Nov 22, 2010 5:17 pmPost subject: Reply with quote

The discussion has passed already the important question:

"Are 36440.mol and 39048.mol aromatic?"

No they are not aromatic.

But ChemAxon has two aromatization method: basic and general (both has advantages and disadvantages). The basic aromatization method correctly leaves the two molecules unchanged. The general method converts the rings in 36440.mol to aromatic form (which is not correct). What happens in this case is that the two ring cannot be aromatic alone, but the ring system can be converted to aromatic form according to the general aromatization (see attached picture).

I hope this help

Andras




 Filename: 39048.png    Filesize: 13.19 KB    Viewed: 6575 Time(s)
 Description:  
39048.png
renjutj

Joined: 01 Mar 2003
Posts: 160

View user's profile

Back to top
Link to postPosted: Mon Nov 22, 2010 9:49 pmPost subject: Reply with quote

Thanks for responding. I think I am now confused on your suggestion to go back to general aromatization. So, bottom line - what must I do to get high similarity between these two structures. I used the JChem Web Services to run all my structure searches. From your suggestion, it is clear that the standardizer is not going to help me. If I standardize using basic aromatization, it converts them both to aliphatic and that does not yield good similarity. If I do general, it considers 36440 aromatic and 39048 aliphatic, which still does not solve the problem.

Szabolcs
ChemAxon personnel
Joined: 03 Jun 2004
Posts: 1924

View user's profile
Visit poster's website

Back to top
Link to postPosted: Tue Nov 23, 2010 9:03 amPost subject: Reply with quote

Hi Renju,

Volfi did not suggest to go back to general aromatization, he only explained why this method works differently.

From the above discussion, it seems that you need to stick with basic aromaticity, and maybe explore other descriptors and / or similarity metrics. (See more information here: https://www.chemaxon.com/products/screen/ )

 

Regarding your question about vague bond options: Vague bond option is only available for substructure, full structure, full fragment and superstructure searches. It is not working in case of similarity search.

rwagner
ChemAxon personnel
Joined: 23 Nov 2007
Posts: 215

View user's profile

Back to top
Link to postPosted: Wed Nov 24, 2010 12:58 pmPost subject: Reply with quote

Hi,

 

Some points are not clear yet.

In case of general aromatization the two structures are aromatized differently and therefore have small similarity. This is clear for both of us.

In order to avoid this the two structures were dearomatized and aromatized using basic method again leading to the "_STD" structures. These are aliphatic because basic aromatization doesn't turn them to aromatic.

What's unclear that's the fact that you received small similarity value for these. There should be a similarity above 70%.

Please check:

jcsearch -t:i:0.9 -q 36440_STD.sdf  39048_STD.sdf -f mrv -S "aromatize:b"

Please note that this command will print out dissimilarity as default, which is around 21% in this case, meaning 78% similarity.(jchem 5.3.8) Do you obtain these values?

If you use a tool for determining the similarity between the two "_STD" structures, that performs standardization then you have to ensure that this is the basic aromatization otherwise they are aromatized again with general aromatization. This will result in low similarity again (as explained earlier). e.g.:

jcsearch -t:i:0.9 -q 36440_STD.sdf  39048_STD.sdf -f mrv

which yields 79% dissimilarity, meaning 21% similarity.

 

Hence using the basic aromatized versions for similarity search in a way that a general aromatization isn't carried out again leads to high similarity values as your chemists desire.

Please consider the differences between general and basic aromatization, it may be true that in this special case basic aromatization is more suitable but otherwise general aromatization is better.

 

Bye,

Robert

renjutj

Joined: 01 Mar 2003
Posts: 160

View user's profile

Back to top
Link to postPosted: Wed Nov 24, 2010 6:54 pmPost subject: Reply with quote

Hi Robert,

Thanks for thinking ahead and clarifying the issue that was looming in my mind. So, clearly running the standardizer to deromatise and aromatize does not help me achieve the high similarity, because jcsearch uses general aromatization when doing the comparison. What does, is the fact that you specify the basic aromatization WHEN RUNNING THE SEARCH. Now, I see that even I get a high similarity when doing that. I use 5.3.1 and even I get 78% similarity when running :

jcsearch -t:i:0.9 -q 36440_STD.sdf  39048_STD.sdf -f mrv -S "aromatize:b"

So now, my only question is how to do this using Web Services. Jonathan Lee, from Web services suggested running the standardizer. As you clearly noted above, running the standardizer before running the search does not help us achieve this high similarity. We need a way within Web Services search program to specify the basic aromatization option AT RUN-TIME to get this result. Is there a way to do this? I asked about the vaguebond option since this is a run-time option within JChemSearch; but unfortunately that does not work with similarity searching as Szabolcs noted above. So, is there a way at all to get this result using Web Services? The software program that I use is entirely web-based and I don't like running command-based programs on the server to get this result.

Renju


jlee

Joined: 11 Aug 2008
Posts: 720

View user's profile

Back to top
Link to postPosted: Thu Nov 25, 2010 4:57 pmPost subject: Web Service solution Reply with quote

Renju,

If you want to use the JChem Search Web Service, you must create a database table and import the molecules to search through.  Then include the query moleculte to the JChem Search Web Service. 

Upon creating the database table (using JChem Manager, for example) you should select a standardization configuration (e.g. aromatize:b).  When using the JChem Search Web Service, the query molecule will be standardized using the configuration before being compared with the targets in the table (which are standardized upon import).  During the comparison, if both query and target are aliphatic, then it will achieve a high similarity score. 

Please remember to be careful about the standardizations that your molecules will go through.  A molecule that has general aromatization and undergoes a basic aromatization will not be similar to a dearomatized molecule that undergoes a basic aromatization.  This is because the first molecule already has a aromatization when trying to aromatize again.  So as a precaution, it might be beneficial to include a dearomatization and a basic aromatization in the standardization configuration you include upon creation of the database table.

Jon

renjutj

Joined: 01 Mar 2003
Posts: 160

View user's profile

Back to top
Link to postPosted: Fri Dec 10, 2010 8:26 pmPost subject: Reply with quote

Hi Jon,

To test your suggestion, I created a table within Instant JChem with the dearomatize + aromatize:b option in the standardizer. I then imported the two structures in question and was unable to achieve the high similarity. If you refer to this thread, you can see that in their aliphatic forms the two structures do not report high similarity. That's why Robert had to to specify basic aromatization during the search even after these structures ran through the custom standardizer with the dearom + arom:b option. For the structures to achieve high similarity, they need to be searched in this way:

jcsearch -t:i:0.9 -q 36440_STD.sdf  39048_STD.sdf -f mrv -S "aromatize:b"

And that's what my question is. Is there a way to do this within web services. Running them through a standardizer during import is not helping, because as I said earlier that only helps to convert them into their aliphatic forms, which does not give high similarity..

Thanks for your help.

rwagner
ChemAxon personnel
Joined: 23 Nov 2007
Posts: 215

View user's profile

Back to top
Link to postPosted: Mon Dec 13, 2010 8:53 pmPost subject: Reply with quote

Hi,

 

The search on a table created with dearomatization-basic aromatization option should work.

To test this try:

jcman c simtest --stconfig dearomArom.xml   (attached here)

jcman a simtest 36440.mol    (your structure attached earlier in this topic)

jcman a simtest 39048.mol

jcsearch -q 36440.mol DB:simtest -t:i:0.9 -f mrv

This last command should dump an mrv with the two structures one with zero the other with 21.6% dissimilarity, which means a high similarity. During searching on a DB table you don't need to specify standardization configuration, because it's the st. configuration of the DB table that will be used.  Entering these commands do you obtain the same results?

We are still invetingating why instant JChem doesn't yield the desired results.

Robert




 Filename: dearomArom.xml    Filesize: 333 Bytes    Downloaded: 67 Time(s)
 Description:  
renjutj

Joined: 01 Mar 2003
Posts: 160

View user's profile

Back to top
Link to postPosted: Mon Jan 17, 2011 9:49 pmPost subject: Reply with quote

Yes, I ran this using JChem Base UI. I am able to get the high similarity. Thanks for your patience. So, I guess Instant JChem does not use the custom aromatization settings when running the search..

vbors
ChemAxon personnel
Joined: 05 Oct 2010
Posts: 20

View user's profile

Back to top
Link to postPosted: Tue Jan 18, 2011 3:13 pmPost subject: Reply with quote

Hi All,

The problem has been found. It is not in the search mechanism, but rather in the tool that displays the results.

This tool is recalculating the similarity value, but it uses different fingerprint and standardization settings. Fix will be available in version 5.5

 

Kind regards,

Vencel

Reply to topic
Page 1 of 1


To watch this topic for replies   Register (enables digests) or give email address  
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum