Table of Contents
Performing an Overlap Analysis
Background
Instant JChem provides a simple way to look for identical or similar structures
in two database tables. This allows you to answer questions like:
- Which structures in this vendor library are already contained in my own database?
- What proportion of structures in one database are more than 85% similar to a
structure in another database?
To perform an overlap analysis:
- Create a JChem table for
each of the sets of structures you wish to analyse. These do not have to be in the
same database, but typically are. Here we will assume that you have created two
JChem tables, one with the query structures and one with the target structures.
- Choose Tools -> Chemistry -> Overlap Analysis. The Overlap Analysis
wizard opens.
Step 1: Query and Target selection
- Select the query and target tables (these can be the same tables if you
want to do a self-comparison), by selecting them from the mini
Project Explorer and clicking on the appropriate 'Set as...' button:
Step 2: Search options
- Specify the maximum number of hits to report for each query structure (this is most
useful when running a similarity search).
- Specify the type of search (perfect, exact, similarity).
- Specify search options. These are the same as the structure search options when
running a query.
- Click Next to move to the next step
Step 3: Output options
- Specify a name to use as a prefix for the resulting Fields. When
running multiple sets of overlap analysis on the same query
table, each one must be provided with a unique name to use as the field prefix.
- Click 'Finish' to start the analysis. For large numbers of structures this can
take some time to complete.
- A report is written to the Output window, and progress can be followed by
looking at the progress monitor in the bottom right corner of the main window:
Results
Fields are added to the query table displaying the results of the search for each structure.
The following fields may be added:
- A field with a count of the number of hits found
- A field with the highest similarity score (only present when a similarity search is run)
- A field visiting the cd_id values of the hits (and the similarity score for each hit
if a similarity search was run)
These fields can be sorted and queried like any normal field. This can be very helpful when
performing an analysis of the results.
Copyright © 1998-2008
ChemAxon Ltd.
All rights reserved.