3. Structure Searching

  • 3.1 Database Searching
  • 3.2 Structure Searching in memory and flat files
  • 3.3 Stereo Notes
  • 3.1 Database Searching

    To assist searching structures in a database, JChem provides the chemaxon.jchem.db.JChemSearch JavaBean. The following search types are supported:

    Using a connection object (ConnectionHandler) passed by the caller method, JChemSearch retrieves all structures that match the search criteria from the given structure table and returns their cd_id values in an int array.

    Oracle users may also use JChem Cartridge for Oracle to perform search and other operations via SQL commands.

    Comments:

    Defining Queries in Web Applications

    It is recommended to apply MarvinSketch as a tool for drawing query structures.

    Steps of creating a web page for entering query structures:

    1. Include a form with a hidden variable with the page that contains MarvinSketch.
    2. On submitting the form call MSketch.getMol(...). For example, if the name of the hidden input variable is "molfile" and you need the structure in Marvin format use this call in JavaScript:
          form.molfile.value=document.MSketch.getMol('mrv');
      
      (You can also get the structure in other formats, like MDL's Molfile or SMILES, but the Marvin format is recommended as it can represent all molecule and query features that are available in Marvin Sketch. You can find more information about file formats here.)
    3. Query the requested form's variable in your servlet or server-side script and submit it to the JChemSearch class

    Initializing Search

    After creating a JChemSearch object, setting the following properties is necessary:
    queryStructure the query structure in Smiles, Molfile, or other format
    connectionHandler specifies the connection
    structureTable the table in the database where the structures are stored
    searchType specifies the type of the search.
    Values:
    JChemSearch.SUBSTRUCTUREsubstructure search (default)
    JChemSearch.SUPERSTRUCTUREsuperstructure search
    JChemSearch.PERFECTperfect structure search
    JChemSearch.EXACTexact structure search
    JChemSearch.EXACT_FRAGMENTexact fragment search
    JChemSearch.SIMILARITYsimilarity search
    In addition, various other structure search options can be specified that modify structure search behaviour. For further customization, see the API of the JChemSearch class. Many of these options are detailed in the Substructure Search section.

    Example:

        JChemSearchOptions jcSearchOptions = new JChemSearchOptions();
        jcSearchOptions.setSearchType(SearchConstants.SUBSTRUCTURE);
        JChemSearch jChemSearch = new JChemSearch();
        jChemSearch.setStructureTable(structTableName);
        jChemSearch.setQueryStructure("Brc1ccccc1");
        jChemSearch.setSearchOptions(jcSearchOptions);
        jChemSearch.setConnectionHandler(connectionHandler);
        jChemSearch.run();
    
    Please, see the InitializingSearch.java example in the examples/java/search directory.

    Perfect Structure Search

    This search type can be used to retrieve the same molecule as the query. It is used to check whether a chemical structure already exists in the database, and also during duplicate filter import. All structural features (atom types, isotopes, stereochemistry, query features, etc.) must be the same for matching, but for example coordinates and dimensionality are usually ignored.

    For this search mode there is no search per minute license limitation in JChemBase, these searches are not counted.

    Java example: Throwing an exception if a given structure exists.

        ...				   // Initialize connection
    
        String mol = "Clc1cccc(Br)c1"; // Query in SMILES/SMARTS,
    				   // MDL Molfile or other format
        String structureTableName = "cduser.structures";
    
        JChemSearch searcher = new JChemSearch(); // Create searcher object
        searcher.setQueryStructure(mol);
        searcher.setConnectionHandler(conHandler);
        searcher.setStructureTable(structureTableName);
        searcher.setRunMode(JChemSearch.RUN_MODE_SYNCH_COMPLETE);
        JChemSearcOptions searchOptions = new JChemSearchOptions();
        searchOptions.setSearchType(JChemSearch.PERFECT);
        searcher.setSearchOptions(searchOptions);
        searcher.run();
        if(searcher.getResultCount()>0)
    	throw new Exception("Structure already exists (cd_id=" +
    		searcher.getResult(0) + ")");
        ...
    
    More examples:

    Substructure Search

    Substructure search finds all structures that contain the query structure as a subgraph. Sometimes not only the chemical subgraph is provided, but certain query features also that further restrict the structure. If special molecular features are present on the query (eg. stereochemistry, charge, etc.), only those targets match which also contain the feature. However, if a feature is missing from the query, it is not required to be missing (by default). For more information, see the JChem Query Guide.

    Searching starts with a fast screening phase where query and database fingerprints are compared. If the result of the screening is positive (meaning that a fit is possible) for a database structure, then an atom-by-atom search (ABAS) is also performed. Query structures may contain query atoms and bonds described earlier.

    Starting a Search

    The initialization of substructure searching is similar to perfect searching, but the searchType property of JChemSearch should be set to JChemSearch.SUBSTRUCTURE.

    Java example:

        ...				         // Initialize connection
    
        String mol = "[*]c1cccc([Cl,Br])c1"; // Query structure
        String structureTableName = "cduser.structures";
    
        JChemSearch searcher = new JChemSearch(); // Create searcher object
        searcher.setQueryStructure(mol);
        searcher.setConnectionHandler(conHandler);
        searcher.setStructureTable(structureTableName);
        JChemSearcOptions searchOptions = new JChemSearchOptions();
        searchOptions.setSearchType(JChemSearch.SUBSTRUCTURE);
        searcher.setSearchOptions(searchOptions);
        searcher.run();
        ...
    
    More examples:

    Running Search in a Separate Thread

    Since substructure searching can be time consuming, it is reasonable to create a new thread for the search. If the runMode property of JChemSearch is set to JChemSearch.RUN_MODE_ASYNCH_COMPLETE, then searching runs in a separate thread.

    The progress of the search can be checked by the following properties of JChemSearch:
    running checks if searching is still running
    progressMessage textual information about the phase of the search process
    resultCount the number of hits found so far
    currentId the cd_id value of the molecule being checked

    Java application example:

        searcher.setRunMode(JChemSearch.RUN_MODE_ASYNCH_COMPLETE);
        searcher.run();
        while(searcher.isRunning()) {
    	String msg = searcher.getProgressMessage();
    	int count = searcher.getResultCount();
    	int lastId = searcher.getCurrentId();
    	...                 // Displaying
    	Thread.sleep(2000);
        }
    
    Please, see the SeparateSearchThread.java example in the examples/java/search directory.

    JSP example:

        ...
        if(searcher.isRunning()) {
    	%>
    	<p>Please wait. Searching.....
    	<p><%= searcher.getProgressMessage() >%
    	<p>Hits: <%= searcher.getResultCount() %>
    	<p>
    	<%
    	if(searcher.getCurrentId()>0) {
    	    %>
    	    Current id: <%= searcher.getCurrentId() %>
    	    <%
    	}
    	...
    	%>
    	<script LANGUAGE="JavaScript">
    	<!--
    	window.setTimeout('window.location.reload(false)', 5000);
    	//-->
    	</script>
    	>%
        }
        ...
    
    More examples:

    Retrieving Results

    If the resultTableMode property of JChemSearch is set to JChemSearch.NO_RESULT_TABLE, then the following properties can be used for retrieving the results:
    resultCount the number of hits found
    maxTimeReached returns true if the search stopped because the time that passed since the start of the searched had reached the maximum value
    maxResultCountReached returns true if the search stopped because the number of hits had reached the maximum value
    result returns the cd_id value of a found compound specified by an index value.
    exception, error
    errorMessage
    if an error occurred during the search these properties provide information about the problem

    The recommended way of retrieving the results of the search

    Java example:

            int[] cdIds = jChemSearch.getResults();
    
            String retrieverSql =
                    "SELECT cd_molweight from " + structTableName
                            + " where cd_id = ?";
            PreparedStatement ps =
                    connectionHandler.getConnection().prepareStatement(
                            retrieverSql);
            try {
                for (int i = 0; i < cdIds.length; i++) {
                    int cdId = cdIds[i];
                    ps.setInt(1, cdId);
                    ResultSet rs = ps.executeQuery();
                    if (rs.next()) {
                        System.out.println("Mass: " + rs.getDouble(1));
                    } else {
                        ; // has been deleted in the meantime?
                    }
                }
            } finally {
                ps.close();
            }
    
    Please, see the RetrievingResults.java example in the examples/java/search directory.

    JSP example for displaying search results on a web page by a JSP script:

        ...
        Hits: <%= searcher.getResultCount() %> 
        <font size="1">
        <%= searcher.isMaxResultCountReached()? " (maximum hits reached)" : "" %>
        <%= searcher.isMaxTimeReached()? " (maximum time reached)" : "" %>
        </font>
        <center>
        <script LANGUAGE="JavaScript1.1" SRC="../../../marvin/marvin.js">
        </script>
        <script LANGUAGE="JavaScript1.1">
        <!--
        mview_name="mview";
        mview_begin("../../../marvin",
    	    "<%= cols*cellWidth+cols-1 %>",
    	    "<%= rows*cellHeight+rows-1 %>");
        mview_param("rows", "<%= rows %>");
        mview_param("cols", "<%= cols %>");
        mview_param("navmode", "rot3d");
        mview_param("molbg", "#000000");
        mview_param("bgcolor", "#e0e0e0");
        mview_param("border", "1");
        mview_param("animate", "0");
        mview_param("layout0", ":4:1:"+
    	"L:0:0:1:1:w:n:0:10:"+ <%/* ID              */%>
    	"L:1:0:1:1:w:n:0:10:"+ <%/* Stock           */%>
    	"M:2:0:1:1:c:n:1:10"); <%/* Molecule        */%>
        mview_param("param0", ":"+
    	"L:11b:"+
    	"L:10:"+
    	"M:<%= structureWidth %>:<%= structureHeight %>");
        <%
        /*
         * Writing data into the cells in the applet
         */
        int cellIndex=0;
        for(int i=start; i<start+cells; i++) {
    	int id = searcher.getResult(i);
    	/*
    	 * SQL statement for retrieving structures
    	 */
    	String sql =
    		"SELECT " +
    		structureTableName + ".cd_id, "+
    		structureTableName + ".cd_structure, " +
    		stockTableName + ".quantity\n" +
    		"FROM "+ structureTableName + ", " +
    		stockTableName + "\n" +
    		"WHERE " +
    		structureTableName + ".cd_id = " +
    		stockTableName + ".cd_id AND " +
    		structureTableName + ".cd_id = " + id;
    	Statement stmt = con.createStatement();
    	ResultSet rs = stmt.executeQuery(sql);
    	try {
    	    if(rs.next()) {
    		String dbMolfile = new String(DatabaseTools.readBytes(rs, 2),"ASCII");
    		float dbQuantity = rs.getFloat(3);
    		%>
    		mview_param("cell<%= cellIndex %>",
    		    "|ID: <%= id %>"+
    		    "|<%= Math.round(dbQuantity) %> mg"+
    		    "|<%= HTMLTools.convertForJavaScript(
    			dbMolfile) %>");
    		<%
    	    }
    	} finally {
    	    rs.close();
    	    stmt.close();
    	}
    	cellIndex++;
        }
        %>
        mview_end();
        //-->
        </script>
        </center>
        ...
    
    Click here to see the live result of the above code.

    More JSP examples:

    To store the results in a table, the name of the table should be specified by the resultTable property of JChemSearch, and also the resultTableMode property should be set to either JChemSearch.CREATE_OR_REPLACE_RESULT_TABLE or JChemSearch.APPEND_TO_RESULT_TABLE.

    Retrieving hits as soon as they are found

    If the runMode property of JChemSearch is set to JChemSearch.RUN_MODE_ASYNCH_PROGRESSIVE, then searching runs in a separate thread and hits can be retrieved as soon as they are found. Note that this mode does not support any ordering:
        ArrayList hitsByPages = new ArrayList();
        int[] nextPage = new int[NR_OF_HITS_PER_PAGE];
        int idxForNextPage = 0;
        searcher.setOrder(JChemSearch.NO_ORDERING);
        searcher.setRunMode(JChemSearch.RUN_MODE_ASYNCH_PROGRESSIVE);
        while(searcher.hasMoreHits()) {
            nextPage[idxForNextPage++] = searcher.getNextHit();
            if (idxForNextPage == NR_OF_HITS_PER_PAGE) {
                synchronized (hitsByPages) {
                    hitsByPages.add(nextPage);
                    hitsByPages.notifyAll(); // notify any who may be in wait for
                                             // the next page
                    nextPage = new int[NR_OF_HITS_PER_PAGE];
                    idxForNextPage = 0;
                }
            }
        }
        // hits for the last page if any
        if (idxForNextPage > 0) {
            int[] lastPage = new int[idxForNextPage];
            System.arrayCopy(nextPage, 0, lastPage, 0, idxForNextPage - 1);
            synchronized (hitsByPages) {
                hitsByPages.add(lastPage); 
                hitsByPages.notifyAll(); // notify any who may be in wait for
                                         // the next page
            }
        }
    

    Caching Structures

    To boost the speed of substructure searching, JChem caches fingerprints and structures in the searcher application's memory. For more information, see the JChem database concepts section.

    Combining SQL queries with Structure Searching

    Many times structure information is only one of several conditions that a complex query has to check. In those cases structure searches should be combined with SQL queries.

    Example: Suppose that quantities on stock are stored in a table different from the structure table. We are querying compounds that contain a given substructure and their quantity on stock is not less than a given value.

    Two ways of performing the combined query:

    Setting More Properties

    There are several other properties that modify the behavior of JChemSearch, like
    maxResultCount The maximum number of molecules that can be found by the search.
    maxTime The maximum amount of time in milliseconds, which is available for searching.
    stringToAppend A string (like an ORDER BY sub-expression) to be appended to the SQL expression used for screening and retrieving rows from the structure table.
    infoToStdError If set to true, information useful for testing will be written in the servlet server's error log file.
    order Specifies the order of the result.

    Java example:

            JChemSearchOptions jcSearchOptions = new JChemSearchOptions();
            jcSearchOptions.setSearchType(SearchConstants.SIMILARITY);
            jcSearchOptions.setDissimilarityThreshold((float) 0.6); 
            JChemSearch jChemSearch = new JChemSearch();
            jChemSearch.setStructureTable(structTableName);
            jChemSearch.setQueryStructure("c1ccccc1N");
            jChemSearch.setSearchOptions(jcSearchOptions);
            jChemSearch.setConnectionHandler(connectionHandler);
            // Change the default which is by similarity and id:
            jChemSearch.setOrder(JChemSearch.ORDERING_BY_ID);
    
    Please, see the HitsInSpecificOrder.java example in the examples/java/search directory.

    Superstructure Search

    Superstructure search finds all molecules where the query is superstructure of the target. It can be invoked in a similar fashion as Substructure search.
    Set search type to JChemSearch.SUPERSTRUCTURE.

    Exact Structure Search

    An exact structure search finds molecules that are equal (in size) to the query structure. (No additional fragments or heavy atoms are allowed.) Molecular features (by default) are evaluated the same way as described above for substructure search.

    For this search type, the searchType property of JChemSearch should be set to JChemSearch.EXACT.

    Exact Fragment Search

    Exact fragment search is between substructure and exact search: the query must exactly match to a full fragment of the target. Other fragments may be present in the target, they are ignored. This search type is useful to perform an "Exact structure search" that ignores salts or solvents beside the main structure in the target.

    For this search type, the searchType property of JChemSearch should be set to JChemSearch.EXACT_FRAGMENT.

    Similarity Search

    Similarity searching finds molecules that are similar to the query structure. The search uses the Tanimoto coefficient, which has two arguments:

    The dissimilarity formula contains the Tanimoto coefficient and measures how dissimilar the two molecules are from each other. The formula is defined below:
    NA&B
    1 - divided by
    NA+NB-NA&B
    where NA and NB are the number of bits set in the fingerprint of molecule A and B, respectively, NA&B is the number of bits that are set in both fingerprints.

    The dissimilarity threshold is a number between 0 and 1, which specifies a cutoff limit in the similarity calculation. If the dissimilarity value is less than the threshold, then the query structure and the given database structure are considered similar.

    See more details on fingerprints in the section Parameters for Generating Chemical Hashed Fingerprints

    Similarity searching should be used the same way as substructure searching. To enable similarity searching, the searchType property of JChemSearch should be set to JChemSearch.SIMILARITY. If the order property is set to DatabaseSearch.ORDERING_BY_ID_OR_SIMILARITY (which is the default), then the hits returned by the getResult() method will be sorted in increasing order of dissimilarity.

    The dissimilarity threshold is set on JChemSearchOptions with this function:
    setDissimilarityThreshold(float dissimilarityThreshold) Sets the dissimilarity threshold. Expects a float value between 0 and 1. A lower threshold results in hits that are more similar to the query structure.

    The dissimilarity values predicted in the similarity calculation are retrieved with the JChemSearch instance with this function:
    getDissimilarity(int index) Returns the predicted dissimilarity value for the hit corresponding to the given index.

    Java example:

        ...
        JChemSearch searcher = new JChemSearch(); // Create searcher object
        searcher.setQueryStructure(mol);
        searcher.setConnectionHandler(conHandler);
        searcher.setStructureTable(structureTableName);
        JChemSearcOptions searchOptions = new JChemSearchOptions();
        searchOptions.setSearchType(JChemSearch.SIMILARITY);
        searchOptions.setDissimilarityThreshold(0.2);
        searcher.setSearchOptions(searchOptions);
        searcher.run();
        ...
        for(int i=0; i<searcher.getResultCount(); i++) {
    	float similarity = searcher.getDissimilarity(i);
    	...
        }
        ...
    
    More examples:

    If a result table is generated during a similarity search, then the table will contain both the cd_id and the calculated similarity values.

    Similarity Searching With Molecular Descriptors

    Users can open up new ways of similarity searching by using a number of built-in molecular descriptor types other than the default chemical hashed fingerprints. There are a number of built-in molecular descriptors available, including CF, PF, Burden eigenvalue descriptor (or BCUTTM) and various scalar descriptors.

    The following example shows how simple it is to setup molecular descriptors for your compound library. The first command creates a table called compound_library and the second command adds the molecules from an sdf file. The third command uses the 'c' option to create the molecular descriptor with the name of the structure table set by the -a flag, and the chemical fingerprint descriptor type set by the -k flag. The command omits the database login information that was stored previously with the -s option. See the jcman command options and the GenerateMD command options for more information. Creating and assigning molecular descriptors to database structure tables is discussed with the GenerateMD command.

    
        jcman c compound_library
        jcman a compound_library my_compound_group.sdf
        generatemd c -a compound_library -k CF chemical_fingerprint     
    
    Below is an example that runs the similarity search with the new chemical fingerprint. The molecular descriptor name, chemical_fingerprint, is set as a search option and the similarity search is run normally:
        ...
        JChemSearch searcher = new JChemSearch(); // Create searcher object
        searcher.setQueryStructure(mol);
        searcher.setConnectionHandler(conHandler);
        searcher.setStructureTable("compound_library");
        JChemSearcOptions searchOptions = new JChemSearchOptions();
        searchOptions.setSearchType(JChemSearch.SIMILARITY);
        searchOptions.setDescriptorName("chemical_fingerprint"); 
        searchOptions.setDissimilarityThreshold(0.2);
        searcher.setSearchOptions(searchOptions);
        searcher.run();
        ...
    

    Molecular Descriptor Configuration Options

    Application end-users may need further information about the molecular descriptors to select an appropriate molecular descriptor for their search. Application developers can extract this information from the database and display it to the end-user to help with selection. In this Java example, a MDTableHandler is created using the database connection and the name of the structure table. The MDTableHandler provides access to the Molecular Descriptors and the embedded configurations, metrics, and default dissimilarity thresholds.
      ...
      //Start with database connection handler and name of the structure table.
      MDTableHandler mdth = new MDTableHandler(connectionHandler, structureTableName);
      String[] descriptor_ids = mdth.getMolecularDescriptors();
      for (int x= 0; x < descriptor_ids.length; x++){
        String mdName = descriptor_ids[x];
        MolecularDescriptor descriptor = mdth.createMD(mdName);
        //getting descriptor names:
        String descriptorName = descriptor.getName();
        //getting descriptor comments:
        String descriptorComment = mdth.getMDComment(mdName);
        //getting available metrics for each configuration:
        String[] configNames = mdth.getMDConfigs(mdName);
        for (int i=0; i < configNames.length; i++) {
            MolecularDescriptor tempDesc=(MolecularDescriptor)descriptor.clone();
            String config = mdth.getMDConfig(mdName,configNames[i]);
            tempDesc.setScreeningConfiguration(config);
            
            //getting metric name:
            String metricName = tempDesc.getMetricName();
            //getting default thresholds:
            String defaultThreshold = tempDesc.getThreshold();    
            ...
        }
        //Display code can go here
    	...
      }
    

    After selecting a molecular descriptor and other desired parameters, such as the descriptor configuration and the metric, the custom molecular descriptor name is set as a search option and the similarity search is run as normal. If the descriptor name, configuration or metric is omitted, a stored default value is used.

        ...
        JChemSearch searcher = new JChemSearch(); // Create searcher object
        searcher.setQueryStructure(mol);
        searcher.setConnectionHandler(conHandler);
        searcher.setStructureTable(structureTableName);
        JChemSearcOptions searchOptions = new JChemSearchOptions();
        searchOptions.setSearchType(JChemSearch.SIMILARITY);
        searchOptions.setDescriptorName(selectedDescriptor); 
        searchOptions.setDescriptorConfig(selectedConfig); 
        searchOptions.setDissimilarityMetric(selectedMetric); 
        searchOptions.setDissimilarityThreshold(0.8);
        searcher.setSearchOptions(searchOptions);
        searcher.run();
        ...
    
    More examples:

    Customizing the Molecular Descriptor

    In addition, a cheminformatics expert can generate and fine tune a custom made molecular descriptor. Further information on generating the custom molecular descriptors can be found here.

    Search Access Level

    The maximum number of substructure and similarity searches allowed by the system per minute is determined by the license key entered using JChemManager. If no license key has been specified, then the program is in demo mode that allows one search per minute.

    If a query is started when the number of searches has exceeded the quota, JChemSearch throws MaxSearchFrequencyExceededException. It is recommended to catch this exception and display a friendly message advising the user to try searching later. If this exception occurs frequently, please contact ChemAxon and request a license key allowing more searches. Click here to display a table that helps you to determine the access level that suits your needs.

    3.2 Structure Searching in memory and flat files

    The searching of in-memory molecules (chemaxon.struc.Molecule objects) can be performed by the use of chemaxon.sss.search.MolSearch or chemaxon.sss.search.StandardizedMolSearch classes.

    Files and strings

    If the files to be searchable are only available in a molecular file format in a string or stored in the file system, they have to be imported into Molecule objects by the use of chemaxon.formats.MolImporter or chemaxon.util.MolHandler classes. The code example at the MolSearch API description shows examples for the use of both classes.

    Various Java examples for importing molecules using JChem API are available in Java and HTML format.

    An easy to use command line tool for searching and comparing molecules in files, databases or given as SMILES strings is jcsearch.

    Usage of MolSearch and StandardizedMolSearch

    A search object of these classes compares two Molecule objects (a query and a target) to each other. Usually a MolSearch object is used in the following scenario:
      ms = new MolSearch(); // search object creation
      queryMol.aromatize() // aromatization of query molecule
      ms.setQuery(queryMol); // assignment of query to search
      targetMol.aromatize() // aromatization of target molecule
      ms.setTarget(targetMol); // assignment of target molecule to search
      ms.getSearchOptions().setSearchType(chemaxon.sss.SearchConstants.SUBSTRUCTURE); // search type: SUBSTRUCTURE, PERFECT, etc.
      // set other search options. For more info, see MolSearchOptions and its superclass, SearchOptions
      // search operation
    For StandardizedMolSearch, the aromatization steps can be removed, as this class takes care of this internally. The search operation can be one of the following:

    For further information, see the following resources:

    Duplicate search

    There are several ways for searching for duplicates in a file. First you have to import the file as described in files and strings. Then you can search the read array of chemaxon.struct.Molecule objects in the following ways:
    1. Make a double loop through all the molecules and compare them using MolSearch. (see example code)
    2. Generate unique SMILES representation of the Molecule objects and compare these Strings. For generating unique SMILES strings see: smiles export
      For the comparison efficient data structure can be used (e.g. java.util.TreeSet). An example scenario (full code: ):
        Molecule[] mols;
        ... //import molecules...
        TreeSet smilesTree=new TreeSet(); //for faster searching
        String[] smiles = new String[mols.length];
        for (int i=0;i<mols.length;i++) {
        smiles[i] = mols[i].toFormat("smiles:u"); // create unique smiles
        if (!smilesTree.add(smiles[i])) { // process, if already contained
        ... //handle duplicates
        }
        }
    3. Generate the molecules' hash codes (chemaxon.sss.screen.HashCode.getHashCode) and compare them. These hash codes are equivalent if the molecules are the same, but the equivalence of the code doesn't necessarily imply that the molecules are the same. This should be verified using structure searching. Thus this way of comparison is efficient if the number of duplicates is relatively small compared to the number of molecules.
      Example (full code: ):
        Molecule[] mols;
        MolSearch ms = new MolSearch();
        ... //import molecules...
        HashCode hc = new HashCode();
        int[] codes = new int[mols.length];
        for (int i=0;i
        codes[i] = hc.getHashCode(mols[i]);//generate hash code
        for (int q=0;q<mols.length;q++)
        for (int t=q+1;t<mols.length;t++)
        if (codes[q]==codes[t]) {//if codes equal check with structure searching
          ms.setQuery(mols[q]);
          ms.setTarget(mols[t]);
          if (ms.isMatching()) {
          ...//handle duplicates
          }
        }

    3.3 Stereo Notes

    Tetrahedral centers and double bond stereo configurations are recognized during searching. The information applied by JChem for stereo recognition is

    JChemSearch handles all reasonable structures appropriately. When the query structure is specified in MDL Molfile or Marvin mrv formats for JChemSearch, and E/Z stereoisomers are searched, the stereo search attribute (or stereo care flag) of the bonds has to be set. See the relevant section of the Query Guide. Furthermore, only the following formats supports the enhanced stereo configuration of stereocenters: MDL extended (V3000) formats, Marvin mrv, ChemAxon extended smiles/smarts. More details on these are available at the following sources: The database structures may be imported in MDL SDfile, Molfile or Daylight SMILES format. The JChem Class Library provides classes and applications for interconverting between different formats (e.g. see the chemaxon.util.MolHandler object).

     
    Copyright © 1999-2008 ChemAxon Ltd.    All rights reserved.