chemaxon.jchem.db
Class Importer

java.lang.Object
  extended by java.lang.Thread
      extended by chemaxon.jchem.db.Importer
All Implemented Interfaces:
chemaxon.jchem.db.Transfer, java.lang.Runnable

public class Importer
extends java.lang.Thread
implements chemaxon.jchem.db.Transfer

Tool for importing molecules to database tables from a File or InputStream object. Example of usage: File Import/Export Tools.

Author:
Szilard Dorant

Nested Class Summary
 
Nested classes/interfaces inherited from class java.lang.Thread
java.lang.Thread.State, java.lang.Thread.UncaughtExceptionHandler
 
Field Summary
 
Fields inherited from class java.lang.Thread
MAX_PRIORITY, MIN_PRIORITY, NORM_PRIORITY
 
Fields inherited from interface chemaxon.jchem.db.Transfer
INCHI, JTF, MOL2FILE, MOLFILE, MRV, RDFILE, RXNFILE, SDFILE, SMILES, VMN
 
Constructor Summary
Importer()
          Constructor.
 
Method Summary
 void cancel()
          Stops the importing progress.
 ConnectionHandler getConnectionHandler()
          Getter for property connectionHandler.
 java.lang.String getConnections()
          Deprecated. since 2.2 replaced by getFieldConnections().
 java.util.ArrayList<java.lang.Integer> getDuplicateIDs()
          Returns the IDs (cd_id column in database table) of duplicate structures.
 int getDuplicates()
          Returns the number of molecules that were not imported, because they are duplicates.
 int getEmptyStructures()
          Returns the number of molecules that were not imported, because they are empty strucures.
 boolean getEmptyStructuresAllowed()
          Gets whether empty structures are allowed.
 java.lang.Throwable getErrorCause()
          Retrieves Throwable caught in run() method.
 java.lang.String getErrorMessage()
          If error occures this function returns the error message.
 java.lang.String getFieldConnections()
          Returns the specified table field - file field pairs.
 java.util.Vector<java.lang.String> getFieldNames()
          Returns field names in an SDfile.
static java.util.Vector<java.lang.String> getFieldNames(java.io.InputStream is, int linesToCheck)
          Returns field names in an SDfile.
 IntArray getImportedIDs()
          Returns the IDs (cd_id column in database table) of imported structures.
 int getImportedNumber()
          Returns the number of imported molecules.
 java.lang.Object getInput()
          Gets the source object.
 int getLinesToCheck()
          Gets the number of lines to check for file format.
 java.lang.String getNameFieldInDB()
           
 java.lang.String getNote()
          Returns the note of the progresswriter.
 long getProgress()
          Gets the status of the importing progress.
 ProgressWriter getProgressWriter()
          Gets the ProgressWriter object used for monitoring.
 boolean getSetChiralFlag()
          Gets whether chiral flag is set on import.
 int getSkip()
          Gets the number of molecules to skip from the beginning ogf file.
 int getStructCount()
          Returns the current count of structures which were examined by the import process.
 java.lang.String getTableName()
          Gets the name of the table to import into.
 int importMols()
          Imports molecules.
 void init()
          Initialization, checking given number of lines for file format and fields.
 boolean isDuplicateImportAllowed()
          Gets whether duplicate structures are allowed.
 boolean isFinished()
          Returns true if importing has finished, else returns false.
 boolean isHaltOnError()
          Gets if import should stop when an error occures.
 void run()
          Starts execution as a thread.
 void setConnectionHandler(ConnectionHandler conh)
          Setter for property connectionHandler.
 void setConnections(java.lang.String connections)
          Deprecated. since 2.2 replaced by setFieldConnections(String).
 void setDuplicateImportAllowed(boolean b)
          Deprecated. since JChem 5.4. This import option has been table option, instead of this use setDuplicateImportAllowed(int) method
 void setDuplicateImportAllowed(int duplicateFilteringOption)
          Sets the duplicate filtering option on import.
 void setEmptyStructuresAllowed(boolean b)
          If set to false does not import empty molecules.
 void setFieldConnections(java.lang.String connections)
          Specifies which data fields correspond to which table fieds.
 void setHaltOnError(boolean b)
          Sets if import should stop when an error occures.
 void setInfoStream(java.io.PrintStream st)
          Sets the stream where information about the import prorcess will be written (e.g. skipped duplicates and empty structures).
 void setInput(java.io.File inputFile)
          Sets the source object as a file.
 void setInput(java.io.InputStream is)
          Sets the source object as a stream.
 void setInput(java.lang.String fileName)
          Sets the source object as a file, specifying the name of the file.
 void setLinesToCheck(int linesToCheck)
          Sets the number of lines to check for file format.
 void setNameFieldInDB(java.lang.String fieldName)
          Set a DB field to contain the structure name.
 void setOutputOptions(boolean printDuplicates, boolean printNonDuplicates, java.io.OutputStream os, boolean doNotImport)
          With this option one can print duplicate or non-duplicate molecules to a stream.
 void setProgressWriter(ProgressWriter pwriter)
          Sets the ProgressWriter object to track the progress the actual importing.
 void setSetChiralFlag(boolean setChiralFlag)
          Sets if chiral flag should be set to true during import.
 void setSkip(int skip)
          Sets the number of molecules to skip from the beginning ogf file.
 void setStoreDuplicates(boolean value)
          Specifies whether the ID's of duplicate structures should be stored.
 void setStoreImportedIDs(boolean value)
          Specifies whether the ID's of imported structures should be stored.
 void setTableName(java.lang.String tname)
          Sets the name of the table to import into.
 void skip(int offset)
          Skips the given number of molecules.
 
Methods inherited from class java.lang.Thread
activeCount, checkAccess, countStackFrames, currentThread, destroy, dumpStack, enumerate, getAllStackTraces, getContextClassLoader, getDefaultUncaughtExceptionHandler, getId, getName, getPriority, getStackTrace, getState, getThreadGroup, getUncaughtExceptionHandler, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, resume, setContextClassLoader, setDaemon, setDefaultUncaughtExceptionHandler, setName, setPriority, setUncaughtExceptionHandler, sleep, sleep, start, stop, stop, suspend, toString, yield
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Importer

public Importer()
Constructor.

Method Detail

setConnectionHandler

public void setConnectionHandler(ConnectionHandler conh)
Setter for property connectionHandler. The ConnectionHandler must represent an open connection to the database.

Parameters:
conh - the connection handler

getConnectionHandler

public ConnectionHandler getConnectionHandler()
Getter for property connectionHandler.

Returns:
the connection handler

setInput

public void setInput(java.io.File inputFile)
Sets the source object as a file.

Parameters:
inputFile - the source file

setInput

public void setInput(java.io.InputStream is)
Sets the source object as a stream.

Parameters:
is - the source stream

setInput

public void setInput(java.lang.String fileName)
Sets the source object as a file, specifying the name of the file.

Parameters:
fileName - the source file name

getInput

public java.lang.Object getInput()
Gets the source object. The object may be File or InputStream.

Returns:
the source object

setTableName

public void setTableName(java.lang.String tname)
Sets the name of the table to import into.

Parameters:
tname - the table name

getTableName

public java.lang.String getTableName()
Gets the name of the table to import into.

Returns:
the table name

setConnections

@Deprecated
public void setConnections(java.lang.String connections)
Deprecated. since 2.2 replaced by setFieldConnections(String).

Specifies which data fields correspond to which table fields.

The fields are given in pair in the following format: "databaseField1=dataField1;databaseField2=dataField2"

Parameters:
connections - the connection string

setFieldConnections

public void setFieldConnections(java.lang.String connections)
Specifies which data fields correspond to which table fieds.

The fields are given in pair in the following format: "databaseField1=dataField1;databaseField2=dataField2"

Parameters:
connections - the connection string

getConnections

@Deprecated
public java.lang.String getConnections()
Deprecated. since 2.2 replaced by getFieldConnections().

Returns the specified table field - file field pairs.

Returns:
the connection string

getFieldConnections

public java.lang.String getFieldConnections()
Returns the specified table field - file field pairs.

Returns:
the connection string

setLinesToCheck

public void setLinesToCheck(int linesToCheck)
Sets the number of lines to check for file format. The same number of lines will be checked for field names in the case of SDfiles.
Default value is 500 lines.
Note: In the case of using InputStream as source, these lines are buffered in memory. Make sure java has enough memory when setting this value very high. (-Xmx parameter)
Using File as input recommended if it's feasible, since it doesn't need buffering.

Parameters:
linesToCheck - the number of lines to check for file format

getLinesToCheck

public int getLinesToCheck()
Gets the number of lines to check for file format.

Returns:
the number of lines to check for file format

setProgressWriter

public void setProgressWriter(ProgressWriter pwriter)
Sets the ProgressWriter object to track the progress the actual importing. (Format checking and skipping not monitored by this object.)
It can be null if no monitoring is necessary.

Parameters:
pwriter - the progress writer

getProgressWriter

public ProgressWriter getProgressWriter()
Gets the ProgressWriter object used for monitoring.

Returns:
the progress writer

setHaltOnError

public void setHaltOnError(boolean b)
Sets if import should stop when an error occures.

Parameters:
b - true if halt on error

isHaltOnError

public boolean isHaltOnError()
Gets if import should stop when an error occures.

Returns:
true if halt on error

setDuplicateImportAllowed

@Deprecated
public void setDuplicateImportAllowed(boolean b)
Deprecated. since JChem 5.4. This import option has been table option, instead of this use setDuplicateImportAllowed(int) method

If set to false does not import molecules that already exist in the table with the same topology. This checking may slow down the import progress.

See Also:
DatabaseProperties.setDuplicateFilteringOption(String, boolean), StructureTableOptions.duplicateFiltering

setDuplicateImportAllowed

public void setDuplicateImportAllowed(int duplicateFilteringOption)
Sets the duplicate filtering option on import. It can be banned, allowed or specified by the table option.

Parameters:
duplicateFilteringOption -
  • If set to DUPLICATE_FILTERING_ON does not import molecules that already exist in the table with the same topology. Forces switching ON duplicate filtering regardless of table setting. This checking may slow down the import progress.
  • If set to DUPLICATE_FILTERING_OFF duplicates are allowed. Forces switching OFF duplicate filtering regardless of table setting.
  • If set to DUPLICATE_FILTERING_TABLE_OPTION the value of the table option (StructureTableOptions.duplicateFiltering) controls the filtering of duplicates.
Warning: switching duplicate filtering upon import to a different option than the table duplicate filtering option may result in table content not consistent with the table option.
See Also:
UpdateHandler.DUPLICATE_FILTERING_ON, UpdateHandler.DUPLICATE_FILTERING_OFF, UpdateHandler.DUPLICATE_FILTERING_TABLE_OPTION, StructureTableOptions.duplicateFiltering

isDuplicateImportAllowed

public boolean isDuplicateImportAllowed()
Gets whether duplicate structures are allowed.

Throws:
java.lang.IllegalArgumentException - if duplicate filtering option of the table cannot be determined.

setEmptyStructuresAllowed

public void setEmptyStructuresAllowed(boolean b)
If set to false does not import empty molecules.


getEmptyStructuresAllowed

public boolean getEmptyStructuresAllowed()
Gets whether empty structures are allowed.


setSetChiralFlag

public void setSetChiralFlag(boolean setChiralFlag)
Sets if chiral flag should be set to true during import.

Parameters:
setChiralFlag - if set to true, chiral flag is set to true for imported molecules. The default setting is false. since 2.3

getSetChiralFlag

public boolean getSetChiralFlag()
Gets whether chiral flag is set on import.


setNameFieldInDB

public void setNameFieldInDB(java.lang.String fieldName)
Set a DB field to contain the structure name.

Parameters:
fieldName -

getNameFieldInDB

public java.lang.String getNameFieldInDB()

isFinished

public boolean isFinished()
Returns true if importing has finished, else returns false.


getErrorMessage

public java.lang.String getErrorMessage()
If error occures this function returns the error message.

Returns:
the error message

getErrorCause

public java.lang.Throwable getErrorCause()
Retrieves Throwable caught in run() method. WARNING This mechanism is expected to be revised in the near future, use with extreme caution!

Returns:
the error cause or null

getStructCount

public int getStructCount()
Returns the current count of structures which were examined by the import process.


getImportedNumber

public int getImportedNumber()
Returns the number of imported molecules.


getDuplicates

public int getDuplicates()
Returns the number of molecules that were not imported, because they are duplicates.


getEmptyStructures

public int getEmptyStructures()
Returns the number of molecules that were not imported, because they are empty strucures.


getNote

public java.lang.String getNote()
Returns the note of the progresswriter.


setSkip

public void setSkip(int skip)
Sets the number of molecules to skip from the beginning ogf file.


getSkip

public int getSkip()
Gets the number of molecules to skip from the beginning ogf file.


getProgress

public long getProgress()
Gets the status of the importing progress.

Returns:
the position of the ProgressWriter, -1 if the object is not set (null)

run

public void run()
Starts execution as a thread. Calls init(),skip(), and importMols. Exceptions are caught and printed to stderr.

Specified by:
run in interface java.lang.Runnable
Overrides:
run in class java.lang.Thread

setInfoStream

public void setInfoStream(java.io.PrintStream st)
Sets the stream where information about the import prorcess will be written (e.g. skipped duplicates and empty structures).

Parameters:
st - the stream. The default is null (no info is written).

setOutputOptions

public void setOutputOptions(boolean printDuplicates,
                             boolean printNonDuplicates,
                             java.io.OutputStream os,
                             boolean doNotImport)
With this option one can print duplicate or non-duplicate molecules to a stream. Will print only if duplicate filtering is allowed.


setStoreDuplicates

public void setStoreDuplicates(boolean value)
Specifies whether the ID's of duplicate structures should be stored.


setStoreImportedIDs

public void setStoreImportedIDs(boolean value)
Specifies whether the ID's of imported structures should be stored.

Since:
JChem 3.1.7
See Also:
getImportedIDs()

getDuplicateIDs

public java.util.ArrayList<java.lang.Integer> getDuplicateIDs()
Returns the IDs (cd_id column in database table) of duplicate structures.

Returns:
the IDs as an ArrayList containing Integer objects.

getImportedIDs

public IntArray getImportedIDs()
Returns the IDs (cd_id column in database table) of imported structures.

Returns:
the IDs stored in an IntArray object
Since:
JChem 3.1.7
See Also:
setStoreImportedIDs(boolean)

importMols

public int importMols()
               throws TransferException
Imports molecules.

Returns:
the number of molecules imported
Throws:
TransferException

cancel

public void cancel()
Stops the importing progress.


skip

public void skip(int offset)
          throws TransferException
Skips the given number of molecules.

Parameters:
offset - the number of molecules to be skipped
Throws:
TransferException

init

public void init()
          throws TransferException
Initialization, checking given number of lines for file format and fields. If not called explicitly, automatically called by skip or importMols if necessary.

Throws:
TransferException

getFieldNames

public java.util.Vector<java.lang.String> getFieldNames()
                                                 throws TransferException,
                                                        java.io.IOException
Returns field names in an SDfile. The file may come from an InputStream, import may follow without reopening the stream. Calls int if initialization is necessary.

Returns:
a vector of String objects, the names of the SDfile fields.
Throws:
TransferException
java.io.IOException

getFieldNames

public static java.util.Vector<java.lang.String> getFieldNames(java.io.InputStream is,
                                                               int linesToCheck)
                                                        throws java.io.IOException,
                                                               MRecordParseException
Returns field names in an SDfile. NOTE: in order to return to the initial position, the InputStream has to reopened or repositioned (BufferedInputStream)

Returns:
a vector of String objects, the names of the SDfile fields.
Throws:
java.io.IOException
MRecordParseException