chemaxon.formats
Class MolImporter

java.lang.Object
  extended by chemaxon.marvin.io.MDocSource
      extended by chemaxon.formats.MolImporter

public class MolImporter
extends MDocSource

Molecule file importer.

The input file format is guessed automatically or specified as an import option to the constructor. Many different formats are supported like "mol", "rgf", "sdf", "rdf", "csmol", "csrgf", "cssdf", "csrdf", "mol2", "cml", "mrv", "smiles", "cxsmiles", "pdb", "xyz", "cube". For more information on formats, please visit File Formats in Marvin. MolImporter can also import gzip compressed and base64 encoded structures.

Serialized Molecule objects can also be imported using the "chemaxon.struc.Molecule" format.

Version:
5.1.1, 08/28/2008
Author:
Peter Csizmadia, Tamas Vertse, Szilveszter Juhos

Field Summary
static int F_MOLMOVIE
          Read multi-molecule XYZ files as movies.
 
Constructor Summary
MolImporter()
          Create an empty MolImporter object.
MolImporter(java.io.File f, java.lang.String opts)
          Create a molecule importer for a file.
MolImporter(java.io.InputStream is)
          Create a molecule importer for an input stream.
MolImporter(java.io.InputStream is, java.lang.String opts)
          Create a molecule importer for an input stream.
MolImporter(java.io.InputStream is, java.lang.String opts, java.lang.String enc)
          Create a molecule importer for an input stream.
MolImporter(java.lang.String fname)
          Create a molecule importer for a file.
MolImporter(java.lang.String fname, java.lang.Object component, java.lang.String msg)
          Create a molecule importer with a progress monitor.
 
Method Summary
 void close()
          Close the underlying input stream.
 Molecule createMol()
          Creates a target molecule object for import.
 int estimateNumRecords()
          Estimates the total number of records.
 java.io.File getFile()
          Gets the file object for the input.
 java.lang.String getFileName()
          Gets the name of the input file
 java.lang.String getFormat()
          Get the file format.
 MPropertyContainer getGlobalProperties()
          Gets the global properties in a container that was retrieved from the input stream, earlier.
 java.lang.String getGrabbedMoleculeString()
          Gets the last grabbed molecule string.
 int getLineCount()
          Gets the current line number.
 int getOptionFlags()
          Gets options.
 java.lang.String getOptions()
          Gets the import options.
 boolean getQueryMode()
          Gets query mode.
 int getRecordCount()
          Gets the current record number.
 int getRecordCountMax()
          Gets the total number of records read.
static MDocument importDoc(byte[] b)
          Reads a document from a byte array.
static MDocument importDoc(byte[] b, java.lang.String opts, java.lang.String enc)
          Reads a document from a byte array.
static Molecule importMol(byte[] b)
          Read a molecule from a byte array.
static boolean importMol(byte[] b, Molecule mol)
          Read a molecule from a byte array.
static Molecule importMol(byte[] b, java.lang.String opts, java.lang.String enc)
          Read a molecule from a byte array.
static boolean importMol(byte[] b, java.lang.String opts, java.lang.String enc, Molecule mol)
          Read a molecule from a byte array.
static Molecule importMol(java.lang.String s)
          Read a molecule from a string.
static boolean importMol(java.lang.String s, Molecule mol)
          Read a molecule from a string.
static Molecule importMol(java.lang.String s, java.lang.String opts)
          Read a molecule from a string.
static Molecule importMol(java.lang.String s, java.lang.String opts, java.lang.String enc)
          Read a molecule from a string.
static boolean importMol(java.lang.String s, java.lang.String opts, java.lang.String enc, Molecule mol)
          Read a molecule from a string.
 boolean isEndReached()
          Tests whether the end of input is already reached.
 boolean isGrabbingEnabled()
          Tests whether molecule file content grabbing is enabled.
 boolean isMultiSet()
          Are the imported molecules merged into one multi-set molecule?
 boolean isRewindable()
          Tests whether rewinding (seeking backwards) is possible in the underlying input stream.
 MDocument nextDoc()
          Reads the next document.
 Molecule read()
          Read the next molecule.
 boolean read(Molecule mol)
          Read the next molecule.
 MDocument readDoc(MDocument doc, Molecule buf)
          Read the next document.
 Molecule readMol(Molecule mol)
          Read the next molecule.
 java.lang.String readRecordAsText()
          Reads the next molecule in text format without creating a Molecule object.
 void seekRecord(int k, MProgressMonitor pmon)
          Seek the specified record.
protected  void seekVisitedRecord(int k)
          Seeks an already visited position in case of rewindable input.
 void setFileName(java.lang.String fname)
          Sets the name of the input file and (re)initalize the molecule importer.
 void setGrabbingEnabled(boolean v)
          Enables or disables molecule file content grabbing.
 void setOptionFlags(int f)
          Sets options.
 void setOptions(java.lang.String opts)
          Sets the import options.
 void setQueryMode(boolean q)
          Sets query mode.
 boolean skipRecord()
          Skips the next molecule or document instead of reading it into memory.
 boolean skipToNext()
          Deprecated. As of Marvin 5.0, the record reading/molecule import separation makes this method unusable
 long tell()
          Returns the current file offset.
 
Methods inherited from class chemaxon.marvin.io.MDocSource
getDocLabel, getMoleculeIterator, seekForward, seekRecordAtFraction, skipRecords
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

F_MOLMOVIE

public static final int F_MOLMOVIE
Read multi-molecule XYZ files as movies.

Since:
Marvin 4.1, 04/18/2006
See Also:
Constant Field Values
Constructor Detail

MolImporter

public MolImporter()
Create an empty MolImporter object. Use the "setFileName" method to specify the input file.

See Also:
setFileName(java.lang.String)

MolImporter

public MolImporter(java.io.InputStream is)
            throws java.io.IOException,
                   MolFormatException
Create a molecule importer for an input stream. Begins reading the input stream and determines the file format.

Parameters:
is - the input stream to read
Throws:
java.io.IOException - If I/O error occured when determining the file format.
MolFormatException - If the molecule file is in a format that cannot be read
java.nio.charset.IllegalCharsetNameException - if illegal encoding is used
java.nio.charset.UnsupportedCharsetException - if unsupported encoding is used

MolImporter

public MolImporter(java.io.InputStream is,
                   java.lang.String opts)
            throws java.io.IOException,
                   MolFormatException
Create a molecule importer for an input stream. Begins reading the input stream and determines the file format. If the options start with the substring "MULTISET" or "MULTISET,", then all the molecules in the stream are merged into one molecule object containing multiple atom sets. The input character encoding can also be set in "enc{encoding}" form. Other parts of the option string are passed to the import module.

Parameters:
is - the input stream to read
opts - the file format and/or options separated by a colon; use null for automatic format recognition and default options
Throws:
java.io.IOException - If I/O error occured when determining the file format.
MolFormatException - If the molecule file is in a format that cannot be read
java.nio.charset.IllegalCharsetNameException - if illegal encoding is used
java.nio.charset.UnsupportedCharsetException - if unsupported encoding is used

MolImporter

public MolImporter(java.io.InputStream is,
                   java.lang.String opts,
                   java.lang.String enc)
            throws java.io.IOException,
                   MolFormatException
Create a molecule importer for an input stream. Begins reading the input stream and determines the file format. If the options start with the substring "MULTISET" or "MULTISET,", then all the molecules in the stream are merged into one molecule object containing multiple atom sets. Other parts of the option string are passed to the import module. The character encoding can be specified also.

Parameters:
is - the input stream to read
opts - the file format and/or options separated by a colon; use null for automatic format recognition and default options
enc - charset name or null
Throws:
java.io.IOException - If I/O error occured when determining the file format.
MolFormatException - If the molecule file is in a format that cannot be read
java.nio.charset.IllegalCharsetNameException - if illegal encoding is used
java.nio.charset.UnsupportedCharsetException - if unsupported encoding is used
Since:
Marvin 3.5.5, 01/02/2006

MolImporter

public MolImporter(java.io.File f,
                   java.lang.String opts)
            throws java.io.IOException,
                   MolFormatException
Create a molecule importer for a file. Begins reading the input stream and determines the file format. If the option string starts with "MULTISET" or "MULTISET,", then all the molecules in the stream are merged into one molecule object containing multiple atom sets. The input character encoding can also be set in "enc{encoding}" form. * Other parts of the option string are passed to the import module.

Parameters:
f - the file to read
opts - the file format and/or options separated by a colon; use null for automatic format recognition and default options
Throws:
java.io.IOException - If I/O error occured when determining the file format.
MolFormatException - If the molecule file is in a format that cannot be read
java.nio.charset.IllegalCharsetNameException - if illegal encoding is used
java.nio.charset.UnsupportedCharsetException - if unsupported encoding is used
See Also:
tell(), close()

MolImporter

public MolImporter(java.lang.String fname)
            throws java.io.IOException,
                   MolFormatException
Create a molecule importer for a file. Begins reading the input stream and determines the file format. The filename string can contain options in the "file{options}" form. If the option string starts with "MULTISET" or "MULTISET,", then all the molecules in the stream are merged into one molecule object containing multiple atom sets. The input character encoding can also be set in "enc{encoding}" form. Other parts of the option string are passed to the import module.

Parameters:
fname - name of the file to read
Throws:
java.io.IOException - If I/O error occured when determining the file format.
MolFormatException - If the molecule file is in a format that cannot be read
java.nio.charset.IllegalCharsetNameException - if illegal encoding is used
java.nio.charset.UnsupportedCharsetException - if unsupported encoding is used
See Also:
tell(), close()

MolImporter

public MolImporter(java.lang.String fname,
                   java.lang.Object component,
                   java.lang.String msg)
            throws java.io.IOException,
                   MolFormatException
Create a molecule importer with a progress monitor. Begins reading the input stream and determines the file format. The filename string can contain options in the "file{options}" form. If the option string starts with "MULTISET" or "MULTISET,", then all the molecules in the stream are merged into one molecule object containing multiple atom sets. The input character encoding can also be set in "enc{encoding}" form. Other parts of the option string are passed to the import module.

Parameters:
fname - name of the file to read
component - the parent component
msg - displayed message, where %p is replaced by the file path
Throws:
java.io.IOException - If I/O error occured when determining the file format.
MolFormatException - If the molecule file is in a format that cannot be read
java.nio.charset.IllegalCharsetNameException - if illegal encoding is used
java.nio.charset.UnsupportedCharsetException - if unsupported encoding is used
See Also:
tell(), close()
Method Detail

getFileName

public java.lang.String getFileName()
Gets the name of the input file

Returns:
the name of the input file

setFileName

public void setFileName(java.lang.String fname)
                 throws java.io.IOException,
                        MolFormatException
Sets the name of the input file and (re)initalize the molecule importer. Starts reading the input stream and determines the file format. The filename string can contain options in the "file{options}" format. If options are specified in the above format, the import options will be overwritten by the new values. If the options string starts with "MULTISET" or "MULTISET,", then all the molecules in the stream are merged into one molecule containing multiple atom sets. The input character encoding can also be set in "enc{encoding}" form. Other parts of the option string are passed to the import module.

Parameters:
fname - name of the file to read
Throws:
java.io.IOException - If I/O error occured when determining the file format.
MolFormatException - If the molecule file is in a format that cannot be read
See Also:
getOptions(), setOptions(java.lang.String)

getFile

public java.io.File getFile()
Gets the file object for the input.

Returns:
the File or null (if the input is not a File)

getOptions

public java.lang.String getOptions()
Gets the import options.

Returns:
the options

setOptions

public void setOptions(java.lang.String opts)
Sets the import options.

Parameters:
opts - options passed to the import module or null

isGrabbingEnabled

public boolean isGrabbingEnabled()
Tests whether molecule file content grabbing is enabled.

Returns:
true if enabled, false if disabled
Since:
4.0, 01/05/2005

setGrabbingEnabled

public void setGrabbingEnabled(boolean v)
Enables or disables molecule file content grabbing.

Parameters:
v - true enables, false disables it
Since:
4.0, 01/05/2005

getGrabbedMoleculeString

public java.lang.String getGrabbedMoleculeString()
Gets the last grabbed molecule string.

Returns:
the molecule as a string
Since:
4.0, 01/05/2005

getOptionFlags

public int getOptionFlags()
Gets options.

Returns:
the options
Since:
Marvin 4.1, 04/18/2006
See Also:
F_MOLMOVIE

setOptionFlags

public void setOptionFlags(int f)
Sets options.

Parameters:
f - the options
Since:
Marvin 4.1, 04/18/2006
See Also:
F_MOLMOVIE

isMultiSet

public boolean isMultiSet()
Are the imported molecules merged into one multi-set molecule?

Returns:
true if the input is a multi-set molecule

getQueryMode

public boolean getQueryMode()
Gets query mode. SMILES strings are imported as SMARTS if query mode is set.

Returns:
query mode
Since:
Marvin 3.3, 11/14/2003

setQueryMode

public void setQueryMode(boolean q)
Sets query mode. SMILES strings are imported as SMARTS if query mode is set.

Parameters:
q - query mode
Since:
Marvin 3.3, 11/14/2003

read

public Molecule read()
              throws java.io.IOException
Read the next molecule.

Returns:
the next molecule, or null at end of file
Throws:
java.io.IOException - If I/O error occured

createMol

public Molecule createMol()
                   throws java.io.IOException
Creates a target molecule object for import.

Returns:
new target molecule object
Throws:
java.io.IOException
Since:
Marvin 3.4, 05/08/2004

nextDoc

public MDocument nextDoc()
                  throws java.io.IOException
Reads the next document.

Specified by:
nextDoc in class MDocSource
Returns:
the next document or null at end of file
Throws:
java.io.IOException - If I/O error occured
Since:
Marvin 4.1, 04/14/2006

readDoc

public MDocument readDoc(MDocument doc,
                         Molecule buf)
                  throws MolFormatException,
                         java.io.IOException
Read the next document. The target document is cleared before reading.

Parameters:
doc - target document object or null
buf - target molecule object or null
Returns:
the next document or null at end of file
Throws:
java.io.IOException - If I/O error occured
MolFormatException
See Also:
F_MOLMOVIE

readMol

public Molecule readMol(Molecule mol)
                 throws MolFormatException,
                        java.io.IOException
Read the next molecule. All the nodes, edges, and properties are removed from mol before reading.

Parameters:
mol - target molecule object
Returns:
the molecule if success, null at end of file
Throws:
java.io.IOException - If I/O error occured
MolFormatException

read

public boolean read(Molecule mol)
             throws java.io.IOException
Read the next molecule. All the nodes, edges, and properties are removed from mol before reading.

Parameters:
mol - target molecule object
Returns:
true after success, false at end of file
Throws:
java.io.IOException - If I/O error occured

skipRecord

public boolean skipRecord()
                   throws MolFormatException,
                          java.io.IOException
Skips the next molecule or document instead of reading it into memory.

Specified by:
skipRecord in class MDocSource
Returns:
true if the end of molecule is found, false if there is no chance to continue
Throws:
java.io.IOException - if read error occured
MolFormatException
Since:
Marvin 4.1, 04/20/2006

skipToNext

public boolean skipToNext()
Deprecated. As of Marvin 5.0, the record reading/molecule import separation makes this method unusable


readRecordAsText

public java.lang.String readRecordAsText()
                                  throws MRecordParseException,
                                         MolExportException,
                                         java.io.IOException
Reads the next molecule in text format without creating a Molecule object.

Returns:
the grabbed record in its original format or null at end of file
Throws:
MRecordParseException - If the record could not be parsed
MolExportException - if binary data cannot be exported to MRV format text
java.io.IOException - if read error occured
Since:
Marvin 5.0, 11/13/2006

isRewindable

public boolean isRewindable()
Tests whether rewinding (seeking backwards) is possible in the underlying input stream.

Specified by:
isRewindable in class MDocSource
Returns:
true if rewinding is possible, false otherwise
Since:
Marvin 4.1, 04/20/2006
See Also:
seekRecord(int, MProgressMonitor)

seekRecord

public void seekRecord(int k,
                       MProgressMonitor pmon)
                throws java.io.EOFException,
                       java.io.IOException
Seek the specified record. Backward seeking (rewinding) in the stream is only possible if the underlying input stream is seekable. Forward seeking is always possible. Seeking terminates before reaching the specified position if the user cancels the progress dialog.

Specified by:
seekRecord in class MDocSource
Parameters:
k - position
pmon - progress monitor or null
Throws:
java.io.EOFException - if end of file reached while trying to seek
java.io.IOException - if read error occured
Since:
Marvin 4.1, 04/19/2006
See Also:
isRewindable()

seekVisitedRecord

protected void seekVisitedRecord(int k)
                          throws java.io.IOException
Seeks an already visited position in case of rewindable input.

Specified by:
seekVisitedRecord in class MDocSource
Parameters:
k - the record index
Throws:
java.io.IOException - if read error occured
Since:
Marvin 4.1, 06/28/2006

isEndReached

public boolean isEndReached()
Tests whether the end of input is already reached.

Specified by:
isEndReached in class MDocSource
Returns:
true if the end was reached, false otherwise
Since:
Marvin 4.1, 06/18/2006

estimateNumRecords

public int estimateNumRecords()
Estimates the total number of records. If the end of file is already reached, then it returns the exact value. Otherwise, in case of a file with known length, it extrapolates from the last read record index and the value of the file pointer at the last read position. If the input is a stream with unknown total length, then it returns two times the current highest record number.

Specified by:
estimateNumRecords in class MDocSource
Returns:
estimated number of records or -1 at the beginning of file
Since:
Marvin 4.1, 04/18/2006

tell

public long tell()
          throws java.io.IOException
Returns the current file offset.

Returns:
the file pointer
Throws:
java.io.IOException - if the position cannot be determined

getLineCount

public int getLineCount()
Gets the current line number.

Returns:
the line number

getRecordCount

public int getRecordCount()
Gets the current record number.

Specified by:
getRecordCount in class MDocSource
Returns:
the record number
Since:
Marvin 4.1, 04/18/2006

getRecordCountMax

public int getRecordCountMax()
Gets the total number of records read.

Specified by:
getRecordCountMax in class MDocSource
Returns:
the number of records
Since:
Marvin 4.1, 04/18/2006

close

public void close()
           throws java.io.IOException
Close the underlying input stream. Needed only if the input file name is set in the constructor.

Throws:
java.io.IOException - If an I/O error has occurred.

getFormat

public java.lang.String getFormat()
Get the file format.

Returns:
the format

importMol

public static Molecule importMol(byte[] b)
                          throws MolFormatException
Read a molecule from a byte array. If the array contains multiple molecules (it is an SDfile for example), read only the first one.

Parameters:
b - the molecule file contents
Returns:
the molecule
Throws:
MolFormatException - If the molecule file is in a format that cannot be read

importMol

public static Molecule importMol(byte[] b,
                                 java.lang.String opts,
                                 java.lang.String enc)
                          throws MolFormatException
Read a molecule from a byte array. If the array contains multiple molecules (it is an SDfile for example), read only the first one.

Parameters:
b - the molecule file contents
opts - the file format and/or options separated by a colon; use null for automatic format recognition and default options
enc - encoding or null
Returns:
the molecule
Throws:
MolFormatException - If the molecule file is in a format that cannot be read
Since:
Marvin 5.0, 12/27/2007

importMol

public static boolean importMol(byte[] b,
                                Molecule mol)
                         throws MolFormatException
Read a molecule from a byte array. If the array contains multiple molecules (it is an SDfile for example), read only the first one.

Parameters:
b - the molecule file contents
mol - target molecule object
Returns:
true in case of successful reading, false if no more molecules
Throws:
MolFormatException - If the molecule file is in a format that cannot be read

importMol

public static boolean importMol(byte[] b,
                                java.lang.String opts,
                                java.lang.String enc,
                                Molecule mol)
                         throws MolFormatException
Read a molecule from a byte array. If the array contains multiple molecules (it is an SDfile for example), read only the first one.

Parameters:
b - the molecule file contents
opts - the file format and/or options separated by a colon; use null for automatic format recognition and default options
enc - encoding or null
mol - target molecule object
Returns:
true in case of successful reading, false if no more molecules
Throws:
MolFormatException - If the molecule file is in a format that cannot be read
Since:
Marvin 5.0, 12/27/2007

importDoc

public static MDocument importDoc(byte[] b)
                           throws MolFormatException
Reads a document from a byte array. If the array contains multiple molecules (it is an SDfile for example), read only the first one.

Parameters:
b - the file contents
Returns:
the document or null if no document found in input
Throws:
MolFormatException - If the molecule file is in a format that cannot be read
Since:
Marvin 4.1.8, 04/20/2007

importDoc

public static MDocument importDoc(byte[] b,
                                  java.lang.String opts,
                                  java.lang.String enc)
                           throws MolFormatException
Reads a document from a byte array. If the array contains multiple molecules (it is an SDfile for example), read only the first one.

Parameters:
b - the file contents
opts - the file format and/or options separated by a colon; use null for automatic format recognition and default options
enc - encoding or null
Returns:
the document or null if no document found in input
Throws:
MolFormatException - If the molecule file is in a format that cannot be read
Since:
Marvin 5.0, 12/27/2007

importMol

public static Molecule importMol(java.lang.String s)
                          throws MolFormatException
Read a molecule from a string. If the string contains multiple molecules (it is an SDfile for example), read only the first one. If the format is known, it is faster to use importMol(String,String) to avoid wasting time with format recognition.

Parameters:
s - the molecule file contents
Returns:
the molecule
Throws:
MolFormatException - If the molecule file is in a format that cannot be read

importMol

public static Molecule importMol(java.lang.String s,
                                 java.lang.String opts)
                          throws MolFormatException
Read a molecule from a string. If the string contains multiple molecules (it is an SDfile for example), read only the first one.

Parameters:
s - the molecule file contents
opts - the file format and/or options separated by a colon; use null for automatic format recognition and default options
Returns:
the molecule
Throws:
MolFormatException - If the molecule file is in a format that cannot be read

importMol

public static Molecule importMol(java.lang.String s,
                                 java.lang.String opts,
                                 java.lang.String enc)
                          throws MolFormatException
Read a molecule from a string. If the string contains multiple molecules (it is an SDfile for example), read only the first one.

Parameters:
s - the molecule file contents
opts - the file format and/or options separated by a colon; use null for automatic format recognition and default options
enc - encoding or null
Returns:
the molecule
Throws:
MolFormatException - If the molecule file is in a format that cannot be read
Since:
Marvin 5.0, 12/27/2007

importMol

public static boolean importMol(java.lang.String s,
                                Molecule mol)
                         throws MolFormatException
Read a molecule from a string. If the string contains multiple molecules (it is an SDfile for example), read only the first one.

Parameters:
s - the file contents
mol - target molecule object
Returns:
true in case of successful reading, false if no more molecules
Throws:
MolFormatException - If the molecule file is in a format that cannot be read

importMol

public static boolean importMol(java.lang.String s,
                                java.lang.String opts,
                                java.lang.String enc,
                                Molecule mol)
                         throws MolFormatException
Read a molecule from a string. If the string contains multiple molecules (it is an SDfile for example), read only the first one.

Parameters:
s - the file contents
opts - the file format and/or options separated by a colon; use null for automatic format recognition and default options
enc - encoding or null
mol - target molecule object
Returns:
true in case of successful reading, false if no more molecules
Throws:
MolFormatException - If the molecule file is in a format that cannot be read

getGlobalProperties

public MPropertyContainer getGlobalProperties()
Gets the global properties in a container that was retrieved from the input stream, earlier. Only MRV import supports global properties. Reads them by the initalization of the record importer.

Returns:
global properties in a container or null.
Since:
Marvin 5.0 06/05/2007