chemaxon.formats
Class MFileFormatUtil

java.lang.Object
  extended by chemaxon.formats.MFileFormatUtil

public class MFileFormatUtil
extends java.lang.Object

File format related utility functions.

Since:
Marvin 4.1, 12/15/2005
Version:
5.3, 07/06/2009
Author:
Peter Csizmadia, Szilard Dorant, Szilveszter Juhos

Field Summary
static int MOLMOVIE
          Read multi-molecule files as movies.
static int MULTISET
          The multi-molecule file really contains multiple atom sets of one molecule.
static int NOMOLMOVIE
          Do not read multi-molecule XYZ files as movies.
 
Constructor Summary
MFileFormatUtil()
           
 
Method Summary
static boolean canBeAbbrevgroup(java.lang.String line)
          Deprecated. as of Marvin 5.0, AbbrevGroupRecognizer.testLine(String) must be used instead
static boolean canBeJTF(java.lang.String line)
          Deprecated. as of Marvin 5.0, JTFRecognizer.canBeJTFHeader(String) must be used instead
static boolean canBePDBRecord(java.lang.String recName)
          Deprecated. as of Marvin 5.0, PDBRecognizer.testRecord(String) must be used instead
static java.lang.String[] convertToSmilingFormat(Molecule m)
          Tries to convert a molecule to a SMILES related format.
static java.lang.String[] convertToSmilingFormat(MProp p)
          Try to convert a property to text with a SMILES related format argument.
static MolExportModule createExportModule(java.lang.String fmt)
          Creates an export module for the specified format.
static MRecordReader createRecordReader(java.io.InputStream is, java.lang.String opts)
          Creates a record reader for an input stream.
static MRecordReader createRecordReader(java.io.InputStream is, java.lang.String opts, java.lang.String enc, java.lang.String path)
          Creates a record reader for an input stream.
static MFileFormat[] findFormats(java.lang.String fmt, long flags, long mask)
          Gets a list of formats.
static java.lang.String[] getEncodingFromOptions(java.lang.String fmtopts)
          Gets the encoding that was explicitly given as an import option.
static java.lang.String getFileExtensionLC(java.io.File f)
          Gets the file extension in lower case.
static java.lang.String getFileExtensionLC(java.lang.String fname)
          Gets the file extension in lower case.
static MFileFormat getFormat(java.lang.String fmt)
          Gets the file format descriptor for the specified codename.
static java.lang.String[] getJTFFields(java.lang.String line)
          Gets fields delimited with {space} {tab} {;} {:} or {,}.
static java.lang.String getKnownExtension(java.lang.String fname)
          Returns the file extension if it is a known extension.
static java.lang.String[] getMolfileExtensions()
          Gets the array of known molecule file extensions.
static java.lang.String[] getMolfileFormats()
          Gets the array of known molecule file formats.
static java.lang.String getMostLikelyMolFormat(java.lang.String fname)
          Gets the most likey molecule file format from the file name extension.
static java.lang.String getUnguessableFormat(java.lang.String fname)
          Gets the file format from the file name extension for formats that are not guessable from the file content.
static java.lang.String guessPeptideFormat(java.lang.String header)
          Deprecated. as of Marvin 5.0, PeptideRecognizer.guessPeptideFormat(String) must be used instead
static boolean isCubeLine(java.lang.String line, int count)
          Deprecated. as of Marvin 5.0, CubeRecognizer.isCubeLine(String, int) must be used instead
static boolean isOutputCleanable(java.lang.String fmt)
          Tests whether the specified output format is cleanable.
static boolean isSubFormatOf(java.lang.String f, java.lang.String other)
          Tests whether a format is a sub-format of another format.
static boolean isURLOrFileName(java.lang.String s)
          Tests whether the specified string is an URL (absolute or relative) or file name.
static int preprocessFormatAndOptions(java.lang.String[] fmtopts)
          Parses options like "MULTISET", "MOLMOVIE" or "NOMOLMOVIE".
static java.lang.String recognizeOneLineFormat(java.lang.String s)
          Recognize a one-line string as CxSMILES, CxSMARTS, AbbrevGroup, Peptide or IUPAC name.
static void registerFormat(MFileFormat mff)
          Registers a user defined file format.
static java.lang.String[] splitFileAndOptions(java.lang.String arg)
          Parses "file{options}" strings used in molecule file import.
static java.lang.String[] splitFormatAndOptions(java.lang.String opts)
          Parses "format:options" strings used in molecule file import and export.
static void testEncoding(java.lang.String enc)
          Tests whether the given charset name is supported by this JVM
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MULTISET

public static final int MULTISET
The multi-molecule file really contains multiple atom sets of one molecule.

See Also:
Constant Field Values

MOLMOVIE

public static final int MOLMOVIE
Read multi-molecule files as movies.

Since:
Marvin 5.2, 02/12/2009
See Also:
Constant Field Values

NOMOLMOVIE

public static final int NOMOLMOVIE
Do not read multi-molecule XYZ files as movies.

Since:
Marvin 5.2, 02/12/2009
See Also:
Constant Field Values
Constructor Detail

MFileFormatUtil

public MFileFormatUtil()
Method Detail

isURLOrFileName

public static boolean isURLOrFileName(java.lang.String s)
Tests whether the specified string is an URL (absolute or relative) or file name.

Parameters:
s - the string
Returns:
true if it is an URL or file name, false otherwise

isSubFormatOf

public static boolean isSubFormatOf(java.lang.String f,
                                    java.lang.String other)
Tests whether a format is a sub-format of another format.

Parameters:
f - the format codename
other - the other format
Returns:
true if it is a format variant of f
Since:
Marvin 4.1, 04/07/2006

recognizeOneLineFormat

public static java.lang.String recognizeOneLineFormat(java.lang.String s)
Recognize a one-line string as CxSMILES, CxSMARTS, AbbrevGroup, Peptide or IUPAC name.

Parameters:
s - the input string
Returns:
the most probable format or null
Since:
Marvin 4.1, 04/06/2006

canBeAbbrevgroup

@Deprecated
public static boolean canBeAbbrevgroup(java.lang.String line)
Deprecated. as of Marvin 5.0, AbbrevGroupRecognizer.testLine(String) must be used instead


canBeJTF

@Deprecated
public static boolean canBeJTF(java.lang.String line)
Deprecated. as of Marvin 5.0, JTFRecognizer.canBeJTFHeader(String) must be used instead


canBePDBRecord

@Deprecated
public static boolean canBePDBRecord(java.lang.String recName)
Deprecated. as of Marvin 5.0, PDBRecognizer.testRecord(String) must be used instead


isCubeLine

@Deprecated
public static boolean isCubeLine(java.lang.String line,
                                            int count)
Deprecated. as of Marvin 5.0, CubeRecognizer.isCubeLine(String, int) must be used instead


getJTFFields

public static java.lang.String[] getJTFFields(java.lang.String line)
Gets fields delimited with {space} {tab} {;} {:} or {,}. Fields are enclosed in "" or '' these can be mixed in a line, but must match for a single field. A valid line for example: "23";'345.45';"asdf asdf";;'CCC1CC1'

Returns:
the contents of the fields.

splitFileAndOptions

public static java.lang.String[] splitFileAndOptions(java.lang.String arg)
Parses "file{options}" strings used in molecule file import.

Parameters:
arg - string containing the filename and the options (if there are)
Returns:
a two-element array containing the filename and the options.

splitFormatAndOptions

public static java.lang.String[] splitFormatAndOptions(java.lang.String opts)
Parses "format:options" strings used in molecule file import and export. Examples:
 splitFormatAndOptions("xyz:f1.4") returns {"xyz", "f1.4"}
 splitFormatAndOptions("f1.4") returns {null, "f1.4"}
 splitFormatAndOptions("xyz:") returns {"xyz", ""}
 splitFormatAndOptions("gzip:xyz:f1.4") returns {"gzip", "xyz:f1.4"}
 
The semicolon can be omitted in case if Marvin's built-in input formats. Example:
 splitFormatAndOptions("xyz") returns { "xyz", ""}
 

Parameters:
opts - string containing the format and the options
Returns:
an array containing the format(s) and the options.

preprocessFormatAndOptions

public static int preprocessFormatAndOptions(java.lang.String[] fmtopts)
Parses options like "MULTISET", "MOLMOVIE" or "NOMOLMOVIE". Example:
 String[] fmtopts = splitFormatAndOptions("gzip:xyz:MULTISET,f1.4");
 // fmtopts == {"gzip", "xyz:MULTISET,f.14"}
 int result = preprocessFormatAndOptions(fmtopts);
 // fmtopts == {"gzip", "xyz:f.14"}, results == MULTISET
 

Parameters:
fmtopts - two-element array containing the format and the options
Returns:
flags corresponding to the options
See Also:
splitFormatAndOptions(java.lang.String), MULTISET, MOLMOVIE, NOMOLMOVIE

getEncodingFromOptions

public static java.lang.String[] getEncodingFromOptions(java.lang.String fmtopts)
Gets the encoding that was explicitly given as an import option. The format is enc{name}, where name is a JAVA supported name of the charset.

Parameters:
fmtopts - the input format and options
Returns:
two element array, the first element is the encoding, the second contains the remaining import options.
Throws:
java.nio.charset.IllegalCharsetNameException - if the encoding is illegal
java.nio.charset.UnsupportedCharsetException - if the encoding is unsupported

testEncoding

public static void testEncoding(java.lang.String enc)
                         throws java.lang.IllegalArgumentException
Tests whether the given charset name is supported by this JVM

Parameters:
enc - the name of the charset
Throws:
java.lang.IllegalArgumentException

getUnguessableFormat

public static java.lang.String getUnguessableFormat(java.lang.String fname)
Gets the file format from the file name extension for formats that are not guessable from the file content. Used to distinguish SMARTS and SMILES.

Parameters:
fname - the filename
Returns:
the file format or null if the file contents can be used to recognize the format

getFileExtensionLC

public static java.lang.String getFileExtensionLC(java.io.File f)
Gets the file extension in lower case.

Parameters:
f - the file
Returns:
the extension in lower case

getFileExtensionLC

public static java.lang.String getFileExtensionLC(java.lang.String fname)
Gets the file extension in lower case.

Parameters:
fname - the filename
Returns:
the extension in lower case

getMostLikelyMolFormat

public static java.lang.String getMostLikelyMolFormat(java.lang.String fname)
Gets the most likey molecule file format from the file name extension.

Parameters:
fname - the filename
Returns:
the file format or null if the format cannot be determined from the file name

getKnownExtension

public static java.lang.String getKnownExtension(java.lang.String fname)
Returns the file extension if it is a known extension. Known extensions are the following: mrv t gz mol mol2 rgf rxn csmol csrgf csrxn sdf cssdf rdf smi smiles sma smarts cml xml xyz txt html htm cgi gif jpg jpeg msbmp png ppm svg svgz

Parameters:
fname - the filename
Returns:
the extension

getMolfileExtensions

public static java.lang.String[] getMolfileExtensions()
Gets the array of known molecule file extensions.

Returns:
the array of known molecule file extensions

getMolfileFormats

public static java.lang.String[] getMolfileFormats()
Gets the array of known molecule file formats.

Returns:
the array of known molecule file formats

isOutputCleanable

public static boolean isOutputCleanable(java.lang.String fmt)
                                 throws java.lang.SecurityException
Tests whether the specified output format is cleanable. For a non-cleanable output format, cleaning is meaningless because coordinates are not stored.

Parameters:
fmt - the format string
Returns:
true if the specified output format is non-cleanable, false otherwise
Throws:
java.lang.SecurityException
Since:
Marvin 4.1, 02/13/2006

guessPeptideFormat

@Deprecated
public static java.lang.String guessPeptideFormat(java.lang.String header)
Deprecated. as of Marvin 5.0, PeptideRecognizer.guessPeptideFormat(String) must be used instead


registerFormat

public static void registerFormat(MFileFormat mff)
Registers a user defined file format. The MFileFormat.F_USER_DEFINED flag is automatically set.

Parameters:
mff - the file format
Since:
Marvin 5.0, 05/23/2007

getFormat

public static MFileFormat getFormat(java.lang.String fmt)
Gets the file format descriptor for the specified codename.

Parameters:
fmt - the format codename
Returns:
the descriptor or null if not found
Since:
Marvin 5.0, 05/23/2007

findFormats

public static MFileFormat[] findFormats(java.lang.String fmt,
                                        long flags,
                                        long mask)
Gets a list of formats.

Parameters:
fmt - the format name or null if not important
flags - select formats of which the specified flags are set
mask - only bits specified here are taken into account
Returns:
the list
Since:
Marvin 5.0, 05/24/2007

createRecordReader

public static MRecordReader createRecordReader(java.io.InputStream is,
                                               java.lang.String opts)
                                        throws MolFormatException,
                                               java.io.IOException
Creates a record reader for an input stream.

Parameters:
is - the input stream
opts - input options or null
Returns:
the record reader or null if the format was not recognized
Throws:
java.nio.charset.IllegalCharsetNameException - if illegal encoding is used
java.nio.charset.UnsupportedCharsetException - if unsupported encoding is used
java.lang.SecurityException - if the module cannot be loaded because of a firewall problem
MolFormatException
java.io.IOException
Since:
Marvin 5.0, 06/03/2007
See Also:
MFileFormat.createRecordReader(MolInputStream, String)

createRecordReader

public static MRecordReader createRecordReader(java.io.InputStream is,
                                               java.lang.String opts,
                                               java.lang.String enc,
                                               java.lang.String path)
                                        throws MolFormatException,
                                               java.io.IOException
Creates a record reader for an input stream.

Parameters:
is - the input stream
opts - input options or null
enc - the input encoding or null
path - the file path (it can also be an URL) or null
Returns:
the record reader or null if the format was not recognized
Throws:
java.nio.charset.IllegalCharsetNameException - if illegal encoding is used
java.nio.charset.UnsupportedCharsetException - if unsupported encoding is used
java.lang.SecurityException - if the module cannot be loaded because of a firewall problem
MolFormatException
java.io.IOException
Since:
Marvin 5.0, 06/03/2007, Marvin 5.3
See Also:
MFileFormat.createRecordReader(MolInputStream, String)

createExportModule

public static MolExportModule createExportModule(java.lang.String fmt)
                                          throws MolExportException
Creates an export module for the specified format.

Parameters:
fmt - the format name
Throws:
java.lang.SecurityException - if the module cannot be loaded because of a firewall problem
MolExportException
See Also:
MFileFormat.createExportModule()

convertToSmilingFormat

public static java.lang.String[] convertToSmilingFormat(Molecule m)
                                                 throws MolExportException
Tries to convert a molecule to a SMILES related format. SMILES, SMARTS, CxSMILES and CxSMARTS are tried in this order.

Returns:
the result of the first successful conversion, the 0th array element is the converted text, the 1st element is the format
Throws:
MolExportException - if conversion was not successful
Since:
Marvin 5.0, 11/11/2007

convertToSmilingFormat

public static java.lang.String[] convertToSmilingFormat(MProp p)
                                                 throws MolExportException
Try to convert a property to text with a SMILES related format argument. SMILES, SMARTS, CxSMILES and CxSMARTS are tried in this order.

Returns:
the result of the first successful conversion, the 0th array element is the converted text, the 1st element is the format
Throws:
MolExportException - if conversion was not successful
Since:
Marvin 5.0, 11/11/2007