The jcsearch program is a command-line interface of the JChem
chemical structure search. It is able to perform substructure, superstructure,
full( formerly called exact), full fragment( formerly exact fragment),
similarity and duplicate( formerly perfect) searches as well as match counts
on the specified query and target molecules. These molecules can be specified
as filenames, SMARTS/SMILES strings or database tables (target only). A number
of different molecular file formats are supported. Refer to
the JChem Query Guide for a detailed description of
search options and query features.
Note that the R-group decomposition functionality has been moved to a different script, the R-group Decomposition documentation contains specific information on this subject with examples.
For correct behaviour, please prepare the usage of the
jcsearch script or batch file as described in Preparing the Usage of JChem Batch Files
and Shell Scripts.
The program should be invoked in one of the following forms:
jcsearch [options] [files...]
or jcsearch [options] DB:[table name]
With no file, or when file is -, it reads the standard input.
When DB is specified, search is done in the database, using connection
information saved by other JChem programs (e.g.
jcman)
Options:
-h help message
-H help on output file formats
-q query SMARTS string or name of file that contains the query structure(s)
(More than one can be specified in non-database mode. Please see
options --and and --or.) For a detailed description about how
to formulate queries, see the JChem Query Guide.
In case of -t:p or --tautomer expects SMILES instead of SMARTS.
-t:type search type.
-t:s substructure search (default)
-t:f full search
-t:ff full fragment search
-t:d duplicate search
-t:i:[dissimilarity_threshold] similarity search
in case of mrv or sdf output the
dissimilarity value is stored as
molecule property.
-t:u superstructure search
(default for query tables)
-t:c count all hits
--hitColoring if the output format is
MRV, colors the hits depending on search type.
--markushDisplayMode:o/r/rhg In case of markush searching and hit coloring
specifies the type of the resulting molecule.
Values:
o Default: The results is shown on the given target structure.
r Markush reduction to hit : The markush structure is reduced
according to the hit.
rhg Markush reduction to hit and the homology groups
are expanded according to the matching part of the query.
which can also be a single H atom or the empty set
--align align or template based clean hits if DB option
has been set, and output format is MRV.
--align:r rotate. If query molecule has 0 dimension, it
will be cleaned in 2d for alignment.
--align:p partial clean (template based clean). If query
molecule has 0 dimension, same as rotate.
--queryAbsoluteStereo:y/n All chiral atoms are absolute(y, default) or
consider chiral flag(n) in case of MDL mol files
w/o enhanced stereo labels. Has no effect in
database mode.
--targetAbsoluteStereo:y/n All chiral atoms are absolute(y, default) or
consider chiral flag(n) in case of MDL mol files
w/o enhanced stereo labels. Has no effect in
database mode.
--DBAbsoluteStereo:T/C/A In database mode, sets the above two
AbsoluteStereo flags.
T:(default) as set for table in database.
C: always check chiral flag(false)
A: always absolute stereo(true)
--orderSensitive Switches on order sensitive search
--tautomer Switches on tautomer search
--exactAtomMatching:y/n Exact atom matching(y) or not(n). Default is n.
(Deprecated.)
--exactQueryAtomMatching:y/n Exact query atom matching(y) or not(n).
Default is n.
--exactRadicalMatching:y/n Exact radical matching(y) or not(n).
Default is n. (--radical is preferred instead.)
--exactIsotopeMatching:y/n Exact isotope matching(y) or not(n).
Default is n. (--isotope is preferred instead.)
--exactChargeMatching:y/n Exact charge matching(y) or not(n).
Default is n. (--charge is preferred instead.)
--charge:d/e/i Charge matching mode: d-default,
e-exact, i-ignore
--isotope:d/e/i Isotope matching mode: d-default,
e-exact, i-ignore
--radical:d/e/i Radical matching mode: d-default,
e-exact, i-ignore
--valence:d/i Valence matching mode: d-default, i-ignore
--vagueBond:n/1/2/3/4 Vague handling of bond types: n-off, 1-handling of
certain 5-membered ambiguous aromatic rings,
like [C,N]1C=CC=C1 (default)
2-all ringsingle and double bonds match aromatic
3-all single and double bonds match aromatic
4-ignore all bond types.
--completeHG:y/n Sets if only such structures can match on a
homology group that form an entire group. (e.g.
alkyl can't match on a cycloalkyl). default:y
--chekSpHyb Switch on sp hybridization checking.
--mix:d/i Handling of com, mix and for brackets: d-default,
i-ignore
--polymer:d/i Handling of polymer brackets: d-default,
i-ignore
--endGroupMatching:y/n Polymer end groups must match: y-yes,
n-no (default: yes)
--transformMonomer:y/n Polymer in their source based representation
are transformed to structure based : y-yes,
n-no (default: yes)
--phaseShift:y/n Polymers match the phase shifted variant:
y-yes, n-no (default: yes)
--copolymerMatching:y/n Polymers in copolymers can only be matched by
copolymers: y-yes, n-no (default: no)
--homologyHandling:y/n Homology pseudo atoms are matching on the
represented group: y-yes, n-no, only on pseudo
(default: yes)
--doubleBondStereo:N/M/A Double bond stereo Matching mode:None/Marked/All
Default is M.
--stereoSearchType:s/i/e/d Sets the stereo search type.
Possible values:
s - stereo specific searching (default), i - ignore stereo
e - exact stereo, d - diastereomer search
--stereoModel:l/g/c Sets the used stereo model (for tetrahedral and
double bond stereo). Possible values:
l - local(default), g - global, c - comprehensive
--reactionUnpairedMap:All/unpairedOnly Option for reaction search unpaired maps:
All(default): match to any atom map,
unPairedOnly: match to unpaired map only.
--HCountMatching:G/E/A Hydrogen count query property interpretation.
Values:
G (greater or equal, mdl behaviour) target atom must have H-s
greater or equal to query H-s, in excess of explicit H-s.
H0 means no extra H other than explicitly drawn.
E (equal, daylight behaviour) target atom must have H-s equal to
H count number.
A automatically determine whether G or E should be used, from the
query source. (smiles and smarts source: E, all other: G).
--implicitHMatching:d/y/n Describes the matching of implicit and explicit hydrogens.
Values:
d default: the behaviour will depend on the circumstances of the search.
y Implicit and explicit hydrogens can match.
n Implicit and explicit hydrogens cannot match.
--keepQueryOrder Does not rearrange the atoms of the query which
is done to achieve best search performance.
--markush:n/y Disable/enable special handling of
Markush targets. Default is n.
Enabling requires special license.
--hitIndexType:m/i For Markush targets returns hits for the
original Markush diagram (m - default) or for the
inner compiled representation (i)(See --allHits).
--optimizeQueries:y/n Tries to speed up search when query molecule
contains special query features (atom lists,
bond lists) Default is y.
--distinctFirstAtomMatching:n/y Disable/enable special findAll algorithm.
If set, the hits must have different first atoms.
Default is n.
--attachedDataMatch Describes whether attached data
is compared.
Values:
i Default: ignores attached data when checks matching.
g general: if attached data is present in query, it must be
present in target as well.
e exact: existing attached data must match
both in query and target.
--attachedDataMatchPrefixes Comma separated list of name prefixes
(of attached data labels), that will be
compared. When not set or set to empty
string all attached data is checked.
Effective only when attachedDataMatch
is not set to 'i'.
--undefinedRAtom:g/gh/ghe/a/u Describes the matching of an undefined
R atom in query. Effective only when
exactQueryAtomMatching is not set.
Values:
g Default: Undefined R atom matches a group of
one or more connected atoms in target,
including at least one heavy atom.
gh Undefined R atom matches a group of
one or more connected atoms in target,
which can also be a single H atom.
ghe Undefined R atom matches a group of
one or more connected atoms in target,
which can also be a single H atom or the empty set
(empty set match is allowed for isolated or
one-attachment R-atoms only).
a Undefined R atom matches any single atom in target.
u Undefined R atom matches only an undefined R atom in target.
--bridgingRAllowed:n/y Forbid/allow different R-atoms matching
the same group. Default is n.
--RLigandEqualityCheck:y/n Switch on/off the requirement that R-atoms
with the same R-group ID should match ligands
with the same structure. Default is y.
--maxResults:<n> Limits the number of molecules returned.
-f format output format (default: smiles). Run jcsearch -H for details
possible formats: mrv, mol, sdf, rdf, rxn, csmol, cssdf, csrdf,
csrxn, cxsmiles, cxsmarts, cml, smiles, smarts, sybyl, pdb, pov,
cube or xyz
-f :T<SDF field> write the value of the SDF field in matching targets
-f :Tname write the molecule names of matching targets
-f :M<SDF field1:...:SDF fieldn> write the specified field values of the SDF
fields in the matching targets.
-o file write output to file
-s SMILES read target from SMILES string
-v verbose
-vv very verbose, stack trace on error
-0 skip coordinate calculation for SMILES input
-d use Daylight-type aromatization (Huckel-rule) instead of
the standard ChemAxon aromatization.
-2[:[On][e]] 2D coordinate calculation (useful if the input is SMILES)
-2 coordinate calculation with default options (O1)
-2:O0 no optimization -2:O1 optimize if needed
-2:O2 optimize -2:e make double either (cis/trans) bonds
-n List non-hits. For using with multiple targets, see options --and
and --or.
--and If two or more queries are present, all are required to match.
(Default) For DB targets, only the first query is considered.
If used together with option -n , a hit is returned if none of the
query molecules match.
--or If more than one queries are present, at least one is required to
match. For DB targets, only the first query is considered.
If used together with option -n , a hit is returned if at least
one query molecules does not match.
--allHits Instead of checking the existence of matching, all matchings of
the query molecule(s) are reported.
Symbols used in hit arrays in place of specific query atoms:
R - R-group
M - multicenter
U - unmapable (e.g. polymer star atom)
LP - lone pair
E - R-atom matching the empty set
-e "expression" |<file> A Chemical Terms filtering expression
or --expression "expression"|<file> for filtering hits. For syntax, see the
Filtering expression syntax
--ignoreCTExceptions:n/y If set to y, only syntactical exceptions
will be thrown during search. Those molecules
that return exception during evaluation
will be left out from hit list. Default is n.
-c config file Configuration xml file for Chemical Terms (optional)
or --config config file
-S, --standardize <file/string> Standardize query and target
according to config file/string.
See the Standardizer manual.
Default: -S 'aromatize'.
Set -S '' to skip standardization.
-g, --ignore-error Continue with next molecule on error.
The expression syntax is described in the Chemical Terms Language Reference. Search specific functions contained in the search context provide access to the query and the target molecules, the search hit array and its elements:
mol(), target(): both refer to the search target molecule
query(): refers to the search query molecule
m(int i): refers to the query atom index with atom map i
hit(), h(): both refer to the search hit array
hit(int i), h(int i): both refer to the i-th element of the
search hit array, this is the target atom index matching the query atom with
atom index i
hm(int i): refers to the target atom index matching the query atom with
atom map i (shorthand for h(m(i)))
The default input molecule is the target molecule (e.g. mass() is the same as
mass(target()), both refer to the molecule mass of the target molecule).
In most cases the function and plugin definitions provided by the
built-in evaluator.xml
are sufficient, but it is possible to specify a user-defined configuration xml
in the --config parameter.
The user-defined configuration is added to the definitions contained in the
built-in evaluator.xml.
The syntax is described in the
Chemical Terms Language Reference, which includes
a set of search filter examples.
The short reference tables give a
summary of the functions and plugins provided by the built-in configuration.
A set of working examples
is also available.
jcsearch -q "c1ccccc1Cl" -f smiles input.smi
jcsearch -q "c1ccccc1Cl" --and -q "Br" -f smiles:Tfield_0 input.smi
jcsearch -q "c1ccccc1Cl" -f sdf -o hits.sdf input.sdf
jcsearch -q clbenz.mol -f sdf input.sdf | mview -f ID -
jcsearch -q clbenz.mol -f sdf DB:molecules | mview -f ID -
jcsearch --allHits -e "charge(h(0)) < -0.3" -q '[*]' '[O-]C(=O)CCCCCC(=O)CCCC([O-])=O'
jcsearch --allHits -e "pka('acidic',hm(1)) > 4" -q "[H][O:1]C=[O:2]" target.mol
jcsearch -e "mass() >= 250" -q query.mol targets.sdf
jcsearch -q "CC(C)(O)C#N" input.smi -t:i:0.4