Technical Support Forum Index
Technical Support Forum
Access ChemAxon scientists and developers here. For registration and login issues contact website support.

Support Ticket System is replacing forum

This forum was converted into a searchable archive. You cannot add posts here any more. For support please use our new Ticket System.

Create your first ticket
simple question
To watch this topic for replies  Register (enables digests) or give email address:
This topic is locked: you cannot edit posts or make replies.
Display posts from previous:   
    View previous topic :: View next topic    
Author Message
akshay

Joined: 28 Nov 2006
Posts: 10

View user's profile

Back to top
Link to postPosted: Sun Oct 21, 2007 7:37 amPost subject: simple question Reply with quote

this is a bit foolish but cant help

1. i have academic license for jchem and generatemd
2. i have a windows xp desktop pc
3. i have a file conitaing 50,000~ molecules represented as smiles
4. i wish to compute as many possible descriptors as i can for qspr/qsar

please help
thanks a lot
Tobias

Joined: 26 Jan 2005
Posts: 580

View user's profile

Back to top
Link to postPosted: Tue Oct 23, 2007 9:14 amPost subject: Reply with quote

Hi,
You know how to program in JAVA and to use Eclipse?
Then you can use the book Molecular Descriptors from Todeschini and
programm all the descriptors with the JCHEM API.

If you don't know JAVA its going to be harder. You can
in principle use cxcalc or generateMD to generate all the descriptors using
a XML command sheet. Or you can use Instant-JChem to
generate most of the descriptors, but due to current
restrictions Instant-JChem can only have 256 or 512 columns
so you can not include all the possible descriptors.
Furthermore it is not possible to apply a general XML sheet
(write once-use often) to generate most of the descriptors.

Now you also have to be clear, what kind of descriptors you
want to use 0D,1D,2D,3D,4D molecular descriptors? All of them? (See Engel\Gasteiger Cheminformatics)

* 0D - bond counts, mol weight, atom counts
* 1D - fragment counts, H-Bond acc/don, Crippen, PSA, SMARTS
* 2D - topological descriptors (Balaban, Randic, Wiener, BCUT, kappa, chi)
* 3D - geometrical descriptors (3D WHIM, 3D autocorrelation, 3D-Morse) + surface properties + COMFA
* 4D - 3D coordinates + conformations (JCHEM conformer, CORINA, gold set, Crystaleye)

The good thing about the JCHEM API is, that in principle you can implement most of the stuff very easily. Those
functions are attached at the bottom. The 1D fragment counts can be implemented using a SMARTS matcher function.

Among those fingerprints are the PubChem Fingerprints or the public
OpenBabel SMARTS implementation. You can also use MCS maximum common substructures (LIBMCS) to create such
patterns only for your dataset or any other dataset (like PubChem).

You can easily calculate 2000 descriptors with different
software applications, see moleculardescriptors.eu
For a small test set of 150 molecules you can use VCCLAB from Igor Tetko for testing the effectiveness of some of
the descriptors (you want to implement with the JCHEM API).
Or you can use JOELIB or better the CDK Descriptor Calculator GUI from Rajarshi Guha.

Beware! Most of the descriptors you can calculate
will have no impact. You need to use feature selection to find useful descriptors for regression or classification.
It is also helpful to prevent overfitting by dividing your dataset into a 70% development and 30% test set
and have a independent external validation set at hand.
You can additionally use v-fold cross-validation or bootstrapping for your development test set.
All those methods are known since the 70s of the last century.
Do not use the R^2=0.999999999 linear fit scam.
Use prediction errors or R^2, Q^2 for independent datasets or other measurements (do not fool yourself).

For the classification or regression statistics it absolutely
does not matter which method you use. The best case is to test all methods or build ensemble methods or group contribution methods which may include:

Generalized Linear Models (GLM)
General Discriminant Analysis
Binary logit (logistic) regression
Binary probit regression
Nonlinear models
Multivariate adaptive regression splines (MARS)
Tree models
Standard Classification Trees (CART)
Standard General Chi-square Automatic Interaction Detector (CHAID)
Exhaustive CHAID
Boosting classification trees
Neural Networks
Multilayer Perceptron
neural network (MLP)
Radial Basis Function neural network (RBF)
Machine Learning
Support Vector Machines (SVM)
Naive Bayes classifier
k-Nearest Neighbors (KNN)

You can implement such methods with MEV, Statistica Dataminer, Yale or WEKA.

Tobias

JCHEM descriptors supported in the API:
Code:

   <Descriptor Name="ChemicalFingerprint"/>
   <Descriptor Name="PharmacophoreFingerprint"/>
   <Descriptor Name="BCUT"/>
   <Descriptor Name="HDon"/>
   <Descriptor Name="HAcc"/>
   <Descriptor Name="Heavy"/>
   <Descriptor Name="LogD"/>
   <Descriptor Name="LogP"/>
   <Descriptor Name="Mass"/>
   <Descriptor Name="TPSA"/>
   <Plugin ID="majorMs" Class="chemaxon.marvin.calculations.MajorMicrospeciesPlugin" JAR="MajorMicrospeciesPlugin.jar"/>
   <Plugin ID="msCount" Class="chemaxon.marvin.calculations.MajorMicrospeciesPlugin" JAR="MajorMicrospeciesPlugin.jar">
   <Plugin ID="ms" Class="chemaxon.marvin.calculations.MajorMicrospeciesPlugin" JAR="MajorMicrospeciesPlugin.jar">
   <Plugin ID="msDistr" Class="chemaxon.marvin.calculations.MajorMicrospeciesPlugin" JAR="MajorMicrospeciesPlugin.jar">
   <Plugin ID="tautomer" Class="chemaxon.marvin.calculations.TautomerizationPlugin" JAR="MultiformPlugin.jar"/>
   <Plugin ID="canonicalTautomer" Class="chemaxon.marvin.calculations.TautomerizationPlugin" JAR="MultiformPlugin.jar">
   <Plugin ID="tautomers" Class="chemaxon.marvin.calculations.TautomerizationPlugin" JAR="MultiformPlugin.jar">
   <Plugin ID="tautomerCount" Class="chemaxon.marvin.calculations.TautomerizationPlugin" JAR="MultiformPlugin.jar">
   <Plugin ID="dominantTautomer" Class="chemaxon.marvin.calculations.TautomerizationPlugin" JAR="MultiformPlugin.jar">
   <Plugin ID="dominantTautomers" Class="chemaxon.marvin.calculations.TautomerizationPlugin" JAR="MultiformPlugin.jar">
   <Plugin ID="dominantTautomerCount" Class="chemaxon.marvin.calculations.TautomerizationPlugin" JAR="MultiformPlugin.jar">
   <Plugin ID="resonant" Class="chemaxon.marvin.calculations.ResonancePlugin" JAR="MultiformPlugin.jar"/>
   <Plugin ID="canonicalResonant" Class="chemaxon.marvin.calculations.ResonancePlugin" JAR="MultiformPlugin.jar">
   <Plugin ID="resonants" Class="chemaxon.marvin.calculations.ResonancePlugin" JAR="MultiformPlugin.jar">
   <Plugin ID="resonantCount" Class="chemaxon.marvin.calculations.ResonancePlugin" JAR="MultiformPlugin.jar">
   <Plugin ID="charge" Class="chemaxon.marvin.calculations.ChargePlugin" JAR="ChargePlugin.jar"/>
   <Plugin ID="ionCharge" Class="chemaxon.marvin.calculations.IonChargePlugin" JAR="IonChargePlugin.jar"/>
   <Plugin ID="sigmaOrbitalElectronegativity" Class="chemaxon.marvin.calculations.OrbitalElectronegativityPlugin" JAR="OrbitalElectronegativityPlugin.jar">
   <Plugin ID="sOEN" Class="chemaxon.marvin.calculations.OrbitalElectronegativityPlugin" JAR="OrbitalElectronegativityPlugin.jar">
   <Plugin ID="piOrbitalElectronegativity" Class="chemaxon.marvin.calculations.OrbitalElectronegativityPlugin" JAR="OrbitalElectronegativityPlugin.jar">
   <Plugin ID="pOEN" Class="chemaxon.marvin.calculations.OrbitalElectronegativityPlugin" JAR="OrbitalElectronegativityPlugin.jar">
   <Plugin ID="polarizability" Class="chemaxon.marvin.calculations.PolarizabilityPlugin" JAR="PolarizabilityPlugin.jar">
   <Plugin ID="pol" Class="chemaxon.marvin.calculations.PolarizabilityPlugin" JAR="PolarizabilityPlugin.jar">
   <Plugin ID="atomPol" Class="chemaxon.marvin.calculations.PolarizabilityPlugin" JAR="PolarizabilityPlugin.jar">
   <Plugin ID="molPol" Class="chemaxon.marvin.calculations.PolarizabilityPlugin" JAR="PolarizabilityPlugin.jar">
   <Plugin ID="avgPol" Class="chemaxon.marvin.calculations.PolarizabilityPlugin" JAR="PolarizabilityPlugin.jar">
   <Plugin ID="averagePol" Class="chemaxon.marvin.calculations.PolarizabilityPlugin" JAR="PolarizabilityPlugin.jar">
   <Plugin ID="axxPol" Class="chemaxon.marvin.calculations.PolarizabilityPlugin" JAR="PolarizabilityPlugin.jar">
   <Plugin ID="ayyPol" Class="chemaxon.marvin.calculations.PolarizabilityPlugin" JAR="PolarizabilityPlugin.jar">
   <Plugin ID="azzPol" Class="chemaxon.marvin.calculations.PolarizabilityPlugin" JAR="PolarizabilityPlugin.jar">
   <Plugin ID="pKa" Class="chemaxon.marvin.calculations.pKaPlugin" JAR="pKaPlugin.jar">
   <Plugin ID="acidicpKa" Class="chemaxon.marvin.calculations.pKaPlugin" JAR="pKaPlugin.jar">
   <Plugin ID="apKa" Class="chemaxon.marvin.calculations.pKaPlugin" JAR="pKaPlugin.jar">
   <Plugin ID="basicpKa" Class="chemaxon.marvin.calculations.pKaPlugin" JAR="pKaPlugin.jar">
   <Plugin ID="bpKa" Class="chemaxon.marvin.calculations.pKaPlugin" JAR="pKaPlugin.jar">
   <Plugin ID="acidicpKaLargeModel" Class="chemaxon.marvin.calculations.pKaPlugin" JAR="pKaPlugin.jar">
   <Plugin ID="basicpKaLargeModel" Class="chemaxon.marvin.calculations.pKaPlugin" JAR="pKaPlugin.jar">
   <Plugin ID="logD" Class="chemaxon.marvin.calculations.logDPlugin" JAR="logDPlugin.jar"/>
   <Plugin ID="logP" Class="chemaxon.marvin.calculations.logPPlugin" JAR="logPPlugin.jar">
   <Plugin ID="logPi" Class="chemaxon.marvin.calculations.logPPlugin" JAR="logPPlugin.jar">
   <Plugin ID="orderE" Class="chemaxon.marvin.calculations.HuckelAnalysisPlugin" JAR="HuckelAnalysisPlugin.jar">
   <Plugin ID="orderNu" Class="chemaxon.marvin.calculations.HuckelAnalysisPlugin" JAR="HuckelAnalysisPlugin.jar">
   <Plugin ID="energyE" Class="chemaxon.marvin.calculations.HuckelAnalysisPlugin" JAR="HuckelAnalysisPlugin.jar">
   <Plugin ID="energyNu" Class="chemaxon.marvin.calculations.HuckelAnalysisPlugin" JAR="HuckelAnalysisPlugin.jar">
   <Plugin ID="piEnergy" Class="chemaxon.marvin.calculations.HuckelAnalysisPlugin" JAR="HuckelAnalysisPlugin.jar">
   <Plugin ID="piChargeDensity" Class="chemaxon.marvin.calculations.HuckelAnalysisPlugin" JAR="HuckelAnalysisPlugin.jar">
   <Plugin ID="totalChargeDensity" Class="chemaxon.marvin.calculations.HuckelAnalysisPlugin" JAR="HuckelAnalysisPlugin.jar">
   <Plugin ID="PSA" Class="chemaxon.marvin.calculations.TPSAPlugin" JAR="TPSAPlugin.jar"/>
   <Plugin ID="vanDerWaalsSurfaceArea" Class="chemaxon.marvin.calculations.MSAPlugin" JAR="MSAPlugin.jar">
   <Plugin ID="solventAccessibleSurfaceArea" Class="chemaxon.marvin.calculations.MSAPlugin" JAR="MSAPlugin.jar">
   <Plugin ID="pI" Class="chemaxon.marvin.calculations.IsoelectricPointPlugin" JAR="IsoelectricPointPlugin.jar"/>
   <Plugin ID="elemanal" Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin" JAR="ElementalAnalyserPlugin.jar"/>
   <Plugin ID="mass" Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin" JAR="ElementalAnalyserPlugin.jar">
   <Plugin ID="exactMass" Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin" JAR="ElementalAnalyserPlugin.jar">
   <Plugin ID="atomCount" Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin" JAR="ElementalAnalyserPlugin.jar">
   <Plugin ID="formula" Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin" JAR="ElementalAnalyserPlugin.jar">
   <Plugin ID="isotopeFormula" Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin" JAR="ElementalAnalyserPlugin.jar">
   <Plugin ID="dotDisconnectedFormula" Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin" JAR="ElementalAnalyserPlugin.jar">
   <Plugin ID="composition" Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin" JAR="ElementalAnalyserPlugin.jar">
   <Plugin ID="isotopeComposition" Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin" JAR="ElementalAnalyserPlugin.jar">
   <Plugin ID="topanal" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar"/>
   <Plugin ID="aliphaticAtomCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="aromaticAtomCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="bondCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="aliphaticBondCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="aromaticBondCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="rotatableBondCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="ringCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="aliphaticRingCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="aromaticRingCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="heteroRingCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="heteroaromaticRingCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="carboRingCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="carboaromaticRingCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="ringAtomCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="ringBondCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="chainAtomCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="chainBondCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="smallestRingSize" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="largestRingSize" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="fusedRingCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="fusedAliphaticRingCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="fusedAromaticRingCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="asymmetricAtomCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="chiralCenterCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="aromaticAtom" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="aliphaticAtom" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="chainAtom" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="ringAtom" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="asymmetricAtom" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="chiralCenter" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="cyclomaticNumber" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="plattIndex" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="randicIndex" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="balabanIndex" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="distanceDegree" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="eccentricity" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="hararyIndex" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="hyperWienerIndex" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="szegedIndex" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="wienerIndex" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="wienerPolarity" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="stericEffectIndex" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="smallestAtomRingSize" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="largestAtomRingSize" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="shortestPath" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="connected" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="connectedGraph" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="bondType" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="chainBond" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="ringBond" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="rotatableBond" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="ringCountOfAtom" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">
   <Plugin ID="HBDA" Class="chemaxon.marvin.calculations.HBDAPlugin" JAR="HBDAPlugin.jar"/>
   <Plugin ID="acc" Class="chemaxon.marvin.calculations.HBDAPlugin" JAR="HBDAPlugin.jar">
   <Plugin ID="don" Class="chemaxon.marvin.calculations.HBDAPlugin" JAR="HBDAPlugin.jar">
   <Plugin ID="accSiteCount" Class="chemaxon.marvin.calculations.HBDAPlugin" JAR="HBDAPlugin.jar">
   <Plugin ID="donSiteCount" Class="chemaxon.marvin.calculations.HBDAPlugin" JAR="HBDAPlugin.jar">
   <Plugin ID="acceptorCount" Class="chemaxon.marvin.calculations.HBDAPlugin" JAR="HBDAPlugin.jar">
   <Plugin ID="donorCount" Class="chemaxon.marvin.calculations.HBDAPlugin" JAR="HBDAPlugin.jar">
   <Plugin ID="refrac" Class="chemaxon.marvin.calculations.RefractivityPlugin" JAR="RefractivityPlugin.jar"/>
   <Plugin ID="refractivity" Class="chemaxon.marvin.calculations.RefractivityPlugin" JAR="RefractivityPlugin.jar"/>
   <Plugin ID="refraci" Class="chemaxon.marvin.calculations.RefractivityPlugin" JAR="RefractivityPlugin.jar">
   <Plugin ID="refractivityIncrements" Class="chemaxon.marvin.calculations.RefractivityPlugin" JAR="RefractivityPlugin.jar">
   <Plugin ID="conformer" Class="chemaxon.marvin.calculations.ConformerPlugin" JAR="ConformerPlugin.jar"/>
   <Plugin ID="conformers" Class="chemaxon.marvin.calculations.ConformerPlugin" JAR="ConformerPlugin.jar">
   <Plugin ID="conformerCount" Class="chemaxon.marvin.calculations.ConformerPlugin" JAR="ConformerPlugin.jar">
   <Plugin ID="leconformer" Class="chemaxon.marvin.calculations.ConformerPlugin" JAR="ConformerPlugin.jar">
   <Plugin ID="hasValidConformer" Class="chemaxon.marvin.calculations.ConformerPlugin" JAR="ConformerPlugin.jar">
   <Plugin ID="stereoisomer" Class="chemaxon.marvin.calculations.StereoisomerPlugin" JAR="StereoisomerPlugin.jar">
   <Plugin ID="stereoisomers" Class="chemaxon.marvin.calculations.StereoisomerPlugin" JAR="StereoisomerPlugin.jar">
   <Plugin ID="stereoisomerCount" Class="chemaxon.marvin.calculations.StereoisomerPlugin" JAR="StereoisomerPlugin.jar">
   <Plugin ID="doubleBondStereoisomer" Class="chemaxon.marvin.calculations.StereoisomerPlugin" JAR="StereoisomerPlugin.jar">
   <Plugin ID="doubleBondStereoisomers" Class="chemaxon.marvin.calculations.StereoisomerPlugin" JAR="StereoisomerPlugin.jar">
   <Plugin ID="doubleBondStereoisomerCount" Class="chemaxon.marvin.calculations.StereoisomerPlugin" JAR="StereoisomerPlugin.jar">
   <Plugin ID="tetrahedralStereoisomer" Class="chemaxon.marvin.calculations.StereoisomerPlugin" JAR="StereoisomerPlugin.jar">
   <Plugin ID="tetrahedralStereoisomers" Class="chemaxon.marvin.calculations.StereoisomerPlugin" JAR="StereoisomerPlugin.jar">
   <Plugin ID="tetrahedralStereoisomerCount" Class="chemaxon.marvin.calculations.StereoisomerPlugin" JAR="StereoisomerPlugin.jar">
   <Plugin ID="dreidingEnergy" Class="chemaxon.marvin.calculations.GeometryPlugin" JAR="GeometryPlugin.jar">
   <Plugin ID="distance" Class="chemaxon.marvin.calculations.GeometryPlugin" JAR="GeometryPlugin.jar">
   <Plugin ID="angle" Class="chemaxon.marvin.calculations.GeometryPlugin" JAR="GeometryPlugin.jar">
   <Plugin ID="dihedral" Class="chemaxon.marvin.calculations.GeometryPlugin" JAR="GeometryPlugin.jar">
   <Plugin ID="stericHindrance" Class="chemaxon.marvin.calculations.GeometryPlugin" JAR="GeometryPlugin.jar">
   <Plugin ID="name" Class="chemaxon.marvin.calculations.IUPACNamingPlugin" JAR="IUPACNamingPlugin.jar"/>
   <Plugin ID="traditionalName" Class="chemaxon.marvin.calculations.IUPACNamingPlugin" JAR="IUPACNamingPlugin.jar">



Fragment counts using OpenBabel counts and the
JCHEM SMARTS matching function:

Code:

#              SMARTS Patterns for Functional Group Classification
#
#              written by Christian Laggner
#              Copyright 2005 Inte:Ligand Software-Entwicklungs und Consulting GmbH
#
#              Released under the Lesser General Public License (LGPL license)
#              see http://www.gnu.org/copyleft/lesser.html
#              Modified from Version 221105
#              Project homepage: http://sourceforge.net/projects/openbabel

Primary_carbon: [CX4H3][#6]
Secondary_carbon: [CX4H2]([#6])[#6]
Tertiary_carbon: [CX4H1]([#6])([#6])[#6]
Quaternary_carbon: [CX4]([#6])([#6])([#6])[#6]
Alkene: [CX3;$([H2]),$([H1][#6]),$(C([#6])[#6])]=[CX3;$([H2]),$([H1][#6]),$(C([#6])[#6])]
Alkyne: [CX2]#[CX2]
Allene: [CX3]=[CX2]=[CX3]
Alkylchloride: [ClX1][CX4]
Alkylfluoride: [FX1][CX4]
Alkylbromide: [BrX1][CX4]
Alkyliodide: [IX1][CX4]
Alcohol: [OX2H][CX4;!$(C([OX2H])[O,S,#7,#15])]
Primary_alcohol: [OX2H][CX4H2;!$(C([OX2H])[O,S,#7,#15])]
Secondary_alcohol: [OX2H][CX4H;!$(C([OX2H])[O,S,#7,#15])]
Tertiary_alcohol: [OX2H][CX4D4;!$(C([OX2H])[O,S,#7,#15])]
Dialkylether: [OX2]([CX4;!$(C([OX2])[O,S,#7,#15,F,Cl,Br,I])])[CX4;!$(C([OX2])[O,S,#7,#15])]
Dialkylthioether: [SX2]([CX4;!$(C([OX2])[O,S,#7,#15,F,Cl,Br,I])])[CX4;!$(C([OX2])[O,S,#7,#15])]
Alkylarylether: [OX2](c)[CX4;!$(C([OX2])[O,S,#7,#15,F,Cl,Br,I])]
Diarylether: [c][OX2][c]
Alkylarylthioether: [SX2](c)[CX4;!$(C([OX2])[O,S,#7,#15,F,Cl,Br,I])]
Diarylthioether: [c][SX2][c]
Oxonium: [O+;!$([O]~[!#6]);!$([S]*~[#7,#8,#15,#16])]
Amine: [NX3+0,NX4+;!$([N]~[!#6]);!$([N]*~[#7,#8,#15,#16])]
Primary_aliph_amine: [NX3H2+0,NX4H3+;!$([N][!C]);!$([N]*~[#7,#8,#15,#16])]
Secondary_aliph_amine: [NX3H1+0,NX4H2+;!$([N][!C]);!$([N]*~[#7,#8,#15,#16])]
Tertiary_aliph_amine: [NX3H0+0,NX4H1+;!$([N][!C]);!$([N]*~[#7,#8,#15,#16])]
Quaternary_aliph_ammonium: [NX4H0+;!$([N][!C]);!$([N]*~[#7,#8,#15,#16])]
Primary_arom_amine: [NX3H2+0,NX4H3+]c
Secondary_arom_amine: [NX3H1+0,NX4H2+;!$([N][!c]);!$([N]*~[#7,#8,#15,#16])]
Tertiary_arom_amine: [NX3H0+0,NX4H1+;!$([N][!c]);!$([N]*~[#7,#8,#15,#16])]
Quaternary_arom_ammonium: [NX4H0+;!$([N][!c]);!$([N]*~[#7,#8,#15,#16])]
Secondary_mixed_amine: [NX3H1+0,NX4H2+;$([N]([c])[C]);!$([N]*~[#7,#8,#15,#16])]
Tertiary_mixed_amine: [NX3H0+0,NX4H1+;$([N]([c])([C])[#6]);!$([N]*~[#7,#8,#15,#16])]
Quaternary_mixed_ammonium: [NX4H0+;$([N]([c])([C])[#6][#6]);!$([N]*~[#7,#8,#15,#16])]
Ammonium: [N+;!$([N]~[!#6]);!$(N=*);!$([N]*~[#7,#8,#15,#16])]
Alkylthiol: [SX2H][CX4;!$(C([SX2H])~[O,S,#7,#15])]
Dialkylthioether: [SX2]([CX4;!$(C([SX2])[O,S,#7,#15,F,Cl,Br,I])])[CX4;!$(C([SX2])[O,S,#7,#15])]
Alkylarylthioether: [SX2](c)[CX4;!$(C([SX2])[O,S,#7,#15])]
Disulfide: [SX2D2][SX2D2]
1,2-Aminoalcohol: [OX2H][CX4;!$(C([OX2H])[O,S,#7,#15,F,Cl,Br,I])][CX4;!$(C([N])[O,S,#7,#15])][NX3;!$(NC=[O,S,N])]
1,2-Diol: [OX2H][CX4;!$(C([OX2H])[O,S,#7,#15])][CX4;!$(C([OX2H])[O,S,#7,#15])][OX2H]
1,1-Diol: [OX2H][CX4;!$(C([OX2H])([OX2H])[O,S,#7,#15])][OX2H]
Hydroperoxide: [OX2H][OX2]
Peroxo: [OX2D2][OX2D2]
Organolithium_compounds: [LiX1][#6,#14]
Organomagnesium_compounds: [MgX2][#6,#14]
Organometallic_compounds: [!#1;!#5;!#6;!#7;!#8;!#9;!#14;!#15;!#16;!#17;!#33;!#34;!#35;!#52;!#53;!#85]~[#6;!-]
Aldehyde: [$([CX3H][#6]),$([CX3H2])]=[OX1]
Ketone: [#6][CX3](=[OX1])[#6]
Thioaldehyde: [$([CX3H][#6]),$([CX3H2])]=[SX1]
Thioketone: [#6][CX3](=[SX1])[#6]
Imine: [NX2;$([N][#6]),$([NH]);!$([N][CX3]=[#7,#8,#15,#16])]=[CX3;$([CH2]),$([CH][#6]),$([C]([#6])[#6])]
Immonium: [NX3+;!$([N][!#6]);!$([N][CX3]=[#7,#8,#15,#16])]
Oxime: [NX2](=[CX3;$([CH2]),$([CH][#6]),$([C]([#6])[#6])])[OX2H]
Oximether: [NX2](=[CX3;$([CH2]),$([CH][#6]),$([C]([#6])[#6])])[OX2][#6;!$(C=[#7,#8])]
Acetal: [OX2]([#6;!$(C=[O,S,N])])[CX4;!$(C(O)(O)[!#6])][OX2][#6;!$(C=[O,S,N])]
Hemiacetal: [OX2H][CX4;!$(C(O)(O)[!#6])][OX2][#6;!$(C=[O,S,N])]
Aminal: [NX3v3;!$(NC=[#7,#8,#15,#16])]([#6])[CX4;!$(C(N)(N)[!#6])][NX3v3;!$(NC=[#7,#8,#15,#16])][#6]
Hemiaminal: [NX3v3;!$(NC=[#7,#8,#15,#16])]([#6])[CX4;!$(C(N)(N)[!#6])][OX2H]
Thioacetal: [SX2]([#6;!$(C=[O,S,N])])[CX4;!$(C(S)(S)[!#6])][SX2][#6;!$(C=[O,S,N])]
Thiohemiacetal: [SX2]([#6;!$(C=[O,S,N])])[CX4;!$(C(S)(S)[!#6])][OX2H]
Halogen_acetal_like: [NX3v3,SX2,OX2;!$(*C=[#7,#8,#15,#16])][CX4;!$(C([N,S,O])([N,S,O])[!#6])][FX1,ClX1,BrX1,IX1]
Acetal_like: [NX3v3,SX2,OX2;!$(*C=[#7,#8,#15,#16])][CX4;!$(C([N,S,O])([N,S,O])[!#6])][FX1,ClX1,BrX1,IX1,NX3v3,SX2,OX2;!$(*C=[#7,#8,#15,#16])]
Halogenmethylen_ester_and_similar: [NX3v3,SX2,OX2;$(**=[#7,#8,#15,#16])][CX4;!$(C([N,S,O])([N,S,O])[!#6])][FX1,ClX1,BrX1,IX1]
NOS_methylen_ester_and_similar: [NX3v3,SX2,OX2;$(**=[#7,#8,#15,#16])][CX4;!$(C([N,S,O])([N,S,O])[!#6])][NX3v3,SX2,OX2;!$(*C=[#7,#8,#15,#16])]
Hetero_methylen_ester_and_similar: [NX3v3,SX2,OX2;$(**=[#7,#8,#15,#16])][CX4;!$(C([N,S,O])([N,S,O])[!#6])][FX1,ClX1,BrX1,IX1,NX3v3,SX2,OX2;!$(*C=[#7,#8,#15,#16])]
Cyanhydrine: [NX1]#[CX2][CX4;$([CH2]),$([CH]([CX2])[#6]),$(C([CX2])([#6])[#6])][OX2H]
Chloroalkene: [ClX1][CX3]=[CX3]
Fluoroalkene: [FX1][CX3]=[CX3]
Bromoalkene: [BrX1][CX3]=[CX3]
Iodoalkene: [IX1][CX3]=[CX3]
Enol: [OX2H][CX3;$([H1]),$(C[#6])]=[CX3]
Endiol: [OX2H][CX3;$([H1]),$(C[#6])]=[CX3;$([H1]),$(C[#6])][OX2H]
Enolether: [OX2]([#6;!$(C=[N,O,S])])[CX3;$([H0][#6]),$([H1])]=[CX3]
Enolester: [OX2]([CX3]=[OX1])[#6X3;$([#6][#6]),$([H1])]=[#6X3;!$(C[OX2H])]
Enamine: [NX3;$([NH2][CX3]),$([NH1]([CX3])[#6]),$([N]([CX3])([#6])[#6]);!$([N]*=[#7,#8,#15,#16])][CX3;$([CH]),$([C][#6])]=[CX3]
Thioenol: [SX2H][CX3;$([H1]),$(C[#6])]=[CX3]
Thioenolether: [SX2]([#6;!$(C=[N,O,S])])[CX3;$(C[#6]),$([CH])]=[CX3]
Acylchloride: [CX3;$([R0][#6]),$([H1R0])](=[OX1])[ClX1]
Acylfluoride: [CX3;$([R0][#6]),$([H1R0])](=[OX1])[FX1]
Acylbromide: [CX3;$([R0][#6]),$([H1R0])](=[OX1])[BrX1]
Acyliodide: [CX3;$([R0][#6]),$([H1R0])](=[OX1])[IX1]
Acylhalide: [CX3;$([R0][#6]),$([H1R0])](=[OX1])[FX1,ClX1,BrX1,IX1]
Carboxylic_acid: [CX3;$([R0][#6]),$([H1R0])](=[OX1])[$([OX2H]),$([OX1-])]
Carboxylic_ester:  [CX3;$([R0][#6]),$([H1R0])](=[OX1])[OX2][#6;!$(C=[O,N,S])]
Lactone: [#6][#6X3R](=[OX1])[#8X2][#6;!$(C=[O,N,S])]
Carboxylic_anhydride: [CX3;$([H0][#6]),$([H1])](=[OX1])[#8X2][CX3;$([H0][#6]),$([H1])](=[OX1])
Carboxylic_acid_derivative: [$([#6X3H0][#6]),$([#6X3H])](=[!#6])[!#6]
Carbothioic_acid: [CX3;!R;$([C][#6]),$([CH]);$([C](=[OX1])[$([SX2H]),$([SX1-])]),$([C](=[SX1])[$([OX2H]),$([OX1-])])]
Carbothioic_S_ester: [CX3;$([R0][#6]),$([H1R0])](=[OX1])[SX2][#6;!$(C=[O,N,S])]
Carbothioic_S_lactone: [#6][#6X3R](=[OX1])[#16X2][#6;!$(C=[O,N,S])]
Carbothioic_O_ester: [CX3;$([H0][#6]),$([H1])](=[SX1])[OX2][#6;!$(C=[O,N,S])]
Carbothioic_O_lactone: [#6][#6X3R](=[SX1])[#8X2][#6;!$(C=[O,N,S])]
Carbothioic_halide: [CX3;$([H0][#6]),$([H1])](=[SX1])[FX1,ClX1,BrX1,IX1]
Carbodithioic_acid: [CX3;!R;$([C][#6]),$([CH]);$([C](=[SX1])[SX2H])]
Carbodithioic_ester: [CX3;!R;$([C][#6]),$([CH]);$([C](=[SX1])[SX2][#6;!$(C=[O,N,S])])]
Carbodithiolactone: [#6][#6X3R](=[SX1])[#16X2][#6;!$(C=[O,N,S])]
Amide: [CX3;$([R0][#6]),$([H1R0])](=[OX1])[#7X3;$([H2]),$([H1][#6;!$(C=[O,N,S])]),$([#7]([#6;!$(C=[O,N,S])])[#6;!$(C=[O,N,S])])]
Primary_amide: [CX3;$([R0][#6]),$([H1R0])](=[OX1])[NX3H2]
Secondary_amide: [CX3;$([R0][#6]),$([H1R0])](=[OX1])[#7X3H1][#6;!$(C=[O,N,S])]
Tertiary_amide: [CX3;$([R0][#6]),$([H1R0])](=[OX1])[#7X3H0]([#6;!$(C=[O,N,S])])[#6;!$(C=[O,N,S])]
Lactam: [#6R][#6X3R](=[OX1])[#7X3;$([H1][#6;!$(C=[O,N,S])]),$([H0]([#6;!$(C=[O,N,S])])[#6;!$(C=[O,N,S])])]
Alkyl_imide: [#6X3;$([H0][#6]),$([H1])](=[OX1])[#7X3H0]([#6])[#6X3;$([H0][#6]),$([H1])](=[OX1])
N_hetero_imide: [#6X3;$([H0][#6]),$([H1])](=[OX1])[#7X3H0]([!#6])[#6X3;$([H0][#6]),$([H1])](=[OX1])
Imide_acidic: [#6X3;$([H0][#6]),$([H1])](=[OX1])[#7X3H1][#6X3;$([H0][#6]),$([H1])](=[OX1])
Thioamide: [$([CX3;!R][#6]),$([CX3H;!R])](=[SX1])[#7X3;$([H2]),$([H1][#6;!$(C=[O,N,S])]),$([#7]([#6;!$(C=[O,N,S])])[#6;!$(C=[O,N,S])])]
Thiolactam: [#6R][#6X3R](=[SX1])[#7X3;$([H1][#6;!$(C=[O,N,S])]),$([H0]([#6;!$(C=[O,N,S])])[#6;!$(C=[O,N,S])])]
Oximester: [#6X3;$([H0][#6]),$([H1])](=[OX1])[#8X2][#7X2]=,:[#6X3;$([H0]([#6])[#6]),$([H1][#6]),$([H2])]
Amidine: [NX3;!$(NC=[O,S])][CX3;$([CH]),$([C][#6])]=[NX2;!$(NC=[O,S])]
Hydroxamic_acid: [CX3;$([H0][#6]),$([H1])](=[OX1])[#7X3;$([H1]),$([H0][#6;!$(C=[O,N,S])])][$([OX2H]),$([OX1-])]
Hydroxamic_acid_ester: [CX3;$([H0][#6]),$([H1])](=[OX1])[#7X3;$([H1]),$([H0][#6;!$(C=[O,N,S])])][OX2][#6;!$(C=[O,N,S])]
Imidoacid: [CX3R0;$([H0][#6]),$([H1])](=[NX2;$([H1]),$([H0][#6;!$(C=[O,N,S])])])[$([OX2H]),$([OX1-])]
Imidoacid_cyclic: [#6R][#6X3R](=,:[#7X2;$([H1]),$([H0][#6;!$(C=[O,N,S])])])[$([OX2H]),$([OX1-])] 
Imidoester: [CX3R0;$([H0][#6]),$([H1])](=[NX2;$([H1]),$([H0][#6;!$(C=[O,N,S])])])[OX2][#6;!$(C=[O,N,S])]
Imidolactone: [#6R][#6X3R](=,:[#7X2;$([H1]),$([H0][#6;!$(C=[O,N,S])])])[OX2][#6;!$(C=[O,N,S])]
Imidothioacid: [CX3R0;$([H0][#6]),$([H1])](=[NX2;$([H1]),$([H0][#6;!$(C=[O,N,S])])])[$([SX2H]),$([SX1-])]
Imidothioacid_cyclic: [#6R][#6X3R](=,:[#7X2;$([H1]),$([H0][#6;!$(C=[O,N,S])])])[$([SX2H]),$([SX1-])] 
Imidothioester: [CX3R0;$([H0][#6]),$([H1])](=[NX2;$([H1]),$([H0][#6;!$(C=[O,N,S])])])[SX2][#6;!$(C=[O,N,S])]
Imidothiolactone: [#6R][#6X3R](=,:[#7X2;$([H1]),$([H0][#6;!$(C=[O,N,S])])])[SX2][#6;!$(C=[O,N,S])]
Amidine: [#7X3v3;!$(N([#6X3]=[#7X2])C=[O,S])][CX3R0;$([H1]),$([H0][#6])]=[NX2v3;!$(N(=[#6X3][#7X3])C=[O,S])]
Imidolactam: [#6][#6X3R;$([H0](=[NX2;!$(N(=[#6X3][#7X3])C=[O,S])])[#7X3;!$(N([#6X3]=[#7X2])C=[O,S])]),$([H0](-[NX3;!$(N([#6X3]=[#7X2])C=[O,S])])=,:[#7X2;!$(N(=[#6X3][#7X3])C=[O,S])])] 
Imidoylhalide: [CX3R0;$([H0][#6]),$([H1])](=[NX2;$([H1]),$([H0][#6;!$(C=[O,N,S])])])[FX1,ClX1,BrX1,IX1]
Imidoylhalide_cyclic: [#6R][#6X3R](=,:[#7X2;$([H1]),$([H0][#6;!$(C=[O,N,S])])])[FX1,ClX1,BrX1,IX1]
Amidrazone: [$([$([#6X3][#6]),$([#6X3H])](=[#7X2v3])[#7X3v3][#7X3v3]),$([$([#6X3][#6]),$([#6X3H])]([#7X3v3])=[#7X2v3][#7X3v3])]
Alpha_aminoacid: [NX3,NX4+;!$([N]~[!#6]);!$([N]*~[#7,#8,#15,#16])][C][CX3](=[OX1])[OX2H,OX1-]
Alpha_hydroxyacid: [OX2H][C][CX3](=[OX1])[OX2H,OX1-]
Peptide_middle: [NX3;$([N][CX3](=[OX1])[C][NX3,NX4+])][C][CX3](=[OX1])[NX3;$([N][C][CX3](=[OX1])[NX3,OX2,OX1-])]
Peptide_C_term: [NX3;$([N][CX3](=[OX1])[C][NX3,NX4+])][C][CX3](=[OX1])[OX2H,OX1-]
Peptide_N_term: [NX3,NX4+;!$([N]~[!#6]);!$([N]*~[#7,#8,#15,#16])][C][CX3](=[OX1])[NX3;$([N][C][CX3](=[OX1])[NX3,OX2,OX1-])]
Carboxylic_orthoester: [#6][OX2][CX4;$(C[#6]),$([CH])]([OX2][#6])[OX2][#6]
Ketene: [CX3]=[CX2]=[OX1]
Ketenacetal: [#7X2,#8X3,#16X2;$(*[#6,#14])][#6X3]([#7X2,#8X3,#16X2;$(*[#6,#14])])=[#6X3]
Nitrile: [NX1]#[CX2]
Isonitrile: [CX1-]#[NX2+]
Vinylogous_carbonyl_or_carboxyl_derivative: [#6X3](=[OX1])[#6X3]=,:[#6X3][#7,#8,#16,F,Cl,Br,I]
Vinylogous_acid: [#6X3](=[OX1])[#6X3]=,:[#6X3][$([OX2H]),$([OX1-])]
Vinylogous_ester: [#6X3](=[OX1])[#6X3]=,:[#6X3][#6;!$(C=[O,N,S])]
Vinylogous_amide: [#6X3](=[OX1])[#6X3]=,:[#6X3][#7X3;$([H2]),$([H1][#6;!$(C=[O,N,S])]),$([#7]([#6;!$(C=[O,N,S])])[#6;!$(C=[O,N,S])])]
Vinylogous_halide: [#6X3](=[OX1])[#6X3]=,:[#6X3][FX1,ClX1,BrX1,IX1]
Carbonic_acid_dieester: [#6;!$(C=[O,N,S])][#8X2][#6X3](=[OX1])[#8X2][#6;!$(C=[O,N,S])]
Carbonic_acid_esterhalide: [#6;!$(C=[O,N,S])][OX2;!R][CX3](=[OX1])[OX2][FX1,ClX1,BrX1,IX1]
Carbonic_acid_monoester: [#6;!$(C=[O,N,S])][OX2;!R][CX3](=[OX1])[$([OX2H]),$([OX1-])]
Carbonic_acid_derivatives: [!#6][#6X3](=[!#6])[!#6]
Thiocarbonic_acid_dieester: [#6;!$(C=[O,N,S])][#8X2][#6X3](=[SX1])[#8X2][#6;!$(C=[O,N,S])]
Thiocarbonic_acid_esterhalide: [#6;!$(C=[O,N,S])][OX2;!R][CX3](=[SX1])[OX2][FX1,ClX1,BrX1,IX1]
Thiocarbonic_acid_monoester: [#6;!$(C=[O,N,S])][OX2;!R][CX3](=[SX1])[$([OX2H]),$([OX1-])]
Thiourea: [#7X3;!$([#7][!#6])][#6X3](=[SX1])[#7X3;!$([#7][!#6])]
Isourea: [#7X2;!$([#7][!#6])]=,:[#6X3]([#8X2&!$([#8][!#6]),OX1-])[#7X3;!$([#7][!#6])]
Isothiourea: [#7X2;!$([#7][!#6])]=,:[#6X3]([#16X2&!$([#16][!#6]),SX1-])[#7X3;!$([#7][!#6])]
Guanidine: [N;v3X3,v4X4+][CX3](=[N;v3X2,v4X3+])[N;v3X3,v4X4+]
Carbaminic_acid: [NX3]C(=[OX1])[O;X2H,X1-]
Urethan: [#7X3][#6](=[OX1])[#8X2][#6]
Biuret: [#7X3][#6](=[OX1])[#7X3][#6](=[OX1])[#7X3]
Semicarbazide: [#7X3][#7X3][#6X3]([#7X3;!$([#7][#7])])=[OX1]
Carbazide: [#7X3][#7X3][#6X3]([#7X3][#7X3])=[OX1]
Semicarbazone: [#7X2](=[#6])[#7X3][#6X3]([#7X3;!$([#7][#7])])=[OX1]
Carbazone: [#7X2](=[#6])[#7X3][#6X3]([#7X3][#7X3])=[OX1]
Thiosemicarbazide: [#7X3][#7X3][#6X3]([#7X3;!$([#7][#7])])=[SX1]
Thiocarbazide: [#7X3][#7X3][#6X3]([#7X3][#7X3])=[SX1]
Thiosemicarbazone: [#7X2](=[#6])[#7X3][#6X3]([#7X3;!$([#7][#7])])=[SX1]
Thiocarbazone: [#7X2](=[#6])[#7X3][#6X3]([#7X3][#7X3])=[SX1]
Isocyanate: [NX2]=[CX2]=[OX1]
Cyanate: [OX2][CX2]#[NX1]
Isothiocyanate: [NX2]=[CX2]=[SX1]
Thiocyanate: [SX2][CX2]#[NX1]
Carbodiimide: [NX2]=[CX2]=[NX2]
Orthocarbonic_derivatives: [CX4H0]([O,S,#7])([O,S,#7])([O,S,#7])[O,S,#7,F,Cl,Br,I]
Phenol: [OX2H][c]
1,2-Diphenol: [OX2H][c][c][OX2H]
Arylchloride: [Cl][c]
Arylfluoride: [F][c]
Arylbromide: [Br][c]
Aryliodide: [I][c]
Arylthiol: [SX2H][c]
Iminoarene: [c]=[NX2;$([H1]),$([H0][#6;!$([C]=[N,S,O])])]
Oxoarene: [c]=[OX1]
Thioarene: [c]=[SX1]
Hetero_N_basic_H: [nX3H1+0]
Hetero_N_basic_no_H: [nX3H0+0]
Hetero_N_nonbasic: [nX2,nX3+]
Hetero_O: [o]
Hetero_S: [sX2]
Heteroaromatic: [a;!c]
Nitrite: [NX2](=[OX1])[O;$([X2]),$([X1-])]
Thionitrite: [SX2][NX2]=[OX1]
Nitrate: [$([NX3](=[OX1])(=[OX1])[O;$([X2]),$([X1-])]),$([NX3+]([OX1-])(=[OX1])[O;$([X2]),$([X1-])])]
Nitro: [$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8]
Nitroso: [NX2](=[OX1])[!#7;!#8]
Azide: [NX1]~[NX2]~[NX2,NX1]
Acylazide: [CX3](=[OX1])[NX2]~[NX2]~[NX1]
Diazo: [$([#6]=[NX2+]=[NX1-]),$([#6-]-[NX2+]#[NX1])]
Diazonium: [#6][NX2+]#[NX1]
Nitrosamine: [#7;!$(N*=O)][NX2]=[OX1]
Nitrosamide: [NX2](=[OX1])N-*=O
N-Oxide: [$([#7+][OX1-]),$([#7v5]=[OX1]);!$([#7](~[O])~[O]);!$([#7]=[#7])]
Hydrazine: [NX3;$([H2]),$([H1][#6]),$([H0]([#6])[#6]);!$(NC=[O,N,S])][NX3;$([H2]),$([H1][#6]),$([H0]([#6])[#6]);!$(NC=[O,N,S])]
Hydrazone: [NX3;$([H2]),$([H1][#6]),$([H0]([#6])[#6]);!$(NC=[O,N,S])][NX2]=[#6]
Hydroxylamine: [NX3;$([H2]),$([H1][#6]),$([H0]([#6])[#6]);!$(NC=[O,N,S])][OX2;$([H1]),$(O[#6;!$(C=[N,O,S])])]
Sulfon: [$([SX4](=[OX1])(=[OX1])([#6])[#6]),$([SX4+2]([OX1-])([OX1-])([#6])[#6])]
Sulfoxide: [$([SX3](=[OX1])([#6])[#6]),$([SX3+]([OX1-])([#6])[#6])]
Sulfonium: [S+;!$([S]~[!#6]);!$([S]*~[#7,#8,#15,#16])]
Sulfuric_acid: [SX4](=[OX1])(=[OX1])([$([OX2H]),$([OX1-])])[$([OX2H]),$([OX1-])]
Sulfuric_monoester: [SX4](=[OX1])(=[OX1])([$([OX2H]),$([OX1-])])[OX2][#6;!$(C=[O,N,S])]
Sulfuric_diester: [SX4](=[OX1])(=[OX1])([OX2][#6;!$(C=[O,N,S])])[OX2][#6;!$(C=[O,N,S])]
Sulfuric_monoamide: [SX4](=[OX1])(=[OX1])([#7X3;$([H2]),$([H1][#6;!$(C=[O,N,S])]),$([#7]([#6;!$(C=
akshay

Joined: 28 Nov 2006
Posts: 10

View user's profile

Back to top
Link to postPosted: Tue Oct 23, 2007 12:50 pmPost subject: Reply with quote

hey thanks a lot.
i was hoping that there would be some kind of simple answer.
i know a lot about ml stuff.
but biggest problem is getting the descriptors.
vcc lab allows only 150 molecules to be processed at a time see ms i will have to go back to cdk etc.
thanks for help
Tobias

Joined: 26 Jan 2005
Posts: 580

View user's profile

Back to top
Link to postPosted: Tue Oct 23, 2007 9:12 pmPost subject: Reply with quote

akshayubhat wrote:
hey thanks a lot.
i was hoping that there would be some kind of simple answer.
A) open the DOS commandline and call cxcalc
I am not quite sure what is simpler than that.
Output will be something like:
Code:

D:\temp>cxcalc plattIndex randicIndex balabanIndex hararyindex wienerindex fusedRingcount largestringsize c6h6.smi

1       12      3.00    2.00    10.00   27      0       6
2       20      2.97    1.64    10.33   27      0       3
3       20      2.97    1.74    10.67   25      2       4
4       28      2.98    2.21    11.50   22      3       5
5       36      3.00    1.29    12.00   21      4       4
6       36      3.00    1.29    12.00   21      4       4
7       8       2.91    2.34    8.70    35      0       0
8       14      2.93    1.88    9.50    31      0       3
9       14      2.93    2.01    9.75    29      0       4
10      22      2.93    1.65    10.25   28      2       3
11      12      3.00    2.00    10.00   27      0       6
12      14      2.93    1.88    9.50    31      0       3


B) Going back to CDK does not help you if you
can not program in JAVA. If you can program in JAVA
its like that:

1) Use MolImporter
2) Load and loop through all molecules
3) Initialize the plugin (see table above)
4) Perform calculation
5) Output calculation

For each of the plugins from the large list above
you can repeat that by simply calling them and
adding more functions, for the topological descriptors it
looks like that, and to be honest I am not quite
sure what is simpler than that (if you know JAVA).
The code is not pretty but it works and its quickly to built.

Code:

package examples;
import chemaxon.formats.*;
import chemaxon.struc.*;
import chemaxon.marvin.calculations.*;
import chemaxon.marvin.plugin.*;
import java.io.*;

public class CalcDescSimple {

   /** Defines a MolImporter object to the structure file. */
   private static MolImporter createMolImporter(String filename) {
      MolImporter mi = null;
      try{
         File f = new File(filename);
         FileInputStream fis = new FileInputStream(f);
         mi = new MolImporter(fis);

      } catch(FileNotFoundException ex) {
         System.err.println(filename+": not found");
         System.exit(1);
      } catch(MolFormatException ex) {
         System.err.println(filename+": "+ex.getMessage());
         System.exit(1);
      } catch(Exception ex) {
         System.err.println("Error: "+filename+" is not a structure file.");
         System.exit(1);
      }
      return mi;
   }
   /** counts molecules from a structure file. */
   private static  long countMolecules(String filename) throws PluginException, MolFormatException, IOException
   {
      MolImporter mi = createMolImporter(filename);
      long globalmolcounter = 0;
      while (( mi.read()) != null) {
         globalmolcounter++;
      }
      mi.close();
      return globalmolcounter;
   }

   public static  void main(String[] args) throws PluginException, MolFormatException, IOException {

      String    filename = "d:/temp/c6h6.smi";
      System.out.println("Number of molecules in " + filename+ ": "+ countMolecules(filename));
      MolImporter mi = createMolImporter(filename);
      TopologyAnalyserPlugin topologyplugin = new TopologyAnalyserPlugin();

      // for each input molecule run the calculation and display the results
      Molecule target = null; long molcounter = 0; long totalerrors = 0;
      while ((target = mi.read()) != null) {

         // set the input molecule
         topologyplugin.setMolecule(target);
         try {

            // run the calculation
            topologyplugin.run();

            //conversion double to string - if you want calculations with doubles use tempXXX
            //loss of precision possible 12-decimals
            java.text.DecimalFormat df12 = new java.text.DecimalFormat("0.000000000000");

            // maybe prettier to put them in array or LIST ?
            int count = target.getAtomCount();
            int aliphaticatomCount = topologyplugin.getAliphaticAtomCount();
            int aliphaticbondcount = topologyplugin.getAliphaticBondCount();
            int aliphaticringcount = topologyplugin.getAliphaticRingCount();
            int aromaticatomcount = topologyplugin.getAromaticAtomCount();
            int aromaticbondcount = topologyplugin.getAromaticBondCount();
            int aromaticringcount = topologyplugin.getAromaticRingCount();
            int asymmetricatomcount = topologyplugin.getAsymmetricAtomCount();
            double tempbalabanindex = topologyplugin.getBalabanIndex();
            String balabanindex = df12.format(tempbalabanindex);
            int bondcount = topologyplugin.getBondCount();
            int carboaromaticringcount = topologyplugin.getCarboaromaticRingCount();
            int carboringcount = topologyplugin.getCarboRingCount();
            int chainatomcount = topologyplugin.getChainAtomCount();
            int chainbondcount = topologyplugin.getChainBondCount();
            int chiralcentercount = topologyplugin.getChiralCenterCount();
            boolean tempconnectedGraph =  topologyplugin.isConnectedGraph();
            int connectedGraph= tempconnectedGraph?1:0;
            int cyclomaticNumber = topologyplugin.getCyclomaticNumber();
            int fusedaliphaticringcount = topologyplugin.getFusedAliphaticRingCount();
            int fusedaromaticringcount = topologyplugin.getFusedAromaticRingCount();
            int fusedringcount = topologyplugin.getFusedRingCount();
            double temphararyIndex = topologyplugin.getHararyIndex();
            String hararyIndex = df12.format(temphararyIndex);
            int heteroaromaticringcount = topologyplugin.getHeteroaromaticRingCount();
            int heteroringcount = topologyplugin.getHeteroRingCount();
            int hyperWienerIndex = topologyplugin.getHyperWienerIndex();
            int largestringsize = topologyplugin.getLargestRingSize();
            int plattIndex = topologyplugin.getPlattIndex();
            double temprandicIndex = topologyplugin.getRandicIndex();
            String randicIndex = df12.format(temprandicIndex);
            int ringatomcount = topologyplugin.getRingAtomCount();
            int ringbondcount = topologyplugin.getRingBondCount();
            int ringcount = topologyplugin.getRingCount();
            int rotatablebondcount = topologyplugin.getRotatableBondCount();
            int smallestringsize = topologyplugin.getSmallestRingSize();
            int szegedIndex = topologyplugin.getSzegedIndex();
            int wienerIndex = topologyplugin.getWienerIndex();
            int wienerPolarity = topologyplugin.getWienerPolarity();

            //*******************************************************************

            String TopologyResult = molcounter + "\t"+count+"\t" + aliphaticatomCount + "\t" + aliphaticbondcount + "\t" + aliphaticringcount + "\t";
            TopologyResult = TopologyResult + aromaticatomcount + "\t" +aromaticbondcount + "\t" + aromaticringcount + "\t";
            TopologyResult = TopologyResult + asymmetricatomcount + "\t" +balabanindex+ "\t"+bondcount+ "\t";
            TopologyResult = TopologyResult + carboaromaticringcount + "\t" +carboringcount + "\t" +chainatomcount + "\t" + chainbondcount + "\t";
            TopologyResult = TopologyResult + chiralcentercount +"\t" + connectedGraph + "\t" + cyclomaticNumber+ "\t";
            TopologyResult = TopologyResult + fusedaliphaticringcount +"\t" + fusedaromaticringcount +"\t" + fusedringcount +"\t" ;
            TopologyResult = TopologyResult + hararyIndex+"\t" +heteroaromaticringcount+"\t" +heteroringcount+"\t"+hyperWienerIndex+"\t" ;
            TopologyResult = TopologyResult + largestringsize +"\t" +plattIndex+"\t" +randicIndex+"\t";
            TopologyResult = TopologyResult + ringatomcount+"\t"+ringbondcount+"\t"+ringcount +"\t";
            TopologyResult = TopologyResult + rotatablebondcount+"\t"+smallestringsize +"\t"+szegedIndex+"\t";
            TopologyResult = TopologyResult + wienerIndex  +"\t"+ wienerPolarity  +"\t";

            System.out.println();
            System.out.print(TopologyResult);

         } //this is for plugin-errors
         catch (Exception e)
         {
            System.out.println ("Error - " + e );
            totalerrors++;
         }
      }
      System.out.println();
      System.out.println("Number of errors:"+totalerrors);
      mi.close();
   }
}



The output is something like:
Code:

Number of molecules in d:/temp/c6h6.smi: 217
                                    
0  6  0  0  0  6  6  1  0  2.000000000000  12  1  1  0  0  0  1  1  0  0  0  10.00000000000  0  0  42  6  12  3.000000000000  6  6  1  0  6  54  27  3 
0  6  6  7  2  0  0  0  0  1.641897173182  13  0  2  0  1  0  1  2  0  0  0  10.33333333333  0  0  43  3  20  2.966326495189  6  6  2  1  3  27  27  4 
0  6  6  7  2  0  0  0  0  1.738063991517  13  0  2  0  0  2  1  2  2  0  2  10.66666666666  0  0  37  4  20  2.966326495189  6  7  2  0  4  59  25  2 
0  6  6  8  3  0  0  0  0  2.213093912396  14  0  3  0  0  4  1  3  3  0  3  11.50000000000  0  0  29  5  28  2.983163247594  6  8  3  0  3  33  22  0 
0  6  6  9  4  0  0  0  0  1.285714285714  15  0  4  0  0  6  1  4  4  0  4  12.00000000000  0  0  27  4  36  3.000000000000  6  9  4  0  3  51  21  0 
0  6  6  9  4  0  0  0  0  1.285714285714  15  0  4  0  0  6  1  4  4  0  4  12.00000000000  0  0  27  4  36  3.000000000000  6  9  4  0  4  81  21  0 
0  6  6  5  0  0  0  0  0  2.339092314976  11  0  0  6  5  0  1  0  0  0  0  8.700000000000  0  0  70  0  8   2.914213562373  0  0  0  2  0  35  35  3 
0  6  6  6  1  0  0  0  0  1.876285894838  12  0  1  3  3  0  1  1  0  0  0  9.500000000000  0  0  56  3  14  2.931851652578  3  3  1  2  3  31  31  3 
0  6  6  6  1  0  0  0  1  2.014266206296  12  0  1  2  2  1  1  1  0  0  0  9.750000000000  0  0  49  4  14  2.931851652578  4  4  1  1  4  45  29  3 
0  6  6  7  2  0  0  0  2  1.647800297284  13  0  2  2  2  2  1  2  2  0  2  10.25000000000  0  0  47  3  22  2.931851652578  4  5  2  1  3  34  28  3 
0  6  6  6  1  0  0  0  0  2.000000000000  12  0  1  0  0  0  1  1  0  0  0  10.00000000000  0  0  42  6  12  3.000000000000  6  6  1  0  6  54  27  3 
0  6  6  6  1  0  0  0  0  1.876285894838  12  0  1  3  3  0  1  1  0  0  0  9.500000000000  0  0  56  3  14  2.931851652578  3  3  1  1  3  31  31  3 
0  6  6  6  1  0  0  0  0  2.014266206296  12  0  1  2  2  0  1  1  0  0  0  9.750000000000  0  0  49  4  14  2.931851652578  4  4  1  1  4  45  29  3 
0  6  6  7  2  0  0  0  2  1.795593921009  13  0  2  0  0  2  1  2  2  0  2  10.83333333333  0  0  34  5  20  2.966326495189  6  7  2  0  3  34  24  1 
0  6  6  6  1  0  0  0  0  2.184105569636  12  0  1  1  1  0  1  1  0  0  0  10.16666666666  0  0  39  5  14  2.893846850117  5  5  1  0  5  33  26  2 
0  6  6  6  1  0  0  0  0  2.014266206296  12  0  1  2  2  0  1  1  0  0  0  9.750000000000  0  0  49  4  14  2.931851652578  4  4  1  1  4  45  29  3 
0  6  6  6  1  0  0  0  0  1.876285894838  12  0  1  3  3  0  1  1  0  0  0  9.500000000000  0  0  56  3  14  2.931851652578  3  3  1  1  3  31  31  3 
0  6  6  7  2  0  0  0  0  1.641897173182  13  0  2  0  1  0  1  2  0  0  0  10.33333333333  0  0  43  3  20  2.966326495189  6  6  2  1  3  27  27  4 


...snip


I added the three files, the example SMILES from all
C6H6 isomers were calculated using the CDK.

Given the fact that you only need 5 lines of code
with the JChem API which actually perform the calculation
I think its quite simple. Its actually a no brainer. Just adding
up routines. The only thing which would be nice to have
a parser which loops through all the XML properties
and then automatically adds each new descriptor
and a calculation line to the JAVA code. But this would require
some serious programming and I am just too lazy for that,
or lets say that goes beyond my programming knowledges.

Kind regards
Tobias
Miklos
ChemAxon personnel
Joined: 21 May 2004
Posts: 1194

View user's profile

Back to top
Link to postPosted: Wed Oct 24, 2007 9:22 amPost subject: Reply with quote

Hi Tobias, great man, thank you for all the useful suggestions, detailed explanations and for the source codes you provided.

Btw: have you received a ChemAxon User Forum t-shirt yet?

Best regards,
Miklos
akshay

Joined: 28 Nov 2006
Posts: 10

View user's profile

Back to top
Link to postPosted: Wed Oct 24, 2007 9:53 amPost subject: Reply with quote

THANKS TOBIAS
FOR ALL THE CODE AND HELP.
I WILL TRY IT.
THANKS AGAIN
Miklos
ChemAxon personnel
Joined: 21 May 2004
Posts: 1194

View user's profile

Back to top
Link to postPosted: Wed Oct 24, 2007 10:26 amPost subject: Reply with quote

Quote:

1. i have academic license for jchem and generatemd
2. i have a windows xp desktop pc
3. i have a file conitaing 50,000~ molecules represented as smiles
4. i wish to compute as many possible descriptors as i can for qspr/qsar
Hi,

as an academic user you are entitled to use all jchem tools without any size or other ind of limitations. Your ~50K compounds can be processed without any problems (i.e. practical memory or time limits will not be reached).

You can use generatemd to calculate complex descriptors like fingerprints, you can even incorporate your own descriptors, in which case, however, you need to write some java code.

As Tobias mentioned, cxcalc can also be quite useful and relevant for your prject. That program can calculate a large number of physico-chemical properties as well as topological and geometrical descriptors and write results in standard text files that are easy to process further. For a detailed list of avaialble properties you may wish to follow this link:
Quote:
http://www.chemaxon.com/marvin/chemaxon/marvin/help/calculator-plugins.html
.

Regards,
Miklos
akshay

Joined: 28 Nov 2006
Posts: 10

View user's profile

Back to top
Link to postPosted: Tue Nov 06, 2007 11:17 amPost subject: finally done! Reply with quote

finally i could caculate the chemical fingerprints using
generatemd c aids.sdf -k CF -o descp.txt
however the desc.txt contains 34 integers
what do theses integers represent are these binary fingerprints?
if yes how can i get 1/0 values?
Miklos
ChemAxon personnel
Joined: 21 May 2004
Posts: 1194

View user's profile

Back to top
Link to postPosted: Tue Nov 06, 2007 1:30 pmPost subject: Reply with quote

Hi,

great!
What you got in the output file is a binary fingerprint in decimal text representation. Each consecutive 32 bits of the binary fingerprint are respected as an integer value and that value is printed in decimal format as readable text. This is a compact representation, much shorter than a 0,1 text. If you insists on using 0,1 text then add the -2 flag to the command line of generatemd. (See the command line help, generatemd -x ). The user's guide may also be useful: http://www.chemaxon.com/jchem/doc/user/GenerateMD.html.)

I still do not understand your real goal, but in most cases the binary text format is not needed and not so useful. For any kind of calculations the integers are just fine, you can directly compare them by tanimoto etc.

Does this help at all?
regards,
Miklos
akshay

Joined: 28 Nov 2006
Posts: 10

View user's profile

Back to top
Link to postPosted: Tue Nov 06, 2007 5:10 pmPost subject: Reply with quote

i want to use those fingerprints as descriptors.
also i want to load them into matlab for further calculation of tanimoto etc hence i am using text format so that i can load the delimited file into matlab.
however i would like to know whether is it possible to calculate a similarity matrix i.e. there are 4773 molecules (bursi mutagencity) i want a tanimoto similarity matrix 4773 *4773 similarity values is it possible with screenmd?
Thanks
Tobias

Joined: 26 Jan 2005
Posts: 580

View user's profile

Back to top
Link to postPosted: Wed Nov 07, 2007 5:25 amPost subject: Reply with quote

Hi,
besides using generateMD and generFP and all other tools,
you can use again the Evaluator very easily.

Assume you have the SMARTS (Derivation and Validation of Toxicophores for Mutagenicity Prediction;
Jeroen Kazius, Ross McGuire, and Roberta Bursi
J. Med. Chem.; 2005; 48(1) pp 312 - 320; (Article) DOI: 10.1021/jm040835a)

you want to use or any other SMARTS like from
Performance of Kier-Hall E-state Descriptors in Quantitative
Structure Activity Relationship (QSAR) Studies of
Multifunctional Molecules
; Darko Butina; Molecules 2004, 9, 1004-1009)

Code:

RowNo smarts-definitions estates-atom-types-Kier-Hall
1 [OH1][*] sOH
2 O=[*] dO
3 [OH0]([*])[*] ssO
4 [o] aaO
5 [NH2][*] sNH2
6 [NH1]=[*] dNH
7 [NH1]([*])[*] ssNH
8 [nH1] aaNH
9 N#[*] tN
10 [ND2](=[*])[*] dsN
11 [nH0] aaN
12 N([*])([*])[*] sssN
13 N(=[*])(=[*])[*] ddsN
14 [N;+]([*])([*])([*])[*] ssssN+
15 [SH1][*] sSH
16 S=[*] dS
17 [SX2]([*])[*] ssS
18 [s] aaS
19 S(=[*])(=[*])([*])[*] ddssS
20 [F][*] sF
21 [Cl][*] sCl
22 [Br][*] sBr
23 [I][*] sI
24 [CH3][*] sCH3
25 [CH2]([*])[*] ssCH2
26 [CH2]=[*] dCH2
27 [CH1]([*])([*])[*] sssCH1
28 [CH1](=[*])[*] dsCH1
29 [CH1]#[*] tCH
30 [cH] aaCH
31 [cH0] aasC
32 C(=[*])=[*] ddC
33 C(#[*])[*] tsC
34 C(=[*])([*])[*] dssC
35 C([*])([*])([*])[*] ssssC


what you do is you create an evaluator XML file:


Code:

array(
matchCount("[OH1][*]"),
matchcount("O=[*]"),
matchcount("[OH0]([*])[*]"),
matchcount("[o]"),
matchcount("[NH2][*]"),
matchcount("[NH1]=[*]"),
matchcount("[NH1]([*])[*]"),
matchcount("[nH1]"),
matchcount("N#[*]"),
matchcount("[ND2](=[*])[*]"),
matchcount("[nH0]"),
matchcount("N([*])([*])[*]"),
matchcount("N(=[*])(=[*])[*]"),
matchcount("[N;+]([*])([*])([*])[*]"),
matchcount("[SH1][*]"),
matchcount("S=[*]"),
matchcount("[SX2]([*])[*]"),
matchcount("[s]"),
matchcount("S(=[*])(=[*])([*])[*]"),
matchcount("[F][*]"),
matchcount("[Cl][*]"),
matchcount("[Br][*]"),
matchcount("[I][*]"),
matchcount("[CH3][*]"),
matchcount("[CH2]([*])[*]"),
matchcount("[CH2]=[*]"),
matchcount("[CH1]([*])([*])[*]"),
matchcount("[CH1](=[*])[*]"),
matchcount("[CH1]#[*]"),
matchcount("[cH]"),
matchcount("[cH0]"),
matchcount("C(=[*])=[*]"),
matchcount("C(#[*])[*]"),
matchcount("C(=[*])([*])[*]"),
matchcount("C([*])([*])([*])[*]"))


and you call it with evaluator like this (but beware this does not work
for logP because it is a real number and the array function
is only defined as integer:

evaluate -f SMARTS-kier-hall-QSAR.txt NCI2000.smi >kier-hall-smarts-out.txt

The output is a nice matrix for any tool like Statistica or WEKA.

Code:

0;2;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;1;0;0;0;3;0;0;0;0;0;3;0
0;0;0;0;0;0;0;0;0;0;2;0;0;0;0;0;2;2;0;0;0;0;0;0;0;0;0;0;0;8;6;0;0;0;0
1;2;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;2;4;0;0;0;0
0;1;0;0;0;1;0;1;0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;0;0;0;1;2;0;0;0;0
0;2;0;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;7;5;0;0;2;0
2;2;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;2;0;0;0;0;0;0;0;8;11;0;0;1;0
0;2;0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;1;0;0;2;0;0;0;0;0;4;2;0;0;4;0
0;3;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;6;6;0;0;2;0
2;0;0;0;0;0;0;0;0;2;0;0;0;0;0;0;0;0;0;0;0;0;0;2;0;0;0;0;0;0;0;0;0;2;0
0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;15;3;0;0;0;0
2;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;6;0;0;0;0;0;2;4;0;0;0;2
0;1;0;0;0;0;0;0;0;1;0;1;0;0;0;0;0;0;0;0;0;0;0;1;1;0;0;0;0;5;1;0;0;2;0
0;0;0;0;1;0;0;0;0;0;1;0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;5;4;0;0;0;0
...snip


Tobias
akshay

Joined: 28 Nov 2006
Posts: 10

View user's profile

Back to top
Link to postPosted: Wed Nov 07, 2007 10:50 amPost subject: Reply with quote

thanks tobias

to calculate the similarity matrix i tried following code using screenmd.
csa.sdf csa2.sdf are same files containing same molecules.
i used
screenmd csa.sdf csa2.sdf -g -k CF -M Tanimoto -o output.txt
it seems to work.
actually while writing this post the calculation is going on.
Miklos
ChemAxon personnel
Joined: 21 May 2004
Posts: 1194

View user's profile

Back to top
Link to postPosted: Wed Nov 07, 2007 11:07 amPost subject: Reply with quote

Indeed, you can calculate the similarity matrix using screenmd, just make sure that the dissimilarity threshold is 1 (for tanimoto, or a very large number when using Euclidean metric).

Regarding the use of the chemical fingerprint as a descriptor: it is possible to use the decimal values for further analysis, e.g. in matlab, there is no need to use the binary 0,1 text format. However, if you would like to perform any kind of dimension reduction then the binary form must be used.

Does this help?

Regards,
Miklos
akshay

Joined: 28 Nov 2006
Posts: 10

View user's profile

Back to top
Link to postPosted: Wed Nov 07, 2007 5:58 pmPost subject: Reply with quote

thanks Miklos
i got the similarity matrix.
could you tell me how dissimilarity threshold affect the whole procedure and how can i set it using command line?
also i want similarity values between 0 ~1. 1 indicates most similiar or equivalent molecule.
thanks
akshay

Joined: 28 Nov 2006
Posts: 10

View user's profile

Back to top
Link to postPosted: Sat Dec 08, 2007 11:14 pmPost subject: Reply with quote

when i calculated the similarity matrix using above procedure i got all diagonal elements as 0 while they should have been 1!!!
please help
thanks
Miklos
ChemAxon personnel
Joined: 21 May 2004
Posts: 1194

View user's profile

Back to top
Link to postPosted: Mon Dec 10, 2007 10:17 amPost subject: Reply with quote

This is because you calculate the dissimilarity... The dissimilarity is often preferred over similarity as there are many common metrics (e.g. Euclidean) that aren't similarity but distance metrics and thus aren't upper bounded.

Hope this helps.

Regards,
Miklos
This topic is locked: you cannot edit posts or make replies.
Page 1 of 1


To watch this topic for replies   Register (enables digests) or give email address  
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum