Standardizer Configuration
Version 5.9.4
Contents
Introduction
Standardizer brings molecules to a standardized form by applying standardization actions to the molecules. These actions can be simple tasks such as conversion of explicit H atoms to implicit form or vice versa, aromatization or dearomatization, keeping only one fragment of a salt molecule, clear stereo data or set / remove the absolute stereo (chiral) flag and recalculating atom coordinates (clean). The other action type is defined by a reaction equation given in a reaction molecule file or as a SMARTS string. These reactions specify transformations of functional groups (e.g. transforming nitro groups) which are applied to transform all matching functional groups in the input molecules.
Standardizer actions are specified in a Configuration XML. It is also possible to define actions in a simple action string in case when simple actions mostly use their default options and the reaction actions can be given in SMARTS form.
Configuration XML
The standardization transformations performed by the Standardizer are
determined by the configuration file (specified following the --config
mandatory command line parameter). Transformations can be given as
- reactions mapping one functional group into another
- reactions removing a functional group (e.g. counter ion removal)
- fragment removal actions ("keep largest fragment" or "remove smallest fragment")
- tautomerize or mesomerize actions
- clean action (2D clean, "always" or "only if needed")
- clear stereo action (clear chirality and/or double bond stereo data)
- set/clear absolute stereo action
- specific actions (e.g. aromatize)
The configuration file is an XML file.
The transformations are given in subelements of the
<Actions> subsection, identified with an
ID attribute and processed in the order they are given
in the configuration XML:
<Aromatize>subsections: performs aromatization (example), can also be referenced from the simple action string.In most cases this task is essential and it is best to be put in the first place, see the notes on task ordering and aromatization.
This section can optionally have a
Typeattribute which sets the aromatization type/method. Available aromatization types are:- "general": general (Daylight) type aromatization (default)
- "basic": basic (ChemAxon) type aromatization
- "loose": loose type aromatization
If the
Typeattribute is not specified, then general aromatization is performed.The corresponding action strings are:
- "aromatize": refers to general (Daylight) type aromatization
- "aromatize:b": refers to basic (ChemAxon) type aromatization
- "aromatize:l": refers to loose type aromatization
Refer to the aromaticity documentation for a detailed description of the methods.
Please note, that aromatization does not change existing aromatic bonds, so if you need to rearomatize an incorrectly aromatized molecule, please place a Dearomatize action before Aromatize.
-
<Dearomatize>subsections: performs dearomatization (example), can also be referenced from the simple action string. <AddExplicitH>(former<Hydrogenize>) subsections: transforms implicit H atoms to explicit hydrogens (example), can also be referenced from the simple action string.-
<RemoveExplicitH>(former<ImplH>or<Dehydrogenize>) subsections: transforms explicit H atoms to implicit hydrogens, these elements enable the user to fine tune explicit hydrogen removal (example). Specific attributes specify which hydrogen atoms should be removed. By default, only bound, non-isotope, neutral, non-radical, non-mapped hydrogen atoms are removed. The following attributes can be set to "true" to remove specific hydrogen types:LonelyIsotopeChargedRadicalMappedWedged
-
<ClearIsotopes>subsections: converts isotopes to non-isotopic form (example). Can also be referenced from the simple action string as "clearisotopes". -
<Neutralize>subsections: neutralizes molecules (converts charged atoms to non charged, if it doesn't cause valence errors, example). Can also be referenced from the simple action string as "neutralize". -
<Transformation>subsections: these elements specify the reaction based functional group transformations and removals (example). The reaction is specified in theStructureattribute either as a SMILES string or as a molecule file path. an optionalTypeattribute can be added to specify whether the structure is given as a string (Type="string") or as a file path (Type="path"). If theTypeattribute is omitted then the structure type is automatically decided based on its format which gives the correct result in most cases.For a description of reaction mapping, see the Reaction mapping section of the Reactor Manual.
Removal of a functional group is given by a reaction with empty product side.
An additional optional
Exactattribute can be set to "true" to specify full matching on molecule fragments. This means that only isolated molecule fragments fully matching the given functional group are transformed. A typical application of this feature together with functional group removal is counter ion removal: the"[Cl-]>>"transformation with theExactattribute set to "true" removes the "Cl-" counter ions. Another application is solvent removal. In case of functional group removal in exact mode it may occur that the molecule contains only fragments that would be removed by the transformation and in this way the molecule would become empty. This is prevented by an additional rule: if all fragments would be removed in a single exact transformation then one of the fragments is kept. This is useful for example when we want to remove benzene as a solvent but want to keep benzene itself when it is the only component. The default value of theExactattribute is "false". -
<Reaction>subsections: same as the<Transformation>subsections above, used mainly in earlier configurations and kept for backward compatibility. -
<Removal>subsections: these elements specify fragment removal actions (example).The method to be applied is determined in the
Methodattribute. There are four such methods:- keep the largest fragment, remove all other (
Method="keepLargest") - remove the smallest fragment if there is more than one fragment
(
Method="removeSmallest") - keep the smallest fragment, remove all other (
Method="keepSmallest") - remove the largest fragment if there is more than one fragment
(
Method="removeLargest")
The default method is "keepLargest".
The measure that determines the fragment size is specified in the
Measureattribute:- the number of atoms (
Measure="atomCount") - the molecule mass (
Measure="molMass")
The default measure is "atomCount".
Note:
- the "keepone" action in the simple action string corresponds to the default
behavior (
Method="keepLargest" Measure="atomCount")
- keep the largest fragment, remove all other (
-
<RemoveRGroupDefinitions>subsections: these elements enable the user to remove R-group definitions (keep the root structure). Can also be referenced from the simple action string as "removergroupdefinitions". -
<RemoveAttachedData>subsections: these elements enable the user to remove attached data from molecules. Can also be referenced from the simple action string as "removeattacheddata". -
<RemoveAtomValues>subsections: these elements enable the user to remove atom values (also called as extra atom label). Can also be referenced from the simple action string as "removeatomvalues". -
<Sgroups>subsections: these elements enable the user to convert S-groups (example). The value of theActattribute specifies the conversion:- "Contract": contract S-groups (use abbriviated notation)
- "Expand": expand S-groups (use atoms and bonds notation)
- "Ungroup": ungroup S-groups (replace the S-group by atoms and bonds)
The action will not be performed on S-groups listed in the
Excludeattribute. In theExcludeattribute names of the S-groups should be listed, separated by commas (e.g.Exclude="Ph,Ac").Note:
- in the simple action string "sgroups:contract", "sgroups:expand", and "sgroups:ungroup" refer to these actions
-
<AliasToGroup>subsections: these elements enable the user to convert pseudo and alias atoms to S-groups (example). Can also be referenced from the simple action string as "aliastogroup". -
<ClearStereo>subsections: these elements enable the user to clear stereo data. By default, both chirality information and double bond stereo data are cleared (example). TheTypeattribute can specify only one of these stereo types:- "Chirality": only chirality should be cleared
- "DoubleBond": only double bond stereo should be cleared
- "SingleUpOrDownBond": only single UP or DOWN bonds (aka wiggly bonds) connected to chiral centers should be cleared
Note:
- the "clearstereo" action in the simple action string corresponds to the default behavior, while "clearstereo:chirality", "clearstereo:doublebond" and "clearstereo:singleupordownbond" specify that only chirality, double bond stereo, or single UP or DOWN bonds connected to chiral centers should be cleared, resp.
-
<AbsoluteStereo>subsections: these elements enable the user to set/clear the absolute stereo flag (used in MDL molfiles) (example). TheActattribute specifies the action:- "Clear": clears the absolute stereo flag
- "Set": sets the absolute stereo flag if relevant (the molecule contains chiral atom(s) with no enhanced stereo property), clears otherwise
Note:
- in the simple action string "absolutestereo:clear" and "absolutestereo:set" refer to these actions
<ConvertToEnhancedStereo>subsections: these elements enable the user to convert the molecule to enhanced stereo representation.The
AbsStereoattribute specifies if the molecule should be considered absolute stereo, regardless of the chiral flag setting in the molecule:- "true": the molecule is considered to be absolute stereo
- "false": the chiral flag is used to determine if the molecule is absolute stereo
The
DefaultNonAbsStereoGroupattribute specifies the stereo group type of unlabeled stereo atoms that are not considered to be absolute stereo:- "and": unlabeled stereo atoms go into a new "and" group
- "or": unlabeled stereo atoms go into a new "or" group
Note:
- in the simple action string "converttoenhancedstereo:abs", "converttoenhancedstereo:and" and "converttoenhancedstereo:or" refer to these actions
-
<ConvertDoubleBonds>subsections: these elements enable the user to convert the representation of double bonds with unspecified CIS/TRANS stereo information to wiggly or crossed type. TheTypeattribute can specify only one of these representation types:- "Wiggly": double bonds with unspecified CIS/TRANS stereo information will be converted to wiggly representation
- "Crosssed": double bonds with unspecified CIS/TRANS stereo information will be converted to crossed representation
If input molecules have no 2D coordinates, then also a 2D clean is performed before converting the double bond representation.
Note:
- in the simple action string "convertdoublebonds:wiggly" and "convertdoublebonds:crossed" action refer to these actions
-
<WedgeClean>subsections: rearrange the stereo wedges according to the IUPAC recommendations. Can also be referenced from the simple action string as "wedgeclean". -
<ConvertWedgeInterpretation>subsections: convert each wedge between two stereo centers into two wedges. Can also be referenced from the simple action string as "convertwedgeinterpretation". -
<RemoveStereoCareBox>subsections: remove stereo search markers (AKA stero care box) from double bonds. Can also be referenced from the simple action string as "removestereocarebox" -
<Expand>subsections: expand stoichiometry data. This means that copies of molecule fragments are added such that the multiplicities of the fragments would reflect the stoichiometry data. Since these multiplicities should be integral, a calculation is made to find the least possible integers with the required ratios. Finally the stoichiometry data is removed, since it is represented by the structure itself after this expansion.The
Dataattribute specifies the attached data field name that stores the stoichiometry data (default: "Stoichiometry"). The stoichiometry data of an atom refers to the connected fragment containing the atom. If more atoms in a fragment have different stoichiometry data then the behaviour is undefined. The stoichiometry data can be specified in either of the following forms:- decimal fraction (e.g.
0.5,0.667) - common fraction (e.g.
1/2,2/3) - integer (e.g.
2,3)
1. The fragment coefficients are corresponding integer values that preserve the stoichiometry ratio. The expansion will contain each fragment with these multiplicities. For example, this structure:
will be expanded to this form:

- decimal fraction (e.g.
-
<Tautomerize>subsections: creates a canonical/standard tautomer form of the molecule. Can also be referenced from the simple action string as "tautomerize".The created tautomer:
- represents a group of tautomers which can transform to each other
- supposed to be a stable tautomer
Notes:
- The created tautomer is always returned in dearomatized form, so after performing "tautomerize" action molecules will be in dearomatized form.
- "Tautomerize" action ungroups all S-groups. After performing "tautomerize" the molecules will not contain any S-groups.
- The created tautomer is not identical to the canonical tautomer generated by the Tautomerization Plugin.
-
<Mesomerize>subsections: takes the canonical resonant form of the molecule. Can also be referenced from the simple action string as "mesomerize".Note:
- Canonical resonant is always returned in dearomatized form, so after performing "mesomerize" action molecules will be in dearomatized form.
-
<MapReaction>subsections: maps reactions by identifying and assigning numbers to the corresponding atoms on the two sides of the reaction arrow (example). Can also be referenced from the simple action string as "mapreaction".Note:
- The "MapReaction" action maps only reactions.
-
<Map>subsections: maps atoms of a molecule or reaction. Can also be referenced from the simple action string as "map". -
<Unmap>subsections: removes all map numbers from the atoms. Can also be referenced from the simple action string as "unmap". -
<Clean>subsections: these elements perform automatic atom coordinate calculation (example). TheDimattribute specifies the molecule dimension (2or3), the default is the molecule dimension in case of full clean, or2if the molecule dimension is0. For partial and template based clean the dimension is always2. TheTypeattribute specifies the clean type:- "Partial": partial clean, preserve original atom coordinates where possible, calculate coordinates for new/changed atoms (default, available in 2D only)
- "Full": full clean, recalculate all atom coordinates
- "TemplateBased": template based clean, the template file is specified in the
TemplateFileattribute (available in 2D only)
Currently partial and template based clean is available in 2D only. If the template molecules are not in 2D then they are cleaned in 2D upon startup.
Template based clean is performed in the following way: templates are searched in the target molecule in the order as they are given in the template file. The first matching is processed: template atom coordinates are copied to the corresponding target atoms and the remaining atoms are cleaned with partial clean. See some difficult to clean structures processed with template based cleaning below.
skeleton compound to clean template cleaned result crown ether 


porphyrine 


bicycle 


Notes:
- Full clean is performed in the case when the
dimension specified in the
Dimattribute is different from the original molecule dimension (e.g. in case of SMILES input with cleaning in 2D). - The "clean" action in the simple action string corresponds to the default behaviour, while the "clean:full" action corresponds to full clean always.
- In case of an output format containing atom coordinates, a full clean in 2D is performed at the end if the input is in 0D, a partial clean on changing atoms is performed otherwise.
- If the output is not in 0D (e.g. SMILES) then there is a built-in clean task preformed at the end of the standardization process to recalculate the atom coordinates of atoms changed in the transformations. If the molecule is in 3D then this is full clean with the recalculation of all atom coordinates since partial clean is not available in 3D.
- Full clean, performed in 2D converts all unspecified double bonds to "wiggly" representation (this is the representation preferred by IUPAC). See also the description of the <ConvertDoubleBonds> action.
<Action>subsections: specific transformation actions that do not have a corresponding transformation reaction are given in theActattribute with a predefined keyword referring to the action to be performed. Note, that this configuration section is now deprecated and maintained only for backward compatibility. These actions are now replaced by their corresponding specific subsections, listed below. The available actions:- "aromatize" (see section
Aromatize), - "dearomatize" (see section
Dearomatize), - "addexplicitH" (former "hydrogenize") (see section
AddExplicitH), - "removeexplicitH" (former "dehydrogenize") (see section
RemoveExplicitH).
These actions can also be referenced from the simple action string.
- "aromatize" (see section
If Standardizer is run with the
--active-groups
parameter specified (API: setActiveGroups(String[] groups) or
setActiveGroup(String group)) then only those tasks are processed which:
- belong to at least one of the active groups, or
- belong to no groups
Groups
attribute in the configuration XML or between curly braces
in the simple action string. More groups can be specified as a
comma-separated list.
Grouping tasks can be useful in case of query and target standardization before
substructure search: some of the tasks may be required for target standardization only
(e.g. removing explicit hydrogens). Add these tasks to group "target" in the configuration
and then run Standaridzer with active groups "query" when you standardize the query structure.
In this way tasks belonging to the "target" group will be skipped.
Example
<?xml version="1.0" encoding="UTF-8"?>
<!-- Standardizer configuration file -->
<StandardizerConfiguration Version ="0.1">
<Actions>
<Action ID="aromatize" Act="aromatize"/>
<Transformation ID="PlusMinus" Structure="[*+:1][*-:2]>>[*:1]=[*:2]"/>
<Transformation ID="PlusMinusDouble" Structure="molfiles/PlusMinusDouble.mol"/>
<Transformation ID="Enamine" Structure="[H]N[C:1]=[C:2]>>[H][C:2][C:1]=N"/>
<Transformation ID="Enol" Structure="[H:4][O:3][C:1]=[C:2]>>[H:4][C:2][C:1]=[O:3]"/>
<Transformation ID="ClMinus" Structure="[Cl-]>>" Exact="true" Groups="target,g1"/>
<RemoveExplicitH ID="removeH" Charged="true" Radical="true" Mapped="true"/>
<Removal ID="keepOne" Method="keepLargest" Measure="molMass"/>
<RemoveRGroupDefinitions ID="removeRGroupDefinitions"/>
<RemoveAttachedData ID="removeAttachedData"/>
<RemoveAtomValues ID="removeAtomValues"/>
<Aromatize ID="chemaxonaromatize" Type="basic"/>
<AddExplicitH ID="addH"/>
<AliasToGroup ID="aliastogroup"/>
<AliasToAtom ID="aliastoatom"/>
<Sgroups ID="expand" Act="Expand" Exclude="Ph,Ac"/>
<ClearStereo ID="clearstereo" Type="Chirality"/>
<AbsoluteStereo ID="setstereo" Act="Set"/>
<Expand ID="stoichiometry" Data="COEFF"/>
<Dearomatize ID="dearomatize"/>
<Neutralize ID="neutralize"/>
<ClearIsotopes ID="clearisotopes"/>
<Clean Type="TemplateBased" TemplateFile="templates.mrv" ID="clean"/>
<Tautomerize ID="tautomer"/>
<Mesomerize ID="mesomer"/>
</Actions>
</StandardizerConfiguration>
The molecular files referred in the above configuration XML file contain the molecular structures displayed below. Map indices must be unique among the atoms of a given molecule. For a detailed description on the reaction definitions see the Reactor manual.
![]() |
![]() |
![]() |
![]() |
![]() |
Action string
Simple actions can be listed in an action string, replacing the configuration XML. Actions should be separated by either ".." or newline separators. The actions can also be specified in a file, each action written in a separate line. Action strings are handled case insensitive. The available simple actions are listed below:
- any reaction SMARTS (defines a transformation as a chemical reaction)
- "aromatize" (Daylight style, general)
- "aromatize:b" (ChemAxon style, basic)
- "aromatize:l" (loose)
- "dearomatize"
- "addexplicitH" (former "hydrogenize")
- "removeexplicitH[:lonely:isotope:charged:radical:mapped:wedged]" (former "dehydrogenize") (remove explicit H atoms - by default, skips specific types - if some types are listed in a ':' separated list then these H types are also removed)
- "clearisotopes" (converts isotopes to non-isotopic form)
- "neutralize" (neutralize molecule)
- "clean" (partial clean in 2D, full clean in 3D, no clean in 0D)
- "clean:full" (full clean in 2D and 3D, no clean in 0D)
- "clean:<template file>" (template based clean in 2D)
- "clean:3" (clean in 3D)
- "aliastogroup" (convert pseudo and alias atoms to S-groups)
- "aliastoatom" (convert pseudo and alias atoms to normal atoms)
- "keepone" (keep the connected fragment with largest atom count)
- "keepone:mass" (keep the connected fragment with largest molecule mass)
- "removergroupdefinitions" (remove R-group definitions)
- "removeatomvalues" (remove atom values)
- "removeattacheddata" (remove attached data)
- "sgroups:contract" (contract Sgroups)
- "sgroups:expand" (expand Sgroups)
- "sgroups:ungroup" (ungroup Sgroups)
- "clearstereo" (clear stereo, both chirality and double bond)
- "clearstereo:chirality" (clear chirality)
- "clearstereo:doublebond" (clear double bond stereo)
- "clearstereo:singleupordownbond" (single up or down bond)
- "absolutestereo:clear" (clear absolute stereo flag)
- "absolutestereo:set" (set absolute stereo flag)
- converttoenhancedstereo:abs (converts to enhanced stereo representation, unlabeled stereo atoms go into a new "abs" group)
- converttoenhancedstereo:and (converts to enhanced stereo representation, unlabeled stereo atoms go into a new "and" group)
- converttoenhancedstereo:or (converts to enhanced stereo representation, unlabeled stereo atoms go into a new "or" group)
- "wedgeclean" (rearranges stereo wedges according to the IUPAC recommendations)
- "convertwedgeinterpretation" (converts each wedge between two stereo centers into two wedges)
- "convertdoublebonds:wiggly" (converts double bonds with unspecified CIS/TRANS stereo information to wiggly representation)
- "convertdoublebonds:crossed" (converts double bonds with unspecified CIS/TRANS stereo information to crossed representation)
- removestereocarebox (remove stereo care box)
- "expand" (expand stoichiometry data, data field = "Stoichiometry")
- "expand:<datafield>" (expand stoichiometry data, stored in the specified data field)
- "tautomerize" (take canonical tautomer form)
- "mesomerize" (take canonical mesomer form)
- "mapreaction" (add atom maps to reaction)
- "map" (map atoms of a molecule or reaction)
- "unmap" (remove atom maps)
Each action can have a comma-separated list of group names between curly braces as prefix
in which case the task belongs to the listed groups. In this case the task is skipped
if the active group list is specified (command line: --active-groups, API: setActiveGroups(String[] groups) or
setActiveGroup(String group)) and none of the task groups belongs to the active group set. The group names are handled case insensitive.
This group setting corresponds to the
Groups attribute in the XML configuration.
Example:
First extract the largest fragment, then aromatize, then standardize nitro groups, finally standardize enamine groups.
keepone..aromatize:b..[O-:2][N+:1]=O>>[O:2]=[N:1]=O..[H:4][N:3][C:1]=[C:2]>>[H:4][C:2][C:1]=[N:3]
Notes
- The order of the transformations in the configuration file or in the action string is important, since the output of the first transformation will be the input of the second one: if the first action is aromatization and the second one transforms an aromatized functional group then this group may not be matched at all if we do the transformations in the reversed order, first search for the aromatized functional group and then do aromatization.
- Aromatization should be present in order to receive correct substructure search results. In most cases it is best to put it first in the configuration (see the above note).
- Some Standardizer actions ("aromatize", "aromatize:b", "dearomatize", "addexplicitH", "removeexplicitH", "wedgeclean","convertwedgeinterpretation") do not require a license key: standardization composed of these actions only can be run unlimited even without license.
Do you have a question? Would you like to learn more? Please browse among the related topics on our support forum or search the website. If you want to suggest modifications or improvements to our documentation email our support directly!





