The Dictionary of Carbohydrates covers the following classes of compound.
The parent monosaccharides and their important derivatives. All of the fundamental aldoses and ketoses are extensively documented. The coverage of their derivatives is extensive although not of course comprehensive. Entries have been compiled to present a wide range of derivatives of most interest in synthetic carbohydrate chemistry, including a good selection of those containing the more recently developed blocking groups.
Sugars differently derivatised at several functional groups are common in carbohydrate chemistry and these are presented as 'derivatives of derivatives', sometimes branching off into their own entries (see below).
Modified monosaccharides. Compounds such as halodeoxy-, aminodeoxy-, thio- and anhydrosugars, glycuronic acids, etc. are covered extensively.
Disaccharides. The coverage of disaccharides derived from two unmodified sugar residues is virtually complete. There is selective coverage of modified disaccharides (e.g. aminodeoxysucroses).
Tri-, oligo- and polysaccharides. The coverage of these has also been increased and concentrates on those with biochemical significance and/or industrial usage.
Alditols and cyclitols. These are extensively documented along with their most important derivatives.
Nucleosides. The number of these included has been considerably increased. The coverage centres on naturally occurring nucleosides such as those found in RNA, their analogues and other nucleosides with pronounced drug activities.
Glycoside antibiotics and related compounds. An up-to-date coverage of these is given.
Other glycosides. Some natural products containing markedly unusual sugars or unusual glycosidic linkages are included. A very large number of plant and animal glycosides is now known, and for comprehensive coverage of these the user is referred to the Dictionary of Natural Products website (www.chemnetbase.com) or CD-ROM. The glycosinolates are covered.
In compiling the printed version, on-line and the CD-ROM, the primary literature has been reviewed up to late 2004.
Organisation of Entries
The Dictionary is arranged alphabetically by entry name. Every entry is numbered to assist ready location. Many compounds are included as derivatives of main entry compounds but important derivatives have their own individual cross-referenced entries. Use of the indexes enables the rapid location of all compounds in the Dictionary by name or compound type, regardless of their location.
In most cases the stereoisomeric and ring-form (i.e. pyranose/furanose) variants of a given carbohydrate are included in the same entry, with a few exceptions (for example, glucoseptanose has its own entry).
The organisation of the entry for a typical important monosaccharide therefore follows the following scheme:
Entry Compound
D-form
Derivatives generally applicable to the D-form (e.g. derivatives of the open chain form)
D-Pyranose-form
Derivatives not specifically assigned to either the α- or β-anomer α-D-Pyranose-form
Derivatives
β-D-Pyranose-form
Derivatives
D-Furanose-form
Derivatives not specifically assigned to either the α- or β-anomer α-D-Furanose-form
Derivatives β-D-Furanose-form
Derivatives L-form
etc.
A representative dictionary entry is shown in Fig. 1
Fig. 1
Chemical Names and Synonyms; Nomenclature of Carbohydrates
The Dictionary contains a wide range of synonyms which may be (a) those found in the primary literature, (b) Chemical Abstracts names, or (c) names added editorially to achieve as much consistency as possible with other closely related substances. Names corresponding to those used by CAS during the 9th and subsequent Collective Index periods (1973-) are labelled 9CI (there have been no substantial changes in CAS nomenclature for most carbohydrates since 1973). Names used during the 8th Collective Index period (1967-1972) are labelled 8CI. All important derivatives embedded within entries are named (but see comment on CAS nomenclature below).
The most authoritative current statement of good practice on carbohydrate nomenclature is the document IUPAC/IUBMB Joint Commission on Biochemical Nomenclature of Carbohydrates Recommendations, (Pure Appl. Chem., 1996, 68, 1919) the full text of which can be read on the Dictionary of Carbohydrates.
For the majority of carbohydrates documented in the dictionary there is full agreement between the IUPAC/IUBMB recommendations and current practice by the vast majority of authors active in carbohydrate science, and therefore no difficulty in deciding on the entry name for the majority of compounds in the Dictionary. However, a number of complications arise at a more detailed level.
The complexity of the names of carbohydrate derivatives containing a variety of different substituents means that trivial errors in naming them according to the correct IUPAC alphabetical rules are quite widespread in the primary literature. In general, these errors have been corrected in the Dictionary and the incorrect versions are not given as synonyms.
IUPAC rules for nomenclature of a few types of carbohydrate contain certain features that differ from those for general organic compounds. For carbohydrates which do not possess a functional group to receive the locant 1 such as alditols, two names are possible. In such cases precedence is given to the name deriving from the parent structure coming first in alphabetical order, then to assignment to the D- rather than L-series; these rules take precedence over the usual IUPAC numbering principle of lowest possible locants (rule 2-Carb-2-1-3). This is illustrated by the following examples.
The rule can lead to names which are less than user friendly. D-Galactal, for example, is 2,6-anhydro-5-deoxy-D-arabino-hex-5-enitol rather than 1,5-anhydro-2-deoxy-D-lyxo-hex-1-enitol which is more intuitive and would be the choice of workers in the field (in the Dictionary all three names are quoted).
Not all authors follow strict IUPAC principles. In this Dictionary the policy has been to follow IUPAC in the choice of the entry name for the compound, but to give the 'incorrect' forms as synonyms. Such entries are often given notes explaining the nomenclature situation with a view to eliminating as much confusion as possible.
The -CHO (or potential CHO) group in aldoses (but not the CO group in ketoses) takes numbering precedence over any other group in the molecule, including groups such as -COOH which in general organic nomenclature are senior to -CHO.
Complexities arise at the interface between carbohydrate and aliphatic chemistry, where it is frequently possible to name compounds either as modified sugars or as aliphatic structures such as lactones or furans. The transition between the two systems is not rigidly defined by IUPAC and compounds treated in Chemical Abstracts as sugars may be named by other authors as aliphatic compounds, and vice-versa. The numbering scheme invariably changes and so does the preferred system for denoting configurations, meaning that great care is necessary to avoid errors. The following example is typical:
In this Dictionary, compounds containing three or more chiral centres are usually named and numbered as carbohydrates, while the majority of compounds containing one or two chiral centres are named and numbered as aliphatics(Lichtenthaler, F.W., et.al, Annalen,, 1989, 1153). The alternative schemes are usually given as synonyms.
Although Chemical Abstracts names for carbohydrates follow IUPAC principles, the CAS names for some common groups found in carbohydrate derivatives do not correspond to those used by the majority of carbohydrate chemists as given in this Dictionary. For example;
Dictionary name
CAS name
Isopropylidene
1-Methylethylidene
Benzylidene
Phenylmethylene
In the majority of cases, the CAS alternative is not given as a synonym in the Dictionary entry.
CAS Registry Numbers
CAS numbers are identifying numbers allocated to each distinctly definable chemical substance indexed by CAS since 1965 (plus some retrospective allocation of numbers by CAS to compounds from earlier index periods). The numbers have no chemical significance but they provide a label for each substance independent of any system of nomenclature. They are extensively used for exchanging information between individuals and databases. The numbers take the form NNNNNN-NN-R, where the total number of digits is five or more and R is a check digit.
For practical purposes, CAS numbers have certain shortcomings arising from their free allocation, resulting in one substance having more than one potential number. Duplication may arise for one of several reasons to do with the detailed chemistry of the substance, for example tautomerism, solvent formation, partially unspecified stereochemistry. There are also replaced numbers. For this reason, Carbohydrates entries will often contain one or more Additional CAS numbers which may help the user to obtain further information about the substance, especially by online searching.
Clearly, the additional CAS numbers given in Carbohydrates have to be used with care. Their inclusion in the entry is the result of an editorial decision by the Carbohydrates contributor that they refer to what is essentially the same substance, but this decision may be a subjective one. Care has been taken to ensure that the main CAS number given in Carbohydrates for each substance is the correct one.
Further information on CAS number allocation policy can be obtained from CAS indexes or The Organic Chemist's Desk Reference.
Structural formulae
Every attempt has been made to present the structures of chemical substances as accurately as possible according to best current practice and recommendations of IUPAC (The International Union of Pure and Applied Chemistry). As much consistency as possible has been aimed at between closely-related structures. For example, all sugars are shown as Haworth formulae, and whenever possible in complex structures the rings are oriented in the standard Haworth convention so that structural comparisons can be quickly made.
Molecular formula and molecular weight
The elements in the molecular formula are given according to the Hill convention (C, H, then other elements in alphabetical order). The molecular weights given are formula weights (or more strictly, molar masses in daltons) and are rounded to three places in decimals. In the case of some high molecular mass substances, such as proteins, the value quoted may be that taken from an original literature source and may be an aggregate molar mass.
Physical data
Carbohydrates gives the following physical characteristics of substances, when available; appearance, melting point, boiling point, optical rotation, density, refractive index, solubility, pKa. All of these fields are searchable by numerical value (including range searching) in the CD-ROM version of Carbohydrates.
Appearance
Organic compounds are considered to be colourless unless otherwise stated. Where the compound contains a chromophore which would be expected to lead to visible colour, but no colour is mentioned in the literature, the Carbohydrates entry will mention this fact if it has been noticed by the contributor. An indication of crystal form and of recrystallisation solvent is often given but these are imprecise items of data; most compounds can be crystallised from several solvent systems and the crystal form often varies. In the case of the small number of compounds where crystal behaviour has been intensively studied (e.g. pharmaceuticals), it is found that polymorphism is a very common phenomenon and there is no reason to believe that it is not widespread among organic compounds generally.
Melting points and boiling points
The policy followed in the case of conflicting data is as follows:
Where the literature melting points are closely similar, only one figure (the highest or most probable) is quoted
Where two or more melting points are recorded and differ by several degrees (the most likely explanation being that one sample was impure) the lower figure is given in parentheses, thus Mp 139° (134-135°)
Where quoted figures differ widely and some other explanation such as polymorphism or incorrect identity seems the most likely explanation, both figures are quoted without parentheses, thus Mp 142°, Mp 205-206°
Known cases of polymorphism or double melting points are noted
Boiling point determination is less precise than that of melting points and conflicting boiling point data is not usually reported except when there appears to be a serious discrepancy between the different authors.
Optical rotations
These are given wherever possible, and normally refer to what the Carbohydrates contributor believes to be the best characterised sample of highest chemical and optical purity. Where available an indication of the optical purity (op) or enantiomeric excess (ee) of the sample measured follows the specific rotation value.
Specific rotations are dimensionless numbers and the degree sign which was formerly universal in the literature has been discontinued.
Densities and refractive indexes
Densities and refractive indexes are now of less importance for the identification of liquids than has been the case in the past, but are quoted for common or industrially important substances, or where no boiling point can be found in the literature.
Densities and refractive indexes are not quoted where the determination appears to refer to an undefined mixture of stereoisomers.
Solubilities
Solubilities are given only where the solubility is unusual for a compound. Typical organic compounds are soluble in the usual organic solvents such as ether and chloroform, and virtually insoluble in water. The presence of polar groups (OH, NH2, and especially COOH, SO3H and NR3+) increases water solubility.
pKa values
pKa values are given for both acids and bases. The pKb of a base can be obtained by subtracting its pKa from 14.17 (at 20°) or from 14.00 (at 25°).
Spectroscopic data
Many Carbohydrates entries include ultraviolet spectra which are presented in the format:
where ε is the absorption coefficient for a given UV maxima value (λmax). A description of the solvent conditions
used, if reported in the literature, is listed at the beginning and end of the UV data in parentheses. All peak
absorptions cited are maxima unless otherwise described, e.g. shoulder/inflection (sh) and end absorption (end).
In addition, UV data may be followed by the term 'Berdy' or 'DEREP' indicating from which database the data
originated. The absence of these terms implies that the data were abstracted from the primary literature.
On the on-line version, all the λmax values are indexed in the UV Maxima field and can be searched for
numerically including range searching. Similarly, the solvent data associated with the UV data are indexed in the UV Solvent field.
Hazard and toxicity information
General
Toxicity and hazard information is highlighted by the symbol and has been selected to assist
in risk assessments for experimental, manufacturing and manipulative procedures with chemicals.
Physical, reactive and toxic properties all contribute to the hazard associated with a particular
chemical. As part of the physical data, flash points, explosive limits and autoignition temperatures have been
included where appropriate. Flammable classifications, which are based on flash point measurements and boiling points,
are also mentioned, and the opportunity has been taken to include UK occupational exposure limits, or for some compounds
threshold limit values published by the American Conference of Governmental Industrial Hygienists (ACGIH).
For the reactive hazards, a brief comment is made on any explosive (or violent polymerisation) properties and aspects of the
chemical reactivity of a substance which are of concern. These include the potential for peroxidation, oxidising/reducing properties
and incompatibility with commonly available chemicals. Toxicity information has been chosen to show hazardous effects from short-term
or long-term exposure. Observations from human exposure are summarised if available (including possible adverse effects of drugs),
otherwise experimental (exp.) tests are quoted. Included in the toxicity data are the results of irritancy tests, acute lethality
data, target organ toxicity, and carcinogenic and reproductive properties where appropriate. Those chemicals which have
been classified by the International Agency for Research on Cancer (IARC) as human carcinogens, probable human carcinogens or
possible human carcinogens have been identified in Carbohydrates accordingly.
The Publishers cannot be held responsible for any inaccuracies in the reported information, neither does the omission of hazard data
in the Dictionary imply an absence of this data from the literature. Widely recognised hazards are included however, and where
possible key toxicity reviews are identified in the references. Further advice on the storage, handling and disposal of
chemicals is given in The Organic Chemist's Desk Reference.
Finally, it should be emphasised that any chemical has the potential for harm if it is carelessly used.
Many entries in DIOC contain one or more RTECS® Accession Numbers.
Possession of these numbers allows users to locate toxicity information on
relevant substances from the NIOSH Registry of Toxic Effects of Chemical
Substances. The Registry is a compendium of toxicity data extracted from the
scientific literature and each substance is identified by a unique
nine-character alphanumeric RTECS® Accession Number.
For each Accession Number, the RTECS database provides the following
data where available: substance prime name and synonyms; update data;
CAS registry number; molecular weight and formula; reproductive,
tumorigenic and toxic dose data; citations to aquatic toxicity ratings, IARC
reviews, ACGIH Threshold Limit Values, toxicological reviews, existing
Federal standards, the NIOSH criteria document program for recommended
standards, the NIOSH current intelligence program, the NCI Carcinogenesis
Testing Program and the EPA Toxic Substances Control Act inventory. Each
data line and citation is referenced to the source from which the information
was extracted.
Bibliographic References
The selection of references is made with the aim of facilitating entry into the literature for the user who wishes to locate more detailed information about a particular compound.
The contents of most references are indicated by reference tags (suffixes) indicating their content and in particular the stereoisomers and derivatives of the parent compound which they document. Many carbohydrates are documented in several different literature references,
and unless there are marked differences in their reported physical properties or syntheses, recent and accessible references are preferred to older and/or less accessible ones. The number of references cited does not indicate the relative importance of a compound;
one key recent citation may supersede a number of older ones.
Journal abbreviations generally follow the practice of the Chemical Abstracts Service Source Index (CASSI), except for a short list of very well known journals where the Dictionary gives shorter abbreviations to save space (e.g. J.A.C.S. instead of J. Am. Chem. Soc.)
Further References
Further useful information on a variety of topics concerned with the structure, description, stereochemistry and nomenclature of organic compounds, including carbohydrates
can be found in the Organic Chemist's Desk Reference (Chapman & Hall, 1995).
Abbreviations
The following is a selection of the most common Database abbreviations used:
Table 1. Abbreviations
Abbreviation
Meaning
[α]
specific
rotation
acac
acetylacetonato
Ac
acetyl
ACGIH
American
Conference of Governmental Industrial Hygienists
Ac2O
acetic
anhydride
AcOH
acetic
acid
ADI
Acceptable
Daily Intake
alk.
alkaline
amorph.
amorphous
ANSI
American
National Standards Institute
anhyd.
anhydrous
approx.
approximately
aq.
aqueous
asym.
asymmetrical,
unsymmetrical
B
base
BAN
British
Approved Name
biol.
biological
bipy
2,2ยข-bipyridine
Bp
boiling
point
br
broad
BSI
British
Standards Institution
Bu
butyl
(But for tert-butyl etc.)
bwd
bird
(wild)
Bz
benzyl
c.
concentration
ca.
(circa)
about
CAS
Chemical
Abstracts Service
Ccp
cubic
close packed
cdt
1,5,9-cyclododecatriene
C6H6
benzene
C5Me5
pentamethylcyclopentadienyl
CNS
central
nervous system
cod
1,5-cyclooctadiene
col.
colour,
coloration
comly.
commercially
compd(s)
compounds(s)
conc.
concentrated
const.
constant
constit.
constituent
coord
coordinate(d),
coordination
cot
1,3,5,7-cyclooctatetraene
Cp
cyclopentadienyl
C5Ph5
pentaphenylcyclopentadienyl
cryst.
crystal(s)
cv
cultivar
CVD
chemical
vapour deposition
Cy
cyclohexyl
d
density
dba
dibenzylideneacetone
dck
duck
dec.
decomposes,
decomposition
degradn.
degradation
depe
1,2-bis(diethylphosphino)ethane
descr.
described
diars
diarsine
(generalised ligand)
dil.
dilute,
dilution
dimorph.
dimorphic
diphos
diphosphine
(generalised ligand)
diss.
dissolves,
dissolved
dissoc.
dissociates
dist.
distil,
distillation
DMA
dimethylacetamide
DMF
dimethylformamide
dmpe
1,2-bis(dimethylphosphino)ethane
dmpm
bis(dimethylphosphino)methane
DMSO
dimethyl
sulfoxide
dppe
1,2-bis(diphenylphosphino)ethane
dppm
bis(diphenylphosphino)methane
dppp
1,3-bis(diphenylphosphino)propane
EDTA
ethylenediaminetetracetate(4-)
ee
enantiomeric
excess
Eg
band
gap (electron volts)
en
ethylenediamine
equilib.
equilibrium
esp.
especially
Et
ethyl
EtOAc
ethyl
acetate
EtOH
ethanol
EtOH
aq.
aqueous ethanol
evapn.
evaporation
exp.
exposure
exp.
experimental
fac
facial
Fc
ferrocenyl
fl.
p.
flash
point
fluor.
fluoresces,
fluorescence
formn.
formation
Fp
freezing
point
g
gram(s)
ΔG0f
standard
free energy of formation
Glc
β-D-glucopyranosyl
gpg
guinea
pig
ham
hamster
ΔH0f
standard
enthalpy of formation
hcp
hexagonal
close packed
hydrol.
hydrolyses,
hydrolysed, hydrolysis
ihl
inhalation
im
imidazolato
ims
intramuscular
INN
International
Non-proprietary Name
inorg.
inorganic
insol.
insoluble
intermed.
intermediate
ipr
intraperitoneal
ISO
International
Standards Organisation
Ivg
intravaginal
ivn
intravenous
JAN
Japanese
Accepted Name
JMAF
Japanese
Ministry for Agriculture, Forestry and Fisheries
K
temperature
(Kelvin)
L
generalised
ligand
LC
lethal
concentration
LD
Lethal
dose; LD50: a dose which is lethal to 50% of the animals
tested
M
relative
molecular mass (formula weight)
M
metal
m
medium
mcd
magnetic
circular dichroism
Me
methyl
MEL
maximum
exposure limit
MeOH
methanol
mer
meridional
mes
mesityl
(1,3,5-trimethylphenyl)
Me2CO
acetone
misc.
miscible
misc.
miscellaneous
mixt.
mixture
mky
monkey
MOCVD
metal-organic
chemical vapour deposition
mod.
moderately
Mp
melting
point
mus
mouse
n
index
of refraction eg. (n20D for 20° and sodium
light).
Nbd
norbornadiene
nqr
nuclear
quadrupole resonance spectrum
obt.
obtained
oc
open
cup
oep
octaethylporphyrinato
OES
occupational
exposure standard
Oh
octahedral
op
optical
purity
org.
organic
orl
oral
ox
oxalato
Ph
phenyl
(C6H5)
pH
Measure
of soln. acidity where pH = log10 (1/[H+]) where
[H+] is the hydrogen ion
Phen
1,10-phenanthroline
phys.
physical
pK
Measure
of dissoc. const. (K) where pK = Log10(1/K)
Pm
picometres
(10?12 m)
PMDET
pentamethyldiethylenetriamine
polarog.
polarography
polym.
polymerised,
polymerisation
ppm
parts
per million
Pr
propyl
(Pri for isopropyl)
prob.
probably
purifn.
purification
Py
pyridine
pz
pyrazolato
R
generalised
alkyl group
rbt
rabbit
ref.
reference
rel.
relative(ly)
r.t.
room
temperature
s
strong
S0
standard
entropy
scu
subcutaneous
skn
skin
sl.
slightly
sol.
soluble
soln(s)
solution(s)
solv(s)
solvent(s)
soly.
solubility
sp.
species
(singular)
spar.
sparingly
spp.
species
(plural)
ssp.
subspecies
subl.
sublimation,
sublimes
tbp
triagonal
bipyramidal
Td
tetrahedral
Tf
triflate
THF
tetrahydrofuran
tht
tetrahydrothiophene
TLV
Threshold
Limit Value
TMED
tetramethylethylenediamine
tpp
tetraphenylporphyrinato
triphos
triphosphine
(generalised ligand)
Ts
tosyl
μeff
effective
magnetic moment (in Bohr magnetons μB)
unsatd.
unsaturated
USAN
United
States Adopted Name
Uv
ultraviolet
spectrum
v.
very
var.
variety
vis.
visible
vol.
volume
w
weak
WSSA
Weed
Science Society of America
X
generalised
anion, usually halide
Table 2. Reference tags
The following is a selection of the most common reference tags that
are used
Abbreviation
Meaning
abs
config absolute configuration
anal
analysis
bibl
bibliography
biodistribn
biodistribution
biosynth
biosynthesis
cd
circular dichroism
chromatog
chromatography
cmr
13C nuclear magnetic resonance spectrum
config
configuration
conformn
conformation
cryst struct
X-ray crystal structure determination
deriv(s)
derivative(s)
detn
etermination, detection
epr
electron paramagnetic (spin) resonance spectrum
glc
gas-liquid chromatography
haz
hazard
hplc
high performance liquid chromatogrpahy
ir
infrared spectrum
isol
isolation
isom
isomerism
manuf
manufacture
metab
metabolism
ms
mass spectrum
nmr
nuclear magnetic resonance spectrum
occur
occurrence
ord
optical rotatory dispersion
pharmacol
pharmacology
pmr
proton (1H) nuclear magnetic resonance spectrum
props
properties (chemical or physical)
resoln
resolution
rev
review
sepn
separation
spectra
struct
structure
synonyms
synth
synthesis
tautom
tautomerism
tlc
thin layer chromatography
tox
toxicity
use(s)
uv
ultra-violet visible spectrum
*RTECS® Accession Numbers are compiled and distributed by the
National Institute for Occupational Safety and Health Service of the
U.S. Department of Health and Human Services of the United States of
America. All rights reserved (1996