The IPA standard considers its characters to be a separate alphabet that includes the lowercase Latin characters "a" through "z" and other symbols. IPA Latin characters are not duplicated in Unicode, so many IPA characters are found outside the standard phonetic and modifier blocks.
Because the IPA is primarily used by linguists to capture spoken language in print, IPA characters in the standard phonetic block are arranged in order of their resemblance to the Latin characters "a" through "z" and not their phonetic similarity. As a result, adjacent Unicode values may bear no relation to each other phonetically.
The following table lists groups of IPA characters and the Unicode blocks in which they can be found. The U+ prefix is a convention that identifies Unicode; they are 16-bit hexadecimal values.
IPA characters | Unicode block |
Standard Latin | U+0041 – U+00FF |
European and Extended Latin | U+0010 – U+01F0 |
Standard phonetic characters | U+0250 – U+02AF |
Modifier letters (spacing) | U+02B0 – U+02FF |
Diacritical marks (nonspacing) | U+0300 – U+036F |
The symbols used for American English phonemes are listed below. Each phoneme symbol is accompanied by an example, as well as the IPA description, the Unicode name for the glyph shape used in the IPA standard phonetic charts, and the Unicode value. Some phonemic labels are described as dipthongs or affricate clusters. For these, it may be preferable to rely on the MS labels, rather than the Unicode clusters of their component phonemes, since some TTS engines will provide single combined data points for these phonemes, rather than synthesize them as combinations of separately modeled phonemes. In the Unicode names, 'LATIN' means 'LATIN SMALL LETTER' and 'GREEK' means 'GREEK SMALL LETTER'.
MS | Example | IPA Description | Unicode name | Unicode |
iy | feel, eve, me | front close unrounded | LATIN I | U+0069 |
ih | fill, hit, lid | front close unrounded (lax) | LATIN CAPITAL I | U+026A |
ae | at, carry, gas | front open unrounded (tense) | LATIN AE | U+00E6 |
aa | father, ah, car | back open unrounded | LATIN ALPHA | U+0251 |
ah | cut, bud, up | open-mid back unrounded | LATIN TURNED V | U+028C |
ao | dog, lawn, caught | open-mid back round | LATIN OPEN O | U+0254 |
ay | tie, ice, bite | dipthong with quality: aa + ih | ||
ax | ago, comply | central close mid (schwa) | LATIN SCHWA | U+0259 |
ey | ate, day, tape | front close-mid unrounded (tense) | LATIN E | U+0065 |
eh | pet, berry, ten | front open-mid unrounded | LATIN OPEN E | U+025B |
er | turn, fur, meter | central open-mid unrounded rhoticized | LATIN SCHWA W/HOOK | U+025A |
ow | go, own, tone | back close-mid rounded | LATIN O | U+006F |
aw | foul, how, our | dipthong with quality: aa + uh | ||
oy | toy, coin, oil | dipthong with quality: ao + ih | ||
uh | book, pull, good | back close-mid unrounded (lax) | LATIN UPSILON | U+028A |
uw | tool, crew, moo | back close round | LATIN U | U+0075 |
b | big, able, tab | voiced bilabial plosive | LATIN B | U+0062 |
p | put, open, tap | voiceless bilabial plosive | LATIN P | U+0070 |
d | dig, idea, wad | voiced alveolar plosive | LATIN D | U+0064 |
t | talk, sat | voiceless alveolar plosive & | LATIN T | U+0074 |
meter | alveolar flap | LATIN R W/FISHHOOK | U+027E | |
g | gut, angle, tag | voiced velar plosive | LATIN SCRIPT G | U+0067 |
k | cut, oaken, take | voiceless velar plosive | LATIN K | U+006B |
f | fork, after, if | voiceless labiodental fricative | LATIN F | U+0066 |
v | vat, over, have | voiced labiodental fricative | LATIN V | U+0076 |
s | sit, cast, toss | voiceless alveolar fricative | LATIN S | U+0073 |
z | zap, lazy, haze | voiced alveolar fricative | LATIN Z | U+007A |
th | thin, nothing, truth | voiceless dental fricative | GREEK THETA | U+03B8 |
dh | then, father, scythe | voiced dental fricative | LATIN ETH | U+00F0 |
sh | she, cushion, wash | voiceless postalveolar fricative | LATIN ESH | U+0283 |
zh | genre, azure | voiced postalveolar fricative | LATIN EZH | U+0292 |
l | lid | alveolar lateral approximant | LATIN L | U+006C |
elbow, sail | velar lateral approximant | LATIN L W/MIDDLE TILDE | U+026B | |
r | red, part, far | retroflex approximant | LATIN R | U+0072 |
y | yacht, onion, yard | palatal sonorant glide | LATIN Y | U+006A |
w | with, away | labiovelar sonorant glide | LATIN W | U+0077 |
hh | help, ahead, hotel | voiceless glottal fricative | LATIN H | U+0068 |
m | mat, amid, aim | bilabial nasal | LATIN M | U+006D |
n | no, end, pan | alveolar nasal | LATIN N | U+006E |
nx | sing, anger, drink | velar nasal | LATIN ENG | U+014B |
ch | chin, archer, march | voiceless alveolar affricate: t + sh | U+02A7 | |
jh | joy, agile, edge | voiced alveolar affricate: d + zh | U+02a4 |
The following symbols can be used to construct phoneme strings and phonetic input to a TTS engine.
The precise effects may vary in different TTS engines.
MS | Description | Unicode name | Unicode | Usage/Effect |
- | syllable boundary | HYPHEN-MINUS | U+002D | separates syllables |
# | word boundary | NUMBER SIGN | U+0023 | separates words |
(space) | word boundary | SPACE | U+0020 | separates words |
_ | silence | UNDERLINE | U+005f | indicates silent period |
1 | primary stress | DIGIT ONE | U+0031 | follows affected vowel |
2 | secondary stress | DIGIT TWO | U+0032 | follows affected vowel |
(blank) | unstressed | DIGIT ZERO | U+0030 | follows affected vowel |
. | period | FULL STOP | U+002E | pitch fall, pause |
? | question mark | QUESTION MARK | U+003F | pitch rise, pause |
! | exclamation | EXCLAMATION MARK | U+0021 | raised pitch range, pause |
, | comma | COMMA | U+002C | continuation rise, pause |
Use the Prn control tag to indicate how to pronounce text by passing the phonetic equivalent to the engine. For information about Prn, see Appendix A, "Text-to-Speech Control Tags."
The rest of this appendix provides tables and diagrams of common phonemes and their Unicode values. For more information about IPA characters and Unicode, see the following publications:
· The Unicode Standard, Volumes 1 and 2 (Addison-Wesley).
· The Unicode Standard, Version 1.1 (The Unicode Consortium). This document can be found in the Microsoft Development Library.
· Phonetic Symbol Guide (Pullum, Geoffrey K. and Ladusaw, William A., 1986, University of Chicago Press).