Unicode Values for IPA Characters

The IPA standard considers its characters to be a separate alphabet that includes the lowercase Latin characters "a" through "z" and other symbols. IPA Latin characters are not duplicated in Unicode, so many IPA characters are found outside the standard phonetic and modifier blocks.

Because the IPA is primarily used by linguists to capture spoken language in print, IPA characters in the standard phonetic block are arranged in order of their resemblance to the Latin characters "a" through "z" and not their phonetic similarity. As a result, adjacent Unicode values may bear no relation to each other phonetically.

The following table lists groups of IPA characters and the Unicode blocks in which they can be found. The U+ prefix is a convention that identifies Unicode; they are 16-bit hexadecimal values.

IPA characters

Unicode block

Standard Latin

U+0041 – U+00FF

European and Extended Latin

U+0010 – U+01F0

Standard phonetic characters

U+0250 – U+02AF

Modifier letters (spacing)

U+02B0 – U+02FF

Diacritical marks (nonspacing)

U+0300 – U+036F


The symbols used for American English phonemes are listed below. Each phoneme symbol is accompanied by an example, as well as the IPA description, the Unicode name for the glyph shape used in the IPA standard phonetic charts, and the Unicode value. Some phonemic labels are described as dipthongs or affricate clusters. For these, it may be preferable to rely on the MS labels, rather than the Unicode clusters of their component phonemes, since some TTS engines will provide single combined data points for these phonemes, rather than synthesize them as combinations of separately modeled phonemes. In the Unicode names, 'LATIN' means 'LATIN SMALL LETTER' and 'GREEK' means 'GREEK SMALL LETTER'.

MS

Example

IPA Description

Unicode name

Unicode

iy

feel, eve, me

front close unrounded

LATIN I

U+0069

ih

fill, hit, lid

front close unrounded (lax)

LATIN CAPITAL I

U+026A

ae

at, carry, gas

front open unrounded (tense)

LATIN AE

U+00E6

aa

father, ah, car

back open unrounded

LATIN ALPHA

U+0251

ah

cut, bud, up

open-mid back unrounded

LATIN TURNED V

U+028C

ao

dog, lawn, caught

open-mid back round

LATIN OPEN O

U+0254

ay

tie, ice, bite

dipthong with quality: aa + ih

ax

ago, comply

central close mid (schwa)

LATIN SCHWA

U+0259

ey

ate, day, tape

front close-mid unrounded (tense)

LATIN E

U+0065

eh

pet, berry, ten

front open-mid unrounded

LATIN OPEN E

U+025B

er

turn, fur, meter

central open-mid unrounded rhoticized

LATIN SCHWA W/HOOK

U+025A

ow

go, own, tone

back close-mid rounded

LATIN O

U+006F

aw

foul, how, our

dipthong with quality: aa + uh

oy

toy, coin, oil

dipthong with quality: ao + ih

uh

book, pull, good

back close-mid unrounded (lax)

LATIN UPSILON

U+028A

uw

tool, crew, moo

back close round

LATIN U

U+0075

b

big, able, tab

voiced bilabial plosive

LATIN B

U+0062

p

put, open, tap

voiceless bilabial plosive

LATIN P

U+0070

d

dig, idea, wad

voiced alveolar plosive

LATIN D

U+0064

t

talk, sat

voiceless alveolar plosive &

LATIN T

U+0074

meter

alveolar flap

LATIN R W/FISHHOOK

U+027E

g

gut, angle, tag

voiced velar plosive

LATIN SCRIPT G

U+0067

k

cut, oaken, take

voiceless velar plosive

LATIN K

U+006B

f

fork, after, if

voiceless labiodental fricative

LATIN F

U+0066

v

vat, over, have

voiced labiodental fricative

LATIN V

U+0076

s

sit, cast, toss

voiceless alveolar fricative

LATIN S

U+0073

z

zap, lazy, haze

voiced alveolar fricative

LATIN Z

U+007A

th

thin, nothing, truth

voiceless dental fricative

GREEK THETA

U+03B8

dh

then, father, scythe

voiced dental fricative

LATIN ETH

U+00F0

sh

she, cushion, wash

voiceless postalveolar fricative

LATIN ESH

U+0283

zh

genre, azure

voiced postalveolar fricative

LATIN EZH

U+0292

l

lid

alveolar lateral approximant

LATIN L

U+006C

elbow, sail

velar lateral approximant

LATIN L W/MIDDLE TILDE

U+026B

r

red, part, far

retroflex approximant

LATIN R

U+0072

y

yacht, onion, yard

palatal sonorant glide

LATIN Y

U+006A

w

with, away

labiovelar sonorant glide

LATIN W

U+0077

hh

help, ahead, hotel

voiceless glottal fricative

LATIN H

U+0068

m

mat, amid, aim

bilabial nasal

LATIN M

U+006D

n

no, end, pan

alveolar nasal

LATIN N

U+006E

nx

sing, anger, drink

velar nasal

LATIN ENG

U+014B

ch

chin, archer, march

voiceless alveolar affricate: t + sh

U+02A7

jh

joy, agile, edge

voiced alveolar affricate: d + zh

U+02a4


The following symbols can be used to construct phoneme strings and phonetic input to a TTS engine.

The precise effects may vary in different TTS engines.

MS

Description

Unicode name

Unicode

Usage/Effect

-

syllable boundary

HYPHEN-MINUS

U+002D

separates syllables

#

word boundary

NUMBER SIGN

U+0023

separates words

(space)

word boundary

SPACE

U+0020

separates words

_

silence

UNDERLINE

U+005f

indicates silent period

1

primary stress

DIGIT ONE

U+0031

follows affected vowel

2

secondary stress

DIGIT TWO

U+0032

follows affected vowel

(blank)

unstressed

DIGIT ZERO

U+0030

follows affected vowel

.

period

FULL STOP

U+002E

pitch fall, pause

?

question mark

QUESTION MARK

U+003F

pitch rise, pause

!

exclamation

EXCLAMATION MARK

U+0021

raised pitch range, pause

,

comma

COMMA

U+002C

continuation rise, pause


Use the Prn control tag to indicate how to pronounce text by passing the phonetic equivalent to the engine. For information about Prn, see Appendix A, "Text-to-Speech Control Tags."

The rest of this appendix provides tables and diagrams of common phonemes and their Unicode values. For more information about IPA characters and Unicode, see the following publications:

· The Unicode Standard, Volumes 1 and 2 (Addison-Wesley).

· The Unicode Standard, Version 1.1 (The Unicode Consortium). This document can be found in the Microsoft Development Library.

· Phonetic Symbol Guide (Pullum, Geoffrey K. and Ladusaw, William A., 1986, University of Chicago Press).