W:AA

K:SH

K

OW

NG:K

AE

SH

JH

Z

AH:L

HH

Y:IH

W:AA

D

AH:M

AA

Y:UW

AH

ZH

DH

JH

R

IH

F

ER

UH

Y

F

W

OY

AO

HH:W

T:S

K:S

Y:AE

Z

AH:L

G:Z

B

IY

EY

L

R

G:ZH

UW

Y

AH:W

NG

DH

B

ZH

AW

Y:AE

SH

T:S

CH

TH

AE

AY

UH

AE

NG

S

Y:UH

AH:R

TH

T

AY

M

IH

Y:UW

G:ZH

Y:UW

K:S

IY

N

T

IH

Y

AH:M

K:SH

EY

V

T:TH

G:Z

AH:M

W:EY

AH:W

K:S

B

Y:AE

DH

Y:UW

SH

T

AE

R

AH:M

G

AO

OW

AW

M

W

K:S

OY

AH

G:Z

AA

AH:L

Y:AH

K:S

NG

EY

S

JH

L

EH

V

N

OY

JH

G:ZH

DH

R

Z

AH

T

AH:M

EH

W:AA

B

K:S

R

SH

Y:AE

AH

W

HH

JH

AH:L

AE

OW

CH

W

F

W:EY

K:SH

ER

IH

AH

AE

G:Z

Y

IY

AW

D

NG:K

G:Z

V

Y:UH

EH

OW

Y:UW

B

OY

DH

T:S

AH

HH:W

OW

AH:M

N

Y:AE

Y:AH

HH

K

AE

AY

AH:L

B

K:S

IH

AH:R

L

SH

M

CH

Y:IH

IY

W:EY

W:AA

T

TH

AH:W

Y:IH

G:Z

K:SH

AH:M

ER

AH:R

T:TH

EY

AH:L

Y:UH

M

AH:M

K

UW

AH:L

K:SH

L

AO

F

EH

AW

ZH

T:S

DH

G

R

L

AH:R

F

HH:W

AE

K:S

NG

V

IH

W

OW

UW

K:S

UH

T:TH

Y:AE

Y:EH

IY

AH:W

IH

NG

CH

AH:L

HH:W

OY

W:EY

N

K:S

AH:R

Z

AY

Linguistics

Phonemes and graphemes -
the sounds and spellings of English

What do all these words have in common?
away
they
sleigh
cafe
matinee
cabaret
Each word rhymes by ending in the same sound, or phoneme, called AY. In this case, the phoneme is a vowel more commonly known as "long a".
aw-ay
th-ey
sl-eigh
caf-e
matin-ee
cabar-et
However, the groups of letters that produce the sound AY are all different. A single letter or group of letters that produces a phoneme is called a grapheme. In the examples, the graphemes are ay, ey, eigh, e, ee, and et.
Now, what do these words have in common?
cat
cello
ceiling
oceanic
accountancy
Each word contains the same grapheme, c. This single-letter grapheme, however, is responsible for different sounds, or phonemes, in each word.
c-at
c-ello
c-eiling
o-c-eanic
accountan-c-y
Each highlighted c corresponds to a different phoneme (K, CH, S, SH, T:S). In this case, the phonemes in blue are consonants. And the purple phoneme,T:S, is a consonant diphoneme, or merger of 2 phonemes.
In the previous examples, we saw a phoneme can map to multiple graphemes:
AY  â†’  ay,  ey,  eigh,  e,  ee,  et
Vice versa, we saw a grapheme can map to multiple phonemes:
c  â†’   K,  CH,  S,  SH,  T:S
But where does this knowledge come from? How many phonemes are there? How many graphemes? And how many mappings are there between graphemes and phonemes?
To answer these questions, I looked at ~35K English words from the CMU Pronouncing Dictionary. Its data helps break words into their graphemes and phonemes, like the word box:

Graphemes

B

|

O

|

X

Phonemes

B

|

AA

|

K:S

Silent letters, like the c in czar, become part of the nearest sound-producing grapheme.

Graphemes

CZ

|

A

|

R

Phonemes

Z

|

AA

|

R

39 Phonemes
CMU modified ARPAbet's alphabet of sounds, resulting in 39 phonemes: 15 vowels & 24 consonants. It also has 21 diphonemes (2 phonemes merged).
All
NaN
AH
21,686
N
16,694
IH
15,992
T
15,950
S
13,095
L
12,966
R
12,664
K
10,597
IY
9,388
D
8,904
ER
7,750
M
7,385
EH
7,091
P
7,072
AE
5,944
B
5,130
AA
4,846
EY
4,434
F
4,134
OW
3,486
AY
3,460
NG
3,224
V
3,149
SH
3,080
AO
2,880
G
2,784
Z
2,428
W
2,036
JH
2,020
UW
1,829
HH
1,751
AH:L
1,308
CH
1,188
AW
888
TH
849
Y:UW
673
K:S
666
UH
528
Y
385
OY
323
Y:AH
290
DH
243
AH:M
231
ZH
193
HH:W
104
Y:UH
86
G:Z
77
AH:W
29
W:AH
21
K:SH
19
W:AA
11
T:S
7
W:EY
4
NG:K
3
Y:EH
3
G:ZH
3
Y:AE
2
Y:IH
2
AH:R
1
T:TH
1
The numbers show how many times a phoneme appears in the 35K CMU words. Some phonemes appear twice or more in the same word, like the N in banana.
275 Graphemes
From the mappings in the ~35K English words, there are 275 graphemes: 33 vowel graphemes, 101 consonant graphemes and 141 mixed.
All
NaN
I
19,637
A
18,757
N
16,101
T
14,235
E
14,051
O
12,537
R
11,973
L
11,289
S
10,817
C
8,315
D
7,580
M
7,001
P
6,508
U
6,407
Y
5,232
ER
4,923
B
4,919
G
3,446
F
3,175
NG
2,638
V
2,374
H
1,727
LL
1,616
TI
1,529
K
1,427
W
1,390
OU
1,368
LE
1,327
SS
1,304
EA
1,164
TE
1,155
ED
1,153
TH
1,068
CH
1,033
SH
1,014
OR
896
CK
771
VE
764
CE
762
X
726
OO
716
EE
641
AI
622
SE
548
TT
542
NE
525
UR
520
OW
505
PH
503
AR
462
RE
452
J
430
Z
428
GE
411
PP
408
Q
383
FF
358
AU
354
RR
348
DE
331
AY
315
NN
303
IE
302
MM
297
WH
290
ZE
271
IR
244
OA
243
OI
224
ME
223
GG
203
IGH
196
KE
190
DD
189
EY
180
AW
174
CI
166
BB
162
NT
150
SI
146
TCH
146
CA
143
EW
140
PE
135
SC
133
CC
124
EI
118
URE
113
OY
112
SSI
98
EU
93
WR
87
URR
86
UE
83
KN
74
GN
72
ST
71
DGE
64
GU
61
UI
57
EAR
57
ZZ
56
AE
55
OE
52
OUR
51
OUGH
51
DG
47
FE
46
BE
46
GH
45
LK
45
EIGH
44
MB
42
PS
40
TTE
39
GI
38
QUE
36
ORR
36
XC
34
RH
34
ERR
33
GUE
32
CHE
30
THE
29
QU
28
LM
26
EYE
24
AH
23
ET
22
AUGH
21
SCH
21
HO
21
DJ
20
ARR
20
SHI
20
EUR
19
CQ
17
ND
17
IA
17
BT
17
CT
16
MN
16
EAU
16
AA
14
ON
14
PT
14
YE
14
YR
14
HI
13
LLE
13
SCI
13
FT
12
UO
11
SSE
11
EO
11
PA
10
KH
10
XI
10
NNE
10
HL
10
FFE
10
SW
9
HER
9
CCH
7
LF
7
HAU
7
OH
7
SCE
6
AUR
6
IS
6
ORRH
6
PPE
6
HOU
6
HU
6
AIGH
6
NM
5
AWE
5
IRR
5
UY
5
OT
5
LD
5
MME
5
EWE
5
PN
5
SLE
4
LVE
4
DI
4
AO
4
TW
4
CHT
4
IEU
3
ERE
3
STH
3
ULL
3
EAUX
3
RRE
3
RPS
3
GM
3
HA
3
NGUE
3
HEI
3
SL
3
IU
2
AER
2
ARRH
2
XE
2
DA
2
OIS
2
UGH
2
UOY
2
UA
2
OL
2
MBE
2
OUP
2
PB
2
OAR
2
AGH
2
CZ
2
UT
2
IL
2
DH
2
VV
2
EH
2
IT
2
OWE
2
GL
2
IO
2
NGE
2
TG
2
TS
2
OS
1
AWR
1
AYE
1
AT
1
AS
1
BH
1
IN
1
AIS
1
GNE
1
IRRH
1
OLO
1
ZI
1
OUE
1
NGH
1
ALL
1
ELL
1
XS
1
SZ
1
LV
1
UH
1
JH
1
UER
1
JU
1
KK
1
ES
1
URRE
1
YRRH
1
EOU
1
IER
1
AIT
1
PF
1
NK
1
OOH
1
RRH
1
UEUE
1
RT
1
EZ
1
OUS
1
OUX
1
PPH
1
PHO
1
BP
1
AL
1
LLA
1
EAH
1
The numbers show how many times a grapheme appears in the corpus. Some graphemes also appear twice or more in the same word, again like the N in banana.
Phonemes ↔ Graphemes
The graphic on the following page lets you dive more into all the unexpected mappings between sounds and letters in English.
You can discover oddities like the phoneme P mapping to the bizzare grapheme GH in hiccough. The OED labeled this folk etymology relic "a mere error," yet its use continues.
Following this intro to linguistics, I'll make a couple more articles diving into alternate spellings and the poetic features of words. Lastly, if you're interested in the data, check out the accompanying Github repo.
Phonemes ↔ Graphemes
Phoneme
Count
Graphemes

AH

21,686

aoue

N

16,694

n

IH

15,992

i

T

15,950

t

S

13,095

s

L

12,966

l

R

12,664

r

K

10,597

c

IY

9,388

y

D

8,904

d

ER

7,750

er

M

7,385

m

EH

7,091

e

P

7,072

p

AE

5,944

a

B

5,130

b

AA

4,846

EY

4,434

F

4,134

OW

3,486

AY

3,460

NG

3,224

V

3,149

SH

3,080

AO

2,880

G

2,784

Z

2,428

W

2,036

JH

2,020

UW

1,829

HH

1,751

AH:L

1,308

CH

1,188

AW

888

TH

849

Y:UW

673

K:S

666

UH

528

Y

385

OY

323

Y:AH

290

DH

243

AH:M

231

ZH

193

HH:W

104

Y:UH

86

G:Z

77

AH:W

29

W:AH

21

K:SH

19

W:AA

11

T:S

7

W:EY

4

NG:K

3

Y:EH

3

G:ZH

3

Y:AE

2

Y:IH

2

AH:R

1

T:TH

1

 

PHONEME

 

GRAPHEME

# Occurrences in 35K Word Corpus

15K

10K

5K

1K

Vowel

Consonant

Diphoneme

NaN%

FREQUENCY

 

COUNT