Skip to Content

Department of Linguistics

SPEECH RESOURCES - HELP PAGES

Phonetic Fonts: ASCII and Unicode Fonts - An Explanation

Robert Mannell, 2009

All fonts consist of shape definitions for each character and code tables that allocate ("map") each character in the font to a unique number.

ASCII Mapped Phonetic Fonts

In the past, phonetic fonts have been supplied by various sources in the form of ASCII mapped fonts. ASCII stands for "American Standard Code for Information Interchange" and is pronounced "assky".

ASCII is a set of 128 Primary ASCII characters and a further set of 128 Secondary ASCII characters. The ASCII standard provides for 256 characters (including some "control" codes such as "carriage return" and "line feed" characters), each allocated a unique number between 0 and 255. The primary set consists of the upper and lower case English alphabet, the numbers 0 to 9 and numerous additional characters (the ones that appear on the US English computer keyboard). The secondary set consists of the additional characters found in various parts of Western Europe (but not including many Cyrillic or Greek characters). The ASCII system is known as an 8-bit character system, as 8-bit binary digits (0's and 1's) encode the numbers 0 to 255 (28 = 256 numbers ≡ 0-255).

Other writing systems had two choices:-

  • mapping to the ASCII character set
  • 16 bit character systems (216 = 65536 numbers ≡ 0-65535)

16 bit character systems had to be used by writing systems with a very large number of characters (more than 256 characters, eg. Chinese) as 16 bit numbers encode the numbers 0 to 65535 and so can handle up to 65536 characters. Various 16-bit character standards developed in different countries but they were not compatible and there was sometimes more than one "standard" for a single language.

The situation with ASCII-mapped fonts was even more chaotic. These systems used the same code numbers that were used by standard ASCII. That's why when using an older ASCII mapped font, Russian or Greek text, for example, looks like random selections of Western European characters when displayed on computers without the correct font. There were, however, at least local standards for individual languages such as Russian or Greek.

The IPA phonetic character set, however, fared far worse. Virtually every font had a different mapping. This means that a document that made sense with one font would display garbage when a different font was used.

To add to the chaos, Microsoft and Apple used a different default ordering of the secondary ASCII characters on their operating systems and so there needed to be a Microsoft and an Apple version of each font (but often fonts only came in either an Apple or a Microsoft version). This made most ASCII mapped phonetic fonts only usable on one brand of operating system. Only very recently has it been possible to use a "Windows" font on both systems (I'm not sure how this works with the secondary ASCII characters, but I assume that OS X identifies that a Windows font is being used and re-maps the secondary ASCII characters to the Apple character order before displaying).

With the widespread adoption of the web, this proliferation of language-specific standards (as well as conflicting standards) has become a serious problem which can only be overcome by a single international standard.

Around the year 2000 I created a few ASCII mapped phonetic fonts (including "IPAASCII", pronounced "I P A assky", and three earlier fonts) and experimented with now defunct technologies, such as an early proprietary "web fonts" technology, in an attempt to display phonetic characters on the early web based versions of these resources. I eventually replaced these fonts with my own Unicode phonetic font (see below).

Unicode Phonetic Fonts

Since the early 1990's the Unicode standard (http://www.unicode.org/charts/) has been under development, but only very recently have we finally seen the advent of operating systems and web browsers that fully support the Unicode standard.

Unicode is a standard font mapping system that gives every character from every language (including some dead or artificial languages) a single unique code number. The standard allows for many thousands of characters and includes code numbers for the complete IPA character and diacritic set. The first 256 characters of the Unicode standard are identical to the ASCII characters.

Unfortunately, only a very small number of fonts actually contain the complete set of IPA characters and diacritics and some of these fonts are not perfect (eg. the widely distributed Microsoft "Lucida Sans Unicode" font chops off the bottom of a number of IPA diacritics). Some other fonts that contain the complete set of IPA characters (eg. Microsoft's "Arial Unicode MS" which is supplied with recent versions of MS Office) are very large (~25 mbyte) fonts containing thousands of characters and so are rather unwieldy.

In about 2003 I created my own Unicode phonetic font, "IPAUNI" (pronounced "I P A uni"), and recommended its use for these resource pages until early 2008. There were numerous rendering (character display) issues with this font and it has now been discarded. The availability of free high-quality alternatives has now made the IPAUNI font an unnecessary and unattractive option. If you downloaded a copy of it before I withdrew it, please don't use it any more.

This site expects you to install and use a fairly compact free Unicode font ("Charis SIL") which contains all of the Western European characters as well as the IPA characters and diacritics. It will also display with "Lucida Sans Unicode" (a Microsoft font) and "Lucida Grande" (an Apple font) as backup fonts but you are very strongly urged to install Charis SIL. Some of the IPA diacritics don't display accurately with Microsoft's "Lucida Sans Unicode" and with Apple's "Lucida Grande" font. If you are one of our students, we insist that you install "Charis SIL" before viewing these pages (or submitting assignments).