Font

The original Atom VDG chip's text font uses a simple 5 x 7 pixel array of 64 characters. Lower case letters were omitted to reduce the ROM needed, and avoid dealing with their pesky descenders. Instead, the VDG provides 32 inverted characters and 64 semi-graphic characters. The ROM needed for the latter is economical, and can be implemented with simple logic instead. Other niggles include a lack of characters such as the British Pound or the upright bar (|). To make things worse, the font order is in not an exact match to ASCII. Some re-ordering is required to turn ASCII into the equivalent VDU code.

Original

These limitations were just about acceptable in 1982, they are definitely not today. Even a simple text editor requires lower case. So it is desirable to have some way of providing a more versatile font.

ASCII

This has only 7 bits for the character code, the 8th bit being reserved for parity. Many systems used an 8-bit character set where the lower half was ASCII but the upper half was anything they wished. Since this leads to incompatibility, the 8-bit coding was superceded by ISO-8859.

ISO-8859

ISO-8859-1 Character set

This uses 8 bits for the character code. Codes 00 to 7F are compatible with the most common form of ASCII.
Codes A0 to FF are regional variants, mainly holding accented characters:

Latin 1 (West European)
Latin 2 (Central and East European)
Latin 3 (South European)
Latin 4 (North European, Scandinavian/Baltic languages )
Cyrillic
Arabic
Greek
Hebrew
Latin 5 (Turkish)
Latin 6 (Nordic) Lappish, Eskimo
Latin/Thai, Thai
Non-existent?
Latin 7, Baltic Rim
Latin 8, Celtic
Latin 9, New West European
Latin 10, Romanian

See http://czyborra.com/charsets/iso8859.html for more details.

If designing a display chip from scratch I would be tempted to implement the common characters in ROM, and the variations in RAM. However, the FPGA chips often have blocks of RAM that can be used equally easily for implementing ROM or RAM.

Translating ISO 8-bit codes to Unicode

Useful table that highlights the characters that have to be changed for regional variants of ISO-8859.

Unicode

This allocates 16 bits per character, to cope with the wide range of characters in the world's languages. Codes are allocated in pages of 256 bytes. The most-significant 8-bits are the page number.

Unicode characters 00 to FF

There isn't room to support 64K of character glyphs, so just the first 256 are to be implemented.

0000-001F Basic Control codes

0020-007F Basic Latin characters (20-7F)

0080-009F Extra Control codes

00A0-00FF Extra Latin characters (A0-FF)

Other interesting groups

0100-0177 Latin extended A

0180-024F Latin extended B
0250-02AF International Phonetic Alphabet

0340-309F Hiragana (Japanese)

0900-097F Devanagari (Hindi script)

2800-28FF Braille dot patterns

30A0-30FF Katakana (Japanese)

Many of the world's fonts are rather ornate and hard to pack in an 8-pixel wide cell. For these it might best to render the characters on the graphic screen.

Character cell

5 x 7 dots are enough to define the basic latin characters, though lower case letters look a bit kludgy as they have to be shifted up to fit in the matrix. One economical solution might be to have a programmable attribute bit to shift the 5x7 cell down by 3 lines in a 5x10 cell.

The simplest method is to have an 8x12 character cell for all 256 characters. Since I am not strapped for FPGA RAM at the moment, this is the method I intend to use.

I currently use 6K of the 8K block-RAM for firmware. This leaves 2K, enough for 256 character cells of 8x8 pixels, but not the 3K needed to define them in 8x12 pixels. The font memory has to be inside the FPGA, the firmware can be in external ROM or RAM, so the latter has to give way.

Space for accents over capitals

Main space for lower case letters

Space for descenders

Design choices

The best compromise seems to use a default character coding that corresponds to codes 0000 to 00FF of Unicode.
The ISO-8859-1 character codes then map directly onto their unicode equivalent.
It also incurs the least work when modifying it to national variants.

There is another matter to consider: the control codes do not have associated glyphs that appear in a font, but their values can still be poked into display RAM. Therefore it seems sensible to use such values for semigraphic glyphs (as the original Atom VDG does) or for attribute-changing bytes (as teletext chips do).

There are enough codes to enumerate the full 64 semigraphic characters of the original Atom VDG.

Given the choice of Atom semigraphics and Teletext attributes, the latter seems the most appealing because it allows coloured words and graphics to appear on screen without needing extra memory to store colour data.

Implementation

The 256-character font has been implemented and the results are as expected. Some minor details have been changed.

4F	Letter O rounded off to improve appearance.
30	Number 0 has stroke added to differentiate from letter O.
23	Hash sign improved
2A	Star sign modified
5E	Up arrow replaced with caret (^)
5F	Left-arrow replaced with tilde (~)

The first two changes make it easier to read listings.

The last two changes are more than cosmetic, they will display different glyphs but this should not be significant since they are so seldom used. If any program does find this a problem, those glyphs may be changed as required.

There is also a register to control various features. For instance, one bit selects either all 256 character when set, or Atom compatible mapping when clear (the default):

Atom char	Font char	Output
00 to 3F	40 to 5F	Alphanumeric
40 to 7F	00 to 1F	Graphic
80 to BF	40 to 5F	Alphanumeric inverted
C0 to FF	80 to 9F	Graphic

Note that the inverted glyphs really are inverted: any change to the normal alphanumeric glyphs also changes the inverted glyphs.

Teletext font support

Teletext requires 24 rows of 40 characters, or 25 lines if a status row is used. This is not problem for a TV, but a QVGA screen has 240 lines. Thus it can show 24 rows where character cells are 10 lines high. To show 25 lines one would have to settle for 8-lines high (8x25=200) with 40 lines free, or 9-lines high with the one scan line missing (9x25=241). Programmable character cell height (up to 12 lines) would be very useful.

Teletext Character set demo
Warning text is enhanced when ramping the intensity.

Not wise to demonstrate if taking through customs! :-)

The text is taken from the self-destruct mechanism of the spaceship Nostromo, from the movie Alien.

DANGER

EMERGENCY DESTRUCTION SYSTEM

ON ACTIVATION SHIP WILL DETONATE IN T MINUS 10 MINUTES

FAILSAFE WARNING

CUT OFF SYSTEM WILL NOT OPERATE AFTER T MINUS 5 MINUTES

SCUTTLE PROCEDURE

DANGER, THE EMERGENCY DESTRUCT SYSTEM IS NOW ACTIVATED.
THE SHIP WILL DETONATE IN T MINUS TEN MINUTES.
THE OPTION TO OVER-RIDE AUTOMATIC DETONATION EXPIRES IN T MINUS FIVE MINUTES.

ATTENTION. THE COOLING UNITS FOR THE LIGHT-PLUS ENGINES ARE NOT FUNCTIONING.
ENGINES WILL OVERLOAD IN FOUR MINUTES, FIFTY SECONDS.
ATTENTION. ENGINES WILL OVERLOAD IN THREE MINUTES, TWENTY SECONDS.
ATTENTION. ENGINES WILL OVERLOAD IN THREE MINUTES.

TOO LATE FOR REMEDIAL ACTION. THE CORE HAS BEGUN TO MELT.
ENGINES WILL OVERLOAD IN TWO MINUTES, THIRTY-FIVE SECONDS.

ATTENTION. ENGINES WILL OVERLOAD IN TWO MINUTES.
ATTENTION. ENGINES WILL EXPLODE IN NINETY SECONDS.
ATTENTION. ENGINES WILL EXPLODE IN SIXTY SECONDS.

0000-001F	Basic Control codes
0020-007F	Basic Latin characters (20-7F)
0080-009F	Extra Control codes
00A0-00FF	Extra Latin characters (A0-FF)

0100-0177	Latin extended A
0180-024F	Latin extended B
0250-02AF	International Phonetic Alphabet
0340-309F	Hiragana (Japanese)
0900-097F	Devanagari (Hindi script)
2800-28FF	Braille dot patterns
30A0-30FF	Katakana (Japanese)