TheBlackzone Logo

Decoding Acorn BBC Basic Files

Posted on Sunday, June 23, 2019 by TheBlackzone

Acorn Computers Logo During my archeologic expedition into ancient computer stuff, I encountered a bunch of old BBC BASIC programs I'd written on my Acorn Archimedes A5000 computer in the early 1990s. For the sake of reminiscence, I wanted to have a look at them, but was thwarted by the fact that they are stored in a tokenized format that is not directly readable. So I did some research on the format and created a converter...

Format of a BBC BASIC file

First, let's have a look at the structure of the lines in a BBC BASIC file. Each line looks like this:

+------+--------+--------+--------+----------------------------------+ | 0x0D | lno hi | lno lo | length | line data (text and tokens).... | +------+--------+--------+--------+----------------------------------+ 

Tokens

Detokenizing the line data is pretty straightforward: Tokens are either one or two byte long and are in the range 0x7F..0xFF. Everything in the range 0x20..0x7E is treated as normal text.

The list of tokens is included in Appendix B of the BBC BASIC Reference Manual, so it is quite easy to create a decoding table.

Line References

Line references, such as used in GOTO nnnnn or GOSUB nnnnn statements, are stored in an internal format. The sequence starts with 0x8D, followed by three bytes:

[0x8D] [b0] [b1] [b2] 

The line number in a line reference is calculated as follows:

lineno = ((b2 EOR (b0 * 16)) * 256 + (b1 EOR ((b0 * 4) AND 0xC0))) AND 0xFFFF; 

Actually, the internal format of line references was not so easy to figure out. I found the decoding algorithm in the RISC OS source code (in the file "s.bastxt"), but I also learned that there is a document on J.G.Harston's website describing it.

Putting it together

After having researched the information above, I created a small program that takes a tokenized BBC BASIC file as its input and translates it into plain text. The program is implemented in C and you can download the source code here.

Conclusion

Although there is no practical use for my old BBC BASIC programs these days, I find it interesting (and sometimes inspiring) to have a look at the stuff I created almost 30 years ago. To achive this goal, it would most certainly have been easier to just use an emulation, but researching an old file format and building a tool to read it are the most fun in digital archeology.

Tags: ancient, riscos

Button PreviousConverting Acorn Fonts to TTF

Emulating an Acorn Archimedes A5000 computerButton Next