Unpacking !PackDir archives on Linux

Posted on Sunday, August 18, 2019 by TheBlackzone

!PackDir Icon While wading through the heaps of data that I had recovered from my old Acorn computer machines, I found some interesting coding projects I did in the mid 1990s. I wanted to have a closer look at them and maybe even reuse some of the code. However, there was a little obstacle: I had stored almost all of my source code in non-standard archive files...

The Back Story

Back in the day I did not use a version control system. Instead I took snapshots of the various development states of my projects by copying the corresponding project directory and compressing it in to an archive file. My tool of choice for this task was !PackDir, an easy to use archiving application for RISC OS with a simple drag-and-drop interface and reasonable compresion rates.

!PackDir takes a RISC OS directory as input and compresses its content recursively into a single archive file. When unpacking, it restores the original directory structure and content.

The archives created by !PackDir are unfortunately not compatible with other archiving tools and although !PackDir can still be downloaded today, it only runs on old versions of RISC OS ("26-bit versions") and cannot be used with more modern versions, such as the one for the Raspberry Pi.

Since my old A5000 computer is broken and still awaiting its repair, my options to get hold of the content of these archives narrowed down to either set up an emulation of an old A5000 computer, or to try getting !PackDir to run on a modern RISC OS system using an 26-bit emulation layer. Or to try figure out how !PackDir works and hack up some tool to read its archives on my Linux laptop natively...

I decided to give the latter approach a shot...

The !PackDir Archive File Format

Because there is no offical documentation of the !PackDir archive format, the hardest part was to obtain the information of its file structure and the compression algorithm it uses. But after an extensive research on the web, I gathered all the bits and pieces I needed.

The file format is actually pretty straightforward and looks like this:

+-------------------------------+ | HEADER | +-------------------------------+ | Object information and data 1 | +-------------------------------+ | Object information and data 2 | +-------------------------------+ ... +-------------------------------+ | Object information and data n | +-------------------------------+

The HEADER data consists of just 8 byte:

Offset Length Data ------------------------------------------------------------------------------ 0 5 Null Terminated String "PACK\0" 5 4 code bits width minus 12

The value found at offset 5 is the bit width used in the LZW compression algorithm that has been set in the "Options" dialogue of the !PackDir application. The value stored here is the number of bits minus 12 for the corresponding LZW bit width ("0" = 12-bit, "1" = 13-bit, etc.).

The header is followed by "objects" of various lengths that are either a "directory object" or "file object", whereby the first object is always a "directory object"

Root directory object (immediately after the header)

Offset Length Data ------------------------------------------------------------------------------ 0 x Directory name, zero-terminated x 4 "LOAD" address (file type and topmost byte of timestamp) x+4 4 "EXEC" address (lower bytes of timestamp) x+8 4 Attributes x+12 4 Number of entries in this directory x+16 x Start of the next object

Directory object

Offset Length Data ------------------------------------------------------------------------------ 0 x Directory name, zero-terminated x 4 "LOAD" address (file type and topmost byte of timestamp) x+4 4 "EXEC" address (lower bytes of timestamp) x+8 4 Attributes x+12 4 Number of entries in this directory x+16 4 Object type "1" = Directory x+20 x Start of the next object

Note: The root directory object entry is lacking the object type information because it is [always]{.underline} a directory.

File object

Offset Length Data ------------------------------------------------------------------------------ 0 x File name, zero-terminated x 4 "LOAD" address (file type and topmost byte of timestamp) x+4 4 "EXEC" address (lower bytes of timestamp) x+8 4 Attributes x+12 4 Uncompressed file size x+16 4 Object type "0" = File x+20 4 Compressed size or -1=uncompressed, -2=no data stored x+24 x compressed or uncompressed data of the file

Let's have a closer look on the fields of these objects.

The directory/filename is pretty straightforward: Just an zero-terminated ASCII string.

The fields "LOAD address" and "EXEC address" may seem to be misnomers as they hold the file type and date stamp of the object. These names are used in the RISC OS "Programmer's Reference Manuals" (PRM 2-16) and the data stored in this location was originally used to hold the load and execution addresses of simple machine code programs.

The two data fields have the following structure:

LOAD ADDRESS: 0xFFFtttdd | | | | | +---- Topmost byte of timestamp | +------- File type +---------- Fixed 0xFFF EXEC ADDRESS: 0xdddddddd | +---------- Lower bytes of timestamp

The file type part is pretty straightforward and its 12 bits directly translate to the RISC OS filetypes.

The timestamp is stored as a 40-bit value of centiseconds since January, 1st 1900, with the topmost byte held in the LOAD address. To convert this value into a Unix timestamp (which counts in seconds since January, 1st 1970), we need to subtract 0x336E996A00 (the number of centisecons between January, 1st 1900 and January, 1st 1970) and divide it by 100.

The attributes byte corresponds to the RISC OS file attributes:

Bit Meaning when set ----------------------------------- 0 Owner read access 1 Owner write access 2 Owner execute access 3 Owner delete protection 4 Others read access 5 Others write access 6 undefined 7 Others delete protection

The remaining fields of the object information are pretty straigthforward. The "uncompressed file size" at offset 12 holds the original file size. The "object type" at offset 16 distinguishes between a file object ("0") and a directory object ("1"). The "compressed size" at offset 20 either holds the number of bytes of the compressed data or "-1" if the data is uncompressed or "-2" if there is no data stored (zero-length files).

Finally, if the object is a "file object" its data follows at offset 24. The length of the data is either the number stored in the field "compressed size", or the number stored in the field "uncompressed file size" (if "compressed size" is "-1"). In case "compressed size" is "-2", there is no data at all and the next object follows instead.

Compression

The compression of the data itself is implemented by using the GIF variant of the LZW compression algorithm, which uses a fixed width of 8 LZW bits. I will not go into the details of the LZW compression algorithm as this would be beyond the scope of this post. There is an excellent article (and a second one here) by Mark Nelson that explains the LZW algorithm in detail and also includes sample C code of the implementation. The specifics of the GIF variant of the LZW algorithm is explained in a document accompanying the GIFLIB library.

The Tool

With this information at hand, I hacked up a small C program that takes an !PackDir archive as input and uncompresses it, recreating the directory structure and content. The source code can be downloaded here.

You can compile it simply with

gcc -o pkdir pkdir.c

and use it like this

pkdir {-d} [packdirfile]

where -d is optional for "dry-run" (ie. just showing what would be done without actually doing it) and [packdirfile] is a !PackDir archive (file type &68E).

Conclusion

Creating this tool has been fun and so far I have recovered the content of many old !PackDir archives using it. It's quite handy to be able to instantly unpack an archive without having to fire up an emulation first.

Tags: ancient, riscos, coding

New VIM colorschemes

Converting Acorn Fonts to TTF