Blogged elsewhere

Interoppo Research
(IT Standards & Interoperability)

Linking research & learning technologies through standards » Nick Nicholas

(Greek Linguistics)


The perishability of Word

In re:

Peter Sefton's trying to recover his 1994 Word thesis into a sustainable document format, and migrating from 10 year old Word formats and media is no fun at all. He's right: act now, while Mac Classic is still somewhat accessible. Been there, doing that again soon with my PhD (Word 5, 1998). I did styles like Pete did, so I was somewhat virtuous, but I did go somewhat ape, so I'll be making life difficult for myself anyway.

I have two major problems Peter didn't. One, I used Endnote 4. Proprietary bibliographical software which didn't migrate well: the author names in the Endnote library itself autovanished long ago, and there was a serious compatibility issue resulting in Endnote not talking to the migrated version of the document. I've decided to cut my losses, go with Bookends as biblio software (more proprietary software, but I'm not switching to TeX in a hurry), not bother about migrating, and convert the version of the thesis with the Endnote references spelt out. Problem here is, Endnote 4 used control characters to delimit references, which when you migrate the Word file turn up as ugly splotchy fields. Fields you cannot globally find and delete -- you cannot search inside the field for text, so you'd end up deleting all fields. And I don't want to do that, because I occasionally used fields in mathematical typesetting, to get diacritics positioned correctly. *snarl*

Second problem is the thesis predates Unicode -- or rather, Microsoft allowing Unicode into the Mac version. So lots of non-future-proof 8-bit fonts: Ismini for the Greek, SILDoulosIPA 93 for the IPA, TimesDiacrit for Latin-2 characters, and (because I went ape) the occasional instance of Arabic, Hebrew, Cyrillic, and Linear B. Lots of tedious global replaces. And some hurdles:

* Word 2004 will import the Word 5 files, but is UNUSABLE on a MacBook.
* Word 2004 will do Unicode alright, but it will not even display SILDoulosIPA 93: turns it to blank squares.
* NeoOffice is usable on a MacBook, but OpenOffice has forgotten so far to implement "replace in all open documents". We're talking 10 documents here. This means macros.
* NeoOffice LOSES the font information for 8-bit fonts. And yes, I used styles, but I didn't use character styles (the main reason being that char styles weren't supported in Word 5). Which means I'll be opening these files in Word 2000 (so I can still see the 8-bit fonts), globally replace each font with a different colour, and work off global replaces based on the colours in NeoOffice. (I just did that with someone else, and the colours didn't always come through; maybe I'll try char styles after all instead.)

You can see why I've been putting this off for so long. But again: a couple of years from now is probably too late. A couple of years ago, as a research assistant, I was asked to recover a file of Don Laycock's from Word for DOS 2 -- it was a published dictionary of a Papuan language, but we couldn't grep a dead tree. Nothing on campus would read Word 84 -- Microsoft had taken their converter offline months before, and was showing no inclination to put it back up. The only way I was able to get anything out of it was ... opening it in Word 5, minted in 1991. And in a couple of years with Classic going extinct, even that will be impossible. Needless to say, the IPA font Don had used was unrecoverable and long gone; I ended up having to infer the engmas by elimination.

Yeah, proprietary, binary Word processing formats really do bite. Thank God I went easy on the diagrams, the preservability of old MacDraw PICTs is even worse...

No comments:

Friends wot blog

Twitter Updates