Blogged elsewhere

Interoppo Research
(IT Standards & Interoperability)

Linking research & learning technologies through standards » Nick Nicholas

Ἡλληνιστεύκοντος
(Greek Linguistics)

2006-11-09

BPMN jottings

In re:
http://www.bpmn.org/

BPMN is something I'll be spending more time on in the day job, and I've just started looking at it. Some notes:


  • Looks a lot like UML activity diagrams. That's a good thing. More expansive icon set for their activites and notifications, which is conceptually extremely helpful, although potentially just syntactic sugar as far as the underlying formal model is concerned. Or maybe not: I'll find out from the spec.
  • These guys at Potsdam Uni are algebraically manipulating BPMN diagrams to reason about them, using an algebra they call pi-calculus. The wiki is spotty, but at the least they're doing soundness checks. This is very cool, and a result of the same formality of such graphs that enables them to be turned into working code. Process diagram's ain't just napkin fodder.
  • The Potsdamites use OmniGraffle to generate their BPMN diagrams, and applescript it to generate XML; the XML is what they feed to their pi-calculus engine (in Ruby). Tres nifty again. But it highlights a problem with the napkin-to-xml translation: no standard XML tagset. Actually, that is a malicious and repulsive lie; of course there is an XML, BPEL4WS --- but that involves translation, not just restyling; and the BPMI says in their spec of BPMN, sect. 2.3, that they intend to create a diagram exchange format between tools, which may be an XML or an XMI -- they just haven't yet. In the meantime, we're left with generic diagram exchange formats. For my two target apps, Omnigraffle and Websphere, that means VISIO XML. I'd hate to think VISIO XML becomes the de facto standard; at any rate, my first attempt to go from Omnigraffle to Websphere via VISIO XML failed. I'll come back to this, and I may well be buying a vowel or two from the Potsdamites' applescript.
  • Websphere costs a lot more than Omnigraffle, so you'd hope it does more. And it certainly does. Of course, the diagrams Omnigraffle produces are of a transcendent beauty immanent in their Macness. On the other hand, live syntax checking of your BPMN diagram in Websphere as you draw it? You gotta love that. There'll be a lot of stuff to explore over in that package over the next few weeks. But remember: if you don't have at least 1GB of RAM on your PC, don't even bother: it's 800 MB of Java Virtual Machine goodness. (It seems to be running just fine on Parallels on my MacBook, but I've juiced it up to 2GB.)

The perishability of Word

In re:
http://ptsefton.com/blog/2006/11/08/self_preservation_1

Peter Sefton's trying to recover his 1994 Word thesis into a sustainable document format, and migrating from 10 year old Word formats and media is no fun at all. He's right: act now, while Mac Classic is still somewhat accessible. Been there, doing that again soon with my PhD (Word 5, 1998). I did styles like Pete did, so I was somewhat virtuous, but I did go somewhat ape, so I'll be making life difficult for myself anyway.

I have two major problems Peter didn't. One, I used Endnote 4. Proprietary bibliographical software which didn't migrate well: the author names in the Endnote library itself autovanished long ago, and there was a serious compatibility issue resulting in Endnote not talking to the migrated version of the document. I've decided to cut my losses, go with Bookends as biblio software (more proprietary software, but I'm not switching to TeX in a hurry), not bother about migrating, and convert the version of the thesis with the Endnote references spelt out. Problem here is, Endnote 4 used control characters to delimit references, which when you migrate the Word file turn up as ugly splotchy fields. Fields you cannot globally find and delete -- you cannot search inside the field for text, so you'd end up deleting all fields. And I don't want to do that, because I occasionally used fields in mathematical typesetting, to get diacritics positioned correctly. *snarl*

Second problem is the thesis predates Unicode -- or rather, Microsoft allowing Unicode into the Mac version. So lots of non-future-proof 8-bit fonts: Ismini for the Greek, SILDoulosIPA 93 for the IPA, TimesDiacrit for Latin-2 characters, and (because I went ape) the occasional instance of Arabic, Hebrew, Cyrillic, and Linear B. Lots of tedious global replaces. And some hurdles:

* Word 2004 will import the Word 5 files, but is UNUSABLE on a MacBook.
* Word 2004 will do Unicode alright, but it will not even display SILDoulosIPA 93: turns it to blank squares.
* NeoOffice is usable on a MacBook, but OpenOffice has forgotten so far to implement "replace in all open documents". We're talking 10 documents here. This means macros.
* NeoOffice LOSES the font information for 8-bit fonts. And yes, I used styles, but I didn't use character styles (the main reason being that char styles weren't supported in Word 5). Which means I'll be opening these files in Word 2000 (so I can still see the 8-bit fonts), globally replace each font with a different colour, and work off global replaces based on the colours in NeoOffice. (I just did that with someone else, and the colours didn't always come through; maybe I'll try char styles after all instead.)

You can see why I've been putting this off for so long. But again: a couple of years from now is probably too late. A couple of years ago, as a research assistant, I was asked to recover a file of Don Laycock's from Word for DOS 2 -- it was a published dictionary of a Papuan language, but we couldn't grep a dead tree. Nothing on campus would read Word 84 -- Microsoft had taken their converter offline months before, and was showing no inclination to put it back up. The only way I was able to get anything out of it was ... opening it in Word 5, minted in 1991. And in a couple of years with Classic going extinct, even that will be impossible. Needless to say, the IPA font Don had used was unrecoverable and long gone; I ended up having to infer the engmas by elimination.

Yeah, proprietary, binary Word processing formats really do bite. Thank God I went easy on the diagrams, the preservability of old MacDraw PICTs is even worse...

2006-11-08

The Complutensian Polyglot, ahead of the times

In re:
http://www.supakoo.com/rick/ricoblog/Permalink.aspx?guid=873cc194-46b8-4dca-a0a8-d6ab8b688a3b

As I had added into the Wikipedia entry, the Complutensian Polyglot edition of the Bible in the 1520s marked the highpoint of the initial trend in Greek typography to come up with an unconnected Greek typeface. By the time of the Complutensian, 40 years in from the first attempts of the 1470s, the results were beautiful. Around that time, Aldus Manutius decided to go with the contemporary cursive as the model for both his Roman and Greek typeface; and everyone followed suit for the next couple of centuries of Greek. Now (as I've seen in a typographer's blog someplace), this made commercial sense --- Aldus used the bookhand his scholarly audience was familiar with from their manuscripts; and the results for Roman script were the beauty of italics. The results for Greek was squiggle, and by the 19th people was considered ugly. (That's because it *is* ugly.)

So as typographers tried to distance themselves from Aldus' typeface, there was a trend to try to go back to a lost ideal of Greek typography, nicely commented on in John Bowman's paper (in Greek) on British typography of Greek. The Complutensian begat Robert Proctor's Otter Greek font (see p. 158 of Bowman); Otter Greek begat Scholderer's Neohellenic (cf. GFS Neohellenic); Neohellenic begat Athenian font; and the Complutensian again begat the Greek Font Society's GFS Complutensian Greek, which I'm informed is planned for release by the Greek Font Society next year. [EDIT: since released as GFS Complutum] (See also the enlightening thread on the Typophile blog.)

The point is, the Complutensian has long been fetishised as a lost ideal of Greek typography, and I wanted to get me some. I've just received a 500 MB pdf of the PDF, and it contains a surprise I hadn't noticed. But first, a brief comment on what it looks like.

The Complutensian is pretty well described in a blog entry by Rick Brannan (ricoblog), and I suggest you pop out to it before continuing.

OK, you're back. :-) It's apparent from the gifs Brannan provides, but it truly hits you when you see the pages; the Old and the New Testament look totally different. The Old Testament looks impressive, and is quite a technical feat; but it does not look pretty. It's very busy, for one:




Septuagint Greek (with interlinear Latin)Vulgate LatinHebrew
Targum Onkelos Aramaic paraphraseLatin translation of Targum Onkelos


Some malicious bishop commented that the Vulgate text looks like Jesus with the two thieves crucified either side of him, and I can see why now. The Hebrew is Hebrew; it does look out of place next to the Latin, which is inevitable, although I'm not familiar enough with Hebrew script to tell if it's a good looking Dysmas. The Septuagint gets to be Gestas, the Bad Thief. The Septuagint column is not evil per se, and it's very utilitarian, but it's also quite messy: the Greek's in squiggle, the interlinear Latin's in a Bastarda that crowds out the spindly Greek it's meant to be a crutch for; ick. In the middle, the Latin's in a gorgeous, self-assured Antiqua. The Vulgate wins.

Zooming forwards to the New Testament is a shock to the eyes; it's sort of a Darien moment. Just two simple columns: no prima donna in the centre. The Latin's back in Bastarda, but it's a Bastarda that's been given room to breathe, instead of tripping over interlinear squiggle; and at full size, it's quite elegant. The shock of course is the Greek. It is simply gorgeous.

But the real shock is when you zoom in. (You can see it in the first gif on Ricoblog, but you have to click to enlarge and concentrate). The Complutensian typeface, the pinnacle of early Greek typography, the Eden from which Aldus' serpentine Greek expelled us and which has haunted several 20th century typographers, the bestest Greek font ever...

... is monotonic.

Seriously. No circumflexes or graves; no accents on monosyllables; no iota subscripts; no smooth breathings. There are rough breathings, but they're actually displaced to the left of the vowel, as they are normally on capitals; the Complutensian's pretty much treating them as letters not diacritics. (You can see it in the Ricoblog gif, ῾υπέρ, second line from the end.)

That's a shock alright. And it's a deliberate aesthetic choice: Jimenez' Spaniards certainly knew about accents, and their squiggle font in the Septuagint is drenched in them. The forerunners of their typeface -- da Spira and Jenson in 1470 -- used accents (see Zapf's paper on the history of Greek typefaces, p. 6). It's like the Complutensians said, we're designing the most beautiful Greek leters ever --- and we say we have no room on top of those letters for distracting squiggles. It's a deliciously bold decision.

2006-11-07

Ο Νικολάου Τοναμύντωρ

In re:
http://www.sarantakos.com/language/l-akrotites.html

Ξαναδιάβαζα χτες τις σελίδες του φίλου Νίκο Σαραντάκου για τη γλώσσα, μεταξύ των οποίων και καταδίκες για το φαινόμενο της υπεράσπισης του πολυτονικού, ως το πιο πρόσφατο επεισόδιο στη διαμάχη γλωσσαμυντόρων και... άλλων γλωσσαμυντόρων. (Για να μην ξεχνάμε την εύστοχη παρατήρηση του Πήτερ Μάκριτιζ, πως η λογοτεχνική δημοτική δεν ήταν λιγότερο τεχνητή γλώσσα εν τέλει από την καθαρεύουσα.) Και παρότι συμφωνώ εν πολλοίς με το 40κο, βλέπω ότι στο θέμα του πολυτονικού, περνάω όλο και περισσότερο στη συντήρηση, καταπώς κάνω και σε κάποια άλλα κοινωνικά θέματα (π.χ. μοιχεία --- η σειρά "Και οι παντρεμένοι έχουν ψυχή" με κάνει μπαρούτι κάθε που το βλέπω· και μην αρχίσω για το ρεσιτάλ δεοντολογίας και κοινωνικής ευθύνης που αποτελεί το "Θα βρεις το δάσκαλό σου"). Για να γίνω σαφής: συμφωνώ ότι η δημοτική όπως τη γνωρίζουμε με πολυτονικό δεν γράφεται -- ή μάλλον γράφεται με ικανή αυθαιρεσία, μέχρι να αποφασίσεις τι θα πεις μακρό και τι βραχύ σε μια γλώσσα που μακρά και βραχέα δεν σκαμπάζει. Αλλά όταν βλέπω αρχαία σε μονοτονικό (κάτι που ο 40κος κάνει πειραματικά, αλλά που είθισται πλέον), ξενίζομαι. Και το ίδιο εν παρόδω νοιώθω για το μικτό λόγο των πρώιμων δημωδών λογοτεχνημάτων. Αν είναι αυθαίρετο να κάνουμε το <ούζο> <οὖζο>, αυθαίρετο είναι να κάνουμε και το <ᾦ> <ώ>.

Και αυτό νομίζω απορρέει από το ιδιάζον της γενιάς μου. Είμαι στην πρώτη γενιά που αποποιήθηκε το πολυτονικό στην εκπαίδευση --- το '81 ήμουν Ε! δημοτικού, και το άγιος και αγνός μετά χαράς αποποιήθηκα. Συμμεριζόμουν τον τρόμο της προηγούμενης γενιάς για την αντιδραστική και νεκρωμένη καθαρεύουσα --- δεν μπορούσα καν να διαβάσω την καθαρεύουσα του Νικολάου Πολίτη χωρίς δυσφορία. Αλλά λίγο η τριβή με το TLG (όπου μετά χρόνια ξανάμαθα το πολυτονικό, και έπρεπε και να το επιβάλω στον έλεγχο των κειμένων), λίγο η έκθεση στην καλαίσθητη καθαρεύουσα του Χατζιδάκι (καλαίσθητη στη σύνταξη, γιατί ειρμό τα κατεβατά του δεν έχουν ούτε για δείγμα), και πολύ η απουσία από την Ελλάδα, μου αναχαίτησαν την παλιά δυσφορία. Τους τόνους τους βρίσκω τώρα χαριτωμένους, αν και όχι σε βαθμό να τους χρησιμοποιώ τακτικά στο νεοελληνικό μου λόγο: γι' αυτό και η προμετωπίδα είναι πολυτονική και Μπόστεια ("ὁπουτζοῦ"), ενώ τα επιμέρους άρθρα είναι σε μονοτονικό.

Τη δυνατότητα να βλέπουμε το γλωσσικό μας παρελθόν ως χαριτωμένο μάς την έβλαψε και η επιβολή του αρχαΐζοντος ιδιώματος, αλλά και η πολιτική στροφή του γλωσσικού προβλήματος: η χρήση πλέον αρχαΐζοντος ιδιώματος από νεοέλληνα είναι πολιτικά βεβαρημένη. (Πάντως ευγενέστατη βρήκα τη συμβολή του Τέττιγα στο Ιστολόγιον, όπου σωστά κατακεραυνώνει τις συνήθεις ατοπολογίες για την πενία της νεοελληνικής, αλλά το κάνει... στα αρχαία. Και χαλάλι του το μονοτονικό.)

Σκέφτομαι το ευφυέστατο ιστολόγιο του ψευδο-Τσώσερ, Geoffrey Chaucer Hath A Blog, και αναρωτιέμαι αν είναι καν δυνατόν την σήμερον να γράψει κανείς ανάλογο στα ελληνικά. Ο ψευδο-Τσώσερ, η ψευδο-κουνιάδα του και ο ψευδο-ντε Μαντεβίλ διακωμωδούν την Πάρις Χίλτον και τον πόλεμο με το Ιράκ, ή τη θρησκοπληξία της εποχής τους, και είναι τρομερά αστείοι. Αστείοι, γιατί η γλώσσα αποτελεί σημαίνον και σ' αυτήν την περίσταση (πώς λέμε "το μέσον είναι το μήνυμα" κατά ΜακΛούχαν; και ο κώδικας είναι το μήνυμα, με την έννοια πάντα του Jakobson -- πομπός , δέκτης, κανάλι, κώδικας). Aλλά το σημαινόμενο της Τσωσέρειας αγγλικής ως κώδικα στο σημερινό αναγνώστη δεν επεκτείνεται πολύ πιο πέρα από το «γράφτηκα το 1400»· οπότε το χιούμορ του απροόπτου και του αναχρονισμού προκύπτει αβίαστα. Αν κάποιος ιστολογήσει περί Τατιάνας Στεφανίδου και Πανίκου Ψωμιάδη, μην πω με τη γλώσσα και το προσωπείο του Πλάτωνα ή καν του Άη Παύλου (τρομάρα μας), αλλά έστω του Πτωχοπρόδρομου, θα πάει ο νους μας απλώς στον αναχρονισμό; Νομίζω πως όχι. Είτε θα σκεφτούμε «αντιδραστικός» -- το πολιτικά βεβαρημένο σημαινόμενο της διγλωσσίας· είτε θα σκεφτούμε Μποστ -- διότι Μπόστ είναι το προηγούμενο της χιουμοριστικής χρήσης αρχαΐζοντος ιδιώματος, αλλά το χλευαζόμενο δεν είναι το περιεχόμενο του μηνύματος, μα πάλι ο κώδικας: η ελληνικούρα, η γλωσσική έπαρση. (Αποτελεί αντιπαράδειγμα ο Αστερίξ στα Αρχαία; Δεν κάνω ρητορική ερώτηση.)

Αν αυτό δεν ισχύει, τότε όντως έληξε πλέον το κεφάλαιο «γλωσσικό ζήτημα». Και για να κάνω τη σύνδεση με το τοναμυτορλίκι μου --- μπορώ τώρα να συμπαθώ μια πολυτονική που απαλλάχτηκα, διότι κάπως για μένα τουλάχιστον έχει αφαιμαχθεί πλέον ο καβγάς αυτός...

Ορολογία για ψηφιακές βιβλιοθήκες

In re:
http://conference.lis.upatras.gr/topics.php

Έπεσα τυχαία στην παραπάνω σελίδα καθώς έψαχνα τα του νέου μου επαγγέλματος. Καράφλιασα με την καταχώρηση:


* Οντολογίες (Ontologies)


... !! Και επίσης άσχημο μου φάνηκε το αμετάφραστο Tutorials. Όσο για το "Μάνατζμεντ" βιβλιοθηκών αντί διαχείρηση --- νισάφι! Βέβαια μετά μια ματιά στα μιξοεγγλέζικα του Μάξιμ ελληνιστί (ε, ο θεός να τα κάνει ελληνιστί αλλά τέλος πάντων), δεν δικαιούμαι να 'χω και πολλές αξιώσες. (Μου φαίνεται, ή τα εληνικά του πάλαι ΚΛΙΚ ήταν πιο πηγαία;)

Μάλιστα η ορολογία είχε και τα ωραία του: μ' αυτές τις εξαιρέσεις, φαίνεται συνειδητή προσπάθεια να παραχθεί ελληνική ορολογία στο πεδίο. Ιδίως μου άρεσε η απόδοση του Institutional Repository: "Ιδρυματικό Αποθετήριο". (Κρίμα βέβαια η ομοιότητα με το "αποχωρητήριο... :-) )

2006-11-06

Thoughts on permanent identifiers

In re:
http://ptsefton.com/blog/2006/11/01/repository-maintenance


Some random thoughts on permanent identifiers (my day job), triggered from Peter Sefton's post above.



  • The HTTP proxy address to resolve a Handles (or whatever else) permanent identifier for a resource is binding the permanent identifier to a particular protocol (HTTP) and particular host ( hdl.handle.net, arrow.monash.edu.au, whatever). This has the advantage of actually working in the current web infrastructure, which a URI based on Handles (or whatever) does not. This is turn links up with Norman Walsh's contention that "if I want DNS I know where to find it" --- i.e. why come up with and fund a shadow to DNS in Handles (or whatever), when DNS is already working. That's a question I'm not getting into yet, but it is true enough that HTTP addresses are real, and hdl: URIs (or whatever) are currently not outside a very small number of browsers.

  • However, there is nothing permanent about an HTTP link to begin with -- that's the whole point of having a persistent identifier that isn't a URL. After all, there may not always be an HTTP; and HTTP URLs as is have a half-life of what, six months? As Sefton points out, there may also not always be an arrow.monash.edu.au, so rewriting URLs containing arrow.monash to something else is a big risk. A plus of having a national infrastructure for identifiers would be that, while there may not always be an arrow.monash (or even, heavens forfend, a Monash), there will always be an Australian Government(*), and one can expect the Australian Government to always be able to resolve those identifiers.

    * NOTE: by "always", I mean of course "next few decades". I'll save the "I'm laminating my papers and burying them in Spitzbergen" tirade for another time.

  • So a couple of things I think should happen (right now, a week into the job, and with no idea of what I'm talking about) are



    1. While having the Handles-resolving URL at your HTTP proxy (http://hdl.handle.net/<HANDLE> or http://arrow.monash.edu.au/<HANDLE> ) is a good and valid and practical thing, it's not a persistent identifier itself; just a link to one. Argal, the digital object should include a Handle URI, distinct from the HTTP link, for future-proofing's sake. Similarly, people should be encouraged to cite the Handle URI, as well as or instead of the URL. After all, HTTP proxies can change (and will, and will be autogenerated from your repository). But the data itself should bear and contain its permanent identifier, which should travel with the digital object to wherever it ends up. To recover the <HANDLE> from the proxy URL requires that I know where the proxy ends and where the handle begins. Since a Handle can contain more than one slash, it ain't unambiguous: given http://example.com/hdl/77/99 , I cannot know whether the handle is hdl/77/99 (naming authority: hdl) or 77/99 (naming authority: 77). And knowing which Handle proxy servers were around at the time the URL was minted shouldn't be necessary for me to recover the identifier.

    2. We may have a national infrastructure for Handles (or whatever), but that need not mean national-level management of the Handles. It would be pointless to make a request to Canberra every time a repository in Australia needs to register a new object --- even if the request is instantaneous and light enough not to require human intervention. One of the unsung assets of the Handles system is that individual fields of the Handle record can be managed by different administrators. To me, that means a federated identifier infrastructure; Canberra can override and step in in case of emergency or disaster, but the day-to-day management of identifiers can stay with the repository managers who actually know what's going on in their repository.

    3. Accordingly, the national identifer should make migration of permanent identifiers possible: if a naming authority is dissolved, the national-level identifier management should either pass on the naming authority to some other institution, or take over the naming authority itself. If there's no such guarantee, the identifiers are not permanent. (That is assuming there will always be an Australian government, for which see above.)



  • I agree with Peter that the browser should (for RFC 2119 values of "should") display a Handles-like rather than VITAL-like URL, since the VITAL URL is not even a shadow of a permanent identifier. A common URL format is also a "should". But without minimising the importance of getting the HTTP links migratable, I still think it's the Handles URI inclusion that is the "must".






The "Wherefore Identifiers" post that preceded the above on Pete's blog is more of a challenge; the Norman Walsh riposte and Pete's query on full-text local names made me forget who I was and what I was doing here. I'll come back to it when I have more time and less confusion...

Friends wot blog

Twitter Updates

Calendar