Thoughts on permanent identifiers
Labels:
Information Technology
In re:
http://ptsefton.com/blog/2006/11/01/repository-maintenance
Some random thoughts on permanent identifiers (my day job), triggered from Peter Sefton's post above.
- The HTTP proxy address to resolve a Handles (or whatever else) permanent identifier for a resource is binding the permanent identifier to a particular protocol (HTTP) and particular host ( hdl.handle.net, arrow.monash.edu.au, whatever). This has the advantage of actually working in the current web infrastructure, which a URI based on Handles (or whatever) does not. This is turn links up with Norman Walsh's contention that "if I want DNS I know where to find it" --- i.e. why come up with and fund a shadow to DNS in Handles (or whatever), when DNS is already working. That's a question I'm not getting into yet, but it is true enough that HTTP addresses are real, and hdl: URIs (or whatever) are currently not outside a very small number of browsers.
- However, there is nothing permanent about an HTTP link to begin with -- that's the whole point of having a persistent identifier that isn't a URL. After all, there may not always be an HTTP; and HTTP URLs as is have a half-life of what, six months? As Sefton points out, there may also not always be an arrow.monash.edu.au, so rewriting URLs containing arrow.monash to something else is a big risk. A plus of having a national infrastructure for identifiers would be that, while there may not always be an arrow.monash (or even, heavens forfend, a Monash), there will always be an Australian Government(*), and one can expect the Australian Government to always be able to resolve those identifiers.
* NOTE: by "always", I mean of course "next few decades". I'll save the "I'm laminating my papers and burying them in Spitzbergen" tirade for another time. - So a couple of things I think should happen (right now, a week into the job, and with no idea of what I'm talking about) are
- While having the Handles-resolving URL at your HTTP proxy (http://hdl.handle.net/<HANDLE> or http://arrow.monash.edu.au/<HANDLE> ) is a good and valid and practical thing, it's not a persistent identifier itself; just a link to one. Argal, the digital object should include a Handle URI, distinct from the HTTP link, for future-proofing's sake. Similarly, people should be encouraged to cite the Handle URI, as well as or instead of the URL. After all, HTTP proxies can change (and will, and will be autogenerated from your repository). But the data itself should bear and contain its permanent identifier, which should travel with the digital object to wherever it ends up. To recover the <HANDLE> from the proxy URL requires that I know where the proxy ends and where the handle begins. Since a Handle can contain more than one slash, it ain't unambiguous: given http://example.com/hdl/77/99 , I cannot know whether the handle is hdl/77/99 (naming authority: hdl) or 77/99 (naming authority: 77). And knowing which Handle proxy servers were around at the time the URL was minted shouldn't be necessary for me to recover the identifier.
- We may have a national infrastructure for Handles (or whatever), but that need not mean national-level management of the Handles. It would be pointless to make a request to Canberra every time a repository in Australia needs to register a new object --- even if the request is instantaneous and light enough not to require human intervention. One of the unsung assets of the Handles system is that individual fields of the Handle record can be managed by different administrators. To me, that means a federated identifier infrastructure; Canberra can override and step in in case of emergency or disaster, but the day-to-day management of identifiers can stay with the repository managers who actually know what's going on in their repository.
- Accordingly, the national identifer should make migration of permanent identifiers possible: if a naming authority is dissolved, the national-level identifier management should either pass on the naming authority to some other institution, or take over the naming authority itself. If there's no such guarantee, the identifiers are not permanent. (That is assuming there will always be an Australian government, for which see above.)
- While having the Handles-resolving URL at your HTTP proxy (http://hdl.handle.net/<HANDLE> or http://arrow.monash.edu.au/<HANDLE> ) is a good and valid and practical thing, it's not a persistent identifier itself; just a link to one. Argal, the digital object should include a Handle URI, distinct from the HTTP link, for future-proofing's sake. Similarly, people should be encouraged to cite the Handle URI, as well as or instead of the URL. After all, HTTP proxies can change (and will, and will be autogenerated from your repository). But the data itself should bear and contain its permanent identifier, which should travel with the digital object to wherever it ends up. To recover the <HANDLE> from the proxy URL requires that I know where the proxy ends and where the handle begins. Since a Handle can contain more than one slash, it ain't unambiguous: given http://example.com/hdl/77/99 , I cannot know whether the handle is hdl/77/99 (naming authority: hdl) or 77/99 (naming authority: 77). And knowing which Handle proxy servers were around at the time the URL was minted shouldn't be necessary for me to recover the identifier.
- I agree with Peter that the browser should (for RFC 2119 values of "should") display a Handles-like rather than VITAL-like URL, since the VITAL URL is not even a shadow of a permanent identifier. A common URL format is also a "should". But without minimising the importance of getting the HTTP links migratable, I still think it's the Handles URI inclusion that is the "must".
The "Wherefore Identifiers" post that preceded the above on Pete's blog is more of a challenge; the Norman Walsh riposte and Pete's query on full-text local names made me forget who I was and what I was doing here. I'll come back to it when I have more time and less confusion...
No comments:
Post a Comment