Showing posts with label repositories. Show all posts
Showing posts with label repositories. Show all posts

20 April 2011

How Can Your Content Live After You Die?

The current computer scene is notable for the role played by user-generated content (UGC): Facebook, Twitter, Flickr, YouTube etc. are all driven by people's urge to create and share.

Most of this is done by relatively young people; this means death is unlikely to be high on their list of preoccupations. Which also implies that they are probably not thinking about what will happen to all the content they create when they do die.

So we find ourselves in a situation where more and more content is being produced - not all of it great, by any means, by certainly characteristic of our time and important to the people that create it and their family, friends and users. Despite that rapid accumulation, no one is really trying to address the issue of what is going to happen to it all as users die.

This is quite separate from the more immediate problem of services shutting down, as is happening with Google Video. At least in these cases, you generally have the option to transfer it to some other site. But what happens when you - the creator, the uploader, the one that is nominally responsible for that content - are no longer around to do that?

You might hope that your heirs, whoever they might be, would carry on with things. But that presupposes that you leave all your passwords with them - in your will, perhaps? There are probably also issues to do with changing over the ownership of accounts - again, something that has not needed tackling much yet.

But is it really realistic to expect your family and friends to carry on caring for your content? After all, they will probably have their own to worry about. And what happens when they die? Will they then pass on not only their own UGC, but yours too? Won't that create a huge digital ball and chain that grows as it is passed on to the unlucky recipient? Hardly a recipe for sustainability.

Doubtless at some point some sharp entrepreneur will interpret this coming need as an opportunity. Just as you can pay a company to keep your cryogenically-preserved body against the day when a cure will be found for whatever ailment you eventually die of, so there will be companies offering digital immortality for your content.

The key question - as for those cryogenic preservation companies - is: will they really be around in hundreds of years' time? Of course, that's not really a problem for those sharp entrepreneurs that have your money *now*; and there's also not much you will be able to do about it if they don't make good on their side of the bargain...

What we need are repositories where content can be stored safely with a very particular audience in mind: posterity. To a certain extent, the Internet Archive already does that, but as I know from my own blog posts, its coverage is very patchy. And that's to be expected: a single organisation cannot hope to archive the entire Internet, including its second-by-second changes.

Moreover, depending on on one organisation is like putting all of the world's knowledge in the Library of Alexandria and nowhere else: after a good fire or two, you have lost everything. No, the solution is clearly to store the world's digital heritage in a distributed fashion.

We could start with national repositories, like the great deposit libraries that have a copy of every book published in their land. Those national Net holdings might also be national - after all, if every country did this, the world's output would be covered.

But clearly that's not a safe option either: ideally, you want multiple backups of national material to build in redundancy. You'd also want vertical markets to be stored by relevant organisations - every architectural site by some architectural body, every fishing site by some suitable organisation. You might have even more local stores of data in local libraries, or in local universities. Obviously the more the merrier (although it would be good to have some protocol so that they could all signal their existence and what they held to each other.)

Of course, none of this is going to happen, because the intellectual monopolists would be squawking their heads off about the inclusion of "their" content· This would have knock-on consequences for UGC, since, as we know, the boundaries between what is fair use and copyright infringement is ill-defined without hugely-expensive court cases. No organisation is going to take the risk of getting it wrong given the insanely litigious nature of the content companies.

And so we must sit back and contemplate not only the inevitability of our own demise - however far off that might be - but also the inevitable destruction of all that really ace content we have created and will create. Because, you know, maintaining that 18th-century intellectual monopoly is just so much more important than preserving the unparalleled global explosion of human creativity we are currently witnessing online.

Follow me @glynmoody on Twitter or identi.ca.

10 July 2009

Do We Need Open Access Journals?

One of the key forerunners of the open access idea was arxiv.org, set up by Paul Ginsparg. Here's what I wrote a few years back about that event:

At the beginning of the 1990s, Ginsparg wanted a quick and dirty solution to the problem of putting high-energy physics preprints (early versions of papers) online. As it turns out, he set up what became the arXiv.org preprint repository on 16 August, 1991 – nine days before Linus made his fateful “I'm doing a (free) operating system (just a hobby, won't be big and professional like gnu) for 386(486) AT clones” posting. But Ginsparg's links with the free software world go back much further.

Ginsparg was already familiar with the GNU manifesto in 1985, and, through his brother, an MIT undergraduate, even knew of Stallman in the 1970s. Although arXiv.org only switched to GNU/Linux in 1997, it has been using Perl since 1994, and Apache since it came into existence. One of Apache's founders, Rob Hartill, worked for Ginsparg at the Los Alamos National Laboratory, where arXiv.org was first set up (as an FTP/email server at xxx.lanl.org). Other open source programs crucial to arXiv.org include TeX, GhostScript and MySQL.

arxiv.org was and is a huge success, and that paved the way for what became the open access movement. But here's an interesting paper - hosted on arxiv.org:

Contemporary scholarly discourse follows many alternative routes in addition to the three-century old tradition of publication in peer-reviewed journals. The field of High- Energy Physics (HEP) has explored alternative communication strategies for decades, initially via the mass mailing of paper copies of preliminary manuscripts, then via the inception of the first online repositories and digital libraries.

This field is uniquely placed to answer recurrent questions raised by the current trends in scholarly communication: is there an advantage for scientists to make their work available through repositories, often in preliminary form? Is there an advantage to publishing in Open Access journals? Do scientists still read journals or do they use digital repositories?

The analysis of citation data demonstrates that free and immediate online dissemination of preprints creates an immense citation advantage in HEP, whereas publication in Open Access journals presents no discernible advantage. In addition, the analysis of clickstreams in the leading digital library of the field shows that HEP scientists seldom read journals, preferring preprints instead.

Here are the article's conclusions:

Scholarly communication is at a cross road of new technologies and publishing models. The analysis of almost two decades of use of preprints and repositories in the HEP community provides unique evidence to inform the Open Access debate, through four main findings:

1. Submission of articles to an Open Access subject repository, arXiv, yields a citation advantage of a factor five.

2. The citation advantage of articles appearing in a repository is connected to their dissemination prior to publication, 20% of citations of HEP articles over a two-year period occur before publication.

3. There is no discernable citation advantage added by publishing articles in “gold” Open Access journals.

4. HEP scientists are between four and eight times more likely to download an article in its preprint form from arXiv rather than its final published version on a journal web site.

On the one hand, it would be ironic if the very field that acted as a midwife to open access journals should also be the one that begins to undermine it through a move to repository-based open publishing of preprints. On the other, it doesn't really matter; what's important is open access to the papers. If these are in preprint form, or appear as fully-fledged articles in peer-reviewed open access journals is a detail, for the users at least; it's more of a challenge for publishers, of course... (Via @JuliuzBeezer.)

Follow me @glynmoody on Twitter or identi.ca.

10 April 2009

How Apt: Apt-urls Arrive

One of the unsung virtues of open source is the ease with which you can add, remove and upgrade programs. In part, this comes down to the fact that all software is freely available, so you don't need to worry about cost: if you want it, you can have it. This makes installation pretty much a one-click operation using managers like Synaptic.

Now things have become even easier:


As of this morning, apt-urls are enabled on the Ubuntu Wiki. What does this mean? In simple terms, this feature provides a simple, wiki-based interface for apt, the base of our software management system. It means that we can now insert clickable links on the wiki that can prompt users to install software from the Ubuntu repositories.

That's pretty cool, but even more amazing is the fact that when I click on the link in the example on the above page, it *already* works:

If you are a Firefox user on Ubuntu, you will also note that the link I’ve provided here works, too. This is because Firefox also allows apt-urls to work in regular web pages.

Free software is just *so* far ahead of the closed stuff: how could anyone seriously claim that it doesn't innovate?

Follow me on Twitter @glynmoody

18 October 2007

A Library of Open Access Digital Libraries

If you're a fan of digital libraries - and, let's face it, who isn't? - you'll find this mega-list useful, especially because:

The sites listed here are mainly open access, which means that the digital formats are viewable and usable by the general public.

That's not to say it's anywhere near complete, not least because it has it's own, self-confessed biases:

This list contains over 250 libraries and archives that focus mainly on localized, regional, and U.S. history, but it also includes larger collections, eText and eBook repositories, and a short list of directories to help you continue your research efforts.

(Via DigitalKoans.)

30 September 2007

OpenOffice.org Extends Itself

One of the great strengths of open source is its extensibility. This might be at the code level, or through self-standing extensions, as with Firefox. So it's really good news - if long overdue - that OpenOffice.org is doing something similar with a formal repository.

12 April 2006

Windows Live Academic Search Goes Live

Microsoft has rolled out the first beta of its academic search engine. It has some nice Web 2.0-y features that make it look far cooler than Google Scholar (Google, are you listening?). One of the FAQs made me smile:

What about open source repositories? Do you have content from them in your index?

Academic search has implemented the Open Architecture Initiative (OAI) protocol for indexing OAI-compliant repositories. For example, we indexed the content present in ArXiv.org for the launch. We will continue to index more repositories after the launch.

I don't know of anybody except Microsoft that calls these OAI repositories "open source": you don't think that Microsoft's hung up on something?

Another FAQ talks about something new to me: the OpenURL. This turns out to be a wonderful piece of Orwellian double-speak, since it is a way of ensuring that people only get to see the content they are "entitled" to - that is, have paid for. In other words, OpenURL is all about closing off your options.