Showing posts with label internet archive. Show all posts
Showing posts with label internet archive. Show all posts

25 July 2014

Copyright Strikes Again: No Online Access To UK Internet Archive

Last week we wrote about how Norway had come up with a way to provide online access to all books in Norwegian, including the most recent ones, available to anyone in the country. Here, by contrast, is how not to do it, courtesy of publishers in the UK: 

On Techdirt.

20 April 2011

How Can Your Content Live After You Die?

The current computer scene is notable for the role played by user-generated content (UGC): Facebook, Twitter, Flickr, YouTube etc. are all driven by people's urge to create and share.

Most of this is done by relatively young people; this means death is unlikely to be high on their list of preoccupations. Which also implies that they are probably not thinking about what will happen to all the content they create when they do die.

So we find ourselves in a situation where more and more content is being produced - not all of it great, by any means, by certainly characteristic of our time and important to the people that create it and their family, friends and users. Despite that rapid accumulation, no one is really trying to address the issue of what is going to happen to it all as users die.

This is quite separate from the more immediate problem of services shutting down, as is happening with Google Video. At least in these cases, you generally have the option to transfer it to some other site. But what happens when you - the creator, the uploader, the one that is nominally responsible for that content - are no longer around to do that?

You might hope that your heirs, whoever they might be, would carry on with things. But that presupposes that you leave all your passwords with them - in your will, perhaps? There are probably also issues to do with changing over the ownership of accounts - again, something that has not needed tackling much yet.

But is it really realistic to expect your family and friends to carry on caring for your content? After all, they will probably have their own to worry about. And what happens when they die? Will they then pass on not only their own UGC, but yours too? Won't that create a huge digital ball and chain that grows as it is passed on to the unlucky recipient? Hardly a recipe for sustainability.

Doubtless at some point some sharp entrepreneur will interpret this coming need as an opportunity. Just as you can pay a company to keep your cryogenically-preserved body against the day when a cure will be found for whatever ailment you eventually die of, so there will be companies offering digital immortality for your content.

The key question - as for those cryogenic preservation companies - is: will they really be around in hundreds of years' time? Of course, that's not really a problem for those sharp entrepreneurs that have your money *now*; and there's also not much you will be able to do about it if they don't make good on their side of the bargain...

What we need are repositories where content can be stored safely with a very particular audience in mind: posterity. To a certain extent, the Internet Archive already does that, but as I know from my own blog posts, its coverage is very patchy. And that's to be expected: a single organisation cannot hope to archive the entire Internet, including its second-by-second changes.

Moreover, depending on on one organisation is like putting all of the world's knowledge in the Library of Alexandria and nowhere else: after a good fire or two, you have lost everything. No, the solution is clearly to store the world's digital heritage in a distributed fashion.

We could start with national repositories, like the great deposit libraries that have a copy of every book published in their land. Those national Net holdings might also be national - after all, if every country did this, the world's output would be covered.

But clearly that's not a safe option either: ideally, you want multiple backups of national material to build in redundancy. You'd also want vertical markets to be stored by relevant organisations - every architectural site by some architectural body, every fishing site by some suitable organisation. You might have even more local stores of data in local libraries, or in local universities. Obviously the more the merrier (although it would be good to have some protocol so that they could all signal their existence and what they held to each other.)

Of course, none of this is going to happen, because the intellectual monopolists would be squawking their heads off about the inclusion of "their" content· This would have knock-on consequences for UGC, since, as we know, the boundaries between what is fair use and copyright infringement is ill-defined without hugely-expensive court cases. No organisation is going to take the risk of getting it wrong given the insanely litigious nature of the content companies.

And so we must sit back and contemplate not only the inevitability of our own demise - however far off that might be - but also the inevitable destruction of all that really ace content we have created and will create. Because, you know, maintaining that 18th-century intellectual monopoly is just so much more important than preserving the unparalleled global explosion of human creativity we are currently witnessing online.

Follow me @glynmoody on Twitter or identi.ca.

09 July 2008

Come to the World eBook Fair

Every year, some of the top ebook companies and organisations come together to offer extremely large numbers of ebooks, absolutely free (mostly as in beer, but often as in freedom) as part of the World eBook Fair. Here are the facts and figures:


Third Annual World eBook Fair: July 4th to August 4th

Just two years ago The First World eBook Fair came on the scene with about 1/3 million books, doubled to 2/3 million in 2008, and now over one million.

Created by contributions from 100+ eLibraries from around the world, here are the largest collections.

As of midnight Central Daylight Time July 4, 2008 these are the approximate numbers:

100,000+ from Project Gutenberg
500,000+ from The World Public Library
450,000+ from The Internet Archive
160,000+ from eBooks About Everything

..17,000+ from IMSLP

1,227,000+ Grand Total

Pretty impressive.

And while we're on the subject of free, here is a good list of "100+ Sources for Free-As-In-Beer Books & Texts Online", which includes a lot of fairly obscure but highly worthy sites. Recommended.

13 December 2007

Building the Zotero Commons

One of the many insights that have come out of open source is what might be called the "pebble on the cairn" effect - the idea that by combining the small, even negligible, individual efforts we can create something large and durable.

Here's a perfect example that builds on the fact that scholars very often scan books in the public domain during the course of their research, but then don't do anything with those scans. What if they were all brought together, and then fed into an OCR system?

If many researchers have had to scan rare documents or books for their own perusal, there’s a potential treasure trove of material that exists among their combined efforts. Rather than let all that scholarship rot, or waste away in data files, the university’s Center for History and New Media sees an opportunity to create an open archive of scholarly resources in the public domain.

...

In partnership with the Internet Archive, and with funding from the Andrew W. Mellon Foundation, the center is creating a way for scholars to upload existing data files to be optically scanned (to make them text-searchable) and stored in a database available to the public.

Even better is that fact that open source software can be used to make realise this idea:

The vehicle for the new environment will be the Zotero plug-in for the Firebox browser, also developed by the center. The software stores Web pages, collects citations and lets scholars annotate and organize online documents. A new feature of the plug-in will allow people to collaborate and share materials through a dedicated server. Building on that functionality, according to Cohen, the system will allow scholars to drag and drop documents onto an icon in Zotero that essentially sends it to the Internet Archive for storage and free optical character recognition.

The eventual result of the project, called Zotero Commons, could be reduced need need for research trips, Cohen suggested.

(Via Open Access News.)

23 November 2007

We Demand Books on Demand

One of the interesting results of the move to digital texts is a growing realisation that analogue books still have a role to play. Similarly, it's clear that analogue books serve different functions, and that feeds into their particular physical form. So some books may be created as works of art, produced to the very highest physical standards, while others may simply be convenient analogue instantiations of digital text.

Public domain books are likely to fall into the latter class, which means that ideally there should be an easy way to turn such e-texts into physical copies. Here's one:

This is an experiment to see what the demand for reprints of public domain books would be. This free service can take any book from the Internet Archive (that is in public domain) and reprint it using Lulu.com. Prices of the books are rounded up from Lulu.com cost prices to the nearest $0.99 to cover the bandwidth and processing power that we rent from Amazon using their EC2 service. There is also a short post on my blog about it.

How Does It Work

Anyone with an email address can place a request on this page using an Internet Archive link or ID. Your request will be forwarded to our conversion server which will convert the appropriate book to printable form, and sends it off to Lulu.com. When the book has been uploaded, it will be made for immideate ordering and shipping, and you will receive a link to it via email. Currently, only soft cover books are supported in 6"x9", 6.625"x10.25" or 8"x11" trim sizes.

Interesting to see Lulu.com here, confirming its important place as a mediator between the digital and analogue worlds. (Via Open Access News.)

01 November 2007

Long Live LibriVox

To my shame, I only discovered the wonderful LibriVox recently. Now it's passed a milestone in its short history:

Well, we did it. We just cataloged our 1,000th book, and for that a huge thank you must go out to everyone who has ever said or written the word LibriVox. Thank you first to the readers for lending their voices to something wonderful; to the Book Coordinators who pull things together; to the Meta Coordinators who get all this audio up on the net; to the Moderators who keep things running smoothly on our forum. And of course the other people: the proof listeners, the catalog development team, the web site designers and fixers, and all the forum volunteers of every stripe.

And more: to our listeners, and supporters, to Dan for keeping the servers running; to the Internet Archive for providing hosting for all our media, which makes it all possible; to Project Gutenberg (and other public domain projects) for liberating all this wonderful text onto the web.

And of course a big thank you to all our families and friends who live with our varying levels of LibriVox addiction.

Interesting to note that LibriVox feeds (in the nicest possible way) off Project Gutenberg, another great digital commons. Interesting, too, to see that they call LibriVox an "addiction"; that's what makes these projects so great: sheer, unadulterated dependancy.... (Via Michael Geist.)

07 March 2007

Remembrance of Sims Past

There's a fascinating post over on 3pointD.com, which exhumes some screenshots of sims as they were three years ago, and contrasts them with their present form. It's impressive to see how far Second Life has come in that time - and exciting to consider how far it might go in the next three years.

But seeing these old sims made me wonder whether we are in danger of losing our virtual past, since these screenshots are the exception, rather than the rule. When the history of virtual worlds comes to be written, vital data about how things looked in those days - nowadays, too - will have gone for ever.

Clearly, what we need is a kind of Internet Archive for virtual worlds that preserves not just the screenshots, but maybe the actual data files for "historic" and representative sims - a Virtual World Archive. Brewster Kahle, are you listening?

23 January 2007

Have Pity on the Orphans

Oh dear, Larry's still having no luck rolling back US copyright law:

In a move that's a blow to the U.S. movement to reform copyright law, the U.S. 9th Circuit Court of Appeals ruled against the Internet Archive's Brewster Kahle, in his lawsuit to allow orphaned works into the public domain.

Rejecting the argument of Larry Lessig, the court decided the case was too close to Lessig's Eldred copyright suit of 2002, and that's settled business

09 November 2006

Tapping into the Digital Tipping Point

For some reason, the idea of open source film is one that exerts a strong fascination on people. I've written about it before, and here's another one:

The Digital Tipping Point film project is an open source film project about the big changes that open source software will bring to our world. Like the printing press before it, open source software will empower average people to create an immense wave of new literature, art, and science.

...


The first DTP film will follow my individual personal growth from being an attorney who feared computer technology to being a community activist who picks up technology tips while shooting this movie, and brings that technology back to a local public school.

So far, so dull, you might think. But more interestingly:

We will make as many films as the open source film community would like to make. The DTP project will actually be many, many films made about free open source software. We are giving away our footage under a Creative Commons license on the Internet Archive's Digital Tipping Point Video Collection.

There's another aspect to this. The 300 or so hours of interviews that have been conducted for this film will form an invaluable record of some of the key people in the open source world, a resource that future historians will be able to tap.

Which reminds me: I really must put online the hundreds of hours of interviews that I did for Rebel Code six years ago: it would make an interesting foil to the present material.

29 September 2006

European Digital Library, European Archive

Some time back I wrote about the European Digital Library. But it seems that this isn't enough: now we have the European Archive, too, which seems even more ambitious. For as well as providing access to digital versions of traditional content, it seems to be aiming to become a European mirror of the wonderful Internet Archive:

The European Archive is a non-profit foundation working towards universal access to all knowledge. The archive will achieve this through partnerships with libraries, museums, other collection bodies, and through building its own collections. The primary goal of collecting this knowledge is to make it as publicly accessible as possible, via the Internet and other means.

...

As the web has grown in importance as a publishing medium, we are behind in bringing into operation the archiving and library services that will provide enduring access to many important resources. Where some assumed web site owners would archive their own materials, this has not generally been the case. If properly archived, the Web history can provide a tremendous base for time-based analysis of the content, the topology including emerging communities and topics, trends analysis etc. as well as an invaluable source of information for the future.

The foremost effort to archive the Web has been carried on in the US by the Internet Archive, a non-profit foundation based in San Francisco. Every two months, large snapshots of the surface of the web are archived by the Internet Archive since 1996.

This entire collection offers 500 terabytes of data of major significance in all domain that have been impacted by the development of the Internet, that is, almost all. This represent large amount of data (petabytes in the coming years) to crawl, organize and give access to.

By partnering with the Internet Archive, the European Archive is laying down the foundation of a global Web archive based in Europe.

Obviously, all this begs scads of questions to do with access and copyright, but at least it's a start.

09 September 2006

Sorry, Larry...

...but I can't agree on this one. You write:

Check out webcitation.org -- a project run at the University of Toronto. The basic idea is to create a permanent URL for citations, so that when the Supreme Court, e.g., cites a webpage, there's a reliable way to get back to the webpage it cited. They do this by creating a reference URL, which then will refer back to an archive of the page created when the reference was created. E.g., I entered the URL for my blog ("http://lessig.org/blog"). It then created an archive URL "http://www.webcitation.org/5IlFymF33". Click on it and it should take you to an archive page for my blog.

This is the TinyURL problem all over again. It destroys one of the greatest features of the Web: its transparency. You can generally see where you are going and some of the structure of what you will find there. TinyURLs and Larry's recommendation do away with this.

Another point is that it's actually harder to enter gobbledygook like "http://www.webcitation.org/5IlFymF33" than even long, but comprehensible URLs, so this system doesn't even achieve the goal of making addresses easier to enter.

Agreed, we need an archive of the Web: but we already have one in the wonderful Internet Archive. What we really need to do is to support it better, with more dosh and more infrastructure.

03 May 2006

The Nitty-Gritty of Net Neutrality

Net neutrality - the idea that the underlying technologies of the Internet should never care or even know about the details of who you are or what you are doing with the data packets it is conveying - is much in the news lately, what with outrageous demands from telecommunications companies to be allowed to charge different rates for different traffic. If you ever had any doubts that we need Net neutrality, here's someone who might convince you, since he knows a thing or two about this area.

24 April 2006

Murdering Memory

This press release from the US National Archives raises a key issue for the digital age: the need for archives to act in a completely transparent fashion. If, as has been happening, archives can be silently "disappeared" by security forces, history - built on sand at the best of times - becomes even more unstable.

The words of the grandly-named Archivist of the United States should be framed on the walls of everyone working in the world of digital memories:

There can never be a classified aspect to our mission. Classified agreements are the antithesis of our reason for being.

Imagine, for example, if the great and wonderful Internet Archive were forced to delete materials, without even leaving a notice to that effect. Perhaps they already have.