Showing posts with label google book search. Show all posts
Showing posts with label google book search. Show all posts

17 September 2009

Analogue or Digital? - Both, Please

Recently, I bought the complete works of Brahms. Of course, I was faced with the by-now common problem of whether to buy nostalgic CDs, or evanescent MP3s. The price was about the same, so there was no guidance there. Of course, ecologically, I should have gone for the downloads, but in the end I choose the CDs - partly for the liner stuff you never get with an MP3, and partly because I have the option of degrading the CD bits to lossy MP3, which doesn't work so well the other way.

So imagine my surprise - and delight - when I discovered after paying for said CDs that the company - Deutsche Grammophon - had also given me access to most of the CDs as streams from its Web site, for no extra cost (I imagine the same would have been true of the MP3s). This was a shrewd move because (a) it made me feel good about the company, even though it cost them very little, and (b) I'm now telling people about this fact, which is great publicity for them.

But maybe my delight is actually a symptom of something deeper: that having access to both analogue and digital instantiations of information is getting the best of both worlds.

This struck me when I read the following story:

Google will make some 2 million out-of-copyright books that it has digitally scanned available for on-demand printing in a deal announced Thursday. The deal with On Demand Books, a private New York-based company, lets consumers print a book in about 10 minutes, and any title will cost around $8.

The books are part of a 10 million title corpus of texts that Google ( GOOG - news - people ) has scanned from libraries in the U.S. and Europe. The books were published before 1923, and therefore do not fall under the copyright dispute that pits Google against interests in technology, publishing and the public sector that oppose the company's plans to allow access to the full corpus.

That in itself, is intriguing: Google getting into analogue goods? But the real importance of this move is hinted at in the following:

On Demand already has 1.6 million titles available for print, but the Google books are likely to be more popular, as they can be searched for and examined through Google's popular engine.

That's true, but not really the key point, which is that as well as being able to search *for* books, you can search *through* them. That is, Google is giving you an online search capability for the physical books you buy from them.

This is a huge breakthrough. At the moment, you have to choose between the pleasure of reading an analogue artefact, and the convenience of its digital equivalent. With this new scheme, Google will let you find a particular phrase - or even word - in the book you have in your hands, because the latter is a physical embodiment of the one you use on the screen to search through its text.

The trouble is, of course, that this amazing facility is only available for those books out of copyright that Google has scanned. Which gives us yet another reason for repealing the extraordinarily stupid copyright laws that stop this kind of powerful service being offered for *all* text.

Follow me @glynmoody on Twitter or identi.ca.

06 April 2009

Google's Perpetual Monopoly on Orphan Works

Here's an interesting analysis of the Google Book Search settlement. This, you will recall, resolved the suit that authors and publishers brought against Google for scanning books without permission - something it maintained it could do without, since it only wanted to index its contents, not display them in their entirely.

At first this looked like an expensive and unnecessary way out for Google: many hoped that it would fight in the courts to determine what was permitted under fair use. But as people have had time to digest its implications, the settelement is beginning to look like a very clever move:


Thanks to the magic of the class action mechanism, the settlement will confer on Google a kind of legal immunity that cannot be obtained at any price through a purely private negotiation. It confers on Google immunity not only against suits brought by the actual members of the organizations that sued Google, but also against suits brought by anyone who doesn’t explicitly opt out. That means that Google will be free to mine the vast body of orphan works without fear of liability.

Any competitor that wants to get the same legal immunity Google is getting will have to take the same steps Google did: start scanning books without the publishers’ and authors’ permission, get sued by authors and publishers as a class, and then negotiate a settlement. The problem is that they’ll have no guarantee that the authors and publishers will play along. The authors and publishers may like the cozy cartel they’ve created, and so they may have no particular interest in organizing themselves into a class for the benefit of the new entrant. Moreover, because Google has established the precedent that “search rights” are something that need to be paid for, it’s going to be that much harder for competitors to make the (correct, in my view) argument that indexing books is fair use.

It seems to me that, in effect, Google has secured for itself a perpetual monopoly over the commercial exploitation of orphan works. Google’s a relatively good company, so I’d rather they have this monopoly than the other likely candidates. But I certainly think it’s a reason to be concerned.

Cunning.

Follow me on Twitter @glynmoody

13 November 2007

Google Book Search: Boons and Banes

You're probably not big into antedating - the academic game of finding earlier citations of words and phrases. But here's an interesting tale, because it shows the pros and cons of Google Book Search:

this discovery is typical of how Google Book Search now provides limited assistance to participants in what Erin McKean recently called "the competitive sport of antedating." Bonnie Taylor-Blake happened upon the relevant volume of Car Life but had no way of determining the precise context or even the correct issue and page number because of the limitations of Google's "snippet view." Fortunately, the metadata for this record includes accurate volume information ("v.9 1962-1963"), which allowed Bonnie to zero in on the correct page in a library copy of Car Life.

In other words, Google Book Search tantalises the antedater by showing earlier uses, but makes it awkward if you want to pin down the details. This is yet another reason why we need full-text open access to all books: otherwise, imagine the antedaters' anguish.

02 August 2007

Google's Choice of Hercules

Further to yesterday's post about a call to respect free use of copyrighted material, here's an interesting point about Google's participation:

it certainly seems ironic that Google is being associated with this complaint, at the same time as they are putting putting highly misleading notices on scanned public domain works:

The Google notice, found as page 1 on downloadable PDFs of public domain works available via Google Book Search, "asks" users to:

Make non-commercial use of the files. We designed Google Book Search for use by individuals, and we request that you use these files for personal, non-commercial purposes...

Maintain attribution The Google “watermark” you see on each file is essential for informing people about this project and helping them find additional materials through Google Book Search. Please do not remove it.

There is clear U.S. precedent that scanning a public domain work does not create a new copyright so there seems to be absolutely zero legal basis for restricting use or forcing users to preserve inserted per-page watermarks-cum-advertisements.

So, which side are you on, Google? (Via Michael Hart.)

05 July 2007

Google Books Open Up - A Bit

One of the problems with the otherwise laudable Google Book Project is that it's not actually providing access to the texts, just adding searchability. That's useful, but not really want we need. And since many of the the books that it is scanning are in the public domain, there seems no reason not to offer full access.

Google seems to have realised this, finally:

I work on a project at Google called Google Accessible Search, which helps promote results that are more accessible to visually impaired users. Building on that work is today's release of accessible public domain works through Google Book Search. It's opening up hundreds of thousands of books to people who use adaptive technologies such as speech output, screen readers, and Braille displays.

As this notes, one of the advantages of opening up in this way is that the text may be re-purposed for adaptive technologies. Put another way, texts that remain closed, locked up behind DRM or similar, are largely denied to people who rely on those technologies - another reason why closing up knowledge in this way is ethically wrong.

02 March 2007

Googling World of Warcraft

WoW - or rather World of Warcraft - real search for a virtual world:

The Armory is a vast searchable database of information for World of Warcraft - taken straight from the real servers, updated in real time, and presented in a user-friendly interface. Since the Armory pulls its data from the actual game servers, it is the most comprehensive and up-to-date database on the characters, arena teams, and guilds of World of Warcraft in existence.

(Via Clickable Culture.)

05 February 2007

Lifelogging

I've touched on the subject of lifelogging - recording every moment of your waking day - before, but this feature is by far the best exploration of the subject I've come across.

What's fascinating is that it draws together so many apparently disparate threads: openness, privacy, security, search technologies, storage, memories, blogging, online videos, virtual worlds, etc. etc. (Via 3pointD.com.)

30 November 2006

The Digital Library of India

There's plenty of noise in the press (and blogs) about the Google Book project, or the Million Book Project. These are all interesting and laudable (well, those bits of it in the public domain, at least), but what about elsewhere?

Here's an interesting piece about the Digital Library of India (DLI) initiative. Here, for example, is an issue I bet you've never considered before - I know I haven't:

Designing an accurate OCR in the Indian languages is one of the greatest challenges in computer science. Unlike European languages, Indian languages have more than 300 characters to distinguish, a task that is an order of magnitude greater than distinguishing 26 characters. This also means that the training set needed is significantly larger for Indian languages. It is estimated that at least a ten million-word corpus would be needed in any font to recognize Indian languages with an acceptable level of accuracy. DLI is expected to provide such a phenomenally large amount of data for training and testing of OCRs in Indian Languages. Many of the contents, besides scanned images, have been manually entered for this purpose. Using this extremely large repertoire of data, a Kannada OCR has been developed.

(Via Open Access News.)

25 September 2006

Searching for an Edge

Search lies at the heart of modern desktop computing (just ask Google). So if free software wants to make a breakthrough on the desktop, coming up with a better search tool might just be the way to do it. Perhaps this could help.

31 August 2006

Books Be-Googled

I've not really been paying much attention to the Google Book Search saga. Essentially, I'm totally in favour or what they're up to, and regard publishers' whines about copyright infringement as pathetic and wrong-headed. I'm delighted that Digital Code of Life has been scanned and can be searched.

It seems obvious to me that scanning books will lead to increased sales, since one of the principal obstacles to buying a book is being uncertain whether it's really what you want. Being able to search for a few key phrases is a great way to try before you buy.

Initially, I wasn't particularly excited by the news that Google Book Search now allows public domain books to be downloaded as images (not as text files - you need Project Gutenberg for that.) But having played around with it, I have to say that I'm more impressed: being able to see the scan of venerable and often obscure books is a delightful experience.

It is clearly an important step in the direction of making all knowledge available online. Let's hope a few publishers will begin to see the project in the same light, and collaborate with the thing rather than fight it reflexively.

04 March 2006

The European Digital Library: Dream, but Don't Touch

With all the brouhaha over the Google Book Search Library Project, it is easy to overlook other efforts directed along similar lines. I'm certainly guilty of this sin of omission when it comes to The European Library, about which I knew nothing until very recently.

The European Library is currently most useful for carrying out integrated searches across many European national libraries (I was disappointed to discover that neither Serbia nor Latvia has any of my books in their central libraries). Its holdings seem to be mainly bibliographic, rather than links to the actual text of books (though there are some exceptions).

However, a recent press release from the European Commission seems to indicate that The European Library could well be transmogrified into something altogether grander: The European Digital Library. According to the release:

At least six million books, documents and other cultural works will be made available to anyone with a Web connection through the European Digital Library over the next five years. In order to boost European digitisation efforts, the Commission will co-fund the creation of a Europe-wide network of digitisation centres.

Great, but it adds:

The Commission will also address, in a series of policy documents, the issue of the appropriate framework for intellectual property rights protection in the context of digital libraries.

Even more ominously, the press release concludes:

A High Level Group on the European Digital Library will meet for the first time on 27 March 2006 and will be chaired by Commissioner Reding. It will brings together major stakeholders from industry and cultural institutions. The group will address issues such as public-private collaboration for digitisation and copyrights.

"Stakeholders from industry and cultural institutions": but, as usual, nobody representing the poor mugs who (a) will actually use this stuff and (b) foot the bill. So will our great European Digital Library be open access? I don't think so.

24 February 2006

Google's Creeping Cultural Imperialism

Another day, another Google launch.

As the official Google blog announced, the company is launching a pilot programme to digitise national archive content "and offer it to everyone in the world for free."

And what national archives might these be? Well, not just any old common-or-garden national archives, but "the National Archives", which as Google's blog says:

was founded with the express purpose of ... serving America by documenting our government and our nation.

Right, so these documents are fundamentally "serving America". A quick look at what's on offer reveals the United Motion Newsreel Pictures, a series which, according to the accompanying text, "was produced by the Office of War Information and financed by the U. S. government", and was "[d]esigned as a counter-propaganda medium."

So there we have it: this is (literally) vintage propaganda. And nothing wrong with that: everybody did it, and it's useful to be able to view how they did it. But as with the Google Print/Books project, there is a slight problem here.

When Google first started, it did not set out to become a search engine for US Web content: it wanted it all - and went a long way to achieving that, which is part of its power. But when it comes to books, and even more where films are concerned, there is just too much to hope to encompass; of necessity, you have to choose where to start, and where to concentrate your efforts.

Google, quite sensibly, has started with those nearest home, the US National Archives. But I doubt somehow that it will be rushing to add to other nations' archives. Of course, those nations could digitise and index their own archives - but it wouldn't be part of the Google collection, which would always have primacy, even if the indexed content were submitted to them.

It's a bit like Microsoft's applications: however much governments tell the company to erect Chinese walls between the programmers working on Windows and those working on applications, there is bound to be some leakiness. As a result, Windows programs from Microsoft have always had an advantage over those from other companies. The same will happen with Google's content: anything it produces will inevitably be more tightly integrated into their search engine.

And so, wittingly or not, Google becomes an instrument of cultural imperialism, just like that nice Mr Chirac warned. The problem is that there is nothing so terribly wrong with what Google is doing, or even the way that it is doing it; but it is important to recognise that these little projects that it sporadically announces are not neutral contributions to the sum of the world's open knowledge, but come with very particular biases and knock-on effects.

13 December 2005

Publish and Be Damned!

The wilful misunderstanding of Google Books by traditional publishers is truly sad to see. They continue to propagate the idea that Google is somehow going to make the entire text of their titles available, whereas in fact it simply wants to index that text, and make snippets available in its search results.

As a an author I welcome this; nothing makes me happier than see that a search for the phrase "digital code" at Google Books brings up my own title as the top hit. The fact that anyone can dip into the book can only increase sales (assuming the book is worth reading, at least). Yes, it might be possible for a gang of conspirators to obtain scans of the entire book if they had enough members and enough time to waste doing so. But somehow, I think it would be easier to buy the book.

Of course, what is really going on here is a battle for control - as is always the case with open technologies. The old-style publishers are fighting a losing battle against new technologies (and open content) by being as obstructive as possible. Instead, they should be spending their energies working out new business models that let them harness the Internet and search engines to make their books richer and more available to readers.

They are bound to lose: the Internet will continue to add information until it is "good enough" for any given use. This may take time, and the mechanisms for doing so still need some work (just look at Wikipedia), but the amount of useful information is only going in one direction. Traditional publishers will cling on to the few titles that offer something beyond this, but the general public will have learned to turn increasingly to online information that is freely available. More importantly, they will come to expect that free information will be there as a matter of course, and will unlearn the habit of buying expensive stuff printed on dead trees.

It is this dynamic that is driving all of the "opens" - open source, open access, open genomics. The availability of free stuff that slowly but inexorably gets better means that the paid stuff will always be superseded at some point. It happened with the human genome data, when the material made available by the public consortium matched that of Celera's subscription service, which ultimately became irrelevant. It is happening with open source, as GNU/Linux is being swapped in at every level, replacing expensive Unix and Microsoft Windows systems. And it will happen with open content.