04 April 2006

Exploring the Digital Universe

The Digital Universe - a kind of "When Larry (Sanger) left Jimmy (Wales)" story - remains a somewhat nebulous entity. In some ways, it's forward to the past, representing a return to the original Nupedia that Larry Sanger worked on before Wikipedia was founded. In other respects, it's trying a new kind of business model that looks brave, to put it mildly.

Against this background, any insight into the what and how of the Digital Universe is welcome, and this article on the "eLearning Scotland" site (CamelCase, anyone?) provides both (via Open Access News). Worth taking a look.

Coughing Genomic Ink

One of the favourite games of scholars working on ancient texts that have come down to us from multiple sources is to create a family tree of manuscripts. The trick is to look for groups of textual divergences - a word added here, a mis-spelling there - to spot the gradual accretions, deletions and errors wrought by incompetent, distracted or bored copyists. Once the tree has been established, it is possible to guess what the original, founding text might have looked like.

You might think that this sort of thing is on the way out; on the contrary, though, it's an extremely important technique in bioinformatics - hardly a dusty old discipline. The idea is to treat genomes deriving from a common ancestor as a kind of manuscript, written using just the four letters - A, C, G and T - found in DNA.

Then, by comparing the commonalities and divergences, it is possible to work out which manuscripts/genomes came from a common intermediary, and hence to build a family tree. As with manuscripts, it is then possible to hazard a guess at what the original text - the ancestral genome - might have looked like.

That, broadly, is the idea behind some research that David Haussler at the University of California at Santa Cruz is undertaking, and which is reported on in this month's Wired magazine (freely available thanks to the magazine's enlightened approach to publishing).

As I described in Digital Code of Life, Haussler played an important role in the closing years of the Human Genome Project:

Haussler set to work creating a program to sort through and assemble the 400,000 sequences grouped into 30,000 BACs [large-scale fragments of DNA] that had been produced by the laboratories of the Human Genome Project. But in May 2000, when one of his graduate students, Jim Kent, inquired how the programming was going, Haussler had to admit it was not going well. Kent had been a professional programmer before turning to research. His experience in writing code against deadlines, coupled with a strongly-held belief that the human genome should be freely available, led him to volunteer to create the assembly program in short order.

Kent later explained why he took on the task:

There was not a heck of a lot that the Human Genome Project could say about the genome that was more informative than 'it's got a lot of As, Cs, Gs and Ts' without an assembly. We were afraid that if we couldn't say anything informative, and thereby demonstrate 'prior art', much of the human genome would end up tied up in patents.

Using 100 800 MHz Pentiums - powerful machines in the year 2000 - running GNU/Linux, Kent was able to lash up a program, assemble the fragments and save the human genome for mankind.

Haussler's current research depends not just on the availability of the human genome, but also on all the other genomes that have been sequenced - the different manuscripts written in DNA that have come down to us. Using bioinformatics and even more powerful hardware than that available to Kent back in 2000, it is possible to compare and contrast these genomes, looking for tell-tale signs of common ancestors.

But the result is no mere dry academic exercise: if things go well, the DNA text that will drop out at the end will be nothing less than the genome of one of our ancient forebears. Even if Wired's breathless speculations about recreating live animals from the sequence seem rather wide of the mark - imagine trying to run a computer program recreated in a similar way - the genome on its own will be treasure enough. Certainly not bad work for those scholars who "cough in ink" in the world of open genomics.

Ozymandias in Blogland

A fascinating post on Beebo (via C|net): a list of the top 50 blogs, six years ago. It's interesting to see some familiar names at the top, but even more interesting to see so many (to me) unknown ones.

"Look on my works, ye mighty, and despair!" was my first thought. My second, was to create a blog called "Ozymandias" on Blogger, so that I could link to it from this post. But somebody already beat me to it.

Its one - and only - post is dated Sunday, January 07, 2001.

Look on my works, ye mighty, and despair!

03 April 2006

To DRM or Not to DRM - That is the Question

Digital Rights Management - or Digital Restrictions Management as Richard Stallman likes to call it - is a hot topic at the moment. It figured largely in an interview I did with the FSF's Eben Moglen, which appeared in the Guardian last week. Here's the long version of what he had to say on DRM:

In the year 2006, the home is some real estate with appliances in it. In the year 2016, the home is a digital entertainment and data processing network with some real estate wrapped around it.

The basic question then is, who has the keys to your home? You or the people who deliver movies and pizza? The world that they are thinking about is a world in which they have the keys to your home because the computers that constitute the entertainment and data processing network which is your home work for them, rather than for you.

If you go to a commercial MIS director and you say, Mr VP, I want to put some computers inside your walls, inside your VPN, on which you don't have root, and you can't be sure what's running there. But people outside your enterprise can be absolutely certain what software is running on that device, and they can make it do whatever they think necessary. How do you feel about that? He says, No, thank you. And if we say to him, OK, how about then if we do that instead in your children's house? He says, No, thank there either.

That's what this is about for us. User's rights have no more deep meaning than who controls the computer your kid uses at night when he comes home. Who does that computer work for? Who controls what information goes in and out on that machine? Who controls who's allowed to snoop, about what? Who controls who's allowed to keep tabs, about what? Who controls who's allowed to install and change software there? Those are the question which actually determine who controls the home in 2016.

This stuff seems far away now because, unless you manage computer security for a business, you aren't fully aware of what it is to have computers you don't control part of your network. But 10 years from now, everybody will know.

Against this background, discussions about whether Sun's open source DRM solution DReaM - derived from "DRM/everywhere available", apparently - seem utterly moot. Designing open source DRM is a bit like making armaments in an energy-efficient fashion: it rather misses the point.

DRM serves one purpose, and one purpose only: to control users. It is predicated on the assumption that most people - not just a few - are criminals ready to rip off a company's crown jewels - its "IP" - at a moment's notice unless the equivalent of titanium steel bars are soldered all over the place.

I simply do not accept this. I believe that most people are honest, and the dishonest ones will find other ways to get round DRM (like stealing somebody's money to pay for it).

I believe that I am justified in making a copy of a CD, or a DVD, or a book provided it is for my own use: what that use is, is no business of the company that sold it to me. What I cannot do is make a copy that I sell to someone else for their use: clearly that takes away something from the producers. But if I make a backup copy of a DVD, or a second copy of a CD to play in the car, nobody loses anything, so I am morally quite justified in this course of action.

Until the music and film industries address the fundamental moral issues - and realise that the vast majority of their customers are decent, honest human beings, not crypto-criminals - the DRM debate will proceed on a false basis, and inevitably be largely vacuous. DRM is simply the wrong solution to the wrong problem.

The Birth of Emblogging

I've written before about the blogification of the cyber union - how everything is adopting a blog-like format. Now comes a complementary process: emblogging, or embedding blogs directly into other formats.

This flows from the observation that blogs are increasingly at the sharp end of reporting, beating staider media like mere newspapers (even their online incarnations) to the punch. There is probably some merit in this idea, since bloggers - the best of them - are indeed working around the clock, unable to unplug, whereas journalists tend to go home and stop. And statistically this means that some blogger, somewhere, is likely to be online and writing when any given big story breaks. So why not embed the best bits from the blogs into slow-moving media outlets? Why not emblog?

Enter BlogBurst, a new company that aims to mediate between those bloggers and the traditional publications (I discovered this, belatedly, via TechCrunch). The premise seems sensible enough, but I have my doubts about the business model. At the moment, the mainsteam media get the goods, BlogBurst gets the dosh, and the embloggers get, er, the glory.

Still, an interesting and significant development in the rise and rise of the blog.

02 April 2006

Wiki Wiki Wikia

Following one of my random wanders through the blogosphere I alighted recently on netbib. As the site's home page explains, this is basically about libraries, but there's much more than this might imply.

As a well as a couple of the obligatory wikis (one on public libraries, the other - the NetBibWiki - containing a host of diverse information, such as a nice set of links for German studies), there is also a useful collection of RSS feeds from the library world, saved on Bloglines.

The story that took me here was a post about something called Wikia, which turns out to be Jimmy Wales' wiki company (and a relaunch of the earlier Wikicities). According to the press release:

Wikia is an advertising-supported platform for developing and hosting community-based wikis. Specifically, Wikia enables groups to share information, news, stories, media and opinions that fall outside the scope of an encyclopedia. Jimmy Wales and Angela Beesley launched Wikia in 2004 to provide community-based wikis inspired by the model of Wikipedia--the free, open source encyclopedia founded by Jimmy Wales and operated by the Wikimedia Foundation, where Wales and Beesley serve as board members.

Wikia is committed to openness, inviting anyone to contribute web content. Authors retain their own copyrights, but allow others to freely reuse their content under the GNU Free Documentation License, allowing widespread distribution of knowledge and ideas.

Wikia supports the development of the open source software that runs both Wikipedia and Wikia, as well as thousands of other wiki sites. Among other contributions, Wikia plans to enhance the software with usability features, spam prevention, and vandalism control. All of Wikia's development work will, of course, be fed back into the open source code.

In a sense, then, this is yet more of the blogification of the online world, this time applied to wikis.

But I'm not complaining: if that nice Mr Wales can make some money and feed back improvements to the underlying MediaWiki software used by Wikipedia and many other wikis, all to the good. I just hope that the dotcom 2.0 bubble lasts long enough (so that's why they used the Hawaiian word for "quick" in the full name "wiki wiki").

01 April 2006

Open Access Opens the Throttle

It's striking that, so far, open access has had a relatively difficult time making the breakthrough into the mainstream - despite the high-profile example of open source to help pave the way. Whether this says something about institutional inertia, or the stubbornness of the forces ranged against open access, is hard to say.

Against this background, a post (via Open Access News) on the splendidly-named "The Imaginary Journal of Poetic Economics" blog (now why couldn't I have thought of something like that?) is good news.

Figures from that post speak for themselves:
In the last quarter, over 780,000 records have been added to OAIster, suggesting that those open access archives are beginning to fill! There are 170 more titles in DOAJ, likely an understated increase due to a weeding project. 78 titles have been added to DOAJ in the past 30 days, a growth rate of more than 2 new titles per day.

OAIster refers to a handy central search site for "freely available, previously difficult-to-access, academically-oriented digital resources", while DOAJ is a similarly-indispensable directory of open access journals. The swelling holdings of both augur well for open access, and offer the hope that the breakthrough may be close.

Update: An EU study on the scientific publishing market comes down squarely in favour of open access. As Peter Suber rightly notes, "this is big", and is likely to give the movement a considerable boost.

When Blogs Are Beyond a Joke

As any fule kno, today is April Fool's Day. It has long been a tradition among publications - even, or perhaps especially, the most strait-laced - to show that they are not really so cold, callous and contemptible as most people think, by trying to sneak some wry little joke past their readers. Ideally, this will achieve the tricky combination of being both outrageous and just about plausible.

This was fine when news stories came sequentially and slowly: it was quite good fun sifting through a publication trying to work out which item was the fake story. But in the modern, blog-drenched world that we inhabit these days, the net effect of April Fool's Day combined with headline aggregators is to find yourself confronted by reams of utter, wilful nonsense, lacking any redeeming counterweight of real posts.

As many people have suspected when it comes to blogs, you really can have too much of a good thing.

Update: Maybe the solution is this cure for information overload.

31 March 2006

Open Source Rocks

There's nothing new about companies deciding to open source their products and make money in other ways. But it's still good to come across new examples of the breed to confirm that the logic remains as strong as ever.

A case in point is Symfony, which describes itself as "a web application framework for PHP5 projects". It is unusual in two respects: first, because it uses the liberal MIT licence, and secondly, because it is sponsored by a French company, Sensio. And according to them, open source rocks.

30 March 2006

Googling the Genome

I came across this story about Google winning an award as part of the "Captain Hook Awards for Biopiracy" taking part in the suitably piratical-sounding Curitiba, Brazil. The story links to the awards Web site - rather fetching in black, white and red - where there is a full list of the lucky 2006 winners.

I was particularly struck by one category: Most Shameful Act of Biopiracy. This must have been hard to award, given the large field to choose from, but the judges found a worthy winner in the shape of the US Government for the following reason:

For imposing plant intellectual property laws on war-torn Iraq in June 2004. When US occupying forces “transferred sovereignty” to Iraq, they imposed Order no. 84, which makes it illegal for Iraqi farmers to re-use seeds harvested from new varieties registered under the law. Iraq’s new patent law opens the door to the multinational seed trade, and threatens food sovereignty.

Google's citation for Biggest Threat to Genetic Privacy read as follows:

For teaming up with J. Craig Venter to create a searchable online database of all the genes on the planet so that individuals and pharmaceutical companies alike can ‘google’ our genes – one day bringing the tools of biopiracy online.

I think it unlikely that Google and Venter are up to anything dastardly here: from studying the background information - and from my earlier reading on Venter when I was writing Digital Code of Life - I think it is much more likely that they want to create the ultimate gene reference, but on a purely general, not personal basis.

Certainly, there will be privacy issues - you won't really want to be uploading your genome to Google's servers - but that can easily be addressed with technology. For example, Google's data could be downloaded to your PC in encrypted form, decrypted by Google's client application running on your computer, and compared with your genome; the results could then be output locally, but not passed back to Google.

It is particularly painful for me to disagree with the Coalition Against Biopiracy, the organisation behind the awards, since their hearts are clearly in the right place - they even kindly cite my own 2004 Googling the Genome article in their background information to the Google award.

29 March 2006

Linus Torvalds' First Usenet Posting

It was 15 years ago today that Linus made his first Usenet posting, to the comp.os.minix newsgroup. This is how it began:

Hello everybody,
I've had minix for a week now, and have upgraded to 386-minix (nice), and duly downloaded gcc for minix. Yes, it works - but ... optimizing isn't working, giving an error message of "floating point stack exceeded" or something. Is this normal?

Minix was the Unix-like operating system devised by Andy Tanenbaum as a teaching aid, and gcc a key hacker program that formed part of Stallman's GNU project. Linus' question was pretty standard beginner's stuff, and yet barely two days later, he answered a fellow-newbie's question as if he were some Minix wizard:

RTFSC (Read the F**ing Source Code :-) - It is heavily commented and the solution should be obvious (take that with a grain of salt, it certainly stumped me for a while :-).

He may have been slightly premature in according himself this elevated status, but it wasn't long before he not only achieved it but went far beyond. For on Sunday, 25 August, 1991, he made another posting to the comp.os.minix newsgroup:

Hello everybody out there using minix -
I'm doing a (free) operating system (just a hobby, won't be big and professional like gnu) for 386(486) AT clones. This has been brewing since april, and is starting to get ready.

The hobby, of course, was Linux, and this was its official announcement to the world.

But imagine, now, that Linus had never made that first posting back in March 1991. It could have happened: as Linus told me in 1996 when I interviewed him for a feature in Wired, back in those days

I was really so shy I didn't want to speak in classes. Even just as a student I didn't want to raise my hand and say anything.

It's easy to imagine him deciding not to “raise his hand” in the comp.os.minix newsgroup for fear of looking stupid in front of all the Minix experts (including the ultimate professor of computing, Tanenbaum himself). And if he'd not plucked up courage to make that first posting, he probably wouldn't have made the others or learned how to hack a simple piece of code he had written for the PC into something that grew into the Linux kernel.

What would the world look like today, had Linux never been written? Would we be using the GNU Hurd – the kernel that Stallman intended to use originally for his GNU operating system, but which was delayed so much that people used Linux instead? Or would one of the BSD derivatives have taken off instead?

Or perhaps there would simply be no serious free alternative to Microsoft Windows, no open source movement, and we would be living in a world where computing was even more under the thumb of Bill Gates. In this alternative reality, there would be no Google either, since it depends on the availability of very low-cost GNU/Linux boxes for the huge server farms that power all its services.

It's amazing how a single post can change the world.

28 March 2006

Dancing Around Openness

The concept of "openness" has featured fairly heavily in these posts - not surprisingly, given the title of this blog. But this conveniently skates over the fact that there is no accepted definition of what "open" really means in the context of technology. This has fairly serious implications, not least because it means certain companies can try to muddy the waters.

Against this background I was delighted to come across this essay by David A. Wheeler on the very subject, entitled "Is OpenDocument an Open Standard? Yes!" As his home page makes clear, David is well-placed to discuss this at the deepest level; indeed, he is the author of perhaps the best and most thorough analysis of why people should consider using open source software.

So if you ever wonder what I'm wittering on about, try reading David's essay on openness to find out what I really meant.

27 March 2006

The Science of Open Source

The OpenScience Project is interesting. As its About page explains:

The OpenScience project is dedicated to writing and releasing free and Open Source scientific software. We are a group of scientists, mathematicians and engineers who want to encourage a collaborative environment in which science can be pursued by anyone who is inspired to discover something new about the natural world.

But beyond this canonical openness to all, there is another, very important reason why scientific software should be open source. With proprietary software, you simply have to take on trust that the output has been derived correctly from the inputs. But this black-box approach is really anathema to science, which is about examining and checking every assumption along the way from input to output. In some sense, proprietary scientific software is an oxymoron.

The project supports open source scientific software in two ways. It has a useful list of such programs, broken down by category (and it's striking how bioinformatics towers over them all); in addition, those behind the site also write applications themselves.

What caught my eye in particular was a posting asking an important question: "How can people make money from open source scientific software?" There have been two more postings so far, exploring various ways in which free applications can be used as the basis of a commercial offering: Sell Hardware and Sell Services. I don't know what the last one will say - it's looking at dual licensing as a way to resolve the dilemma - but the other two have not been able to offer much hope, and overall, I'm not optimistic.

The problem goes to the root of why open source works: it requires lots of users doing roughly the same thing, so that a single piece of free code can satisfy their needs and feed off their comments to get better (if you want the full half-hour argument, read Rebel Code).

That's why the most successful open source projects deliver core computing infrastructure: operating system, Web server, email server, DNS server, databases etc. The same is true on the client-side: the big winners have been Firefox, OpenOffice.org, The GIMP, Audacity etc. - each serving a very big end-user group. Niche projects do exist, but they don't have the vigour of the larger ones, and they certainly can't create an ecosystem big enough to allow companies to make money (as they do with GNU/Linux, Apache, Sendmail, MySQL etc.)

Against this background, I just can't see much hope for commercial scientific open source software. But I think there is an alternative. Because this open software is inherently better for science - thanks to its transparency - it could be argued that funding bodies should make it as much of a priority as more traditional areas.

The big benefit of this approach is that it is cumulative: once the software has been funded to a certain level by one body, there is no reason why another should't pick up the baton and pay for further development. This would allow costs to be shared, along with the code.

Of course, this approach would take a major change of mindset in certain quarters; but since open source and the other opens are already doing that elsewhere, there's no reason why they shouldn't achieve it in this domain too.

Searching for an Answer

I have always been fascinated by search engines. Back in March 1995, I wrote a short feature about the new Internet search engines - variously known as spiders, worms and crawlers at the time - that were just starting to come through:

As an example of the scale of the World-Wide Web (and of the task facing Web crawlers), you might take a look at Lycos (named after a spider). It can be found at the URL http://lycos.cs.cmu.edu/. At the time of writing its database knew of a massive 1.75 million URLs.

(1.75 million URLs - imagine it.)

A few months later, I got really excited by a new, even more amazing search engine:

The latest pretender to the title of top Web searcher is called Alta Vista, and comes from the computer manufacturer Digital. It can be found at http://www.altavista.digital.com/, and as usual costs nothing to use. As with all the others, it claims to be the biggest and best and promises direct access to every one of 8 billion words found in over 16 million Web pages.

(16 million pages - will the madness never end?)

My first comment on Google, in November 1998, by contrast, was surprisingly muted:

Google (home page at http://google.stanford.edu/) ranks search result pages on the basis of which pages link to them.

(Google? - it'll never catch on.)

I'd thought that my current interest in search engines was simply a continuation of this story, a historical relict, bolstered by the fact that Google's core services (not some of its mickey-mouse ones like Google Video - call that an interface? - or Google Finance - is this even finished?) really are of central importance to the way I and many people now work online.

But upon arriving at this page on the OA Librarian blog, all became clear. Indeed, the title alone explained why I am still writing about search engines in the context of the opens: "Open access is impossible without findability."

Ah. Of course.

Update: Peter Suber has pointed me to an interesting essay of his looking at the relationship between search engines and open access. Worth reading.

26 March 2006

DE-commerce, XXX-commerce

One of the nuggets that I gathered from reading the book Naked Conversations is that there are relatively few bloggers in Germany. So I was particularly pleased to find that one of these rare DE-bloggers had alighted, however transiently, on these very pages, and carried, magpie-like, a gewgaw back to its teutonic eyrie.

The site in question is called Exciting Commerce, with the slightly pleonastic subheading "The Exciting Future of E-commerce". It has a good, clean design (one that coincidentally seems to use the same link colour as the HorsePigCow site I mentioned yesterday).

The content is good, too, not least because it covers precisely the subject that I lament is so hard to observe: the marriage of Web 2.0 and e-commerce. The site begs to differ from me, though, suggesting that there is, in fact, plenty of this stuff around.

Whichever camp you fall into, it's a useful blog for keeping tabs on some of the latest e-commerce efforts from around the world (and not just in the US), even if you don't read German, since many of the quotations are in English, and you can always just click on the links to see where they take you.

My only problem is the site's preference for the umbrella term "social commerce" over e-commerce 2.0: for me, the former sounds perilously close to a Victorian euphemism.

25 March 2006

Not Your Average Animal Farm

And talking of the commons, I was pleased to find that the Pinko Marketing Manifesto has acquired the tag "commons-based unmarketing" (and it's a wiki).

This site is nothing if not gutsy. Not content with promoting something proudly flying the Pinko flag (in America?), it is happy to make an explicit connection with another, rather more famous manifesto (and no, we're not talking about the Cluetrain Manifesto, although that too is cited as a key influence).

And talking of Charlie, another post says:

I started researching elitism versus the voice of the commons and I happened upon something I haven't read since second year university, The Communist Manifesto.

(So, that's re-reading The Communist Manifesto: how many brownie points does this woman want?)

And to top it all, HorsePigCow - for so it is named - has possibly the nicest customisation of the standard Minima Blogger template I've seen, except that the posts are too wide: 65 characters max is the rule, trust me.

Do take a gander.

Update: Sadly, I spoke too soon: the inevitable mindless backlash has begun....

The Commonality of the Commons

Everywhere I go these days, I seem to come across the commons. The Creative Commons is the best known, but the term refers to anything held in common for the benefit of all. A site I've just come across, called On the Commons, puts it well, stressing the concomitant need to conserve the commons for the benefit of future generations:

The commons is a new way to express a very old idea — that some forms of wealth belong to all of us, and that these community resources must be actively protected and managed for the good of all. The commons are the things that we inherit and create jointly, and that will (hopefully) last for generations to come. The commons consists of gifts of nature such as air, water, the oceans, wildlife and wilderness, and shared “assets” like the Internet, the airwaves used for broadcasting, and public lands. The commons also includes our shared social creations: libraries, parks, public spaces as well as scientific research, creative works and public knowledge that have accumulated over centuries.

It's also put together a free report that spells out in more detail the various kinds of commons that exist: the atmosphere, the airwaves, water, culture, science and even quiet.

What's fascinating for me is how well this maps onto the intertwined themes of this blog and my interests in general, from open content, open access and open spectrum to broader environmental issues. The recognition that there is a commonality between different kinds of commons seems to be another idea that is beginning to spread.

Picture This

I wrote about Riya.com a month ago; now it's out in beta, so you can try out its face recognition technology. I did, and was intrigued to find that this photo was tagged as "Bill Gates". Maybe Riya uses more artificial intelligence than they're letting on.

It's certainly a clever idea - after all, the one thing people (misanthropes apart) are interested in, is people. But you do have to wonder about the underlying technology when it uses addresses like this:

http://www.riya.com/highRes?search=1fSPySWh
FrHn7AnWgnSyHaqJl6bzuGByoFKJuG1H%2Fv
otjYbqlIMI22Qj88Vlcvz2uSnkixrhzHJP%0Aej%
2B9VuGvjiodlKDrBNS8pgy%2FaVqvckjfyo%2
BjhlL1sjK5CgHriGhifn3s2C1q%2B%2FnL1Emr
0OUPvn%2FM%0AJ0Ire5Zl2QUQQLUMi2Naq
Ny1zboiX7JtL77OG96NmV5VT8Buz4bzlyPFmi
ppcvmBJagMcftZjHUG%0AFlnXYIfp1VOGWx
gYijpgpDcsU9M4&pageNumber=9&e=bIaIR30d
SGNoZcG8jWL8z2LhcH%2FEg1LzsBF%2F6pr
Fd2Jm7tpMKFCXTu%2FBsOKk%2FVdS

I know a picture is supposed to be worth a thousand words, but not in the URL, surely....

A Question of Standards

Good to see Andy Updegrove's blog getting Slashdotted. This is good news not just for him, but also for his argument, which is that open source ideas are expanding into new domains (no surprise there to readers of this blog), and that traditional intellectual property (IP) models are being re-evaluated as a result.

Actually, this piece is rather atypical, since most of the posts are to do with standards, rather than open source or IP (though these are inevitably bound up with standards). Andy's blog is simply the best place to go for up-to-the-minute information on this area; in particular, he is following the ODF saga more closely - and hence better - than anyone. In other words, he's not just reporting on standards, but setting them, too.

24 March 2006

A Little Note About Microformats

Further proof that things are starting to bubble: small but interesting ideas like microformats pop up out of nowhere (well, for me, at least). As the About page of the eponymous Web site says:

Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards.

The key thing is that they are built around XHTML, which is effectively HTML done properly. Examples of microformats that are based on things that may be familiar include hCard, based on the very old vCard, hCalendar, based on the equally venerable iCalendar, moderately old stuff like XHTML Friends Network (XFN), which you stumble across occasionally on the Web, and the inscrutable XOXO (which I've heard of, but never seen brandished in anger).

That's the upside; on the downside, Bill Gates has started talking about it.

23 March 2006

Open Data in the Age of Exponential Science

There's a very interesting article in this week's Nature, as part of its 2020 Computing Special (which miraculously is freely available even to non-subscribers), written by Alexander Szalay and Jim Gray.

I had the pleasure of interviewing Gray a couple of years back. He's a Grand Old Man of the computing world, with a hugely impressive curriculum vitae; he's also a thoroughly charming interviewee with some extremely interesting ideas. For example:

I believe that Alan Turing was right and that eventually machines will be sentient. And I think that's probably going to happen in this century. There's much concern that that might work out badly; I actually am optimistic about it.

The Nature article is entitled "Science in an exponential world", and it considers some of the approaching problems that the vast scaling up of Net-based, collaborative scientific endeavour is likely to bring us in the years to come. Here's one key point:

A collaboration involving hundreds of Internet-connected scientists raises questions about standards for data sharing. Too much effort is wasted on converting from one proprietary data format to another. Standards are essential at several levels: in formatting, so that data written by one group can be easily read and understood by others; in semantics, so that a term used by one group can be translated (often automatically) by another without its meaning being distorted; and in workflows, so that analysis steps can be executed across the Internet and reproduced by others at a later date.

The same considerations apply to all open data in the age of exponential science: without common standards that allow data from different groups, gathered at different times and in varying circumstance, to be brought together meaningfully in all sorts of new ways, the openness is moot.

Synchronicity

I'm currently reading Naked Conversations (sub-title: "how blogs are changing the way businesses talk with customers"). It's full of well-told anecdotes and some interesting ideas, although I'm not convinced yet that it will add up to more than a "corporate blogging is cool" kind of message.

That notwithstanding, I was slightly taken aback to find myself living out one of the ideas from the book. This came from the inimitable Dave Winer, who said, speaking of journalists:

They don't want the light shone on themselves, which is ironic because journalists are experts at shining the light on others.... This is why we have blogs. We have blogs because we can't trust these guys.

Speaking as a journalist, can I just say "Thanks, Dave," for that vote of confidence.

But the idea that bloggers can watch the journalists watching them is all-too true, as I found when I went to Paul Jones' blog, and found this posting, in which he not only tells the entire world what I'm up to (no great secret, to be sure), but also effectively says that he will be publishing his side of the story when my article comes out so that readers can check up on whether I've done a good job.

The only consolation is that at least I can leave a comment on his posting on my article about him....

22 March 2006

Digital Libraries - the Ebook

It seems appropriate that a book about digital libraries has migrated to an online version that is freely available. Digital Libraries - for such is the nicely literalist title - is a little long in the tooth in places as far as the technical information is concerned, but very clearly written (via Open Access News).

It also presents things from a librarian's viewpoint, which is quite different from that of a your usual info-hacker. I found Chapter 6, on Economic and legal issues, particularly interesting, since it touches most directly on areas like open access.

Nonetheless, I was surprised not to see more (anything? - there's no index at the moment) about Project Gutenberg. Now, it may be that I'm unduly influenced by an extremely thought-provoking email conversation I'm currently engaged in with the irrepressible Michael Hart, the founder and leader of the project.

But irrespective of this possible bias, it seems to me that Project Gutenberg - a library of some 17,000 ebooks, with more being added each day - is really the first and ultimate digital library (or at least it will be, once it's digitised the other million or so books that are on its list), and deserves to be recognised as such.

21 March 2006

Why the GPL Doesn't Need a Test Case

There was an amusing story in Groklaw yesterday, detailing the sorry end of utterly pointless legal action taken against the Free Software Foundation (FSF) on the grounds that

FSF has conspired with International Business Machines Corporation, Red Hat Inc., Novell Inc. and other individuals to “pool and cross license their copyrighted intellectual property in a predatory price fixing scheme.”

It sounded serious, didn't it? Maybe a real threat to free software and hence Civilisation As We Know It? Luckily, as the Groklaw story explains, the judge threw it out in just about every way possible.

However, welcome as this news is, it is important to note that the decision does not provide the long-awaited legal test of the GPL in the US (a court has already ruled favourably on one in Germany). Some people seem to feel that such a test case is needed to establish the legal foundation of the GPL - and with it, most of the free software world. But one person who disagrees, is Eben Moglen, General Counsel for the FSF, and somebody who should know.

As he explained to me a few weeks ago:

The stuff that people do with GPL code – like they modify it, they copy it, they give it to other people – is stuff that under the copyright law you can't do unless you have permission. So if they've got permission, or think they have permission, then the permission they have is the GPL. If they don't have that permission, they have no permission.

So the defendant in a GPL violation situation has always been in an awkward place. I go to him and I say basically, Mr So and So, you're using my client's copyrighted works, without permission, in ways that the copyright law says that you can't do. And if you don't stop, I'm going to go to a judge, and I'm going to say, judge, my copyrighted works, their infringing activity, give me an injunction, give me damages.

At this point, there are two things the defendant can do. He can stand up and say, your honour, he's right, I have no permission at all. But that's not going to lead to a good outcome. Or he can stand up and say, but your honour, I do have permission. My permission is the GPL. At which point, I'm going to say back, well, your honour, that's a nice story, but he's not following the instructions of the GPL, so he doesn't really have the shelter he claims to have.

But note that either way, the one thing he can't say is, your honour, I have this wonderful permission and it's worthless. I have this wonderful permission, and it's invalid, I have this wonderful permission and it's broken.

In other words, there is no situation in which the brokenness or otherwise of the GPL is ever an issue: whichever is true, violators are well and truly stuffed.

(If you're interested in how, against this background, the GPL is enforced in practice, Moglen has written his own lucid explanations.)

20 March 2006

What Open Source Can Learn from Microsoft

In case you hadn't noticed, there's been a bit of a kerfuffle over a posting that a Firefox 2.0 alpha had been released. However, this rumour has been definitively scotched by one of the top Firefox people on his blog, so you can all relax now (well, for a couple of days, at least, until the real alpha turns up).

And who cares whether the code out there is an alpha, or a pre-alpha or even a pre-pre-alpha? Well, never mind who cares, there's another point that everyone seems to be missing: that this flurry of discoveries, announcements, commentaries, denials and more commentaries is just what Firefox needs as it starts to become respectable and, well, you know, slightly dull.

In fact, the whole episode should remind people of a certain other faux-leak about a rather ho-hum product that took place fairly recently. I'm referring to the Origami incident a couple of weeks ago, which produced an even bigger spike in the blogosphere.

It's the same, but different, because the first happened by accident in a kind of embarrassed way, while the latter was surely concocted by sharp marketing people within Microsoft. So, how about if the open source world started to follow suit by "leaking" the odd bit of code to selected bloggers who can be relied upon to get terribly agitated and to spread the word widely?

At first sight, this seems to be anathema to a culture based on openness, but there is no real contradiction. It is not a matter of hiding anything, merely making the manner of its appearance more tantalising - titillating, even. The people still get their software, the developers still get their feedback. It's just that everyone has super fun getting excited about nothing - and free software's market share inches up another notch.