01 April 2006

When Blogs Are Beyond a Joke

As any fule kno, today is April Fool's Day. It has long been a tradition among publications - even, or perhaps especially, the most strait-laced - to show that they are not really so cold, callous and contemptible as most people think, by trying to sneak some wry little joke past their readers. Ideally, this will achieve the tricky combination of being both outrageous and just about plausible.

This was fine when news stories came sequentially and slowly: it was quite good fun sifting through a publication trying to work out which item was the fake story. But in the modern, blog-drenched world that we inhabit these days, the net effect of April Fool's Day combined with headline aggregators is to find yourself confronted by reams of utter, wilful nonsense, lacking any redeeming counterweight of real posts.

As many people have suspected when it comes to blogs, you really can have too much of a good thing.

Update: Maybe the solution is this cure for information overload.

31 March 2006

Open Source Rocks

There's nothing new about companies deciding to open source their products and make money in other ways. But it's still good to come across new examples of the breed to confirm that the logic remains as strong as ever.

A case in point is Symfony, which describes itself as "a web application framework for PHP5 projects". It is unusual in two respects: first, because it uses the liberal MIT licence, and secondly, because it is sponsored by a French company, Sensio. And according to them, open source rocks.

30 March 2006

Googling the Genome

I came across this story about Google winning an award as part of the "Captain Hook Awards for Biopiracy" taking part in the suitably piratical-sounding Curitiba, Brazil. The story links to the awards Web site - rather fetching in black, white and red - where there is a full list of the lucky 2006 winners.

I was particularly struck by one category: Most Shameful Act of Biopiracy. This must have been hard to award, given the large field to choose from, but the judges found a worthy winner in the shape of the US Government for the following reason:

For imposing plant intellectual property laws on war-torn Iraq in June 2004. When US occupying forces “transferred sovereignty” to Iraq, they imposed Order no. 84, which makes it illegal for Iraqi farmers to re-use seeds harvested from new varieties registered under the law. Iraq’s new patent law opens the door to the multinational seed trade, and threatens food sovereignty.

Google's citation for Biggest Threat to Genetic Privacy read as follows:

For teaming up with J. Craig Venter to create a searchable online database of all the genes on the planet so that individuals and pharmaceutical companies alike can ‘google’ our genes – one day bringing the tools of biopiracy online.

I think it unlikely that Google and Venter are up to anything dastardly here: from studying the background information - and from my earlier reading on Venter when I was writing Digital Code of Life - I think it is much more likely that they want to create the ultimate gene reference, but on a purely general, not personal basis.

Certainly, there will be privacy issues - you won't really want to be uploading your genome to Google's servers - but that can easily be addressed with technology. For example, Google's data could be downloaded to your PC in encrypted form, decrypted by Google's client application running on your computer, and compared with your genome; the results could then be output locally, but not passed back to Google.

It is particularly painful for me to disagree with the Coalition Against Biopiracy, the organisation behind the awards, since their hearts are clearly in the right place - they even kindly cite my own 2004 Googling the Genome article in their background information to the Google award.

29 March 2006

Linus Torvalds' First Usenet Posting

It was 15 years ago today that Linus made his first Usenet posting, to the comp.os.minix newsgroup. This is how it began:

Hello everybody,
I've had minix for a week now, and have upgraded to 386-minix (nice), and duly downloaded gcc for minix. Yes, it works - but ... optimizing isn't working, giving an error message of "floating point stack exceeded" or something. Is this normal?

Minix was the Unix-like operating system devised by Andy Tanenbaum as a teaching aid, and gcc a key hacker program that formed part of Stallman's GNU project. Linus' question was pretty standard beginner's stuff, and yet barely two days later, he answered a fellow-newbie's question as if he were some Minix wizard:

RTFSC (Read the F**ing Source Code :-) - It is heavily commented and the solution should be obvious (take that with a grain of salt, it certainly stumped me for a while :-).

He may have been slightly premature in according himself this elevated status, but it wasn't long before he not only achieved it but went far beyond. For on Sunday, 25 August, 1991, he made another posting to the comp.os.minix newsgroup:

Hello everybody out there using minix -
I'm doing a (free) operating system (just a hobby, won't be big and professional like gnu) for 386(486) AT clones. This has been brewing since april, and is starting to get ready.

The hobby, of course, was Linux, and this was its official announcement to the world.

But imagine, now, that Linus had never made that first posting back in March 1991. It could have happened: as Linus told me in 1996 when I interviewed him for a feature in Wired, back in those days

I was really so shy I didn't want to speak in classes. Even just as a student I didn't want to raise my hand and say anything.

It's easy to imagine him deciding not to “raise his hand” in the comp.os.minix newsgroup for fear of looking stupid in front of all the Minix experts (including the ultimate professor of computing, Tanenbaum himself). And if he'd not plucked up courage to make that first posting, he probably wouldn't have made the others or learned how to hack a simple piece of code he had written for the PC into something that grew into the Linux kernel.

What would the world look like today, had Linux never been written? Would we be using the GNU Hurd – the kernel that Stallman intended to use originally for his GNU operating system, but which was delayed so much that people used Linux instead? Or would one of the BSD derivatives have taken off instead?

Or perhaps there would simply be no serious free alternative to Microsoft Windows, no open source movement, and we would be living in a world where computing was even more under the thumb of Bill Gates. In this alternative reality, there would be no Google either, since it depends on the availability of very low-cost GNU/Linux boxes for the huge server farms that power all its services.

It's amazing how a single post can change the world.

28 March 2006

Dancing Around Openness

The concept of "openness" has featured fairly heavily in these posts - not surprisingly, given the title of this blog. But this conveniently skates over the fact that there is no accepted definition of what "open" really means in the context of technology. This has fairly serious implications, not least because it means certain companies can try to muddy the waters.

Against this background I was delighted to come across this essay by David A. Wheeler on the very subject, entitled "Is OpenDocument an Open Standard? Yes!" As his home page makes clear, David is well-placed to discuss this at the deepest level; indeed, he is the author of perhaps the best and most thorough analysis of why people should consider using open source software.

So if you ever wonder what I'm wittering on about, try reading David's essay on openness to find out what I really meant.

27 March 2006

The Science of Open Source

The OpenScience Project is interesting. As its About page explains:

The OpenScience project is dedicated to writing and releasing free and Open Source scientific software. We are a group of scientists, mathematicians and engineers who want to encourage a collaborative environment in which science can be pursued by anyone who is inspired to discover something new about the natural world.

But beyond this canonical openness to all, there is another, very important reason why scientific software should be open source. With proprietary software, you simply have to take on trust that the output has been derived correctly from the inputs. But this black-box approach is really anathema to science, which is about examining and checking every assumption along the way from input to output. In some sense, proprietary scientific software is an oxymoron.

The project supports open source scientific software in two ways. It has a useful list of such programs, broken down by category (and it's striking how bioinformatics towers over them all); in addition, those behind the site also write applications themselves.

What caught my eye in particular was a posting asking an important question: "How can people make money from open source scientific software?" There have been two more postings so far, exploring various ways in which free applications can be used as the basis of a commercial offering: Sell Hardware and Sell Services. I don't know what the last one will say - it's looking at dual licensing as a way to resolve the dilemma - but the other two have not been able to offer much hope, and overall, I'm not optimistic.

The problem goes to the root of why open source works: it requires lots of users doing roughly the same thing, so that a single piece of free code can satisfy their needs and feed off their comments to get better (if you want the full half-hour argument, read Rebel Code).

That's why the most successful open source projects deliver core computing infrastructure: operating system, Web server, email server, DNS server, databases etc. The same is true on the client-side: the big winners have been Firefox, OpenOffice.org, The GIMP, Audacity etc. - each serving a very big end-user group. Niche projects do exist, but they don't have the vigour of the larger ones, and they certainly can't create an ecosystem big enough to allow companies to make money (as they do with GNU/Linux, Apache, Sendmail, MySQL etc.)

Against this background, I just can't see much hope for commercial scientific open source software. But I think there is an alternative. Because this open software is inherently better for science - thanks to its transparency - it could be argued that funding bodies should make it as much of a priority as more traditional areas.

The big benefit of this approach is that it is cumulative: once the software has been funded to a certain level by one body, there is no reason why another should't pick up the baton and pay for further development. This would allow costs to be shared, along with the code.

Of course, this approach would take a major change of mindset in certain quarters; but since open source and the other opens are already doing that elsewhere, there's no reason why they shouldn't achieve it in this domain too.

Searching for an Answer

I have always been fascinated by search engines. Back in March 1995, I wrote a short feature about the new Internet search engines - variously known as spiders, worms and crawlers at the time - that were just starting to come through:

As an example of the scale of the World-Wide Web (and of the task facing Web crawlers), you might take a look at Lycos (named after a spider). It can be found at the URL http://lycos.cs.cmu.edu/. At the time of writing its database knew of a massive 1.75 million URLs.

(1.75 million URLs - imagine it.)

A few months later, I got really excited by a new, even more amazing search engine:

The latest pretender to the title of top Web searcher is called Alta Vista, and comes from the computer manufacturer Digital. It can be found at http://www.altavista.digital.com/, and as usual costs nothing to use. As with all the others, it claims to be the biggest and best and promises direct access to every one of 8 billion words found in over 16 million Web pages.

(16 million pages - will the madness never end?)

My first comment on Google, in November 1998, by contrast, was surprisingly muted:

Google (home page at http://google.stanford.edu/) ranks search result pages on the basis of which pages link to them.

(Google? - it'll never catch on.)

I'd thought that my current interest in search engines was simply a continuation of this story, a historical relict, bolstered by the fact that Google's core services (not some of its mickey-mouse ones like Google Video - call that an interface? - or Google Finance - is this even finished?) really are of central importance to the way I and many people now work online.

But upon arriving at this page on the OA Librarian blog, all became clear. Indeed, the title alone explained why I am still writing about search engines in the context of the opens: "Open access is impossible without findability."

Ah. Of course.

Update: Peter Suber has pointed me to an interesting essay of his looking at the relationship between search engines and open access. Worth reading.

26 March 2006

DE-commerce, XXX-commerce

One of the nuggets that I gathered from reading the book Naked Conversations is that there are relatively few bloggers in Germany. So I was particularly pleased to find that one of these rare DE-bloggers had alighted, however transiently, on these very pages, and carried, magpie-like, a gewgaw back to its teutonic eyrie.

The site in question is called Exciting Commerce, with the slightly pleonastic subheading "The Exciting Future of E-commerce". It has a good, clean design (one that coincidentally seems to use the same link colour as the HorsePigCow site I mentioned yesterday).

The content is good, too, not least because it covers precisely the subject that I lament is so hard to observe: the marriage of Web 2.0 and e-commerce. The site begs to differ from me, though, suggesting that there is, in fact, plenty of this stuff around.

Whichever camp you fall into, it's a useful blog for keeping tabs on some of the latest e-commerce efforts from around the world (and not just in the US), even if you don't read German, since many of the quotations are in English, and you can always just click on the links to see where they take you.

My only problem is the site's preference for the umbrella term "social commerce" over e-commerce 2.0: for me, the former sounds perilously close to a Victorian euphemism.

25 March 2006

Not Your Average Animal Farm

And talking of the commons, I was pleased to find that the Pinko Marketing Manifesto has acquired the tag "commons-based unmarketing" (and it's a wiki).

This site is nothing if not gutsy. Not content with promoting something proudly flying the Pinko flag (in America?), it is happy to make an explicit connection with another, rather more famous manifesto (and no, we're not talking about the Cluetrain Manifesto, although that too is cited as a key influence).

And talking of Charlie, another post says:

I started researching elitism versus the voice of the commons and I happened upon something I haven't read since second year university, The Communist Manifesto.

(So, that's re-reading The Communist Manifesto: how many brownie points does this woman want?)

And to top it all, HorsePigCow - for so it is named - has possibly the nicest customisation of the standard Minima Blogger template I've seen, except that the posts are too wide: 65 characters max is the rule, trust me.

Do take a gander.

Update: Sadly, I spoke too soon: the inevitable mindless backlash has begun....

The Commonality of the Commons

Everywhere I go these days, I seem to come across the commons. The Creative Commons is the best known, but the term refers to anything held in common for the benefit of all. A site I've just come across, called On the Commons, puts it well, stressing the concomitant need to conserve the commons for the benefit of future generations:

The commons is a new way to express a very old idea — that some forms of wealth belong to all of us, and that these community resources must be actively protected and managed for the good of all. The commons are the things that we inherit and create jointly, and that will (hopefully) last for generations to come. The commons consists of gifts of nature such as air, water, the oceans, wildlife and wilderness, and shared “assets” like the Internet, the airwaves used for broadcasting, and public lands. The commons also includes our shared social creations: libraries, parks, public spaces as well as scientific research, creative works and public knowledge that have accumulated over centuries.

It's also put together a free report that spells out in more detail the various kinds of commons that exist: the atmosphere, the airwaves, water, culture, science and even quiet.

What's fascinating for me is how well this maps onto the intertwined themes of this blog and my interests in general, from open content, open access and open spectrum to broader environmental issues. The recognition that there is a commonality between different kinds of commons seems to be another idea that is beginning to spread.

Picture This

I wrote about Riya.com a month ago; now it's out in beta, so you can try out its face recognition technology. I did, and was intrigued to find that this photo was tagged as "Bill Gates". Maybe Riya uses more artificial intelligence than they're letting on.

It's certainly a clever idea - after all, the one thing people (misanthropes apart) are interested in, is people. But you do have to wonder about the underlying technology when it uses addresses like this:

http://www.riya.com/highRes?search=1fSPySWh
FrHn7AnWgnSyHaqJl6bzuGByoFKJuG1H%2Fv
otjYbqlIMI22Qj88Vlcvz2uSnkixrhzHJP%0Aej%
2B9VuGvjiodlKDrBNS8pgy%2FaVqvckjfyo%2
BjhlL1sjK5CgHriGhifn3s2C1q%2B%2FnL1Emr
0OUPvn%2FM%0AJ0Ire5Zl2QUQQLUMi2Naq
Ny1zboiX7JtL77OG96NmV5VT8Buz4bzlyPFmi
ppcvmBJagMcftZjHUG%0AFlnXYIfp1VOGWx
gYijpgpDcsU9M4&pageNumber=9&e=bIaIR30d
SGNoZcG8jWL8z2LhcH%2FEg1LzsBF%2F6pr
Fd2Jm7tpMKFCXTu%2FBsOKk%2FVdS

I know a picture is supposed to be worth a thousand words, but not in the URL, surely....

A Question of Standards

Good to see Andy Updegrove's blog getting Slashdotted. This is good news not just for him, but also for his argument, which is that open source ideas are expanding into new domains (no surprise there to readers of this blog), and that traditional intellectual property (IP) models are being re-evaluated as a result.

Actually, this piece is rather atypical, since most of the posts are to do with standards, rather than open source or IP (though these are inevitably bound up with standards). Andy's blog is simply the best place to go for up-to-the-minute information on this area; in particular, he is following the ODF saga more closely - and hence better - than anyone. In other words, he's not just reporting on standards, but setting them, too.

24 March 2006

A Little Note About Microformats

Further proof that things are starting to bubble: small but interesting ideas like microformats pop up out of nowhere (well, for me, at least). As the About page of the eponymous Web site says:

Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards.

The key thing is that they are built around XHTML, which is effectively HTML done properly. Examples of microformats that are based on things that may be familiar include hCard, based on the very old vCard, hCalendar, based on the equally venerable iCalendar, moderately old stuff like XHTML Friends Network (XFN), which you stumble across occasionally on the Web, and the inscrutable XOXO (which I've heard of, but never seen brandished in anger).

That's the upside; on the downside, Bill Gates has started talking about it.

23 March 2006

Open Data in the Age of Exponential Science

There's a very interesting article in this week's Nature, as part of its 2020 Computing Special (which miraculously is freely available even to non-subscribers), written by Alexander Szalay and Jim Gray.

I had the pleasure of interviewing Gray a couple of years back. He's a Grand Old Man of the computing world, with a hugely impressive curriculum vitae; he's also a thoroughly charming interviewee with some extremely interesting ideas. For example:

I believe that Alan Turing was right and that eventually machines will be sentient. And I think that's probably going to happen in this century. There's much concern that that might work out badly; I actually am optimistic about it.

The Nature article is entitled "Science in an exponential world", and it considers some of the approaching problems that the vast scaling up of Net-based, collaborative scientific endeavour is likely to bring us in the years to come. Here's one key point:

A collaboration involving hundreds of Internet-connected scientists raises questions about standards for data sharing. Too much effort is wasted on converting from one proprietary data format to another. Standards are essential at several levels: in formatting, so that data written by one group can be easily read and understood by others; in semantics, so that a term used by one group can be translated (often automatically) by another without its meaning being distorted; and in workflows, so that analysis steps can be executed across the Internet and reproduced by others at a later date.

The same considerations apply to all open data in the age of exponential science: without common standards that allow data from different groups, gathered at different times and in varying circumstance, to be brought together meaningfully in all sorts of new ways, the openness is moot.

Synchronicity

I'm currently reading Naked Conversations (sub-title: "how blogs are changing the way businesses talk with customers"). It's full of well-told anecdotes and some interesting ideas, although I'm not convinced yet that it will add up to more than a "corporate blogging is cool" kind of message.

That notwithstanding, I was slightly taken aback to find myself living out one of the ideas from the book. This came from the inimitable Dave Winer, who said, speaking of journalists:

They don't want the light shone on themselves, which is ironic because journalists are experts at shining the light on others.... This is why we have blogs. We have blogs because we can't trust these guys.

Speaking as a journalist, can I just say "Thanks, Dave," for that vote of confidence.

But the idea that bloggers can watch the journalists watching them is all-too true, as I found when I went to Paul Jones' blog, and found this posting, in which he not only tells the entire world what I'm up to (no great secret, to be sure), but also effectively says that he will be publishing his side of the story when my article comes out so that readers can check up on whether I've done a good job.

The only consolation is that at least I can leave a comment on his posting on my article about him....

22 March 2006

Digital Libraries - the Ebook

It seems appropriate that a book about digital libraries has migrated to an online version that is freely available. Digital Libraries - for such is the nicely literalist title - is a little long in the tooth in places as far as the technical information is concerned, but very clearly written (via Open Access News).

It also presents things from a librarian's viewpoint, which is quite different from that of a your usual info-hacker. I found Chapter 6, on Economic and legal issues, particularly interesting, since it touches most directly on areas like open access.

Nonetheless, I was surprised not to see more (anything? - there's no index at the moment) about Project Gutenberg. Now, it may be that I'm unduly influenced by an extremely thought-provoking email conversation I'm currently engaged in with the irrepressible Michael Hart, the founder and leader of the project.

But irrespective of this possible bias, it seems to me that Project Gutenberg - a library of some 17,000 ebooks, with more being added each day - is really the first and ultimate digital library (or at least it will be, once it's digitised the other million or so books that are on its list), and deserves to be recognised as such.

21 March 2006

Why the GPL Doesn't Need a Test Case

There was an amusing story in Groklaw yesterday, detailing the sorry end of utterly pointless legal action taken against the Free Software Foundation (FSF) on the grounds that

FSF has conspired with International Business Machines Corporation, Red Hat Inc., Novell Inc. and other individuals to “pool and cross license their copyrighted intellectual property in a predatory price fixing scheme.”

It sounded serious, didn't it? Maybe a real threat to free software and hence Civilisation As We Know It? Luckily, as the Groklaw story explains, the judge threw it out in just about every way possible.

However, welcome as this news is, it is important to note that the decision does not provide the long-awaited legal test of the GPL in the US (a court has already ruled favourably on one in Germany). Some people seem to feel that such a test case is needed to establish the legal foundation of the GPL - and with it, most of the free software world. But one person who disagrees, is Eben Moglen, General Counsel for the FSF, and somebody who should know.

As he explained to me a few weeks ago:

The stuff that people do with GPL code – like they modify it, they copy it, they give it to other people – is stuff that under the copyright law you can't do unless you have permission. So if they've got permission, or think they have permission, then the permission they have is the GPL. If they don't have that permission, they have no permission.

So the defendant in a GPL violation situation has always been in an awkward place. I go to him and I say basically, Mr So and So, you're using my client's copyrighted works, without permission, in ways that the copyright law says that you can't do. And if you don't stop, I'm going to go to a judge, and I'm going to say, judge, my copyrighted works, their infringing activity, give me an injunction, give me damages.

At this point, there are two things the defendant can do. He can stand up and say, your honour, he's right, I have no permission at all. But that's not going to lead to a good outcome. Or he can stand up and say, but your honour, I do have permission. My permission is the GPL. At which point, I'm going to say back, well, your honour, that's a nice story, but he's not following the instructions of the GPL, so he doesn't really have the shelter he claims to have.

But note that either way, the one thing he can't say is, your honour, I have this wonderful permission and it's worthless. I have this wonderful permission, and it's invalid, I have this wonderful permission and it's broken.

In other words, there is no situation in which the brokenness or otherwise of the GPL is ever an issue: whichever is true, violators are well and truly stuffed.

(If you're interested in how, against this background, the GPL is enforced in practice, Moglen has written his own lucid explanations.)

20 March 2006

What Open Source Can Learn from Microsoft

In case you hadn't noticed, there's been a bit of a kerfuffle over a posting that a Firefox 2.0 alpha had been released. However, this rumour has been definitively scotched by one of the top Firefox people on his blog, so you can all relax now (well, for a couple of days, at least, until the real alpha turns up).

And who cares whether the code out there is an alpha, or a pre-alpha or even a pre-pre-alpha? Well, never mind who cares, there's another point that everyone seems to be missing: that this flurry of discoveries, announcements, commentaries, denials and more commentaries is just what Firefox needs as it starts to become respectable and, well, you know, slightly dull.

In fact, the whole episode should remind people of a certain other faux-leak about a rather ho-hum product that took place fairly recently. I'm referring to the Origami incident a couple of weeks ago, which produced an even bigger spike in the blogosphere.

It's the same, but different, because the first happened by accident in a kind of embarrassed way, while the latter was surely concocted by sharp marketing people within Microsoft. So, how about if the open source world started to follow suit by "leaking" the odd bit of code to selected bloggers who can be relied upon to get terribly agitated and to spread the word widely?

At first sight, this seems to be anathema to a culture based on openness, but there is no real contradiction. It is not a matter of hiding anything, merely making the manner of its appearance more tantalising - titillating, even. The people still get their software, the developers still get their feedback. It's just that everyone has super fun getting excited about nothing - and free software's market share inches up another notch.

19 March 2006

How Do I Blog Thee?

Let me count the ways.

List blog

The original: lots and lots of links to things with no theme but their sum.

Diary blog

The other original - but don't try this at home unless you are really interesting.

Shard blog

Not quite a list blog, not quite a diary blog: instead, small fragments of a life refracted through the links encountered each day.

News blog

Lots of useful links on a well-defined subject area, plus quotes and the odd dash of intelligent comment.

Essay blog

Longer, more thoughtful postings, typically one per day: mental meat to chew on.

Photo blog

A picture is worth a thousand blog postings.

Video blog

Done well, this is the ultimate magic casement in the middle of your screen, a window on another world.

18 March 2006

Economistical with the Truth

The Economist is a strange beast. It has a unique writing style, born of the motto "simplify, then exaggerate"; and it has an unusual editorial structure, whereby senior editors read every word written by those reporting to them - which means the editor reads every word in the magazine (at least, that's the way it used to work). Partly for this reason, nearly all the articles are anonymous: the idea is that they are in some sense a group effort.

One consequence of this anonymity is that I can't actually prove I've written for title (which I have, although it was a long time ago). But on the basis of a recent showing, I don't think I want to write for it anymore.

The article in question, which is entitled "Open, but not as usual", is about open source, and about some of the other "opens" that are radiating out from it. Superficially, it is well written - as a feature that has had multiple layers of editing should be. But on closer examination, it is full of rather tired criticisms of the open world.

One of these in particular gets my goat:

...open source might already have reached a self-limiting state, says Steven Weber, a political scientist at the University of California at Berkeley, and author of “The Success of Open Source” (Harvard University Press, 2004). “Linux is good at doing what other things already have done, but more cheaply—but can it do anything new? Wikipedia is an assembly of already-known knowledge,” he says.

Well, hardly. After all, the same GNU/Linux can run globe-spanning grids and supercomputers; it can power back office servers (a market where it bids fair to overtake Microsoft soon); it can run on desktops without a single file being installed on your system; and it is increasingly appearing in embedded devices - mp3 players, mobile phones etc. No other operating system has ever achieved this portability or scalability. And then there's the more technical aspects: GNU/Linux is simply the most stable, most versatile and most powerful operating system out there. If that isn't innovative, I don't know what is.

But let's leave GNU/Linux aside, and consider what open source has achieved elsewhere. Well, how about the Web for a start, whose protocols and underlying software have been developed in a classic open source fashion? Or what about programs like BIND (which runs the Internet's name system), or Sendmail, the most popular email server software, or maybe Apache, which is used by two-thirds of the Internet's public Web sites?

And then there's Wikimedia, which powers Wikipedia (and a few other wikis): even if Wikipedia were merely "an assembly of already-known knowledge", Wikimedia (based on the open source applications PHP and MySQL) is an unprecedentedly large assembly, unmatched by any proprietary system. Enough innovation for you, Mr Weber?

But the saddest thing about this article is not so much these manifest inaccuracies as the reason why they are there. Groklaw's Pamela Jones (PJ) has a typically thorough commentary on the Economist piece. From corresponding with its author, she says "I noticed that he was laboring under some wrong ideas, and looking at the finished article, I notice that he never wavered from his theory, so I don't know why I even bothered to do the interview." In other words, the feature is not just wrong, but wilfully wrong, since others, like PJ, had carefully pointed out the truth. (There's an old saying among journalists that you should never let the facts get in the way of a good story, and it seems that The Economist has decided to adopt this as its latest motto.)

But there is a deeper irony in this sad tale, one carefully picked out by PJ:

There is a shocking lack of accuracy in the media. I'm not at all kidding. Wikipedia has its issues too, I've no doubt. But that is the point. It has no greater issues than mainstream articles, in my experience. And you don't have to write articles like this one either, to try to straighten out the facts. Just go to Wikipedia and input accurate information, with proof of its accuracy.

If you would like to learn about Open Source, here's Wikipedia's article. Read it and then compare it to the Economist article. I think then you'll have to agree that Wikipedia's is far more accurate. And it isn't pushing someone's quirky point of view, held despite overwhelming evidence to the contrary.

Wikipedia gets something wrong, you can correct it by pointing to the facts; The Economist gets it wrong - as in the piece under discussion - and you are stuck with an article that is, at best, Economistical with the truth.

17 March 2006

Google's Grief, Open Source's Gain?

The news that a judge has ordered Google to turn over all emails from a Gmail account, including deleted messages, has predictably sent a shiver of fear down the collective spine of the wired community, all of whom by now have Gmail accounts. Everybody can imagine themselves in a similar situation, with all their most private online thoughts suddenly revealed in this way.

The really surprising thing about this development is not that it's happened, but that anyone considers it surprising. Lawyers were bound to be tempted by the all unguarded comments lying in emails, and judges were bound to be convinced that since they existed it was legitimate to look at them for evidence of wrong-doing. And Google, ultimately, is bound to comply: after all, it's in the business of making money, not of martyrdom.

So the question is not so much What can we do to stop such court orders being made and executed? but What can we do to mitigate them?

Moving to another email provider like Yahoo or Hotmail certainly won't help. And even setting up your own SMTP server to send email won't do much good, since your ISP probably has copies of bits of your data lying around on its own servers that sooner or later will be demanded by somebody with a court order.

The only real solution seems to be to use strong encryption to make each email message unreadable except by the intended recipient (and even this is an obvious weakness).

It would, presumably, be relatively simple for Google to add this to Gmail. But even if it won't, there is also a fine open source project called Enigmail, which is an extension to the Mozilla family of email readers - Thunderbird et al. - currently nearing version 1.0. The problem is that installation is fairly involved, since you must first set up GnuPG, which provides the cryptographic engine. If the free software world could make this process easier - a click, a passphrase and you're done - Google's present grief could easily be turned into open source's opportunity.

16 March 2006

The Power of Open Genomics

The National Human Genome Research Institute (NHGRI), one of the National Institutes of Health (NIH), has announced the latest round of mega genome sequencing projects - effectively the follow-ons to the Human Genome Project. These are designed to provide a sense of genomic context, and to allow all the interesting hidden structures within the human genome to be teased out bioinformatically by comparing them with other genomes that diverged from our ancestors at various distant times.

Three more primates are getting the NHGRI treatment: the rhesus macacque, the marmoset and the orangutan. But alongside these fairly obvious choices, eight more mammals will be sequenced too. As the press release explains:

The eight new mammals to be sequenced will be chosen from the following 10 species: dolphin (Tursiops truncates), elephant shrew (Elephantulus species), flying lemur (Dermoptera species), mouse lemur (Microcebus murinus), horse (Equus caballus), llama (Llama species), mole (Cryptomys species), pika (Ochotona species), a cousin of the rabbit, kangaroo rat (Dipodomys species) and tarsier (Tarsier species), an early primate and evolutionary cousin to monkeys, apes, and humans.

If you are not quite sure whom to vote for, you might want to peruse a great page listing all the genomes currently being sequenced for the NHGRI, which provides links to a document (.doc, alas, but you can open it in OpenOffice.org) explaining why each is important (there are pix, too).

More seriously, it is worth noting that this growing list makes ever more plain the power of open genomics. Since all of the genomes will be available in public databases as soon as they are completed (and often before), this means that bioinformaticians can start crunching away with them, comparing species with species in various ways. Already, people have done the obvious things like comparing humans with chimpanzees, or mice with rats, but the possibilities are rapidly becoming extremely intriguing (tenrec and elephant, anyone?).

And beyond the simple pairing of genomes, which yields a standard square-law richness, there are even more inventive combinations involving the comparison of multiple genomes that may reveal particular aspects of the Great Digital Tree of Life, since everything may be compared with everything, without restriction. Now imagine trying to do this if genomes had been patented, and groups of them belonged to different companies, all squabbling over their "IP". The case for open genomics is proved, I think.

15 March 2006

Microsoft Goes (a Bit More) Open Source

Many people were amazed back in 2004 when Microsoft released its first open source software, Windows Installer XML (WiX). But this was only the first step in a long journey towardness openness that Microsoft is making - and must make - for some time to come.

It must make it because the the traditional way of writing software simply doesn't work for the ever-more complex, ever-more delayed projects that Microsoft is engaged upon: Brooks' Law, which states that "Adding manpower to a late software project makes it later," will see to this if nothing else does.

Microsoft itself has finally recognised this. According to another fine story from Mary Jo Foley, who frequently seems to know more about what's happening in the company than Bill Gates does:

Beta testing has been the cornerstone of the software development process for Microsoft and most other commercial software makers for as long as they've been writing software. But if certain powers-that-be in Redmond have their way, betas may soon be a thing of the past for Microsoft, its partners and its customers.

The alternative is to adopt a more fluid approach that is a commonplace in the open source world:

Open source turned the traditional software development paradigm on its head. In the open source world, testers receive frequent builds of products under development. Their recommendations and suggestions typically find their way more quickly into developing products. And the developer community is considered as important to writing quality code as are the "experts" shepherding the process.

One approach to mitigating the effects of Brooks' Law is to change the fashion in which the program is tested. Instead of doing this in a formal way with a few official betas - which tend to slow down the development process - the open source method allows users to make comments earlier and more frequently on multiple builds as they are created, and without hindering the day-to-day working of developers, who are no longer held hostage by artificial beta deadlines that become ends in themselves rather than means.

E-commerce 2.0

It is striking how everybody is talking about Web 2.0, and yet nobody seems to mention e-commerce 2.0. In part, this is probably because few have managed to work out how to apply Web 2.0 technologies to e-commerce sites that are not directly based on selling those technologies (as most Web 2.0 start-ups are).

For a good example of what an e-commerce 2.0 site looks like, you could do worse than try Chinesepod.com (via Juliette White), a site that helps you learn Mandarin Chinese over the Net.

The Web 2.0-ness is evident in the name - though I do wish people would come up with a different word for what is, after all, just an mp3 file. It has a viral business model - make the audio files of the lessons freely available under a Creative Commons licence so that they can be passed on, and charge for extra features like transcripts and exercises. The site even has a wiki (which has some useful links).

But in many ways the most telling feature is the fact that as well as a standalone blog, the entire opening page is organised like one, with the lessons arranged in reverse chronological order, complete with some very healthy levels of comments. Moreover, the Chinesepod people (Chinese podpeople?) are very sensibly drawing on the suggestions of their users to improve and extend their service. Now that's what I call e-commerce 2.0.

14 March 2006

Will Data Hoarding Cost 150 Million Lives?

The only thing separating mankind from a pandemic that could kill 150 million people are a few changes in the RNA of the H5N1 avian 'flu virus. Those changes would make it easier for the virus to infect and pass between humans, rather than birds. Research into the causes of the high death-rate among those infected by the Spanish 'flu - which killed between 50 and 100 million people in 1918 and 1919, even though the world population was far lower then than now - shows that it was similar changes in a virus otherwise harmless to humans that made the Spanish 'flu so lethal.

The good news is that with modern sequencing technologies it is possible to track those changes as they happen, and to use this information to start preparing vaccines that are most likely to be effective against any eventual pandemic virus. As one recent paper on the subject put it:

monitoring of the sequences of viruses isolated in instances of bird-to-human transmission for genetic changes in key regions may enable us to track viruses years before they develop the capacity to replicate with high efficiency in humans.

The bad news is that most of those vital sequences are being kept hidden away by the various national laboratories that produce them. As a result, thousands of scientists outside those organisations do not have the full picture of how the H5N1 virus is evolving, medical communities cannot plan properly for a pandemic, and drug companies are hamstrung in their efforts to develop effective vaccines.

The apparent reason for the hoarding - because some scientists want to be able to publish their results in slow-moving printed journals first so as to be sure that they are accorded full credit by their peers - beggars belief against a background of growing pandemic peril. Open access to data never looked more imperative.

Although the calls to release this vital data are gradually becoming more insistent, they still seem to be falling on deaf ears. One scientist who has been pointing out longer than most the folly of the current situation is the respected researcher Harry Niman. He has had a distinguised career in the field of viral genomics, and is the founder of the company Recombinomics.

The news section of this site has long been the best place to find out about the latest developments in the field of avian 'flu. This is for three reasons: Niman's deep knowledge of the subject, his meticulous scouring of otherwise-neglected sources to find out the real story behind the news, and - perhaps just as important - his refusal meekly to tow the line that everything is under control. For example, he has emphasised that the increasing number of infection clusters indicates that human-to-human transmission is now happening routinely, in flat contradiction to the official analysis of the situation.

More recently, he has pointed out that the US decision to base its vaccine on a strain of avian 'flu found in Indonesia is likely to be a waste of time, since the most probable pandemic candidate has evolved away from this.

The US Government's choice is particularly worrying because human cases of avian 'flu in North America may be imminent. In another of Niman's characteristically forthright analyses, he suggests that there is strong evidence that H5N1 is already present in North America:

Recombinomics is issuing a warning based on the identification of American sequences in the Qinghai strain of H5N1 isolated in Astrakhan, Russia. The presence of the America sequences in recent isolates in Astrakhan indicates H5N1 has already migrated to North America. The levels of H5N1 in indigenous species will be supplemented by new sequences migrating into North America in the upcoming months.

Niman arrived at this conclusion by tracking the genomic changes in the virus as it travelled around the globe with migrating birds, using some of the few viral sequences that have been released.

Let's hope for the sake of everyone that WHO and the other relevant organisations see the light and start making all the genomic data available. This would allow Niman and his many able colleagues to monitor even the tiniest changes, so that the world can be alerted at the earliest possible moment to the start of a pandemic that may be closer than many think.

Update: In an editorial, Nature is now calling for open access to all this genomic data. Unfortunately, the editorial is not open access....