28 February 2006

Open Source, Opener Source

Brian Behlendorf is an interesting individual: one of those quietly-spoken but impressive people you meet sometimes. When I talked to him about the birth of Apache - which he informed me was not "a patchy" server, as the folklore would have it, just a cool name he rather liked - he was working at CollabNet, which he had helped to found.

He's still there, even though the company has changed somewhat from those early days. But judging from a characteristically thought-provoking comment reported in one of ZDNet's blogs, he's not changed so much, and is still very much thinking and doing at the leading edge of open source.

In the blog, he was reported as saying that he saw more and more "ordinary people" being enfranchised as coders thanks to the new generation of programming models around - notably Ajax and the mashup - that made the whole process far easier and less intimidating.

If he's right, this is actually a profound shift, since ironically the open source model has been anything but open to the general public. Instead, you had to go through a kind of apprenticeship before you could stalk the hallowed corridors of geek castle.

If open really is becoming opener, it will be no bad thing, because the power of free software depends critically on having lots of programmers, and a good supply of new ones. Anything that increases the pool from which they can be drawn will have powerful knock-on effects.

Blogroll, Drumroll

This fellow Blogger blogger is well worth taking a look at if you're interested in science and technology (well, that's everybody, isn't it?). Not so much for the blog entries - which are interesting enough - but for the astonishing blogroll, which includes several hundred links to a wide variety of interesting-looking sites, all neatly categorised. I have literally never seen anything like it - but maybe I'm just provincial.

I found Al Fin - for such is its suitably gnomic moniker - through another site that is worth investigating. Called Postgenomic (well, with a name like that, I had to take a look), it "collates posts from life science blogs and then does useful and interesting things with that data" according to the site. The "interesting things" seem to amount largely to collation of data from various source; there is also a good blog list, though not quite as impressive as Al Fin's.

Wanted: a Rosetta for the MegaWikipedia

As I write, Wikipedia has 997,131 articles - close to the magic, if totally arbitrary, one million (if we had eleven fingers, we'd barely be halfway to the equally magic 1,771,561): the MegaWikipedia.

Except that it's not there, really. The English-language Wikipedia may be approaching that number, but there are just eight other languages with more than 100,000 articles (German, French, Italian, Japanese, Dutch, Polish, Portuguese and Swedish), and 28 with more than 10,000. Most have fewer than 10,000.

Viewed globally, then, Wikipedia is nowhere near a million articles on average, across the languages.

The disparity between the holdings in different languages is striking; it is also understandable, given the way Wikipedia - and the Internet - arose. But the question is not so much Where are we? as Where do we go from here? How do we bring most of the other Wikipedias - not just five or six obvious ones - up to the same level of coverage as the English one?

Because if we don't, Wikipedia will never be that grand, freely-available summation of knowledge that so many hope for: instead, it will be a grand, freely-available summation of knowledge for precisely those who already have access to much of it. And the ones who actually need that knowledge most - and who currently have no means of accessing it short of learning another language (for which they probably have neither the time nor the money) - will be excluded once more.

Clearly, this global Wikipedia cannot be achieved simply by hoping that there will be enough volunteers to write all the million articles in all the languages. In any case, this goes against everything that free software has taught us - that the trick is to build on the work of others, rather than re-invent everything each time (as proprietary software is forced to do). This means that most of the articles in non-English tongues should be based on those in English. Not because English is "better" as a language, or even because its articles are "better": but simply because they are there, and they provide the largest foundation on which to build.

A first step towards this is to use machine translations, and the new Wikipedia search engine Qwika shows the advantages and limitations of taking this approach. Qwika lets you search in several languages through the main Wikipedias and through machine-translations of the English version. In effect, it provides a pseudo-conversion of the English Wikipedia to other tongues.

But what is needed is something more thoroughgoing, something formal - a complete system for expediting the translation of all of the English-language articles into other languages. And not just a few: the system needs to be such that any translator can use it to create new content based on the English items. The company behind Ubuntu, Canonical, already has a system that does something similar for people who are translating open source software into other languages. It's called, appropriately enough, Rosetta.

Now that the MegaWikipedia is in sight - for Anglophones, at least - it would be the perfect time to move beyond the succerssful but rather ad hoc approach currently taken to creating multilingual Wikipedia content, and to put the Net's great minds to work on the creation of something better - something scalable: a Rosetta for the MegaWikipedia.

What better way to celebrate what is, for all the qualifications above, truly a milestone in the history of open content, than by extending it massively to all the peoples of the world, and not just to those who understand English?

27 February 2006

(B)looking Back

I wondered earlier whether blogified books were bloks or blooks, and the emerging view seems to be the latter, not least because there is now a Blooker Prize, analogous to the (Man)Booker Prize for dead-tree stuff.

I was delighted to find that the Blooker is run by Lulu.com, discussed recently by Vic Keegan in the Guardian. Lulu is essentially micro-publishing, or publishing on demand: you send your digital file, they send the physical book - as many or as few copies as you like. You can also create music CDs, video DVDs and music downloads in the same way; Lulu.com handles the business end of things, and takes a cut for its troubles.

Nonetheless, the prices are extremely reasonable - if you live in the US: as Vic points out, the postage costs for books, for example, tend to nullify the attractiveness of this approach for anyone elsewhere in the world, at least from a financial point of view. But I don't think that this will be a problem for long. For Lulu.com is the brainchild of Bob Young, the marketing brains behind Red Hat, still probably the best-known GNU/Linux distribution for corporates.

I emphasise the marketing side, since the technical brains behind the company was Marc Ewing, who also named the company. As he explained to me when I was writing Rebel Code:

In college I used to wear my grandfather's lacrosse hat, which was red and white striped. It was my favourite hat, and I lost it somewhere in Philadelphia in my last year. I named the company to memorialise the hat. Of course, Red and White Hat Software wasn't very catchy, so I took a little liberty.

Young, a Canadian by birth, was the perfect complement to the hacker Ewing. He is the consummate salesmen, constantly on the lookout for opportunities. His method is to get close to his customers, to let them tell him what he should be selling them. The end-result of this hands-on approach was that he found himself in the business of selling a free software product: GNU/Linux. It took him a while to understand this strange, topsy-turvy world he tumbled into, but being a shrewd chap, and a marketeer of genius, he managed it when he finally realised:

that the one unique benefit that was creating this enthusiasm [for GNU/Linux] was not that this stuff was better or faster or cheaper, although many would argue that it is all three of those things. The one unique benefit that the customer gets for the first time is control over the technology he's being asked to invest in.

Arguably it was this early understanding of what exactly he was selling - freedom - that helped him make Red Hat the first big commercial success story of the open source world: on 11 August 1999, the day of its IPO, Red Hat's share went from $14 to $52, valuing the company that sold something free at $3.5 billion.

It also made Young a billionaire or thereabouts. He later left Red Hat, but has not lost the knack for pursuing interesting ideas. Even if Lulu.com isn't much good for those of us on the wrong side of the Atlantic, it can only be a matter of time before Bob listens to us Brit users (to say nothing of everyone else in the world outside the US) and puts some infrastructure in place to handle international business too.

26 February 2006

The First Blogger - and His Chaos

Wandering around the Net (as one does) I came across this: certainly one of the least-attractive sites that I've seen in a long time. But as soon as I noticed that familiar face in the top left-hand corner, I knew where I was: back in Chaos Manor.

Now, for younger readers, those two words might not mean much, but for anyone privileged enough to have lived through the early years of the microcomputing revolution, as chronicled in Byte magazine (now a rather anonymous Web site), they call forth a strange kind of appalled awe.

For Pournelle's columns - which still seem to exist in cyber-versions if you are a subscriber - consisted of the most mind-numbingly precise descriptions of his struggles to install software or add an upgrade board to one of his computers, all of which were endowed with names like "Anastasia" and "Alex".

Along the way he'd invariably drop in references to what he was doing while conducting this epic struggle, the latest goings-on in space exploration (one of his enthusiasms) plus the science-fiction book he was writing at the time (he always seemed to be writing new ones each month - impressive).

The net effect was that his articles ran to pages and pages of utterly irrelevant - but compulsively fascinating - detail about his daily and working life. I half-dreaded and half-longed for the monthly delivery of Byte, since I knew that I would soon be swept away on this irresistible and unstoppable torrent of high-tech logorrhea.

Visiting the site, I noticed the line "The Original Blog", linked to the following text:

I can make some claim to this being The Original Blog and Daybook. I certainly started keeping a day book well before most, and long before the term "blog" or Web Log was invented. BIX, the Byte exchange, preceeded the Web by a lot, and I also had a daily journal on GE Genie.

And in a flash, I realised why I had been mesmerised by but ambivalent about Pournelle's outpourings all those years ago. They did indeed form a proto-blog, with all the blog's virtues - a captivating first-person voice weaving a story over time - and all of its vices - a level of information way beyond what any sane person should really want to know, given the subject-matter.

Pournelle is right: he probably was the first blogger, but working on pre-Internet time - one posting a month, not one a day. However, it is hard to tell whether what we now know as blogs took off all those years later because of his pioneering example - or in spite of it.

JISC for Fun

I've written before about what seems to me the huge missed opportunity for free software in education. Of course, this is a two-way thing: as well as coders making more effort to serve the education sector, it would be nice to see education deploying more open source.

And hey presto, along comes the Joint Informaton Systems Committee (JISC), a government-funded UK body that offers advice and strategic guidance to further and higher education establishements, with a neat briefing paper on the very subject. What makes this doubly interesting is that last year it published a similar paper on open access: clearly things are beginning to click here.

What makes this trebly interesting - well, to me at least - is that I've been asked to speak at the Open Source and Sustainability conference in Oxford this April, organised by the JISC-funded OSS Watch.

"JISC for fun" as Linus almost said.

24 February 2006

Google's Creeping Cultural Imperialism

Another day, another Google launch.

As the official Google blog announced, the company is launching a pilot programme to digitise national archive content "and offer it to everyone in the world for free."

And what national archives might these be? Well, not just any old common-or-garden national archives, but "the National Archives", which as Google's blog says:

was founded with the express purpose of ... serving America by documenting our government and our nation.

Right, so these documents are fundamentally "serving America". A quick look at what's on offer reveals the United Motion Newsreel Pictures, a series which, according to the accompanying text, "was produced by the Office of War Information and financed by the U. S. government", and was "[d]esigned as a counter-propaganda medium."

So there we have it: this is (literally) vintage propaganda. And nothing wrong with that: everybody did it, and it's useful to be able to view how they did it. But as with the Google Print/Books project, there is a slight problem here.

When Google first started, it did not set out to become a search engine for US Web content: it wanted it all - and went a long way to achieving that, which is part of its power. But when it comes to books, and even more where films are concerned, there is just too much to hope to encompass; of necessity, you have to choose where to start, and where to concentrate your efforts.

Google, quite sensibly, has started with those nearest home, the US National Archives. But I doubt somehow that it will be rushing to add to other nations' archives. Of course, those nations could digitise and index their own archives - but it wouldn't be part of the Google collection, which would always have primacy, even if the indexed content were submitted to them.

It's a bit like Microsoft's applications: however much governments tell the company to erect Chinese walls between the programmers working on Windows and those working on applications, there is bound to be some leakiness. As a result, Windows programs from Microsoft have always had an advantage over those from other companies. The same will happen with Google's content: anything it produces will inevitably be more tightly integrated into their search engine.

And so, wittingly or not, Google becomes an instrument of cultural imperialism, just like that nice Mr Chirac warned. The problem is that there is nothing so terribly wrong with what Google is doing, or even the way that it is doing it; but it is important to recognise that these little projects that it sporadically announces are not neutral contributions to the sum of the world's open knowledge, but come with very particular biases and knock-on effects.

Watching IP Watch

Another great site revealed by Open Access News and the indefatigable Peter Suber: IP Watch. Intellectual property - the very term is hated by people like Richard Stallman - is one of those areas that can make your toes curl just at the thought of the mind-numbing subtleties and complexities involved. But make no mistake: this is an area that matters, and more than ever as the analogue and digital worlds are becoming fuzzy at the edges, and people are tempted to apply old analogue rules to new digital creations.

This lends one story on IP Watch a particular importance, since it deals with the thorny area of balancing rights of digital access and protection. What makes the story particularly interesting is that it reports on a "side event" of a World Intellectual Property Organization (WIPO) meeting.

Now, this may not seem like a big deal, but traditionally WIPO has been all about protecting the intellectual property rights of big content "owners" - record companies, the film industry, etc. So the idea that there could be some discussion about the other side of the coin - access rights - even at a non-official meeting, was a tiny but momentous step.

The end of the journey, which promises to be both long and arduous, is a recognition that access is just as important as protection, that the open approach is as valid as the proprietary one. Although this might seem milk and water stuff to long-time proponents of free software and related movements, it would be truly revolutionary if WIPO ever reached such a point.

Indeed, I would go so far as to predict that WIPO will be one of the last bastions to fall when it comes to recognising that a fresh wind is blowing through the world of intellectual domains - and that these domains might just be commons rather than property as WIPO's own name so prominently and problematically assumes.

23 February 2006

The Blogification of the Cyber Union

I suppose it was inevitable that Google would go from being regarded as quite the dog's danglies to being written off as a real dog's breakfast, but I think that people are rather missing the point of the latest service, Google Page Creator.

Despite what many think, Google is not about ultra-cool, Ajaxic, Javascripty, XMLifluous Web 2.0 mashups: the company just wants to make it as easy as possible for people to do things online. Because the easier it is, the more people will turn to Google to do these things - and the more the advertising revenue will follow.

Google's search engine is a case in point, and Blogger is another. As Blogger's home page explains, you can:

Create a blog in 3 easy steps: (1) Create an account (2) Name your blog (3) Choose a template

and then start typing.

Google Page Creator is just the same - you don't even have to choose a name, you just start typing into the Web page template. In other words, it has brought the blog's ease of use to the creation of Web sites.

This blogification of the Internet is a by-product of the extraordinary recent rise of blogs. As we know, new blogs are popping up every second (and old ones popping their clogs only slightly more slowly). This means that for many people, the blog is the new face of the Web. There is a certain poetic justice in this, since the original WorldWideWeb created by Tim Berners-Lee was a browser-editor, not simply a read-only application.

For many Net users, then, the grammar of the blog - the way you move round it and interact with its content - is replacing the older grammar of traditional Web pages. These still exist, but they are being shadowed and complemented by a new set of Web 2.0 pages - the blogs that are being bolted on by sites everywhere. They function as a kind of gloss explaining the old, rather incomprehensible language of Web 1.0 to the inhabitants of the brave new blogosphere.

Even books are being blogified. For example, Go It Alone!, by Bruce Judson, is freely available online, and supported by Google Ads alongside the text (like a blog) that is broken up into small post-like chunks. The only thing missing is the ability to leave comments, and I'm sure that future blogified books (bloks? blooks?) will offer this and many other blog-standard features.

Update: Seems that it's "blook" - and there's even a "Blooker Prize" - about which, more anon.

22 February 2006

Wackypedia

In case you haven't seen this on digg.com yet: a Wikipedia page devoted to "unusual articles" in Wikipedia. What's impressive is not just the sheer wackiness of some of these, but how many have been brought together here.

This is of more than passing interest. As the Edit history shows, the page's length - and richness - is down to the number of people who have contributed. In a way, this page single-handedly demonstrates the real strengths of the collaborative approach when applied to appropriate domains.

10 Things to Build a Blog Readership

1: A clear idea of what you are trying to do

If you want to get and keep an audience for a blog, you need to have a clear idea about a couple of things: what you are writing about, and who for. The word “blog” may derive from “web log”, but if all you do is write a kind of haphazard online diary, only your mother will read it. Don't be fooled by the top bloggers: even though they often seem to be jotting down random thoughts about their day, there is terrific method in their madness – which is why they are top bloggers. Even if they don't articulate it, there is usually a strong underlying theme to their posts. Until you become an A-list blogger, and can do it reflexively, you need to think about what you are trying to say, and to whom, all the time.

2: Strong, interesting and consistent opinions

Having a clear idea of what you are trying to do is not enough: unless you have something interesting to contribute on the subject, people who visit your blog once will not return. The good news is that it doesn't really matter what your opinions are, provided they are strong, well-expressed – and consistent. People like to have a mental image of what a blog is doing, and where it fits in the blogosphere: once you've established a certain tone or approach, don't keeping changing it. It will only confuse your readers, who will get annoyed and move on.

3: The ability to think fast

This is related to the previous point. You not only need strong opinions, you need to be able to form them quickly. Blogs are not the right medium if you want to ruminate deeply about something and then post a 10,000 word essay some months later: time is of the essence (the “log” bit in web log). You need to be able to form snap judgements - but sensible snap judgements, and consistent with your previous posts. Although not crucial, it also helps if you can type fast: the sooner you get your opinion out there, the more likely it is to be a fresh viewpoint for your potential audience.

4: A self-critical attitude

So you've had an idea and typed it up: don't press the Publish button just yet. First of all, read through what you've written: check whether it's clear, and whether you could improve it. In particular, cut out anything unnecessary. People tend to skim through blogs, so you've got to make it as quick to read as possible. Once you've checked the whole thing (good grammar and correct spelling are not obligatory – blogs are informal, after all – but they do make it easier for visitors) ask yourself one final question: is it worth posting this? One of the hardest things to learn when blogging is the discipline of deleting sub-standard posts. A weak post leaves a bad impression, and lowers someone's opinion of the whole blog. If in any doubt, scrub the post and write another one.

5: Marketing skills

Once you've posted something you're happy with, you need to let the world know. One common misconception among fledgling bloggers is that quality will out – that the world will somehow guess you've written some deeply witty/profound/amazing post and rush to read it. It doesn't work like that. Instead, you need to get out there and sell your story. The first thing to do is to make sure that all the blog search engines know about your posting (an easy way to do this is to sign up with Ping-o-matic). Then you need to start posting on other people's blogs to drive some traffic back to yours.

There are two ways to do this. One is to post on anything about which you have an opinion: most blogs let you include your main blog address as a matter of course. The other is to include in your comment a direct link to one of your posts – only do this if your post is strictly relevant to what you are commenting on. One technique is to use a blog search engine like Technorati to find other blogs that are writing about the subject matter of your post: then visit them to see if you can make a sensible comment with a link back to your own blog. However, use this approach sparingly or your comments may simply not be posted.

6: Self-confidence

Even if you've left plenty of hints around the blogosphere that you've just put up an interesting post, you may well find that there's little activity on your blog. Relatively few people leave comments on blogs (and many of those are just publicising their own postings), so the absence of comments doesn't mean that nobody's visited (see below for a good way of tracking all visitors). But you musn't give up: around 50% of all new bloggers throw in the towel within three months of starting, so if you can last longer than this you're already ahead of the field. Moreover, the longer you keep going, the more posts there will be on your site, and the more interesting material for anyone when they do visit. Rather than viewing all the unvisited posts as waste of effort, consider them as an investment for the future.

7: Thick skin/Self-restraint

After you've waited what can seem an age to get a few comments on your posts, you may be disappointed to find that some of them are, shall we say, less than complimentary. It is a sad fact that among those most likely to take the time and trouble to write a comment on your posts are people who feel strongly that you are a complete idiot. Assuming their post is not libellous or obscene (in which case, just delete it) the best thing to do is to reply as sensibly as possible. Do not return in kind, do not retaliate, do not mock: if you descend to their level, you just look as stupid as them. If, on the other hand, you are seen to be responding in a mature and intelligent fashion, visitors will value your blog all the more highly because they will attribute those same characteristics to the rest of your site.

8: Stamina

If you are serious about blogging, you are making a major, long-term commitment. At the beginning, you will need stamina (as well as the self-confidence mentioned above) to keep churning out posts even though few seem to be reading them. But it's even worse, later on, as you gradually gain a readership. Because now there is an implicit contract between you and your visitors: they will keep reading you, and give some of their valuable time and attention if – and it's a big “if” – you continue to post. This doesn't mean every day without fail – though that would be ideal – but it does mean several posts a week, every week. Above all, you need to establish a rhythm that visitors can depend on.

9: Humility

One of the most daunting aspects of blogging is that whatever rank you achieve on Technorati or elsewhere, you are only as good as your future posts: if you begin to post less, or start skimping in your posts, you will inevitably lose the readership you have built up so laboriously. This is obviously related to the last point: you will need stamina to keep any position you have earned. But another danger is that as you rise through the blogging ranks you might start to believe in your own importance: after all, the A-list bloggers do indeed wield great power. But that power comes from the readers, and as soon as arrogance starts to creep into posts, that audience will diminish, and with it the power. It is noticeable how humble many of the top bloggers are towards their audience, constantly thanking them for their attention and loyalty.

10: A day job

Don't delude yourself: you will never make enough money from your blog to give up the day job. Just look at the A-list: almost all of them do something else as well as blog (though quite how they find the time is one of the blogosphere's great mysteries). By all means use Google's AdSense on your site – its online Reports are invaluable, because they give an up-to-the-minute figure for the number of visitors your site has had each day. But as these will show, the sad fact is you are only likely to get one click-through per thousand visitors: even with valuable keywords paying up to a dollar a time, you will need absolutely vast traffic to make a living from this.

Blogging won't ever be your main job, but it might well help you get a better one. The thing to remember about blogs is that they are a great way of marketing yourself to the world – especially the parts you never knew might be interested. This is something else that you need to keep in mind when you write your blog: the fact that at any moment a future employer may be reading it.

(These comments are based on twenty-five years of journalism, including fifteen covering the Net, colliding with a few months of active blogging. Obviously, they are very early thoughts on the matter, based on limited experience. I'd be interested in the views of other people - especially those with more blogging under their belt - on what are the things you need to build up an audience for your blog.)

21 February 2006

A Question of Value

Although not quite in the same class as Open Access News in terms of signal-to-noise ratio, Slashdot does have its uses, not least as raw entertainment (it's also quite useful for a bit of blog boosting, too....). And sometimes it throws up something that is pure gold.

A case in point tonight, with a link to a story by James Boyle in today's FT. Since that story will more likely than not sink irrevocably into the limbo of subscriber-only content, I won't even bother wasting angled brackets and Href= on it. Happily, though - and this, perhaps, is the real value of Slashdot - one of the comments linked to a much fuller version of the article's underlying arguments that is freely available on Boyle's rich if horribly-designed Website (frames, in the 21st century: can you believe it?)

The item is called, intriguingly, "A Politics of Intellectual Property: Environmentalism For the Net?" Here's a short excerpt from the opening section:

The theme of cyberpunk is that the information age means the homologisation of all forms of information -- whether genetic, electronic, or demographic. I grew up believing that genes had to do with biology, petri dishes and cells and that computers had to do with punch cards and magnetic disks. It would be hard to imagine two more disparate fields. In contrast cyberpunk sees only one issue ~ code ~ expressed in binary digits or the C's,G's, A's and T's on a gene map.

This should give a hint of just how spot-on the whole piece is.

I won't attempt to summarise the whole thing here - partly because I've not digested it fully myself, and it seems too important to vitiate with my own ham-fisted approximations, and partly because Boyle's essay is, in any case, precisely the length it needs to be for a deep analysis of a complicated domain. You might read it now, or wait until I come up with some vaguely coherent thoughts in due course: this, I most certainly will do, since the issues it touches on are central to much of what I am writing about here.

You have been warned.

20 February 2006

Open Business on Open Content

Once more, the indispensable Open Access News takes me somewhere I didn't know I wanted to go. This time it's to a site called Open Business. According to its home page:

OpenBusiness is a platform to share and develop innovative Open Business ideas - entrepreneurial ideas which are built around openness, free services and free access. The two main aims of the project are to build an online resource of innovative business models, ideas and tools, and to publish an OpenBusiness Guidebook.

At the moment it seems to be another tripod, with legs in the UK, Brazil and South Africa. Its basic form is a blog, topped off with a dash of wiki.

The link that brought me here led to an interview with Esther Dyson. I have to confess that I tend to find her a little, er, light, shall we say? But this interview was an exception, and she had some interesting background to give on Del.icio.us, in which she was an angel investor.

I'm still not entirely clear what the site is doing - either strategically or structurally - but it has pointers to stuff I wasn't aware of, so it gets brownie points for that if for nothing else. One to return to.

Freedom, in Other Words

Recently the blogosphere went slightly bonkers over a story that "the Korean government plans to select a city and a university late next month where open-source software like Linux will become the mainstream operating programs." It seems to have been the concept of a "Linux city" that really caught people's attention (though surely "Linux city" must refer to Helsinki?). But the significance of this announcement is perhaps not quite what most people think.

As the article rightly pointed out, there's nothing new in the idea of a city powered by free software: the much-ballyhooed Munich conversion to open source was along the same lines. And there are plenty of other, smaller-scale deployments that show that free software is up to the job. But there is another element in this story that is in some ways more interesting.

Yes, open source can now offer all the usual elements - operating system, browser, email, office apps - that people need for their daily computing; but even more impressively, it can do this in many languages. In other words, there is now a multi-dimensional depth to free software: not only in terms of the apps that have been created, but also with regard to the languages in which many of those apps are available natively.

For example, Firefox boasts versions in languages such as Asturian, Basque, Macedonian and Slovenian, with several others - Armenian, Gujarati and Mongolian - listed as "Not Yet Available", implying that they will be. OpenOffice.org does even better, offering versions in languages such as Albanian, Azerbaijani, Galician, Khmer and Sango (new to me). There is an even longer list of languages that includes others that are being worked on.

And if that doesn't impress you, well, you must be pretty jaded linguistically. So consider this: for many of these languages, you can download the program for three platforms: GNU/Linux, Windows and MacOS X. "Get The Facts", as Microsoft loves to say; but the ones it cares to give are partial, and they strangely omit any mention of this factual strength-in-depth that only free software can offer.

Indeed, there have been various cases where national governments of smaller nations have all but begged Microsoft to port some of its products to their language, only to be refused on the grounds that it wasn't economically "viable" to do so. Which just goes to show how complicated is the warp and weft of software and power, culture and money.

Viability has never been much of an issue for free software; if it had been, Richard Stallman would never have bothered starting it all off in the first place. As far as he is concerned, there's only one word that matters, whatever the language, and that's freedom. As the sterling work underway to produce localised versions of open source software emphasises, that includes the freedom to work in your own tongue.

17 February 2006

The Economics of Open Access Books

I've written before about open access books; but such is my sad state of excitement when I come across good examples, I feel obliged to pass on another one. The home page for the book Introductory Economic Analysis by R. Preston McAfee declares itself to be "the open source introduction to microeconomics" no less, and the online blurb explains why, and also offers some interesting thoughts on the economics of academic book publishing:

Why open source? Academics do an enormous amount of work editing journals and writing articles and now publishers have broken an implicit contract with academics, in which we gave our time and they weren't too greedy. Sometimes articles cost $20 to download, and principles books regularly sell for over $100. They issue new editions frequently to kill off the used book market, and the rapidity of new editions contributes to errors and bloat. Moreover, textbooks have gotten dumb and dumber as publishers seek to satisfy the student who prefers to learn nothing. Many have gotten so dumb ("simplified") so as to be simply incorrect. And they want $100 for this schlock? Where is the attempt to show the students what economics is actually about, and how it actually works? Why aren't we trying to teach the students more, rather than less?

(This closely mirrors Linus' own feelings about the high cost and low quality of proprietary software.)

As a consequence of this unholy alliance of greed and shoddiness, McAfee suggests:

The publishers are vulnerable to an open source project: rather than criticize the text, we will be better off picking and choosing from a free set of materials. Many of us write our own notes for the course anyway and just assign a book to give the students an alternate approach. How much nicer if in addition that "for further reading" book is free.

Introductory Economic Analysis is truly open access, as the author explains: "You are free to use any subset of this work provided you don't charge for it, and you make any additions or improvements to it available under the same terms." To be precise, it is published under the Creative Commons Attribution-NonCommercial-ShareAlike 2.0 licence. Although I've only just started working through the text, I can heartily recommend it for its pervasive clarity and gentle wit - not qualities that you normally associate with economics textbooks.

As well as for the generous gift of the book itself, I'm also grateful to McAfee for the link he gives to a site that's new to me, called The Assayer, which describes itself as "the web's largest catalog of free books." It turns out the vast majority of these are in the fields of science, maths and computing. It's not a huge collection, but since it includes titles as fine as McAfee's, things are clearly looking up in the world of open access books.

Goodbye Goobuntu, We Hardly Knew Ye

Mark Shuttleworth has finally put the ridiculous Goobuntu rumour out of its misery. The idea that Google might come out with an operating system - even one based on the very-wonderful Ubuntu - was just so crazy that I really find hard to understand why everyone - especially the blogosphere - swallowed it.

There is absolutely no fit between the commoditised operating systems market and the core competencies of Google - even if Google itself depends on GNU/Linux to keep its server farms humming. There is a world of difference between using something, and trying to sell it to everybody and their dog.

16 February 2006

There's No Such Thing as a Free...Culture

Well, Becky Hogge probably wouldn't agree. She's written a useful summary of all the IP ins and outs this year, wondering whether this will be "the year of free culture?"

I'm not holding my breath, but I was grateful to be told about the Adelphi Charter on "creativity, innovation and intellectual property", which I'd not heard of before.

Hogge concludes:

What is crucial now is that defenders of the public good vested in the democratic dissemination of information step forward to make their voices heard. Expect this columnist to return to the matter, and to amplify these voices.

I look forward to it.

15 February 2006

Can Google Measure up to Technorati?

Google has acquired Measure Map, a service that tracks visitors and links to blogs. This is of double interest to me.

First, because like all that pathetic crew known to the wider world as bloggers, I am hopelessly addicted to learning who has visited and linked to my blog (this sad human need will surely form the basis of several killer business applications - if only I could think of them...).

This acquisition places another company offering similar services, Technorati, squarely in Google's sights. It also makes Technorati rather more desirable to Google's rivals - no names, no pack drill, but you know who you are. Which brings me neatly to the second reason why this move is of interest to me, since I have an interview with Technorati's founder and CEO, Dave Sifry, in the Guardian today, which touches on many of these points.

I first interviewed Dave some six years ago, when I was writing Rebel Code. At that time, he was riding the dotcom wave with his earlier company, Linuxcare. This had come up with the wizard idea of offering third-party support for all the main open source programs that were widely used in business at the time. As a result, it had mopped up just about every top hacker outside the Linux kernel - people like Andrew Tridgell, the creator of Samba, a program that allows GNU/Linux machines to interoperate with Windows networks by acting as a file and printer server.

There is a certain irony in the fact that Google will now be a competitor to Sifry's Technorati, since in two important respects Linuxcare anticipated a key Google practice: mopping up those hackers, and then encouraging them to work on ancillary projects on company time.

Sifry's explanation back in 2000 of the logic behind this approach throws some interesting light on Google's adoption of the idea:

Number one, it encourages us to get the best developers in the world. When you are actually telling people, hey, I want you to work on open source software while you're at work, , that is pretty unique. And then once you get some [of the best coders], you end up getting more. Because everybody wants to work with the best people. Number two is, the more good open source software that's out there, the more people who are using open source software [there are]. And guess what, that means the more people who are going to need the services of a company like Linuxcare. Number three, when you encourage people to work on open source software while they're at work, you end up getting leaders and teams of open source engineers who now work for your company. And then lastly, but by no means least, it's great PR.

It was good to talk to Dave again, because I found that he hadn't really changed from the lively, enthusiastic, generous individual I'd discovered those years ago. It was particularly good to find that success - as I say in my Guardian piece, Technorati is either going to be bought by someone for lots of money, or make lots of money with an IPO soon - hasn't changed any of that.

More conclusive proof, if any were needed, that free software really is good for the soul.

14 February 2006

Microsoft and Open Source: Two Tales

One is about Microsoft joining with SugarCRM, which produces open source customer relation management software, "to enhance interoperability between their respective Windows Server and SugarCRM products." A key point of this "technical collaboration" is that SugarCRM becomes "the first commercial open source application vendor to adopt the Microsoft Community License."

The latter is an interesting beast. It forms one of several Shared Source Licenses - "Shared Source" being Microsoft's attempt to counter the good vibes and reap some of the proven benefits enjoyed by open source's development methodology. Despite its name hinting otherwise, the Microsoft Community License is not included in the official list of open source licences.

One of the most interesting documents to come out of the Shared Source initiative is called "Microsoft and Open Source". What it presents is Microsoft's public analysis of free software; particularly fascinating is the wedge that it attempts to drive between what it calls "commercial and non-commercial segments". The key paragraph is as follows:

A common misperception about software developed under the open source model is that a loosely-coupled group of distributed developers is creating the software being adopted by business. Although this is true for some smaller projects, the reality is that professional corporate teams or highly structured not-for-profit organizations are driving the production, testing, distribution, and support of the majority of the key OSS technologies.

What Microsoft's document fails to note is that those "professional corporate teams" and "highly structured not-for-profit organizations" are largely filled with hackers: the fact that they are in some cases receiving good salaries instead of the usual love and glory is neither here nor there. The hacker spirit is not tamed just because it wears a corporate hat (just ask Alan Cox).

What is interesting about this is that it shows Microsoft attempting to co-opt the bulk of open source as part of the commercial world, implicitly arguing that it is the commercialism, not the open sourceness that really counts when it comes to great coding. Microsoft, needless to say, has commercialism in abundance, and so - the fallacious syllogism implies - it must be knocking out some damn fine code.

More generally, Microsoft's continuing efforts to cozy up to open source - as with the SugarCRM announcement - are really just further attempts to blur the distinction between the two, to imply that it's six of one and half a dozen of the other, and so to diminish the attractive exoticism of what has hitherto been perceived as a radically different approach.

Which brings us to the second tale.

Daniel Robbins may not be the best-known name in the free software world, but his creation, the Gentoo GNU/Linux distribution, possesses an undeniable popularity - and a certain cachet, given its reputation as being not for the faint-hearted. So when he announced that he was joining Microsoft's new Linux and Open Source Lab in order to help the company "understand open source", there was a sharp intake of collective breath around the free software world.

The news that Robbins is now leaving Microsoft, less than a year after he joined, is therefore likely to produce some similarly audible sighs of relief from that community. Although it is not entirely clear what happened, it is not hard to guess that Microsoft's desire to understand open source was not with a view to entering upon a harmonious relationship between equals.

In other words, the two tales are but one: that Microsoft is applying its considerable corporate intelligence to the conundrum of open source - hitting it, probing it, squeezing it, stroking it, modelling it, copying it, engaging with it, even - to find out where are its edges, what it's made of, how it works; and how it might be defeated.

For all the sweet talk in Microsoft's open source document mentioned above, no one should be in any doubt about the company's real objectives in all this. Its view that open source is some crazy, anti-capitalist, crypto-communistic canker, though not expressed quite so vehemently in public, is surely just as deeply held in private by its core managers (though not necessarily but its coders) as it ever was. Everything else is just a story for the children.

13 February 2006

XML Made Extravagant and Extraordinary

One of the most interesting areas in the world of open standards is the OpenDocument format, which promises to do to Microsoft Office what GNU/Linux is doing to Windows Server. I'm on various mailing lists related to this, and on one of them, from the standards body OASIS, a press release turned up in my inbox today. It proudly informed me that "its members have approved the Election Markup Language (EML) version 1.0 as an OASIS Standard, a status that signifies the highest level of ratification."

The OASIS press release told me that "EML provides a high-level overview of the processes within an electronic voting system and XML schemas for the various data interchange points between the e-voting processes," but naturally I wanted more than this dry description. So I went off to find out more. And the place I turned to was one of the most extraordinary sites on the Internet: the Cover Pages (hosted, in fact, by the self-same OASIS).

This, basically, is the fount of all wisdom for XML standards. And since XML lies at the heart of open data (and OpenDocument), this makes the Cover Pages one of the central sites for the open world. Naturally, it had all the details on EML. And here to whet your appetite are a few more of the XML Applications listed:

Weather Markup Language
Intrusion Detection Message Exchange Format
Historical Event Markup and Linking
Open Philanthropy Exchange
Green Building XML
Robotic Markup Language
Meaning Definition Language

The only question I have is how one man - since the Cover Pages seem to be the work of Robin Cover - can possibly stay on top of what seems to be all human life, neatly expressed as an XML application. Gaze, wonder and be grateful.

10 February 2006

Scrying an Oracle

This story has so many interesting elements in it that it's just got to be true.

According to Business Week, Oracle is poised to snap up no less than three open source companies: JBoss, Zend and Sleepycat Software. JBoss - which calls itself the "professional open source company", making everyone else unprofessional, I suppose - is one of the highest-profile players in this sector. Not least because its founder, the Frenchman Marc Fleury, has a tongue as sharp as his mind (you can sample his blog with this fab riff on genomics, Intelligent Design and much else).

His controversial remarks and claims in the past have not always endeared him to others in the free software world. Take, for example, the "disruptive Professional Open Source model" he proudly professes, "which combines the best of the open source and proprietary software worlds to make open source a safe choice for the enterprise and give CIOs peace of mind." Hmm, I wonder what Richard Stallman has to say about that.

JBoss has been highly successful in the middleware market: if you believe the market research, JBoss is the leader in the Java application server sector. Oracle's acquisition would make a lot of sense, since databases on their own aren't much fun these days: you need middleware to hook them up to the Internet, and JBoss fits the bill nicely. It should certainly bolster Oracle in its battle against IBM and Microsoft in the fiercely-fought database sector.

While many might regard the swallowing up of an ambivalent JBoss by the proprietary behemoth Oracle as just desserts of some kind, few will be happy to see Zend suffer the same fate. Zend is the company behind the PHP scripting language - one of the most successful examples of free software. (If you're wondering, PHP stands for "PHP: Hypertext Preprocessor" - employing your standard hacker recursive acronym naming convention).

Where JBoss is mostly key for companies running e-commerce Web sites, say, PHP is a core technology of the entire open source movement. Its centrality is indicated by the fact that it is one of the options for the ubiquitous LAMP software stack: Linux/Apache/MySQL/PHP or Perl or Python. The fact that Oracle will own the engine that powers PHP will be worrying for many in the free software world.

About Sleepycat, I can only say: er, who? - but that's just ignorance on my part. This article explains that Sleepycat's product, Berkeley DB, is actually the "B" in LAMP. Got that? The Sleepycat blog may throw some more light on this strange state of affairs - or maybe not.

Whatever the reason that Oracle wants to get its mitts on Sleepycat as well as Zend and JBoss, one thing is abundantly clear if these rumours prove true: Oracle is getting very serious about open source.

In the past, the company has had just about the most tortuous relationship with open source of any of the big software houses. As I wrote in Rebel Code, in early July 1998, an Oracle representative said "we're not seeing a big demand from our customer that we support it" - "it" being GNU/Linux. And yet just two weeks later, Oracle announced that it was porting Oracle8 to precisely that platform. This was one of the key milestones in the acceptance of free software by business: no less a person than Eric Raymond told me that "the Oracle port announcement...made the open source concept unkillable by mere PR" - PR from a certain company being a big threat in the early days of corporate adoption.

Open source has come on by leaps and bounds since then, and these moves by Oracle are not nearly so momentous - at least for free software. But I wonder whether the otherwise canny Larry Ellison really knows what he's getting into.

Until now, Oracle has mainly interacted with open source through GNU/Linux - that is, at arm's length. If it takes these three companies on board - especially if it acquires Zend - it will find itself thrown into the maelstrom of open source culture. Here's a hint for Mr Ellison: you don't get to assimilate that culture, whatever you might be thinking of doing with the companies. You either work with it, or it simply routes around you.

Yes, I'm talking about forks here: if Oracle misplays this, and tries to impose itself on the PHP or JBoss communities, I think it will be in for a rude surprise. To its credit, IBM really got this, which is why its embrace of open source has been so successful. Whether Oracle can follow in its footsteps, only time will tell.

But the rumoured acquisitions, if they go ahead, will have one other extremely significant effect. They will instantly add credibility, viability and desirability to a host of other second-generation open source companies that have grown up in the last few years. Free software will gain an immediate boost, and hackers will suddenly find themselves in great demand again.

Given the astonishing lift-off of Google's share price, and the palpable excitement surrounding Web 2.0 technologies (and the start-ups that are working on them), the hefty price-tags on open source companies being bandied around in the context of Oracle have a feeling of déjà-vu all over again: didn't we go through all this with Red Hat and VA Linux a few years back?

You don't have to be clairvoyant - or an oracle - to see that if these deals go through, the stage is well and truly set for Dotcom Delirium 2.0.

08 February 2006

Rich Media Search

Two rather large straws in the wind: Riya - which offers face recognition in pictures with automatic tagging - and Nexidia, which claims to be "the only technology to fully leverage the actual phonemes that define human speech", creating a "highly scalable tool for audio mining and speech analytics."

Assuming these deliver on their promises, they could be very big. Imagine being able to let your PC search through thousands of photos for people - or, more interestingly, groups of people; imagine being able to upload thousands of hours of recorded conversation, and find just the spoken phrase you were looking for.

Combine this with the continuing drop in the price of hard drives - and hence the ability to store as many photos and recordings as you like - and you are getting close to a situation where you could archive all the significant visual and aural moments in your life - and find them again.

In Praise of Google Utopianism

An executive at one of the main US telecommunication companies, Verizon, has decried the fact that:

"The network builders [i.e. telecommunication companies] are spending a fortune constructing and maintaining the networks that Google intends to ride on with nothing but cheap servers."

Excuse me, isn't that what Google pays for when it buys connectivity to the Internet, and what you and I pay for when we sign up for Internet access (like here, for example)? So, in fact, the networks are already being paid twice for this service. Isn't that enough?

The only consolation is that this bizarre accusation has given birth to the rather wonderful concept of "Google Utopianism" to describe this state of affairs. Me? - I call that the latest incarnation of the Internet, personally, where those naughty "cheap servers" - mostly running GNU/Linux - are creating a vast range of exciting new services.

If that's Google Utopianism, you can call me Sir Thomas.

Word of the Week: Podfading

Podfading describes the kind of burn-out that podcasters are prone to - when the effort of recording yet another podcast proves too much, and they just give up.

Of course, the same effect can be observed more generally in the blogosphere. According to Dave Sifry, in his latest State of the Blog Nation analysis, "13.7 million bloggers are still posting 3 months after their blogs are created." Since there are around 27.2 million blogs according to Sifry, this means that nearly 50% aren't still posting after 3 months. Blogfading, anyone?

07 February 2006

The Horror! The Horror!

Can any fellow-countrymen (or anyone, for that matter) explain to me why the UK now sits wallowing at the bottom of the Firefox market share table? Only 11% of us Brits use Firefox, compared to 38% of the enlightened Finn nation, and against a Europe-wide average of 20%. Even the North Americans use it more than we do.

Why is this?