23 December 2007

Beaten to the Blog

News that IBM was buying Solid Information Technology, a company with close ties to MySQL, set off a distant bell ringing in my head in connection with something I'd written a while back, but I didn't have the time to pursue it.

Now, it seems, I don't need to:

When [Monty Widenius] started MySQL, I worked for this other small database company, Solid Information Technology. I told Monty that his project was just going to fail, and that it was a stupid thing to do, and that he didn't have a chance because we had a chance.

GM: What was your view of the Free Software world when you were at Solid--were you even aware of it?

MM: I was getting more aware of it, and I was getting excited about it. At Solid, I drove an initiative of not open-sourcing the product, but making it very popular on the Linux platform--and that was why I was an advertiser in Linux Journal, because we were the leading Linux database in the world in 1996. We gave it away free of charge, so we had taken a step in that direction.

Then Solid decided to cancel the project and just focus on high-end customers, and that's when I left the company. So in that sense, when I got to MySQL, I had some unfinished business. By that time, I had completely bought into the notion of code being open.

Thanks, Matt, for beating me to it....

22 December 2007

What's Up, UOF Doc?

The battle for the soul of the document is usually presented as a two-horse race between ODF and OOXML. But that's a very parochially Western view of things - there is, after, a third format available: UOF, China's "Uniform Office Document Format", which I've written about several times before. If, like me, you were wondering what's happening in that world, he's a short update from Andy Updegrove.

Citizendium Goes CC-BY-SA

Good news:

In a much-awaited move, the non-profit Citizendium (http://www.citizendium.org/) encyclopedia project announced that it has adopted the Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-by-sa) as the license for its own original collaborative content. The license permits anyone to copy and redevelop the thousands of articles that the Citizendium has created within its successful first year.

The license allows the Citizendium to join the large informal club of free resources associated especially with Creative Commons and the Free Software Foundation. Wikipedia uses the FSF’s GNU Free Documentation License (GFDL), which is expected to be made fully compatible with CC-by-sa in coming months. Therefore, Wikipedia and the Citizendium will be able to exchange content easily. A minority of Citizendium articles started life on Wikipedia and so have been available under the GFDL.

Avoiding a Balkanisation of the digital content commons through incompatible licences is critically important.

21 December 2007

Kids Today - The People Tomorrow

Nice story here:

I just could not find a spot on the spectrum that would trigger these kids' morality alarm. They listened to each example, looking at me like I was nuts.

Finally, with mock exasperation, I said, "O.K., let's try one that's a little less complicated: You want a movie or an album. You don't want to pay for it. So you download it."

There it was: the bald-faced, worst-case example, without any nuance or mitigating factors whatsoever.

"Who thinks that might be wrong?"

Two hands out of 500.

Now, maybe there was some peer pressure involved; nobody wants to look like a goody-goody.

Maybe all this is obvious to you, and maybe you could have predicted it. But to see this vivid demonstration of the generational divide, in person, blew me away.

I don't pretend to know what the solution to the file-sharing issue is. (Although I'm increasingly convinced that copy protection isn't it.)

Er, David, it's called changing the business model. It is just not sustainable to try to enforce analogue-type laws on digital content, and ultimately it's counterproductive - as the music industry is finding to its cost.

Samba's Big Step

On Open Enterprise blog.

Pootling Around with PDFs

I'm no big fan of PDFs, but if you've got to use them you may as well do it properly with some open source tools, such as those included here.

Hypocrisy, Thy Name is Gambling

John Naughton points us to a nicely-written piece by John Lanchester about the way the City - and its global mates - work using derivatives to the tune of $85,000,000,000,000 (sorry, no mistake in that number of zeroes.)

It's a long piece because it's describing something that's complicated - sometimes made intentionally more complicated by the banking industry for the purposes of obfuscation - but at its heart it amounts to a very simple thing: gambling. As Lanchester writes:

The list of individual traders who have lost more than a billion dollars at a time betting on derivatives is not short: Robert Citron of Orange County, Toshihide Iguchi at Daiwa, Yasuo Hamanaka at Sumitomo and Nick Leeson at Barings, just to take examples from the early 1990s. In Leeson’s case in 1995, it was a huge unauthorised position in futures on the Nikkei 225, the main Japanese stock exchange. Leeson had been doubling and redoubling his bets in the belief/hope that the index would rise, and hiding the resulting open position – a gigantic open-ended bet – in a secret account. (Incidentally, Leeson’s big bet was on the Nikkei holding its level above 18,000. At the time of writing, 121/2 years later, the index sits at 15,454 – proof, if it were needed, that when prices go down they can stay that way for a long time.) The loss eventually amounted to £827 million, and destroyed Barings, Britain’s oldest merchant bank.

Got that? These are bets, pure and simple, on the way that things will work out. You can dress them up as you will, you can complexify them as you will, but at bottom they are simply gambles.

Now, add that fact to the distasteful sight of the US - a country that probably uses derivatives more than any other, and also probably makes more money from derivatives than any other, trying to stop online gambling with non-US companies - for example by buying off pathetically greedy entities like the EU:

The United States has reached a deal with the European Union, Japan and Canada to keep its Internet gambling market closed to foreign companies, but is continuing talks with India, Antigua and Barbuda, Macau and Costa Rica, U.S. trade officials said on Monday.

Since I'm no expert on derivatives, I don't know the extent to which you can buy them online from anyone anywhere, but I would be utterly astonished if you couldn't (and this suggests you can.) So you have a fundamental cognitive dissonance between the extraordinary use of derivatives worldwide, and the US attempt to ban online gambling though non-US companies.

Maybe the idea is that only the ultra-rich should be allowed to gamble wherever they want.

Steve, the Artful Tagger

Folksonomies - the ad hoc tagging by anyone of anything - sound terribly democratic compared to your top-down authoritarian imposition of taxonomies, but it's easy to see why people are sceptical about them: how can anything useful arise out of something so chaotic?

Del.icio.us is one example of how such folksonomies can be really useful, and here's another (and note the groovy .museum domain - the first time I've seen this):

"Steve” is a collaborative research project exploring the potential for user-generated descriptions of the subjects of works of art to improve access to museum collections and encourage engagement with cultural content. We are a group of volunteers, primarily from art museums, who share a common interest in improving access to our collections. We are concerned about barriers to public access to online museum information. Participation in steve is open to anyone with a contribution to make to developing our collective knowledge, whether they formally represent a museum or not.

Very cool - both in terms of adding metadata to objects, and as far as getting the public involved with art. Indeed, this idea should really be extended to everything - imagine a database of public places that people could tag.

Great idea, then, but why "Steve"?

So Farewell, Then, Matthew Szulik

The announcement that Red Hat's CEO and President, Matthew Szulik, is moving on (back?) to become its Chairman, is obviously pretty big news, since Szulik has led the company for nearly a decade, a long time in the still-young open source world. His valedictory message is well worth a read; I particularly liked the following section:

My early days at Red Hat were sitting in small office with no door in Durham, NC, across from the free soda machine. People by the hour would stop and punch their selection for Mountain Dew or Coke. My challenge was that I was tasked to go and raise venture money for this free software company. And over the phone, in the middle of my sales pitch, corporate types at Dell, IBM and HP and others would hear the constant banging of soda cans dropping in the soda machine and would ask if there were fights going on outside my office. So, after a while, I told the prospective investors that YES there were fights going on. And yes, these fights happened frequently. It’s how people at Red Hat settled technical issues likes software bugs and features in new releases. Red Hat was a real tough place to work. Dell, HP and IBM became investors because they liked the fighting spirit of Red Hat.

Says it all, really.

20 December 2007

RMS Tells It As It Is

Nice to hear it from the, er, horse's mouth:

A patent is an artificial government-imposed monopoly on implementing a certain method or technique. If the method or technique can be implemented by software, so that the patent prohibits the distribution and use of certain programs, we call it a software patent.

Norway's Beautiful Plumage...

...openness:

Regjeringa har vedteke at all informasjon på statlege nettsider skal vere tilgjengeleg i dei opne dokumentformata HTML, PDF eller ODF. Tida der offentlege dokument berre var tilgjengelege i Microsofts Word-format vil med det gå mot slutten.

I particularly liked the last sentence, which is basically a gratuitous kick where it hurts for Microsoft. (Via The Open Sourcerer.)

19 December 2007

Wikimedia, Der Blog

The Wikimedia Foundation has a blog - or, rather, ein blog.

Happy Birthday, Perl

Er, yesterday....

The VC Floodgates Open for Drupal and Acquia

On Open Enterprise blog.

UK To Have Biggest Population in Europe?

Here's a curious thing or two:

The UK population could almost double over the coming 75 years, according to official government projections.

The previously unpublished figures suggest the British population could hit almost 110m in 2081, if immigration fertility and longevity rates are high.

The figures are higher than those released just a month ago by the Office for National Statistics.

In October, the ONS projected the population could go from around 60m today to as high as 77m in 2051.

A *conservative* estimate for 2051 is 77m, while on the high side we have:

According to the ONS, if all of these factors were on the high side over the coming decades, the population across the UK would hit 91,053,000 by the middle of the century

Got that? Between 77 and 91 million?

Now take a look at this table, which shows Germany, with currently the biggest European population, shrinking to 74 million in 2050.

Funny old world, innit? (Via Andrew Leonard.)

Can We Avoid the Great Schism?

At Linux Journal.

18 December 2007

More Icing on the SugarCRM Cake

On Open Enterprise blog.

Wikipedia Goes Open...

OpenDocument, that is:


The third stage, planned for mid-2008, will be the addition of the OpenDocument format for word processors to the list of export formats. "Imagine that you want to use a set of wiki articles in the classroom. By supporting the OpenDocument format, we will make it easy for educators to customize and remix content before printing and distributing it from any desktop computer," Sue Gardner explained.

The first stage, in case you were wondering,

is a public beta test running on WikiEducator.org of functionality for remixing collections of wiki pages and downloading them in the PDF format.

while the second stage is

the deployment of the technology on the projects hosted by the Wikimedia Foundation, including Wikipedia. At this point, users will also be given the option to order printed copies of wiki content directly from PediaPress.com. "The integration into Wikipedia will be a milestone for print-on-demand technology. Users will literally be empowered to print their own encyclopedias", according to Heiko Hees, product manager at PediaPress.com.

Hmm, well, maybe: I think the amount of work involved might make buying an encyclopaedia rather more attractive.... (Via Open Access News.)

New Creative Commons Licences

I mentioned en passant the new CCZero licence, but here's news of yet another:

CC+ is a protocol to enable a simple way for users to get rights beyond the rights granted by a CC license. For example, a Creative Commons license might offer noncommercial rights. With CC+, the license can also provide a link to enter into transactions beyond access to noncommercial rights — most obviously commercial rights, but also services of use such as warranty and ability to use without attribution, or even access to physical media.

Coincidence? I Don't Think So

Here's a nice analysis of what makes today's Internet services tick:

Dopplr can show me when a distant friend will be near and vice versa. Twitter can show me what my friends are doing right now. Wesabe can show me what others have learned about saving money at the places where I spend my money. Among many other things Flickr can show me how to look differently at the things I see when I take photos. And del.icio.us can show me things that my friends are reading every day.

It's all about making connections, creating a community and finding a commonality. The post calls this "surfacing coincidences" but I think that "coincidence" is the wrong word, since it suggests something random and casual; what we're talking about is an action that is much more directed: people looking for like-minded, like-thinking, like-doing people. (Via John Battelle.)

What's the Use of Free Software?

On Open Enterprise blog.

Are Closed Source Databases Doomed?

On Open Enterprise blog.

17 December 2007

Mindquarry Dies - and Lives!

Mindquarry, which

provides a powerful set of collaborative tools for use online and offline to streamline teamwork between information workers in small to large enterprises.

has just announced that it

will stop providing commercial services and products.

That's regrettable, obviously. But there's some good news too:

The Mindquarry GO and Mindquarry PRO products will be discontinued as of today. Our Open Source product will remain publicly available (see below for more information). To those with a Mindquarry GO Beta account, we now offer the possibility to migrate their data to the Open Source version of Mindquarry. This means that they can install Mindquarry themselves and use existing data from their Mindquarry GO Beta instance. Please write to support@mindquarry.com if you want us to extract your data from Mindquarry GO Beta to send it to you.
Keeping our Open Source software alive

Our developers team is currently working on finishing the Mindquarry 1.2beta release, which will be available around end of October. Beginning with 1.2beta, Mindquarry source code will be hosted on Sourceforge as well as the mindquarry.com Web site. Hence, our software as well as all necessary information such as installation documentation and forum discussions will still be available. Further details and links will be available in the next and probably final Mindquarry community newsletter.

This is an object lesson in one of free software's great virtues: whatever happens, the code lives on. This means that even commercial customers can migrate to free versions where they have been paying for other varieties. (Via NetworkWorld.)

Open Access Data - A Question of Protocol

Something calling itself a “Protocol for Implementing Open Access Data” sounds about as exciting as a list of ingredients for paint. But this memo from the Science Commons is one of the most important documents in this field to date. Its scope is explained in the opening paragraph:

This memo provides information for the Internet community interested in distributing data or databases under an “open access” structure. There are several definitions of “open” and “open access” on the Internet, including the Open Knowledge Definition and the Budapest Declaration on Open Access; the protocol laid out herein is intended to conform to the Open Knowledge Definition and extend the ideas of the Budapest Declaration to data and databases.

Again, that may not sound very exciting, but trying to come up with definitions of “open data” or “open access data” have proved extraordinarily hard, and in the course of the memo we learn why:
3. Principles of open access data
Legal tools for an open access data sharing protocol must be developed with three key principles in mind:
3.1 The protocol must promote legal predictability and certainty.
3.2 The protocol must be easy to use and understand.
3.3 The protocol must impose the lowest possible transaction costs on users.


These principles are motivated by Science Commons’ experience in distributing a database licensing Frequently Asked Questions (FAQ) file. Scientists are uncomfortable applying the FAQ because they find it hard to apply the distinction between what is copyrightable and what is not copyrightable, among other elements. A lack of simplicity restricts usage and as such restricts the open access flow of data. Thus any usage system must both be legally accurate while simultaneously very simple for scientists, reducing or eliminating the need to make the distinction between copyrightable and non-copyrightable elements.

The terms also need to satisfy the norms and expectations of the disciplines providing the database. This makes a single license approach difficult – archaeology data norms for citation will differ from those in physics, and yet again from those in biology, and yet again from those in the cultural or educational spaces. But those norms must be attached in a form that imposes the lowest possible costs on users (now and in the future).

The solution is at once obvious and radical:

4. Implementing the Science Commons Database Protocol for open access data
4.1 Converge on the public domain by waiving all rights based on intellectual property

The conflict between simplicity and legal certainty can be best resolved by a twofold measure: 1) a reconstruction of the public domain and 2) the use of scientific norms to express the wishes of the data provider.

Reconstructing the public domain can be achieved through the use of a legal tool (waiving the relevant rights on data and asserting that the provider makes no claims on the data).

Requesting behavior, such as citation, through norms and terms of use rather than as a legal requirement based on copyright or contracts, allows for different scientific disciplines to develop different norms for citation. This allows for legal certainty without constraining one community to the norms of another.

Thus, to facilitate data integration and open access data sharing, any implementation of this protocol MUST waive all rights necessary for data extraction and re-use (including copyright, sui generis database rights, claims of unfair competition, implied contracts, and other legal rights), and MUST NOT apply any obligations on the user of the data or database such as “copyleft” or “share alike”, or even the legal requirement to provide attribution. Any implementation SHOULD define a non-legally binding set of citation norms in clear, lay-readable language.

The solution is obvious because the public domain is the zero state of copyright (in fact, the new Creative Commons public domain licence is called simply CCZero.) It is radical because previous attempts have tried to build on the evident success of the GNU GPL by taking a kind of copyleft approach: using copyright to limit copyright. But the new protocol explicitly negates the use of both GPL's copyleft and the Creative Commons Sharealike licences because, minimal as they are, they are still too restrictive – even though they are both predicated on maximising sharing.

One knock-on consequence of this is that attribution requirements are out. This is not just a matter of belief or principle, but of practicality:

In a world of database integration and federation, attribution can easily cascade into a burden for scientists if a category error is made. Would a scientist need to attribute 40,000 data depositors in the event of a query across 40,000 data sets? How does this relate to the evolved norms of citation within a discipline, and does the attribution requirement indeed conflict with accepted norms in some disciplines? Indeed, failing to give attribution to all 40,000 sources could be the basis for a copyright infringement suit at worst, and at best, imposes a significant transaction cost on the scientist using the data.

It is this pragmatism, rooted in how science actually works, that makes the current protocol particularly important: it might actually be useful. It's also significant that it plugs in to previously existing work in related fields. For example, as the accompanying blog post explains:

We are also pleased to announce that the Open Knowledge Foundation has certified the Protocol as conforming to the Open Knowledge Definition. We think it’s important to avoid legal fragmentation at the early stages, and that one way to avoid that fragmentation is to work with the existing thought leaders like the OKF.

Moreover, the protocol has already been applied in drawing up another important text, the Open Data Commons Public Domain Dedication & Licence:

The Open Data Commons Public Domain Dedication & Licence is a document intended to allow you to freely share, modify, and use this work for any purpose and without any restrictions. This licence is intended for use on databases or their contents (”data”), either together or individually.

Many databases are covered by copyright. Some jurisdictions, mainly in Europe, have specific special rights that cover databases called the “sui generis” database right. Both of these sets of rights, as well as other legal rights used to protect databases and data, can create uncertainty or practical difficulty for those wishing to share databases and their underlying data but retain a limited amount of rights under a “some rights reserved” approach to licensing. As a result, this waiver and licence tries to the fullest extent possible to eliminate or fully license any rights that cover this database and data.

Again, however dry and legalistic this stuff may seem it's not: we're talking about the rigorous foundations of new kinds of sharing - and we all know how important and powerful that can be.

Update: John Wilbanks has pointed me to his post about the winnowing process that led to this protocol - fascinating stuff.