Showing posts with label science commons. Show all posts
Showing posts with label science commons. Show all posts

11 March 2009

Open Science, Closed Source

One of the things that disappoints me is the lack of understanding of what's at stake with open source among some of the other open communities. For example, some in the world of open science seem to think it's OK to work with Microsoft, provided it furthers their own specific agenda. Here's a case in point:

John Wilbanks, VP of Science for Creative Commons, gave O'Reilly Media an exclusive sneak preview of a joint announcement that they will be making with Microsoft later today at the O'Reilly Emerging Technology Conference.

According to John, who talked to us shortly after getting off a plane from Brazil, Microsoft will be releasing, under an open source license, Word plugins that will allow scientists to mark up their papers with scientific entities directly.

"The scientific culture is not one, traditionally, where you have hyperlinks," Wilbanks told us. "You have citations. And you don't want to do cross-references of hyperlinks between papers, you want to do links directly to the gene sequences in the database."

Wilbanks says that Science Commons has been working for several years to build up a library of these scientific entities. "What Microsoft has done is to build plugins that work essentially the same way you'd use spell check, they can check for the words in their paper that have hyperlinks in our open knowledge base, and then mark them up."

That might sound fine - after all, the plugins are open source, right? But no. Here's the problem:

Wilbanks said that Word is, in his experience, the dominant publishing system used in the life sciences, although tools like LaTex are popular in disciplines such as chemistry or physics. And even then, he says it's probably the place that most people prepare drafts. "almost everything I see when I have to peer review is in a .doc format."

In other words, he doesn't see any problem with perpetuating Microsoft's stranglehold on word processing. But it has consistently abused that monopoly by using its proprietary data formats to lock out commercial rivals or free alternatives, and push through pseudo-standards like OOXML that aren't truly open, and which have essentially destroyed ISO as a legitimate forum for open standards.

Working with Microsoft on open source plugins might seem innocent enough, but it's really just entrenching Microsoft's power yet further in the scientific community, weakening openness in general - which means, ultimately, undermining all the other excellent work of the Science Commons.

It would have been far better to work with OpenOffice.org to produce similar plugins, making the free office suite even more attractive, and thus giving scientists yet another reason to go truly open, with all the attendant benefits, rather than making do with a hobbled, faux-openness, as here.

Follow me on Twitter @glynmoody

23 July 2008

Open Access to Drugs (Data)

Here's an interesting confluence of trends:


The Wellcome Trust has awarded £4.7 million [€5.8 million] to EMBL's European Bioinformatics Institute [EMBL-EBI] to support the transfer of a large collection of information on the properties and activities of drugs and a large set of drug-like small molecules from the publicly listed company Galapagos NV to the public domain. It will be incorporated into the EMBL-EBI's collection of open-access data resources for biomedical research and will be maintained by a newly established team of scientists at the EMBL-EBI.

So here we have commercial drugs data being put into the public domain - no restrictions - and managed by one of the key public databases.

The transfer will empower academia to participate in the first stages of drug discovery for all therapeutic areas, including major diseases of the developing world. In future it could also result in improved prediction of drug side-effects.

Given that the current, capital-intensive method of drug development, which is highly skewed to coming up with drugs for rich, obese Westerners, this openness to all is important: it means that one of the key barriers to discovering new therapies is down, in part, at least.

And as Peter Suber rightly notes:

Kudos to Galapagos and Wellcome Trust not only for opening these data, but for choosing the public domain rather than a license. This fits with Science Commons' latest thinking on barrier-free research and collaboration in the Protocol for Implementing Open Access Data.

Public domain redux....

26 May 2008

The Healthiest Kind of Commons

Creating a commons is all about sharing, and there can be few areas where sharing is more mutually beneficial than health. After all, everyone aspires to good health, and the best way to get that is to pool what we know. Surprisingly, that doesn't happen as much as it could at the moment, because antiquated ways of looking at medical knowledge - shaped by pharmaceutical companies - try to enclose as much of the commons as possible.

Happily, others are fighting that tendency. Here's the latest manifestation, called the Health Commons, from the same bunch of idealistic nutters that brought you the Science Commons:

Health Commons is a coalition of parties interested in changing the way basic science is translated into the understanding and improvement of human health. Coalition members agree to share data, knowledge, and services under standardized terms and conditions by committing to a set of common technologies, digital information standards, research materials, contracts, workflows, and software. These commitments ensure that knowledge, data, materials and tools can move seamlessly from partner to partner across the entire drug discovery chain. They enable participants to offer standardized services, ranging from simple molecular assays to complex drug synthesis solutions, that others can discover in directories and integrate into their own processes to expedite development — or assemble like LEGO blocks to create new services.

The Health Commons is too complex for any one organization or company to create. It requires a coalition of partners across the spectrum. It is also too complex for public, private, or non-profit organizations alone - reinventing therapy development for the networked world requires, from the beginning, a commitment to public-private partnership. Only through a public-private partnership can the key infrastructure of the Commons be created: the investments in the public domain of information and materials will only be realized if that public domain is served by a private set of systems integrators and materials, tools and service providers motivated by profit. And in turn, the long-term success of the private sector depends on a growing, robust, and self-replenishing public domain of data, research tools, and open source software.

Good to see open source being mentioned explicitly here: it does, indeed, form the basis of all these commons efforts, because it provides a completely flexible infrastructure that is also completely free.

27 April 2008

John Wilbanks on the Knowledge Web

Here's a nice meditation from Science Commons' John Wilbanks on openness, access and innovation, which includes the following thoughts on the "knowledge web":

Just to be clear, here’s what I mean by a knowledge web: it’s when today’s web has enough power to work as well for science as it currently works for culture. That means databases are integrated as easily as web documents, and it means that powerful search engines let scientists ask complex research questions and have some comfort that they’re seeing all the relevant public information in the answers. A knowledge web is when journal articles have hyperlinks inside them, not just citations, letting systems like Google do their job properly.

A knowledge web is predicated on access, and not control, of knowledge. There will never be a competition to provide the best single-point query to the full-text of journals without access- unless the journals all merge down into one company. That’s the only way a controlled system covers the whole world, through monopoly. There will never be a knowledge web where the entire backfile is hyperlinked to databases for relevance based indexing without access. Scientists won’t get to use the newest and best technologies until those companies that control knowledge decide to adopt those technologies. Control is the enemy of testing the newest technologies, of building one’s own system to suit one’s own needs. We have to have access to build a knowledge web, at least if we hope to replicate the success of the regular Web and the Internet.

17 December 2007

Open Access Data - A Question of Protocol

Something calling itself a “Protocol for Implementing Open Access Data” sounds about as exciting as a list of ingredients for paint. But this memo from the Science Commons is one of the most important documents in this field to date. Its scope is explained in the opening paragraph:

This memo provides information for the Internet community interested in distributing data or databases under an “open access” structure. There are several definitions of “open” and “open access” on the Internet, including the Open Knowledge Definition and the Budapest Declaration on Open Access; the protocol laid out herein is intended to conform to the Open Knowledge Definition and extend the ideas of the Budapest Declaration to data and databases.

Again, that may not sound very exciting, but trying to come up with definitions of “open data” or “open access data” have proved extraordinarily hard, and in the course of the memo we learn why:
3. Principles of open access data
Legal tools for an open access data sharing protocol must be developed with three key principles in mind:
3.1 The protocol must promote legal predictability and certainty.
3.2 The protocol must be easy to use and understand.
3.3 The protocol must impose the lowest possible transaction costs on users.


These principles are motivated by Science Commons’ experience in distributing a database licensing Frequently Asked Questions (FAQ) file. Scientists are uncomfortable applying the FAQ because they find it hard to apply the distinction between what is copyrightable and what is not copyrightable, among other elements. A lack of simplicity restricts usage and as such restricts the open access flow of data. Thus any usage system must both be legally accurate while simultaneously very simple for scientists, reducing or eliminating the need to make the distinction between copyrightable and non-copyrightable elements.

The terms also need to satisfy the norms and expectations of the disciplines providing the database. This makes a single license approach difficult – archaeology data norms for citation will differ from those in physics, and yet again from those in biology, and yet again from those in the cultural or educational spaces. But those norms must be attached in a form that imposes the lowest possible costs on users (now and in the future).

The solution is at once obvious and radical:

4. Implementing the Science Commons Database Protocol for open access data
4.1 Converge on the public domain by waiving all rights based on intellectual property

The conflict between simplicity and legal certainty can be best resolved by a twofold measure: 1) a reconstruction of the public domain and 2) the use of scientific norms to express the wishes of the data provider.

Reconstructing the public domain can be achieved through the use of a legal tool (waiving the relevant rights on data and asserting that the provider makes no claims on the data).

Requesting behavior, such as citation, through norms and terms of use rather than as a legal requirement based on copyright or contracts, allows for different scientific disciplines to develop different norms for citation. This allows for legal certainty without constraining one community to the norms of another.

Thus, to facilitate data integration and open access data sharing, any implementation of this protocol MUST waive all rights necessary for data extraction and re-use (including copyright, sui generis database rights, claims of unfair competition, implied contracts, and other legal rights), and MUST NOT apply any obligations on the user of the data or database such as “copyleft” or “share alike”, or even the legal requirement to provide attribution. Any implementation SHOULD define a non-legally binding set of citation norms in clear, lay-readable language.

The solution is obvious because the public domain is the zero state of copyright (in fact, the new Creative Commons public domain licence is called simply CCZero.) It is radical because previous attempts have tried to build on the evident success of the GNU GPL by taking a kind of copyleft approach: using copyright to limit copyright. But the new protocol explicitly negates the use of both GPL's copyleft and the Creative Commons Sharealike licences because, minimal as they are, they are still too restrictive – even though they are both predicated on maximising sharing.

One knock-on consequence of this is that attribution requirements are out. This is not just a matter of belief or principle, but of practicality:

In a world of database integration and federation, attribution can easily cascade into a burden for scientists if a category error is made. Would a scientist need to attribute 40,000 data depositors in the event of a query across 40,000 data sets? How does this relate to the evolved norms of citation within a discipline, and does the attribution requirement indeed conflict with accepted norms in some disciplines? Indeed, failing to give attribution to all 40,000 sources could be the basis for a copyright infringement suit at worst, and at best, imposes a significant transaction cost on the scientist using the data.

It is this pragmatism, rooted in how science actually works, that makes the current protocol particularly important: it might actually be useful. It's also significant that it plugs in to previously existing work in related fields. For example, as the accompanying blog post explains:

We are also pleased to announce that the Open Knowledge Foundation has certified the Protocol as conforming to the Open Knowledge Definition. We think it’s important to avoid legal fragmentation at the early stages, and that one way to avoid that fragmentation is to work with the existing thought leaders like the OKF.

Moreover, the protocol has already been applied in drawing up another important text, the Open Data Commons Public Domain Dedication & Licence:

The Open Data Commons Public Domain Dedication & Licence is a document intended to allow you to freely share, modify, and use this work for any purpose and without any restrictions. This licence is intended for use on databases or their contents (”data”), either together or individually.

Many databases are covered by copyright. Some jurisdictions, mainly in Europe, have specific special rights that cover databases called the “sui generis” database right. Both of these sets of rights, as well as other legal rights used to protect databases and data, can create uncertainty or practical difficulty for those wishing to share databases and their underlying data but retain a limited amount of rights under a “some rights reserved” approach to licensing. As a result, this waiver and licence tries to the fullest extent possible to eliminate or fully license any rights that cover this database and data.

Again, however dry and legalistic this stuff may seem it's not: we're talking about the rigorous foundations of new kinds of sharing - and we all know how important and powerful that can be.

Update: John Wilbanks has pointed me to his post about the winnowing process that led to this protocol - fascinating stuff.

21 October 2007

Weekend Reading

Here are two online journals that may be of interest. Both, happily, are open access, so you can root around to your heart's content.

The first is the inaugural issue of the International Journal of the Commons. I have to declare a very tangential interest here in that they asked me to review a submitted paper: obviously my well-intentioned comments were devastating, since it's not included in the present issue...

The other journal is Innovations from MIT Press. This has an interesting mix of articles, including one by Cory Ondrejka on Second Life, and others on the Science Commons and Open-Sourcing Social Solutions.

18 January 2007

ScientificCommons.org

Access to all open access science? Ambitious, if nothing else, this:

The major aim of the project is to develop the world’s largest communication medium for scientific knowledge products which is freely accessible to the public. A key challenge of the project is to support the rapidly growing number of movements and archives who admit the free distribution and access to scientific knowledge. These are the valuable sources for the ScientificCommons.org project. The ScientificCommons.org project makes it possible to access the largely distributed sources with their vast amount of scientific publications via just one common interface. ScientificCommons.org identifies authors from all archives and makes their social and professional relationships transparent and visible to anyone across disciplinary, institutional and technological boundaries. Currently ScientificCommons.org has indexed about 10 million scientific publications and successfully extracted 4 million authors out of this data.

(Via eHub.)