01 May 2009

Why Pig Flu is Better than Bird Flu: Open Data

As I wrote two years ago, one of the most worrying aspects of bird flu (remember that?) was that virus sequences were not being shared well, which meant that it was hard for experts to track its development and come up with a vaccine. Well, in one respect, swine flu seems to be an improvement over the avian variety:

In contrast to H5N1 bird flu, all the genetic sequences of this H1N1 are being posted on bulletin boards like GISAID, where scientists can access them and compare preliminary analyses.

The GISAID system was set up in 2006 by scientists who protested that H5N1 sequences were not being made freely available.

Here's what the GISAID site says:

This platform is designed and maintained by scientists for scientists from various disciplines e.g. veterinary and human virology, bioinformatics, epidemiology, immunology and clinical analysis etc. From here on, you will find a series of services, including the EpiFlu Database (developed by the Swiss Institute of Bioinformatics in conjunction with other partners of this initiative) providing secure storage and the analysis of genetic, epidemiological and clinical data.

Researchers like you have come together to empower this publicly accessible platform, free-of-charge to all researchers in the world who agree to the same terms, to foster a better understanding of the influenza virus. Following the correspondence letter in Nature, we have all pledged to share the data, to analyze the findings jointly, and to publish the results collaboratively, on the basis of open sharing of data respecting the rights and interests of all involved parties.

One fascinating aspect of this is that to view the data you must agree to the data-sharing that lies at the heart of the site:

Before you can enter, you are required to register and agree to the Terms of Use of our platform, as GISAID implements a particular data-sharing concept that has facilitated the flow of influenza sequence data to the public.

This creates an information commons, just as free software does.

Maybe there's hope for us yet.


Eleftherios Kosmas said...

Actually, I think Open Data is the only way! But open data it's the first step, we need open formats, and in the long free software with scientific practice.

Glyn Moody said...

σωστά - couldn't agree more.

Burke Squires said...

Hi Glyn,
I could not agree more that open data is of paramount importance, even more so in light of a global pandemic.
I would like to hear how to compare a system like GISAID and their implementation of "open data" as you describe (with the inherent restrictions) to GenBank, where there are absolutely no restrictions? Is not GenBank the true host of "open data?"
Disclaimer: I work with another flu database, the BioHealthBase (www.biohealthbase.org).

Glyn Moody said...

@Burke: that's an interesting question.

Here's my understanding of the situation. To access the GISAID data you must agree to its terms, which include sharing your data (not quite sure of the details); to use GenBank, you don't have to agree to anything.

So GISAID is like the GNU GPL: it's do as you would be done by. GenBank is more or less public domain.

Now, you could argue that public domain is more open, because there are no restrictions; but equally, the GNU GPL has worked better than "weaker" licences like BSD precisely because it insists on reciprocity, which prevents people taking software closed.

So, you might argue that the open data approach of GISAID actually achieves more, in terms of the viral effect of its licence, than GenBank does.


Carla Del Rio said...

Glyn is spot on. There are absolutely no attempts to promote any form of scientific etiquette in databases such as GenBank, which are keen on stripping all the protection of data, while at the same time enabling those resourceful enough to attach their restrictions, thus hampering others from using the data that was actually unrestricted at the time it was deposited. That’s what placing data into the public domain (a legal definition) does. This dilemma has done little to promote data sharing especially for countries that are economically less fortunate.

GISAID has become so incredibly successful because its mechanism does indeed prevent this form of exploitation at the expense of the whole community. And to enforce this data-sharing concept all that’s effectively asked, is for users to identify themselves to the rest of the community.

So GISAID’s concept of equal access for all and true transparency does anything but translate into restrictions, as Ms. Squires is suggesting.

flugenome said...

hi burke
thank you for sharing your perspective. may be you misunderstood the wish of the gisaid community and reason why they added measures of transparency on their site. they are keen on establishing an equal playing-field by making sure all its users adhere to a rather basic code of ethics that asks what many of us practice anyway, i.e. the acknowledgment of the labs that provided the sample in the first place as well as the acknowledgement of those folks that go through the trouble of isolating and sequencing the virus before they uploaded to its database.

them asking their peers not to attach any form of restrictions is no different from what open-source rather successfully spells out for the internet. by contrast genbank & public domain databases leave data totally exposed to those that have the means to apply restrictions on the data. that's the problem that glyn pointed out.

on another note, i'd be curious to know why you created on your website http://www.iterasi.net/public/users/burkesquires an open gateway to the gisaid database, which practically allowed the bypassing of gisaid's registration/login procedure. cheers, hans

Glyn Moody said...

@hans: nicely put.

Unknown said...

@hans...Thank you so much for alerting me to the open link to GISAID. How stupid of me! Sorry about that! I have since removed that links.
@all, I appreciate all the points here and when I have a moment I look forward to continuing the discourse.