Showing posts with label open science. Show all posts
Showing posts with label open science. Show all posts

01 June 2009

Why Scientific Software Wants To Be Free

Not sure if I missed this earlier, but it strikes me as a hugely important issue that deserves a wider audience whether or not it is brand new:

Astronomical software is now a fact of daily life for all hands-on members of our community. Purpose-built software for data reduction and modeling tasks becomes ever more critical as we handle larger amounts of data and simulations. However, the writing of astronomical software is unglamorous, the rewards are not always clear, and there are structural disincentives to releasing software publicly and to embedding it in the scientific literature, which can lead to significant duplication of effort and an incomplete scientific record.

We identify some of these structural disincentives and suggest a variety of approaches to address them, with the goals of raising the quality of astronomical software, improving the lot of scientist-authors, and providing benefits to the entire community, analogous to the benefits provided by open access to large survey and simulation datasets. Our aim is to open a conversation on how to move forward.

We advocate that: (1) the astronomical community consider software as an integral and fundable part of facility construction and science programs; (2) that software release be considered as integral to the open and reproducible scientific process as are publication and data release; (3) that we adopt technologies and repositories for releasing and collaboration on software that have worked for open-source software; (4) that we seek structural incentives to make the release of software and related publications easier for scientist-authors; (5) that we consider new ways of funding the development of grass-roots software; (6) and that we rethink our values to acknowledge that astronomical software development is not just a technical endeavor, but a fundamental part of our scientific practice.

Leaving aside the obvious and welcome element of calling for an open source approach (and, presumably, open source release if possible), there is deeper issue here: the fact that astronomy - and by extension, all science - is increasingly bound up with software, and that software is no longer an incidental factor in its practice.

A consequence of this is that as software moves ever-closer to the heart of the scientific process, so the need to release that code under free software licences increases. First, so that others can examine it for flaws and/or reproduce the results it produces. And secondly, so that other scientists can build on that code, just as they build on its results. In other words, it is becoming evident that open source is indispensable for *all* science, and not just the kind that proudly preclaims itself open.

31 May 2009

Open Government: the Latest Member of the Open Family

One of the most exciting developments in the last few years has been the application of some of the core ideas of free software and open source to completely different domains. Examples include open content, open access, open data and open science. More recently, those principles are starting to appear in a rather surprising field: that of government, as various transparency initiatives around the world start to gain traction....

On Linux Journal.

05 April 2009

Who Can Put the "Open" in Open Science?

One of the great pleasures of blogging is that your mediocre post tossed off in a couple of minutes can provoke a rather fine one that obviously took some time to craft. Here's a case in point.

The other day I wrote "Open Science Requires Open Source". This drew an interesting comment from Stevan Harnad, pretty much the Richard Stallman of open access, as well as some tweets from Cameron Neylon, one of the leading thinkers on and practitioners of open science. He also wrote a long and thoughtful reply to my post (including links to all our tweets, rigorous chap that he is). Most of it was devoted to pondering the extent to which scientists should be using open source:

It is easy to lose sight of the fact that for most researchers software is a means to an end. For the Open Researcher what is important is the ability to reproduce results, to criticize and to examine. Ideally this would include every step of the process, including the software. But for most issues you don’t need, or even want, to be replicating the work right down to the metal. You wouldn’t after all expect a researcher to be forced to run their software on an open source computer, with an open source chipset. You aren’t necessarily worried what operating system they are running. What you are worried about is whether it is possible read their data files and reproduce their analysis. If I take this just one step further, it doesn’t matter if the analysis is done in MatLab or Excel, as long as the files are readable in Open Office and the analysis is described in sufficient detail that it can be reproduced or re-implemented.

...

Open Data is crucial to Open Research. If we don’t have the data we have nothing to discuss. Open Process is crucial to Open Research. If we don’t understand how something has been produced, or we can’t reproduce it, then it is worthless. Open Source is not necessary, but, if it is done properly, it can come close to being sufficient to satisfy the other two requirements. However it can’t do that without Open Standards supporting it for documenting both file types and the software that uses them.

The point that came out of the conversation with Glyn Moody for me was that it may be more productive to focus on our ability to re-implement rather than to simply replicate. Re-implementability, while an awful word, is closer to what we mean by replication in the experimental world anyway. Open Source is probably the best way to do this in the long term, and in a perfect world the software and support would be there to make this possible, but until we get there, for many researchers, it is a better use of their time, and the taxpayer’s money that pays for that time, to do that line fitting in Excel. And the damage is minimal as long as source data and parameters for the fit are made public. If we push forward on all three fronts, Open Data, Open Process, and Open Source then I think we will get there eventually because it is a more effective way of doing research, but in the meantime, sometimes, in the bigger picture, I think a shortcut should be acceptable.

I think these are fair points. Science needs reproduceability in terms of the results, but that doesn't imply that the protocols must be copied exactly. As Neylon says, the key is "re-implementability" - the fact that you *can* reproduce the results with the given information. Using Excel instead of OpenOffice.org Calc is not a big problem provided enough details are provided.

However, it's easy to think of circumstances where *new* code is being written to run on proprietary engines where it is simply not possible to check the logic hidden in the black boxes. In these circumstances, it is critical that open source be used at all levels so that others can see what was done and how.

But another interesting point emerged from this anecdote from the same post:

Sometimes the problems are imposed from outside. I spent a good part of yesterday battling with an appalling, password protected, macroed-to-the-eyeballs Excel document that was the required format for me to fill in a form for an application. The file crashed Open Office and only barely functioned in Mac Excel at all. Yet it was required, in that format, before I could complete the application.

Now, this is a social issue: the fact that scientists are being forced by institutions to use proprietary software in order to apply for grants or whatever. Again, it might be unreasonable to expect young scientists to sacrifice their careers for the sake of principle (although Richard Stallman would disagree). But this is not a new situation. It's exactly the problem that open access faced in the early days, when scientists just starting out in their career were understandably reluctant to jeopardise it by publishing in new, untested journals with low impact factors.

The solution in that case was for established scientists to take the lead by moving their work across to open access journals, allowing the latter to gain in prestige until they reached the point where younger colleagues could take the plunge too.

So, I'd like to suggest something similar for the use of open source in science. When established scientists with some clout come across unreasonable requirements - like the need to use Excel - they should refuse. If enough of them put their foot down, the organisations that lazily adopt these practices will be forced to change. It might require a certain courage to begin with, but so did open access; and look where *that* is now...

Follow me on Twitter @glynmoody

02 April 2009

Open Science Requires Open Source

As Peter Suber rightly points out, this paper offers a reversal of the usual argument, where open access is justified by analogy with open source:


Astronomical software is now a fact of daily life for all hands-on members of our community. Purpose-built software for data reduction and modeling tasks becomes ever more critical as we handle larger amounts of data and simulations. However, the writing of astronomical software is unglamorous, the rewards are not always clear, and there are structural disincentives to releasing software publicly and to embedding it in the scientific literature, which can lead to significant duplication of effort and an incomplete scientific record. We identify some of these structural disincentives and suggest a variety of approaches to address them, with the goals of raising the quality of astronomical software, improving the lot of scientist-authors, and providing benefits to the entire community, analogous to the benefits provided by open access to large survey and simulation datasets. Our aim is to open a conversation on how to move forward.

The central argument is important: that you can't do science with closed source software, because you can't examine its assumptions or logic (that "incomplete scientific record"). Open science demands open source.

Follow me on Twitter @glynmoody

16 March 2009

Opening Minds about Closed Source

One of the most exciting experiences in blogging is when a post catches fire - metaphorically, of course. Often it happens when you least expect it, as is the case with my rant about Science Commons working with Microsoft, which was thrown off in a fit of pique, without any hope that anybody would pay much attention to it.

Fortunately, it *was* picked up by Bill Hooker, who somehow managed to agree and disagree with me in a long and thoughtful post. That formed a bridge for the idea into the scientific community, where Peter Murray-Rust begged to differ with its thesis.

Given all this healthy scepticism, I was delighted to find that Peter Sefton is not only on my side, but has strengthened my general point by fleshing it out with some details:

Looking at the example here and reading Pablo’s Blog I share Glyn Moody’s concern. They show a chunk of custom XML which gets embedded in a word document. This custom XML is an insidious trick in my opinion as it makes documents non-interoperable. As soon as you use custom XML via Word 2007 you are guaranteeing that information will be lost when you share documents with OpenOffice.org users and potentially users of earlier versions of Word.

He also makes some practical suggestions about how the open world can work with Microsoft:

In conclusion I offer this: I would consider getting our team working with Microsoft (actually I’m actively courting them as they are doing some good work in the eResearch space) but it would be on the basis that:

* The product (eg a document) of the code must be interoperable with open software. In our case this means Word must produce stuff that can be used in and round tripped with OpenOffice.org and with earlier versions, and Mac versions of Microsoft’s products. (This is not as simple as it could be when we have to deal with stuff like Sun refusing to implement import and preservation for data stored in Word fields as used by applications like EndNote.)

The NLM add-in is an odd one here, as on one level it does qualify in that it spits out XML, but the intent is to create Word-only authoring so that rules it out – not that we have been asked to work on that project other than to comment, I am merely using it as an example.

* The code must be open source and as portable as possible. Of course if it is interface code it will only work with Microsoft’s toll-access software but at least others can read the code and re-implement elsewhere. If it’s not interface code then it must be written in a portable language and/or framework.

Great stuff.

Update: Peter has written more on the subject.

14 March 2009

Why We Need Open Data

Despite the good-natured ding-dong he and I are currently engaged in on another matter, Peter Murray-Rust is without doubt one of the key individuals in the open world. He's pretty much the godfather of the term "open data", as he writes:

Open Data has come a long way in the last 2-3 years. In 2006 the term was rarely used - I badgered SPARC and they generously created a set up a mailing list. I also started a page on Wikipedia in 2006 so it’s 2-and-a-half years old.

The same post gives perhaps the best explanation of why open data is important; it's nominally about open data in science, but its points are valide elsewhere too:

* Science rests on data. Without complete data, science is flawed.

* Many of todays global challenges require scientific data. Climate, Health, Agriculture…

* Scientists are funded to do research and to make the results available to everyone. This includes the data. Funders expect this. So does the world.

* The means of dissemination of data are cheap and universal. There is no technical reason why all the data in all the chemistry research in the world should not be published into the cloud. It’s small compared with movies…

* Data needs cleaning, flitering, repurposing, re-using. The more people who have access to this, the better the data and the better the science.

Open data is still something of a Cinderella in the open world, but as Peter's comments make clear, that's likely to change as more people realise its centrality to the entire open endeavour.

22 September 2008

Peer-Reviewed Open Journal of Science Online

One of the most eloquent proponents of the idea of open science is Cameron Neylon. Here's an interesting post about bringing peer review to online material:

many of the seminal works for the Open Science community are not peer reviewed papers. Bill Hooker’s three parter [1, 2, 3] at Three Quarks Daily comes to mind, as does Jean-Claude’s presentation on Nature Precedings on Open Notebook Science, Michael Nielsen’s essay The Future of Science, and Shirley Wu’s Envisioning the scientific community as One Big Lab (along with many others). It seems to me that these ought to have the status of peer reviewed papers which raises the question. We are a community of peers, we can referee, we can adopt some sort of standard of signficance and decide to apply that selectively to specific works online. So why can’t we make them peer reviewed?

11 September 2008

The Real Reason to Celebrate GNU's Birthday

As you may have noticed, there's a bit of a virtual shindig going on in celebration of GNU's 25th birthday (including Stephen Fry's wonderfully British salute, which really, er, takes the cake....). Most of these encomiums have dutifully noted how all the free and open source software we take for granted today – GNU/Linux, Firefox, OpenOffice.org and the rest – would simply not exist had Richard Stallman not drawn his line in the digital sand. But I think all of these paeans rather miss the point....

On Open Enterprise blog.

10 June 2008

Recursive Publics: Hardly a Two-Bit Idea

For some years I have contemplated – and even planned out in some detail - a kind of follow-up to Rebel Code, which would look at the ways the ideas underlying free software have radiated out ever wider, to open content, open access, open courseware, open science – well, if you're reading this blog, you can fill in the rest. Happily, I couldn't find a publisher willing to take this on, so I was spared all the effort (non-authors have no idea what an outrageous amount of work books entail).

Now someone else has gone ahead, done all that work, and written pretty much that book, albeit with a more scholarly, anthropological twist than I could aspire to. Moreover, in true open source fashion, its author, Christopher Kelty, has made it freely available, not only to read, but to hack. The following paragraph expresses the core idea of this (and my) book:

The significance of Free Software extends far beyond the arcane and detailed technical practices of software programmers and “geeks” (as I refer to them herein). Since about 1998, the practices and ideas of Free Software have extended into new realms of life and creativity: from software to music and film to science, engineering, and education; from national politics of intellectual property to global debates about civil society; from UNIX to Mac OS X and Windows; from medical records and databases to international disease monitoring and synthetic biology; from Open Source to open access. Free Software is no longer only about software—it exemplifies a more general reorientation of power and knowledge.

I've only speed-read it – it's a dense and rich book – but from what I've seen, I can heartily recommend it to anyone who finds some of the ideas on this blog vaguely amusing: it's the work of a kindred spirit. The only thing I wasn't so keen on was its title: “Two Bits” Now, call me parochial, but the only connotation of “two bits” for me is inferiority, as in a two-bit solution. A far better title, IMHO, would have been one of the cleverest concepts in the book: that of “recursive publics”:

Recursive publics are publics concerned with the ability to build, control, modify, and maintain the infrastructure that allows them to come into being in the first place and which, in turn, constitutes their everyday practical commitments and the identities of the participants as creative and autonomous individuals. In the cases explored herein, that specific infrastructure includes the creation of the Internet itself, as well as its associated tools and structures, such as Usenet, e-mail,the World Wide Web (www), UNIX and UNIX-derived operating systems, protocols, standards, and standards processes. For the last thirty years, the Internet has been the subject of a contest in which Free Software has been both a central combatant and an important architect.

By calling Free Software a recursive public, I am doing two things: first, I am drawing attention to the democratic and political significance of Free Software and the Internet; and second, I am suggesting that our current understanding (both academic and colloquial) of what counts as a self-governing public, or even as “the public,” is radically inadequate to understanding the contemporary reori entation of knowledge and power.

The arch-recursionist himself, RMS, would love that.

03 May 2008

OOXML? For Pete's Sake, No

Peter Murray-Rust is one of the key figures in the world of open data and open science, and deserves a lot of the credit for making these issues more visible. Here's an interesting post in which he points out that PDF files are not ideal from an archiving viewpoint:


I should make it clear that I am not religiously opposed to PDF, just to the present incarnation of PDF and the mindset that it engenders in publishers, repositarians, and readers. (Authors generally do not use PDF).

He then discusses in detail what the problems are and what solutions might be. Then he drops this clanger:

I’m not asking for XML. I’m asking for either XHTML or Word (or OOXML)

Word? OOXML??? Come on, Peter, you want open formats and you're willing to accept one of the most botched "standards" around, knocked up for purely political reasons, that includes gobs of proprietary elements and is probably impossible for anyone other than Microsoft to implement? *That's* open? I don't think so....

XHTML by all means, and if you want a document format the clear choice is ODF - a tight and widely-implemented standard. Anything but OOXML.

03 December 2007

A Question of Open Chemistry

I've written about open science and open notebook science before, but here's an excellent round-up of open chemistry:

The next generation of professional chemists are far more likely to be in tune with web-based chemistry, treating blogs and social networking sites as professional tools in the same manner as email. For Open Chemistry advocates, the inevitable passage of time may be enough to usher in their revolution.

(Via Open Access News.)

05 September 2007

Microsoft Loves Openness

Well, some openness:

BioMed Central, the world’s largest publisher of peer-reviewed, open access research journals, is pleased to announce that Microsoft Research has agreed to be the premium sponsor of the BioMed Central Research Awards for 2007. The BioMed Central Research Awards, which began accepting nominations in late July, recognize excellence in research that has been made universally accessible by open access publication in one of the publisher’s 180 journals.

"Microsoft’s External Research group is proud to be a sponsor of the BioMed Central Research Awards and feel it is important to recognize excellence in research," said Lee Dirks, director, scholarly communications, Microsoft Research. "We are very supportive of the open science movement and recognize that open access publication is an important component of overall scholarly communications."

It may only be promoting open science and open access at the moment, but I predict Microsoft will one day love open source just as much. (Via Open Access News.)

03 September 2007

Open Science Means Open Source

The need for open access and open data in science seems obvious enough - even enough some persist in denying it. But as science becomes increasingly digital, with ever-greater dependence on computers and software, there is another aspect, as Nature Methods has recognised (but some months back - I've only just caught this):

The minimum level of disclosure that Nature Methods requires depends on how central the software is to the paper. If a software program is the focus of the report, we expect the programming code to be made available. Without the code, the software—and thus the paper—would become a black box of little use to the scientific community. In many papers, however, the software is only an ancillary part of the method, and the focus is on the methodological approach or an insight gained from it.

In these cases, releasing the code may not be a requirement for publication, but such custom-developed software will often be as important for the replication of the procedure as plasmids or mutant cell lines. We therefore insist that software or algorithms be made available to readers in a usable form. The guiding principle is that enough information must be provided so that users can reproduce the procedure and use the method in their own research at reasonable cost—both monetary and in terms of labor.

However, the editorial rightly points out that releasing the code as open source has huge advantages:

Some authors who favor the highest degree of transparency and sharing for their software elect to develop their programs in an open-source environment. By doing so, the authors not only provide accessibility and transparency, they also allow the community to build upon their own developments and make continuous improvements to the tool. Open-source software has become extremely popular in various fields. In microscopy, for example, image analysis software tends to be modular, and users benefit from the flexibility of being able to replace some modules with others in an open-source framework. Despite the tremendous added value of open source, other authors prefer to release a compiled version of their program, so as to protect commercial interests tied to sophisticated custom-designed software. This option is not optimal because it turns the program into a black box, but it may be acceptable if the operations performed by the software are sufficiently clear.

Although it is probably appropriate that Nature Methods, given its focus, should be the first to articulate this issue, it is important to appreciate that its logic applies to all scientific publishing where computers are involved. Without open source, there can be no open science - the only kind that is worthy of the name. (Belatedly via Flags and Lollipops.)

15 August 2007

They Gave Me of the Fruit...

...and I did eat:

But now an international scientific counterculture is emerging. Often referred to as “open science,” this growing movement proposes that we err on the side of collaboration and sharing. That’s especially true when it comes to creating and using the basic scientific tools needed both for downstream innovation and for solving broader human problems.

Open science proposes changing the culture without destroying the creative tension between the two ends of the science-for-innovation rope. And it predicts that the payoff – to human knowledge and to the economies of knowledge-intensive countries like Canada – will be much greater than any loss, by leveraging knowledge to everyone’s benefit.

"Sharing the fruits of science", it's called. Nothing new, but interesting for the outsider's viewpoint. (Via Open Access News.)

16 July 2007

This is GOOD SCIENCE!

"Open notebook science" is a great term devised by Jean-Claude Bradley - great, because it makes explicit exactly where you can find, read and hack the source code that underlies open science. One of the best observers of that world, Bill Hooker, has an interesting comment on a fellow researcher's adoption of the open notebook approach:


It's also, to be honest, just plain fun to snoop around in someone else's lab notes! I was amused to note that Jeremiah talks to and about himself in his notebook, the same way I do -- "if I weren't so stupid I'd...", "next time load the control first, doofus", etc. I wonder if everyone does that?

Now, where have I heard this sort of thing before?

This is GOOD CODE!

Yeah, yeah, it's ugly, but I cannot find how to do this correctly, and this seems to work...Most of this was trial and error...Urghh

The programming comments of a very young Linus Torvalds as he hacked version 0.01 of a little program called Linux during the summer of 2001. Coincidence? I don't think so....

25 June 2007

Of Open Knowledge and Closed Minds

Extraordinary:

US university students will not be able to work late at the campus, travel abroad, show interest in their colleagues' work, have friends outside the United States, engage in independent research, or make extra money without the prior consent of the authorities, according to a set of guidelines given to administrators by the FBI.

Better shut down that pesky Internet thingy while you're at it - who knows what knowledge may be seeping out through it? (Via The Inquirer.)

29 May 2007

The Wisdom of Metrics

I like reading Nicholas Carr's stuff because it is often provocative and generally thought-provoking. A good example is his recent "Ignorance of Crowds" which asserts:

Wikipedia’s problems seem to stem from the fact that the encyclopedia lacks the kind of strong central authority that exerts quality control over the work of the Linux crowd. The contributions of Wikipedia’s volunteers go directly into the product without passing through any editorial filter. The process is more democratic, but the quality of the product suffers.

I think this misses a key point about the difference between open source and open content that has nothing to do with authority. Software has clear metrics for success: the code runs faster, requires less memory, or is less CPU-intensive, etc. There is no such metric for content, where it essentially comes down to matters of opinion much of the time. Without a metric, monotonic improvement is impossible to achieve: the best you can hope for is a series of jumps that may or may not make things "better" - whatever that means in this context.

This is an important issue for many domains where the open source "method" is being applied: the better the metric available, the more sustained and unequivocal the progress will be. For example, the prospects for open science, powered by open access + open data, look good, since a general metric is available in the form of fit of theory to experiment.

23 May 2007

Blooming Science Blogs

It is rather ironic that science, which is a paradigmatic example of openness in action, should be a relative laggard when it comes to getting formally behind open science. So it's good to see a couple of new blogs on the subject, as noted by Bill Hooker.

Better blooming late than never.

04 March 2007

Open Science Enters the Mainstream

Interesting not so much for what it says, but that it's Business Week that's saying it:


Just as the Enlightenment ushered in a new organizational model of knowledge creation, the same technological and demographic forces that are turning the Web into a massive collaborative work space are helping to transform the realm of science into an increasingly open and collaborative endeavor. Yes, the Web was, in fact, invented as a way for scientists to share information. But advances in storage, bandwidth, software, and computing power are pushing collaboration to the next level. Call it Science 2.0.

21 December 2006

Scanning the Big Delta

"Delta Scan" sounds like one of those appalling airport potboilers involving mad scientists, terrorists and implausibly durable secret agents, but it's actually something much more exciting: an attempt to peek into the future of our science and technology. A hopeless task, clearly, but worth attempting if only as a five-neuron exercise.

The results are remarkably rich; considerable credit must go to the UK's Office of Science and Innovation for commissioning the report and - particularly - making it freely available. I was glad to see that there are plenty of links in the documents, which are short and to the point. Great for, er, scanning.