After I published Rebel Code in 2001, there was a natural instinct to think about writing another book (a natural masochistic instinct, I suppose, given the work involved.) I decided to write about bioinformatics – the use of computers to store, search through, and analyse the billions of DNA letters that started pouring out of the genomics projects of the 1990s, culminating in the sequencing of the human genome in 2001.
One reason I chose this area was the amazing congruence between the battle between free and closed-source software and the fight to place genomic data in the public domain, for all to use, rather than having it locked up in proprietary databases and enclosed by gene patents. As I like to say, Digital Code of Life is really the same story as Rebel Code, with just a few words changed.
Another reason for the similarity between the stories is the fact that genomes can be considered as a kind of program – the “digital code” of my title. As I wrote in the book:
In 1953, computers were so new that the idea of DNA as not just a huge digital store but a fully-fledged digital program of instructions was not immediately obvious. But this was one of the many profound implications of Watson and Crick's work. For if DNA was a digital store of genetic information that guided the construction of an entire organism from the fertilised egg, then it followed that it did indeed contain a preprogrammed sequence of events that created that organism – a program that ran in the fertilised cell, albeit one that might be affected by external signals. Moreover, since a copy of DNA existed within practically every cell in the body, this meant that the program was not only running in the original cell but in all cells, determining their unique characteristics.
That characterisation of the genome is something of a cliché these days, but back in 2003, when I wrote Digital Code of Life, it was less common. Of course, the interesting question is: to what extent is the genome *really* like an operating system? What are the similarities and differences? That's what a bunch of researchers wanted to find out by comparing the Linux kernel's control structure to that of the bacterium Escherichia coli:
The genome has often been called the operating system (OS) for a living organism. A computer OS is described by a regulatory control network termed the call graph, which is analogous to the transcriptional regulatory network in a cell. To apply our firsthand knowledge of the architecture of software systems to understand cellular design principles, we present a comparison between the transcriptional regulatory network of a well-studied bacterium (Escherichia coli) and the call graph of a canonical OS (Linux) in terms of topology and evolution.
We show that both networks have a fundamentally hierarchical layout, but there is a key difference: The transcriptional regulatory network possesses a few global regulators at the top and many targets at the bottom; conversely, the call graph has many regulators controlling a small set of generic functions. This top-heavy organization leads to highly overlapping functional modules in the call graph, in contrast to the relatively independent modules in the regulatory network.
We further develop a way to measure evolutionary rates comparably between the two networks and explain this difference in terms of network evolution. The process of biological evolution via random mutation and subsequent selection tightly constrains the evolution of regulatory network hubs. The call graph, however, exhibits rapid evolution of its highly connected generic components, made possible by designers' continual fine-tuning. These findings stem from the design principles of the two systems: robustness for biological systems and cost effectiveness (reuse) for software system.
The paper's well-worth reading, but if you find it heavy going (it's really designed for bioinformaticians and their ilk), there's an excellent, easy-to-read summary and analysis by Carl Zimmer in Discover magazine. Alternatively, you could just buy a copy of Digital Code of Life...
Follow me @glynmoody on Twitter or identi.ca.