Showing posts with label genomics. Show all posts
Showing posts with label genomics. Show all posts

25 July 2014

Open Source Genomics

There's a revolution underway. It's digital, but not in the computing sector. I'm referring to the world of genomics, which deals with the data that resides inside all living things: DNA. As most people know, DNA uses four chemical compounds - adenine, cytosine, guanine and thymine - to encode various structures, most notably proteins, which are represented by stretches of DNA called genes. 

On Open Enterprise blog.

23 June 2012

Your Genome, Your Data

The computing revolution is not the only one driven by constant scaling of technologies: the field of genomics -- the study of DNA sequences -- has also enjoyed rapid falls in basic costs over the last decade and a half. This means that whereas the first human genome cost around $3 billion to sequence, we are fast approaching the point where it will cost first a few thousand, and then a few hundred dollars to sequence anyone's complete DNA. An interesting post on the Health Affairs Blog points out that neither the law nor society is ready for this

On Techdirt.

25 January 2012

Adding Your DNA To A Biobank Is A Noble Move -- But Is It A Wise One?

One new approach to teasing apart the complex relationships between genes and common diseases such as cancer, heart disease, asthma and diabetes is by creating huge biobanks of medical data and samples. The idea is that by tracking the health and habits of very large populations across many years, and then examining their DNA, it will be possible to spot factors in common. Here's a major biobank that is shortly opening up its holdings for research

On Techdirt.

05 May 2010

The GNU/Linux Code of Life

After I published Rebel Code in 2001, there was a natural instinct to think about writing another book (a natural masochistic instinct, I suppose, given the work involved.) I decided to write about bioinformatics – the use of computers to store, search through, and analyse the billions of DNA letters that started pouring out of the genomics projects of the 1990s, culminating in the sequencing of the human genome in 2001.

One reason I chose this area was the amazing congruence between the battle between free and closed-source software and the fight to place genomic data in the public domain, for all to use, rather than having it locked up in proprietary databases and enclosed by gene patents. As I like to say, Digital Code of Life is really the same story as Rebel Code, with just a few words changed.

Another reason for the similarity between the stories is the fact that genomes can be considered as a kind of program – the “digital code” of my title. As I wrote in the book:

In 1953, computers were so new that the idea of DNA as not just a huge digital store but a fully-fledged digital program of instructions was not immediately obvious. But this was one of the many profound implications of Watson and Crick's work. For if DNA was a digital store of genetic information that guided the construction of an entire organism from the fertilised egg, then it followed that it did indeed contain a preprogrammed sequence of events that created that organism – a program that ran in the fertilised cell, albeit one that might be affected by external signals. Moreover, since a copy of DNA existed within practically every cell in the body, this meant that the program was not only running in the original cell but in all cells, determining their unique characteristics.

That characterisation of the genome is something of a cliché these days, but back in 2003, when I wrote Digital Code of Life, it was less common. Of course, the interesting question is: to what extent is the genome *really* like an operating system? What are the similarities and differences? That's what a bunch of researchers wanted to find out by comparing the Linux kernel's control structure to that of the bacterium Escherichia coli:

The genome has often been called the operating system (OS) for a living organism. A computer OS is described by a regulatory control network termed the call graph, which is analogous to the transcriptional regulatory network in a cell. To apply our firsthand knowledge of the architecture of software systems to understand cellular design principles, we present a comparison between the transcriptional regulatory network of a well-studied bacterium (Escherichia coli) and the call graph of a canonical OS (Linux) in terms of topology and evolution.

We show that both networks have a fundamentally hierarchical layout, but there is a key difference: The transcriptional regulatory network possesses a few global regulators at the top and many targets at the bottom; conversely, the call graph has many regulators controlling a small set of generic functions. This top-heavy organization leads to highly overlapping functional modules in the call graph, in contrast to the relatively independent modules in the regulatory network.

We further develop a way to measure evolutionary rates comparably between the two networks and explain this difference in terms of network evolution. The process of biological evolution via random mutation and subsequent selection tightly constrains the evolution of regulatory network hubs. The call graph, however, exhibits rapid evolution of its highly connected generic components, made possible by designers' continual fine-tuning. These findings stem from the design principles of the two systems: robustness for biological systems and cost effectiveness (reuse) for software system.

The paper's well-worth reading, but if you find it heavy going (it's really designed for bioinformaticians and their ilk), there's an excellent, easy-to-read summary and analysis by Carl Zimmer in Discover magazine. Alternatively, you could just buy a copy of Digital Code of Life...

Follow me @glynmoody on Twitter or identi.ca.

02 May 2009

Swine Flu in the Nude

This is what the virus really looks like:

1 atgaaggcaa tactagtagt tctgctatat acatttgcaa ccgcaaatgc agacacatta
61 tgtataggtt atcatgcgaa caattcaaca gacactgtag acacagtact agaaaagaat
121 gtaacagtaa cacactctgt taaccttcta gaagacaagc ataacgggaa actatgcaaa
181 ctaagagggg tagccccatt gcatttgggt aaatgtaaca ttgctggctg gatcctggga
241 aatccagagt gtgaatcact ctccacagca agctcatggt cctacattgt ggaaacatct
301 agttcagaca atggaacgtg ttacccagga gatttcatcg attatgagga gctaagagag
361 caattgagct cagtgtcatc atttgaaagg tttgagatat tccccaagac aagttcatgg
421 cccaatcatg actcgaacaa aggtgtaacg gcagcatgtc ctcatgctgg agcaaaaagc
481 ttctacaaaa atttaatatg gctagttaaa aaaggaaatt catacccaaa gctcagcaaa
541 tcctacatta atgataaagg gaaagaagtc ctcgtgctat ggggcattca ccatccatct
601 actagtgctg accaacaaag tctctatcag aatgcagatg catatgtttt tgtggggtca
661 tcaagataca gcaagaagtt caagccggaa atagcaataa gacccaaagt gagggatcaa
721 gaagggagaa tgaactatta ctggacacta gtagagccgg gagacaaaat aacattcgaa
781 gcaactggaa atctagtggt accgagatat gcattcgcaa tggaaagaaa tgctggatct
841 ggtattatca tttcagatac accagtccac gattgcaata caacttgtca gacacccaag
901 ggtgctataa acaccagcct cccatttcag aatatacatc cgatcacaat tggaaaatgt
961 ccaaaatatg taaaaagcac aaaattgaga ctggccacag gattgaggaa tgtcccgtct
1021 attcaatcta gaggcctatt tggggccatt gccggtttca ttgaaggggg gtggacaggg
1081 atggtagatg gatggtacgg ttatcaccat caaaatgagc aggggtcagg atatgcagcc
1141 gacctgaaga gcacacagaa tgccattgac gaaattacta acaaagtaaa ttctgttatt
1201 gaaaagatga atacacagtt cacagcagta ggtaaagagt tcaaccacct ggaaaaaaga
1261 atagagaatt taaataaaaa agttgatgat ggtttcctgg acatttggac ttacaatgcc
1321 gaactgttgg ttctattgga aaatgaaaga actttggact accacgattc aaatgtgaag
1381 aacttatatg aaaaggtaag aagccagcta aaaaacaatg ccaaggaaat tggaaacggc
1441 tgctttgaat tttaccacaa atgcgataac acgtgcatgg aaagtgtcaa aaatgggact
1501 tatgactacc caaaatactc agaggaagca aaattaaaca gagaagaaat agatggggta
1561 aaactggaat caacaaggat ttaccagatt ttggcgatct attcaactgt cgccagttca
1621 ttggtactgg tagtctccct gggggcaatc agtttctgga tgtgctctaa tgggtctcta
1681 cagtgtagaa tatgtattta a

Amazing what a few As, Cs, Gs and Ts can do.... (Via Common Knowledge.)

06 October 2008

Googling the Genome

We're getting close:

The cost of determining a person’s complete genetic blueprint is about to plummet again — to $5,000.

That is the price that a start-up company called Complete Genomics says it will start charging next year for determining the sequence of the genetic code that makes up the DNA in one set of human chromosomes. The company is set to announce its plans on Monday.

Such a price would represent another step toward the long-sought goal of the “$1,000 genome.” At that price point it might become commonplace for people to obtain their entire DNA sequences, giving them information on what diseases they might be predisposed to or what drugs would work best for them.

25 January 2008

Genomics Goes Read-Write

One of Larry Lessig's favourite tropes is that we live in a read-write world these days, where creation is just as important as consumption. Well, hitherto, genomics has been pretty much read only: you could sequence the DNA of an organism, but creating entire genomes of complex organisms (such as bacteria) has been too tricky. Now that nice Dr Venter says he's gone and done it:

A team of 17 researchers at the J. Craig Venter Institute (JCVI) has created the largest man-made DNA structure by synthesizing and assembling the 582,970 base pair genome of a bacterium, Mycoplasma genitalium JCVI-1.0. This work, published online today in the journal Science by Dan Gibson, Ph.D., et al, is the second of three key steps toward the team’s goal of creating a fully synthetic organism. In the next step, which is ongoing at the JCVI, the team will attempt to create a living bacterial cell based entirely on the synthetically made genome.

The team achieved this technical feat by chemically making DNA fragments in the lab and developing new methods for the assembly and reproduction of the DNA segments. After several years of work perfecting chemical assembly, the team found they could use homologous recombination (a process that cells use to repair damage to their chromosomes) in the yeast Saccharomyces cerevisiae to rapidly build the entire bacterial chromosome from large subassemblies.


He even gives some details (don't try this at home):

The process to synthesize and assemble the synthetic version of the M. genitalium chromosome began first by resequencing the native M. genitalium genome to ensure that the team was starting with an error free sequence. After obtaining this correct version of the native genome, the team specially designed fragments of chemically synthesized DNA to build 101 “cassettes” of 5,000 to 7,000 base pairs of genetic code. As a measure to differentiate the synthetic genome versus the native genome, the team created “watermarks” in the synthetic genome. These are short inserted or substituted sequences that encode information not typically found in nature. Other changes the team made to the synthetic genome included disrupting a gene to block infectivity. To obtain the cassettes the JCVI team worked primarily with the DNA synthesis company Blue Heron Technology, as well as DNA 2.0 and GENEART.

From here, the team devised a five stage assembly process where the cassettes were joined together in subassemblies to make larger and larger pieces that would eventually be combined to build the whole synthetic M. genitalium genome. In the first step, sets of four cassettes were joined to create 25 subassemblies, each about 24,000 base pairs (24kb). These 24kb fragments were cloned into the bacterium Escherichia coli to produce sufficient DNA for the next steps, and for DNA sequence validation.

The next step involved combining three 24kb fragments together to create 8 assembled blocks, each about 72,000 base pairs. These 1/8th fragments of the whole genome were again cloned into E. coli for DNA production and DNA sequencing. Step three involved combining two 1/8th fragments together to produce large fragments approximately 144,000 base pairs or 1/4th of the whole genome.

At this stage the team could not obtain half genome clones in E. coli, so the team experimented with yeast and found that it tolerated the large foreign DNA molecules well, and that they were able to assemble the fragments together by homologous recombination. This process was used to assemble the last cassettes, from 1/4 genome fragments to the final genome of more than 580,000 base pairs. The final chromosome was again sequenced in order to validate the complete accurate chemical structure.

But the real kicker was this comment:

“This is an exciting advance for our team and the field. However, we continue to work toward the ultimate goal of inserting the synthetic chromosome into a cell and booting it up to create the first synthetic organism,” said Dan Gibson, lead author.

Yup, you read that correctly: we're talking about porting and then *booting-up* an artificial genome, aka digital code of life.

10 June 2007

The Bad Boy of Genomics Strikes Again

When I was writing Digital Code of Life, I sought to be scrupulously fair to Craig Venter, who was often demonised for his commercial approach to science. Ind fact, it seemed to me he had often gone out of his way to make the results of his work available.

So it's with some sadness that I note that the "Bad Boy of Genomics" epithet seems justified in this more recent case:


A research institute has applied for a pat­ent on what could be the first largely ar­ti­fi­cial or­gan­ism. And peo­ple should be al­armed, claims an ad­vo­ca­cy group that is try­ing to shoot down the bid.

...

The ar­ti­fi­cial or­gan­ism, a mere mi­crobe, is the brain­child of re­search­ers at the Rock­ville, Md.-based J. Craig Ven­ter In­sti­tute. The or­gan­iz­a­tion is named for its found­er and CEO, the ge­net­icist who led the pri­vate sec­tor race to map the hu­man ge­nome in the late 1990s.

The re­search­ers filed their pat­ent claim on the ar­ti­fi­cial or­gan­ism and on its ge­nome. Ge­net­i­cally mo­di­fied life forms have been pa­tented be­fore; but this is the first pa­tent claim for a crea­ture whose genome might be created chem­i­cally from scratch, Mooney said.

This is problematic on a number of levels. For a start, it shouldn't be possible to patent DNA, since it is not an invention. Simply combining existing sequences is not an invention either. There is also the worry that what is being created here is the first genomic operating system: locking others out with patents maans repeating all the mistakes that have been made in some jurisdictions by allowing the patenting of conventional software.

29 January 2007

'Omics - Oh My!

One of the fun aspects of writing my book Digital Code of Life was grappling with all the 'omics: not just genomics, but proteomics and metabolomics too. Here's what I wrote about the latter:

"Metabolome" is the name given to all the molecules - not just the proteins - involved in metabolic processes within a given cell.

And here's the big news:

Scientists in Alberta say they are the first team to finish a draft of the chemical equivalent of the human genome, paving the way for faster, cheaper diagnoses of disease.

The researchers on Wednesday said the Human Metabolome Project, led by the University of Alberta, has listed and described some 2,500 chemicals found in or made by the body (three times as many as expected), and double that number of substances stemming from drugs and food. The chemicals, known as metabolites, represent the ingredients of life just as the human genome represents the blueprint of life.

This does seem to differ from my definition, but hey, my shoulders are broad.
(Via Slashdot.)

08 January 2007

Google Reaches for the Stars

One of the most important shifts in science at the moment is towards dealing with the digital deluge. Whether in the field of genomics, particle physics or astronomy, science is starting to produce data in not just gigabytes, or even terabytes, but petabytes, exabytes and beyond (zettabytes, yottabytes, etc.).

Take the Large Synoptic Survey Telescope, for starters:

The Large Synoptic Survey Telescope (LSST) is a proposed ground-based 8.4-meter, 10 square-degree-field telescope that will provide digital imaging of faint astronomical objects across the entire sky, night after night. In a relentless campaign of 15 second exposures, LSST will cover the available sky every three nights, opening a movie-like window on objects that change or move on rapid timescales: exploding supernovae, potentially hazardous near-Earth asteroids, and distant Kuiper Belt Objects. The superb images from the LSST will also be used to trace billions of remote galaxies and measure the distortions in their shapes produced by lumps of Dark Matter, providing multiple tests of the mysterious Dark Energy.

How much data?

Over 30 thousand gigabytes (30TB) of images will be generated every night during the decade-long LSST sky survey.

Or for those of you without calculators, that's 10x365x30x1,000,000,000,000 bytes, roughly 100 petabytes. And where there's data, there's also information; and where there's information...there's Google:

Google has joined a group of nineteen universities and national labs that are building the Large Synoptic Survey Telescope (LSST).

...

"Partnering with Google will significantly enhance our ability to convert LSST data to knowledge," said University of California, Davis, Professor and LSST Director J. Anthony Tyson. "LSST will change the way we observe the universe by mapping the visible sky deeply, rapidly, and continuously. It will open entirely new windows on our universe, yielding discoveries in a variety of areas of astronomy and fundamental physics. Innovations in data management will play a central role."

(Via C|net.)

30 August 2006

The UK Biobank Time-bomb

It sounds so exciting, so good:

UK Biobank is a long-term project aimed at building a comprehensive resource for medical researchers. The full project will get underway in 2006, when it will begin to gather information on the health and lifestyle of 500,000 volunteers aged between 40 and 69.

Following consent, each participant will be asked to donate a blood and urine sample, have some standard measurements (such as blood pressure) and complete a confidential lifestyle questionnaire. Over the next 20 to 30 years UK Biobank will allow fully approved researchers to use these resources to study the progression of illnesses such as cancer, heart disease, diabetes and Alzheimer’s disease. From this they hope to develop new and better ways of preventing, diagnosing and treating such problems.

Data and samples will only be used for ethically and scientifically approved research. Issues such as consent, confidentiality, and security of the data are guided by an Ethics and Governance Framework overseen by an independent council chaired by Professor Alastair V. Campbell of Bristol University.

But read the access policy, and you find this:

Access will not be permitted for police or forensic use except where required by court order. It is likely that UK Biobank will take steps to resist access for police or forensic use, in particular by seeking to be represented in all court applications for access in order to defend participants’ trust and public confidence in UK Biobank.

Since court orders can always be taken for granted given the right legislative framework, and since the current UK Government already has such a poor track record for invasive laws that create such frameworks, what this means in practice is that anyone taking part in this otherwise laudable scheme is creating a biological time-bomb.

Inside the main UK Biobank database will be their DNA, just waiting for somebody, someday - perhaps long after their death - to obtain that court order. Then, practically everything genomic about them will be revealed: genetic propensities, biological relationships, you name it. And, of course, it will provide the authorities with a reliable way of tracking them and, to a lesser extent all their children, for ever.

I am sure that the UK Biobank will fight this kind of use; and I am equally sure that they will lose. Which is why my DNA will only form part of such a database over my dead body. Probably literally.

18 July 2006

The Mega-Important MicroRNAs

Yesterday, when I was writing about the structures found in DNA, I said

Between the genes lie stretches of the main program that calls the subroutines

This is, of course, a gross over-simplification. One of the most interesting discoveries of recent years is that between your common or garden genes there are other structures that do not code for proteins, but for strings of RNA. It turns out that the latter play crucial roles in many biological processes, for example development. Indeed, they are fast emerging as one of genomics' superstars.

So it is only right that Nature Genetics should devote an entire issue to the subject; even better, it's freely available until August 2006. So get downloading now. Admittedly, microRNAs aren't the lightest of subject-matters, but they're mega-important.