Showing posts with label petabytes. Show all posts
Showing posts with label petabytes. Show all posts

12 October 2009

Windows Does Not Scale

Who's afraid of the data deluge?


Researchers and workers in fields as diverse as bio-technology, astronomy and computer science will soon find themselves overwhelmed with information. Better telescopes and genome sequencers are as much to blame for this data glut as are faster computers and bigger hard drives.

While consumers are just starting to comprehend the idea of buying external hard drives for the home capable of storing a terabyte of data, computer scientists need to grapple with data sets thousands of times as large and growing ever larger. (A single terabyte equals 1,000 gigabytes and could store about 1,000 copies of the Encyclopedia Britannica.)

The next generation of computer scientists has to think in terms of what could be described as Internet scale. Facebook, for example, uses more than 1 petabyte of storage space to manage its users’ 40 billion photos. (A petabyte is about 1,000 times as large as a terabyte, and could store about 500 billion pages of text.)

Certainly not GNU/Linux: the latest Top500 supercomputer rankings show that the GNU/Linux family has 88.60%. Windows? Glad you asked: 1%.

So, forget about whether there will ever be a Year of the GNU/Linux Desktop: the future is about massive data-crunchers, and there GNU/Linux already reigns supreme, and has done for years. It's Windows that's got problems....

Follow me @glynmoody on Twitter or identi.ca.

08 January 2007

Google Reaches for the Stars

One of the most important shifts in science at the moment is towards dealing with the digital deluge. Whether in the field of genomics, particle physics or astronomy, science is starting to produce data in not just gigabytes, or even terabytes, but petabytes, exabytes and beyond (zettabytes, yottabytes, etc.).

Take the Large Synoptic Survey Telescope, for starters:

The Large Synoptic Survey Telescope (LSST) is a proposed ground-based 8.4-meter, 10 square-degree-field telescope that will provide digital imaging of faint astronomical objects across the entire sky, night after night. In a relentless campaign of 15 second exposures, LSST will cover the available sky every three nights, opening a movie-like window on objects that change or move on rapid timescales: exploding supernovae, potentially hazardous near-Earth asteroids, and distant Kuiper Belt Objects. The superb images from the LSST will also be used to trace billions of remote galaxies and measure the distortions in their shapes produced by lumps of Dark Matter, providing multiple tests of the mysterious Dark Energy.

How much data?

Over 30 thousand gigabytes (30TB) of images will be generated every night during the decade-long LSST sky survey.

Or for those of you without calculators, that's 10x365x30x1,000,000,000,000 bytes, roughly 100 petabytes. And where there's data, there's also information; and where there's information...there's Google:

Google has joined a group of nineteen universities and national labs that are building the Large Synoptic Survey Telescope (LSST).

...

"Partnering with Google will significantly enhance our ability to convert LSST data to knowledge," said University of California, Davis, Professor and LSST Director J. Anthony Tyson. "LSST will change the way we observe the universe by mapping the visible sky deeply, rapidly, and continuously. It will open entirely new windows on our universe, yielding discoveries in a variety of areas of astronomy and fundamental physics. Innovations in data management will play a central role."

(Via C|net.)

06 December 2006

Wayback: 85,898,456,616 and Counting

The Wayback Machine is one of the Internet's best-kept secrets:

A snapshot of the World Wide Web is taken every 2 months and donated to the Internet Archive by Alexa Internet. Further, librarians all over the world have helped curate deep and frequent crawls of sites that could be especially important to future researchers historians and scholars.

As web pages are changed or deleted every 100 days, on average, having a resource like this is important for the preservation of our emerging cultural heritage.

And even for someone like me, who uses it all the time, numbers like this still take the breath away:

The Internet Archive's Wayback Machine now has 85,898,456,616 archived web objects in it

plus

The database contains over 1.5 petabytes of data that came from the web (that is 1.5 million gigabytes) which makes it one of the largest databases of any kind.

And a cyber-pearl beyond price. (Via Open Access News.)

15 November 2006

Crumbs from Google's Bigtable

For a company that is so big and important, Google is remarkably opaque to the outside world (blogs? - we don't need no stinkin' blogs.) Any info-morsels that drop from the Big Table are always welcome - which makes this downright gobbet of stuff about Bigtable particularly, er, meaty:

Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance.

Get some while it's hot (and hasn't been taken down by the Google Thought Police.)