Showing posts with label microsoft research. Show all posts
Showing posts with label microsoft research. Show all posts

14 July 2009

I Fear Microsoft Geeks Bearing Gifts...

Look, those nice people at Microsoft Research are saving science from its data deluge:

Addressing an audience of prominent academic researchers today at the 10th annual Microsoft Research Faculty Summit, Microsoft External Research Corporate Vice President Tony Hey announced that Microsoft Corp. has developed new software tools with the potential to transform the way much scientific research is done. Project Trident: A Scientific Workflow Workbench allows scientists to easily work with large volumes of data, and the specialized new programs Dryad and DryadLINQ facilitate the use of high-performance computing.

Created as part of the company’s ongoing efforts to advance the state of the art in science and help address world-scale challenges, the new tools are designed to make it easier for scientists to ingest and make sense of data, get answers to questions at a rate not previously possible, and ultimately accelerate the pace of achieving critical breakthrough discoveries. Scientists in data-intensive fields such as oceanography, astronomy, environmental science and medical research can now use these tools to manage, integrate and visualize volumes of information. The tools are available as no-cost downloads to academic researchers and scientists at http://research.microsoft.com/en-us/collaboration/tools.

Aw, shucks, isn't that just *so* kind? Doing all this out of the goodness of their hearts? Or maybe not:

Project Trident was developed by Microsoft Research’s External Research Division specifically to support the scientific community. Project Trident is implemented on top of Microsoft’s Windows Workflow Foundation, using the existing functionality of a commercial workflow engine based on Microsoft SQL Server and Windows HPC Server cluster technologies. DryadLINQ is a combination of the Dryad infrastructure for running parallel systems, developed in the Microsoft Research Silicon Valley lab, and the Language-Integrated Query (LINQ) extensions to the C# programming language.

So basically Project Trident is more Project Trojan Horse - an attempt to get Microsoft HPC Server cluster technologies into the scientific community without anyone noticing. And why might Microsoft be so keen to do that? Maybe something to do with the fact that Windows currently runs just 1% of the top 500 supercomputing sites, while GNU/Linux has over 88% share.

Microsoft's approach here can be summed up as: accept our free dog biscuit, and be lumbered with a dog.

Follow me @glynmoody on Twitter or identi.ca.

13 December 2005

Driving Hard

Hard discs are the real engines of the computer revolution. More than rising processing speeds, it is constantly expanding hard disc capacity that has made most of the exciting recent developments possible.

This is most obvious in the case of Google, which now not only searches most of the Web, and stores its (presumably vast) index on cheap hard discs, but also offers a couple of Gbytes of storage to everyone who uses/will use its Gmail. Greatly increased storage has also driven the MP3 revolution. The cheap availability of Gigabytes of storage means that people can - and so do - store thousands of songs, and now routinely expect to have every song they want on tap, instantly.

Yet another milestone was reached recently, when even the Terabyte (=1,000 Gbytes) became a relatively cheap option. For most of us mere mortals, it is hard to grasp what this kind of storage will mean in practice. One person who has spent a lot of time thinking hard about such large-scale storage and what it means is Jim Gray, whom I had the pleasure of interviewing last year.

On his Web site (at Microsoft Research), he links to a fascinating paper by Michael Lesk that asks the question How much information is there in the world? (There is also a more up-to-date version available.) It is clear from the general estimates that we are fast approaching the day when it will be possible to have just about every piece of data (text, audio, video) that relates to us throughout our lives and to our immediate (and maybe not-so-immediate) world, all stored, indexed and cross-referenced on a hard disc somewhere.

Google and the other search engines already gives us a glimpse of this "Information At Your Fingertips" (now where did I hear that phrase before?), but such all-encompassing Exabytes (1,000,000 Terabytes) go well beyond this.

What is interesting is how intimately this scaling process is related to the opening up of data. In fact, this kind of super-scaling, which takes us to realms several orders of magnitude beyond even the largest proprietary holdings of information, only makes sense if data is freely available for cross-referencing (something that cannot happen if there are isolated bastions of information, each with its own gatekeeper).

Once again, technological developments that have been in train for decades are pushing us inexorably towards an open future - whatever the current information monopolists might want or do.