30 April 2007

Of Modules, Atoms and Packages

I commented before that I thought Rufus Pollock's use of the term "atomisation" in the context of open content didn't quite capture what he was after, so I was pleased to find that he's done some more work on the concept and come up with the following interesting refinements:

Atomization

Atomization denotes the breaking down of a resource such as a piece of software or collection of data into smaller parts (though the word atomic connotes irreducibility it is never clear what the exact irreducible, or optimal, size for a given part is). For example a given software application may be divided up into several components or libraries. Atomization can happen on many levels.

At a very low level when writing software we break thinks down into functions and classes, into different files (modules) and even group together different files. Similarly when creating a dataset in a database we divide things into columns, tables, and groups of inter-related tables.

But such divisions are only visible to the members of that specific project. Anyone else has to get the entire application or entire database to use one particular part of it. Furthermore anyone working on any given part of one of the application or database needs to be aware of, and interact with, anyone else working on it — decentralization is impossible or extremely limited.

Thus, atomization at such a low level is not what we are really concerned with, instead it is with atomization into Packages:

Packaging

By packaging we mean the process by which a resource is made reusable by the addition of an external interface. The package is therefore the logical unit of distribution and reuse and it is only with packaging that the full power of atomization’s “divide and conquer” comes into play — without it there is still tight coupling between different parts of a given set of resources.

Developing packages is a non-trivial exercise precisely because developing good stable interfaces (usually in the form of a code or knowledge API) is hard. One way to manage this need to provide stability but still remain flexible in terms of future development is to employ versioning. By versioning the package and providing ‘releases’ those who reuse the packaged resource can use a specific (and stable) release while development and changes are made in the ‘trunk’ and become available in later releases. This practice of versioning and releasing is already ubiquitous in software development — so ubiquitous it is practically taken for granted — but is almost unknown in the area of knowledge.

Tricky stuff, but I'm sure it will be worth the effort if the end-result is a practical system for modularisation, since this will allow open content to enjoy many of the evident advantages of open code.

No comments: