(This was originally published in The H Open in November 2012.)
I was lucky enough to interview Linus
quite early in the history of Linux – back in 1996, when he was
still living in Helsinki (you can read the fruits of that meeting in
this old
Wired
feature.) It was at an important moment for him, both
personally – his first child was born at this time – and in terms
of his career. He was about to join the chip design company
Transmeta, a move that didn't really work out, but led to him
relocating to America, where he remains today.
That makes his trips to Europe somewhat
rare, and I took advantage of the fact that he was speaking at the
recent
LinuxCon
Europe 2012 in Barcelona to interview him again, reviewing
the key moments for the Linux kernel and its community since we last
spoke.
Glyn Moody: Looking
back over the last decade and half, what do you see as the key events
in the development of the kernel?
Linus Torvalds: One
big thing for me is all the scalability work that we did. We've gone
from being OK on 2 or 4 CPUs to the point where basically you can
throw 4000 [at it] – you won't scale perfectly, but most of the
time it's not the kernel that's the bottleneck. If your workload is
somewhat sane we actually scale really well. And that took a lot of
effort.
SGI in particular worked a lot on
scaling past a few hundred CPUs. Their initial patches could just
not be merged. There was no way we could take the work they did and
use it on a regular PC because they added all this infrastructure to
work on thousands of CPUs. That was way too expensive to do when you
had only a couple.
I was afraid for the longest time that
we would have the high-performance kernel for the big machines, and
the source code would be separate from the normal kernel. People
worked a lot on just making sure that we had a clean code base where
you can say at compile time that, hey, I want the kernel that works
for 4000 CPUs, and it generates the code for that, and at the same
time, if you say no, I want the kernel that works on 2 CPUs, the same
source code compiles.
It was something that in retrospect is
really important because it actually made the source code much
better. All the effort that SGI and others spent on unifying the
source code, actually a lot of it was clean-up – this doesn't work
for a hundred CPUs, so we need to clean it up so that it works. And
it actually made the kernel more maintainable. Now on the desktop 8
and 16 CPUs are almost common; it used to be that we had trouble
scaling to an 8, now it's like child's play.
But there's been other things too. We
spent years again at the other end, where the phone people were so
power conscious that they had ugly hacks, especially on the ARM side,
to try to save power. We spent years doing power management in
general, doing the kind of same thing - instead of having these
specialised power management hacks for ARM, and the few devices that
cellphone people cared about, we tried to make it across the kernel.
And that took like five years to get our power management working,
because it's across the whole spectrum.
Quite often when you add one device,
that doesn't impact any of the rest of the kernel, but power
management was one of those things that impacts all the thousands of
device drivers that we have. It impacts core functionality, like
shutting down CPUs, it impacts schedulers, it impacts the VM, it
impacts everything.
It not only affects everything, it has
the potential to break everything which makes it very painful. We
spent so much time just taking two steps forward, one step back
because we made an improvement that was a clear improvement, but it
broke machines. And so we had to take the one step back just to fix
the machines that we broke.
Realistically, every single release,
most of it is just driver work. Which is kind of boring in the sense
there is nothing fundamentally interesting in a driver, it's just
support for yet another chipset or something, and at the same time
that's kind of the bread and butter of the kernel. More than half of
the kernel is just drivers, and so all the big exciting smart things
we do, in the end it pales when compared to all the work we just do
to support new hardware.
Glyn Moody: What
major architecture changes have there been to support new hardware?
Linus Torvalds: The
USB stack has basically been re-written a couple of time just because
some new use-case comes up and you realise that hey, the original USB
stack just never took that into account, and it just doesn't work.
So USB 3 needs new host controller support and it turns out it's
different enough that you want to change the core stack so that it
can work across different versions. And it's not just USB, it's PCI,
and PCI becomes PCIe, and hotplug comes in.
That's another thing that's a huge
difference between traditional Linux and traditional Unix. You have
a [Unix] workstation and you boot it up, and it doesn't change
afterwards - you don't add devices. Now people are taking adding a
USB device for granted, but realistically that did not use to be the
case. That whole being able to hotplug devices, we've had all these
fundamental infrastructure changes that we've had to keep up with.
Glyn Moody: What
about kernel community – how has that evolved?
Linus Torvalds: It
used to be way flatter. I don't know when the change happened, but
it used to be me and maybe 50 developers - it was not a deep
hierarchy of people. These days, patches that reach me sometimes go
through four levels of people. We do releases every three months; in
every release we have like 1000 people involved. And 500 of the 1000
people basically send in a single line change for something really
trivial – that's how some people work, and some of them never do
anything else, and that's fine. But when you have a thousand people
involved, especially when some of them are just these drive-by
shooting people, you can't have me just taking patches from everybody
individually. I wouldn't have time to interact with people.
Some people just specialise in drivers,
they have other people who they know who specialise in that
particular driver area, and they interact with the people who
actually write the individual drivers or send patches. By the time I
see the patch, it's gone through these layers, it's seldom four, but
it's quite often two people in between.
Glyn Moody: So what
impact does that have on your role?
Linus Torvalds:
Well, the big thing is I don't read code any more. When a patch has
already gone through two people, at that point, I can either look at
the patch and say: no, all your work was wasted, and micromanage at
that level – and quite frankly I don't want to do that, and I don't
have the capacity to do that.
So most of the time, when it comes to
the major subsystem maintainers, I trust them because I've been
working with them for 5, 10, 15 years, so I don't even look at the
code. They tell me these are the changes and they give me a very
high-level overview. Depending on the person, it might be five lines
of text saying this is roughly what has changed, and then they give
me a diffstat, which just says 15 lines have changed in that file,
and 25 lines have changed in that file and diffstat might be a few
hundred lines because there's a few hundred files that have changed.
But I don't even see the code itself, I just say: OK, the changes
happen in these files, and by the way, I trust you to change those
files, so that's fine. And then I just say: I'll take it.
Glyn Moody: So
what's your role now?
Linus Torvalds:
Largely I'm managing people. Not in the logistical sense – I
obviously don't pay anybody, but I also don't have to worry about
them having access to hardware and stuff like that. Largely what
happens is I get involved when people start arguing and there's
friction between people, or when bugs happen.
Bugs happen all the time, but quite
often people don't know who to send the bug report to. So they will
send the bug report to the Linux Kernel mailing list – nobody
really is able to read it much. After people don't figure it out on
the kernel mailing list, they often start bombarding me, saying: hey,
this machine doesn't work for me any more. And since I didn't even
read the code in the first place, but I know who is in charge, I end
up being a connection point for bug reports and for the actual change
requests. That's all I do, day in and day out, is I read email. And
that's fine, I enjoy doing it, but it's very different from what I
did.
Glyn Moody: So does
that mean there might be scope for you to write another tool like
Git, but for managing people, not code?
Linus Torvalds: I
don't think we will. There might be some tooling, but realistically
most of the things I do tend to be about human interaction. So we do
have tools to figure out who's in charge. We do have tools to say:
hey, we know the problem happens in this area of the code, so who
touched that code last, and who's the maintainer of that subsystem,
just because there are so many people involved that trying to keep
track of them any other way than having some automation just doesn't
work. But at the same time most of the work is interaction, and
different people work in different ways, so having too much
automation is actually painful for people.
We're doing really well. The kind of
pain points we had ten years ago just don't exist any more. And
that's largely because we used to be this flat hierarchy, and we just
fixed our tools, we fixed our work flows. And it's not just me, it's
across the whole kernel there's no single person who's in the way of
any particular workflow.
I get a fair amount of email, but I
don't even get overwhelmed by email. I love reading email on my
cellphone when I travel, for example. Even during breaks, I'll read
email on my cellphone because 90% of them I can just read for my
information that I can archive. I don't need to do anything, I was
cc'd because there was some issue going on, I need to be aware of it,
but I don't need to do anything about that. So I can do 90% of my
work while travelling, even without having a computer. In the
evening, when I go back to the hotel room, I'll go through [the other
10%].
Glyn Moody: 16 years
ago, you said you were mostly driven by what the outside world was
asking for; given the huge interest in mobiles and tablets, what has
been their impact on kernel development?
Linus Torvalds: In
the tablet space, the biggest issue tends to be power management,
largely because they're bigger than phones. They have bigger
batteries, but on the other hand people expect them to have longer
battery life and they also have bigger displays, which use more
battery. So on the kernel side, a tablet from the hardware
perspective and a usage perspective is largely the same thing as a
phone, and that's something we know how to do, largely because of
Android.
The user interface side of a tablet
ends up being where the pain points have been – but that's far
enough removed from the kernel. On a phone, the browser is not a
full browser - they used to have the mobile browsers; on the tablets,
people really expect to have a full browser – you have to be able
to click that small link thing. So most of the tablet issues have
been in the user space. We did have a lot of issues in the kernel
over the phones, but tablets kind of we got for free.
Glyn Moody: What
about cloud computing: what impact has that had on the kernel?
Linus Torvalds: The
biggest impact has been that even on the server side, but especially
when it comes to cloud computing, people have become much more aware
[of power consumption.] It used to be that all the power work
originally happened for embedded people and cellphones, and just in
the last three-four years it's the server people have become very
power aware. Because they have lots of them together; quite often
they have high peak usage. If you look at someone like Amazon, their
peak usage is orders of magnitude higher than their regular idle
usage. For example, just the selling side of Amazon, late November,
December, the one month before Christmas, they do as much business as
they do the rest of the year. The point is they have to scale all
their hardware infrastructure for the peak usage that most of the
rest of the year they only use a tenth of that capacity. So being
able to not use power all the time [is important] because it turns
out electricity is a big cost of these big server providers.
Glyn Moody: Do
Amazon people get involved directly with kernel work?
Linus Torvalds:
Amazon is not the greatest example, Google is probably better because
they actually have a lot of kernel engineers working for them. Most
of the time the work gets done by Google themselves. I think Amazon
has had a more standard components thing. Actually, they've changed
the way they've built hardware - they now have their own hardware
reference design. They used to buy hardware from HP and Dell, but it
turns out that when you buy 10,000 machines at some point it's just
easier to design the machines yourself, and to go directly to the
original equipment manufacturers and say: I want this machine, like
this. But they only started doing that fairly recently.
I don't know whether [Amazon] is behind
the curve, or whether Google is just more technology oriented. Amazon
has worked more on the user space, and they've used a fairly standard
kernel. Google has worked more on the kernel side, they've done
their own file systems. They used to do their own drivers for their
hard discs because they had some special requirements.
Glyn Moody: How
useful has Google's work on the kernel been for you?
Linus Torvalds: For
a few years - this is five or ten years ago - Google used to be this
black hole. They would hire kernel engineers and they would
completely disappear from the face of the earth. They would work
inside Google, and nobody would ever hear from them again, because
they'd do this Google-specific stuff, and Google didn't really feed
back much.
That has improved enormously, probably
because Google stayed a long time on our previous 2.4 releases. They
stayed on that for years, because they had done so many internal
modifications for their specialised hardware for everything, that
just upgrading their kernel was a big issue for them. And partly
because of the whole Android project they actually wanted to be much
more active upstream.
Now they're way more active, people
don't disappear there any more. It turns out the kernel got better,
to the point where a lot of their issues just became details instead
of being huge gaping holes. They were like, OK, we can actually use
the standard kernel and then we do these small tweaks on top instead
of doing these big surgeries to just make it work on their
infrastructure.
Glyn Moody: Finally,
you say that you spend most of your time answering email: as someone
who has always seemed a quintessential hacker, does that worry you?
Linus Torvalds: I
wouldn't say that worries me. I end up not doing as much programming
as sometimes I'd like. On the other hand, it's like some kinds of
programming I don't want to do any more. When I was twenty I liked
doing device drivers. If I never have to do a single device driver
in my life again, I will be happy. Some kind of headaches I can do
without.
I really enjoyed doing Git, it was so
much fun. When I started the whole design, started doing programming
in user space, which I had not done for 15 years, it was like, wow,
this is so easy. I don't need to worry about all these things, I
have infinite stack, malloc just works. But in the kernel space, you
have to worry about locking, you have to worry about security, you
have to worry about the hardware. Doing Git, that was such a relief.
But it got boring.
The other project I still am involved
in is the
dive
computer thing. We had a break in on the kernel.org site.
It was really painful for the maintainers, and the FBI got involved
just figuring out what the hell happened. For two months we had
almost no kernel development – well, people were still doing kernel
development, but the main site where everybody got together was down,
and a lot of the core kernel developers spent a lot of time checking
that nobody had actually broken into their machines. People got a
bit paranoid.
So for a couple of months my main job,
which was to integrate work from other people, basically went away,
because our main integration site went away. And I did my
divelog
software, because I got bored, and that was fun. So I
still do end up doing programming, but I always come back to the
kernel in the end.