02 April 2008

Linux: Too Much of a Good Thing?

The transformation of the Linux Foundation from a rather sleepy, peripheral player into one of the main voices for open source has been fascinating to watch. It's certainly welcome, too, because one of the problems of Linux in particular, and open source in general, is that the distributed production has tended to lead to dissipation in terms of getting the message across.

Now, in addition to a useful series of interviews with open source luminaires, the Linux Foundation is getting into surveys:

The Linux Foundation (LF), the nonprofit organization dedicated to accelerating the growth of Linux, today announced it is publishing a new report written by kernel developers Jonathan Corbet and Greg Kroah-Hartman, and LF Director of Marketing Amanda McPherson.

The report titled “Linux Kernel Development: How Fast is it Going, Who is doing it and Who is Sponsoring it?” is available today. The paper finds that over the last three years the number of developers contributing to the kernel has tripled and that there has been a significant increase in the number of companies supporting kernel development.

Even though Linux has achieved near-ubiquity as a technology platform powering Internet applications, corporate servers, embedded and mobile devices and desktops, mainstream users know very little about how Linux is actually developed. This community paper exposes those dynamics and describes a large and distributed developer and corporate community that supports the expansion and innovation of the Linux kernel. The Linux kernel has become a common resource developed on a massive scale by companies who are fierce competitors in other areas.


Among its findings:

o Every Linux kernel is being developed by nearly 1,000 developers working for more than 100 different corporations. This is the foundation for the largest distributed software development project in the world.

o Since 2005, the number of active kernel developers has tripled, reflecting the growing importance of Linux in the embedded systems, server, and desktop markets.

o Between 70 and 95 percent of those developers are being paid for their work, dispelling the “hobbyist” myth present from the start of open source development.

...

o More than 70 percent of total contributions to the kernel come from developers working at a range of companies including IBM, Intel, The Linux Foundation, MIPS Technology, MontaVista, Movial, NetApp, Novell and Red Hat. These companies, and many others, find that by improving the kernel they have a competitive edge in their markets.

But one result seems slightly worrying to me:

o An average of 3,621 lines of code are added to the kernel tree every day, and a new kernel is released approximately every 2.7 months.

o The kernel, since 2005, has been growing at a steady state of 10 percent per year.

Surely that means that Linux is steadily becoming more and more bloated? I've always been of the view that one of Linux's great virtues is leanness, especially compared to a Certain Other operating system. While change can be good, I don't think that more is necessarily is better when it comes to lines of code. Perhaps the Linux Foundation's next project could be to study how much of the kernel could be trimmed away to return it to its earlier, svelte self.

37 comments:

  1. On the 10% increase in the size of the kernel, could that be due to the number of drivers that are being added to it, due to drivers mostly running in kernel space?

    If that is the case, then the actual kernel may not be growing, just the number of devices it supports.

    ReplyDelete
  2. This issue has come up before, but keep in mind that the kernel isn't like a Web browser or word processor. It's responsible for abstracting access to the system's hardware, and the steady increase in the size of "the kernel" is primarily due to the addition of new drivers. As long as you don't monolithically compile in everything under the sun, you still only load the drivers you actually need for your hardware.

    ReplyDelete
  3. Anonymous9:44 am

    I wouldn't worry about it.
    Most of the extra code is going into new drivers and platform support.

    If the code isn't for your platform, it won't be compiled, and so won't be running, and if you're not using the newly developed drivers (presumably because you don't have the hardware it supports), the kernel won't load the module, so again, it won't be running.

    The Linux kernel is actually getting /faster/ in each release.

    All is well.

    ReplyDelete
  4. Anonymous10:31 am

    Come on, Glynn. That's a very weak conclusion to draw. Stop and think about it for a moment (and perhaps try to avoid using loaded emotional terms like "bloat").

    More architectures have been added. More orthogonal components like filesystems are added. More drivers for hardware (remembering that a goal of the kernel is to have as many drivers in-tree as possible). This is adding features. Many of these require non-negative numbers of lines of source code to be added. Ditto for inline documentation (which has increased over the last few years).

    Also, putting in a number of safety features in order to make features harder to misuse incorrectly sometimes requires more code at the source level. Has a particular kernel containing the same featureset increased dramatically and unnecessarily in its compiled size? If not, then it's certainly not "bloat".

    Comparing apples to oranges is only good when creating fruit salad proportions, not when drawing conclusions about code quality. If you don't know why the code increase is occurring then drawing a conclusion that the simple increase of the line count is bad is unjustified. It's about as useful as saying "I like blue". There might well be reasons to dislike the symptom, but you have to dig much deeper than you have.

    ReplyDelete
  5. Anonymous12:15 pm

    Is it possible they are counting the addition of drivers in that line count? If that was the case then it, at least to me, would be a good thing.

    ReplyDelete
  6. Anonymous2:41 pm

    "Surely that means that Linux is steadily become more and more bloated?"

    Although not 100% accurate, a great deal of the extra code that is added to the kernel is in new hardware drivers, alternative process schedulers and other additional features which may replace their legacy equivalents.

    As we (the user) can compile the kernel as we wish, many of the new features, or the old ones they replace, can be disabled during build time so the binary kernel that runs on your computer is not necessarily bigger at all. Compatibility with legacy features and systems are what keeps a great deal of the "old-code" in the kernel. You don't have to use it :-)

    Unlike closed source monopolies, you have the control and choice...

    ReplyDelete
  7. while no expert on the linux kernel by any means, my feeling is that the kernel grows mainly due to 2 reasons: adding more device support, and adding new features (e.g. no_hz, preempt_rt). the device support is not a big issue, since you can exclude the modules you don't need, so while the code expands, the actual kernel binary running on your machine is not much affected... the new features again mean improvements in kernel behaviour (e.g. lower power consumption), so kernel size is a reasonable sacrifice in my opinion

    ReplyDelete
  8. Anonymous9:19 pm

    While "bloat" is always going to be rather subjective, I think Linux is some way from acquiring that label just yet.

    Unlike the leading commercial alternative, Linux comes with pretty much every driver you'll need. Every time they add support for a new chipset or esoteric USB peripheral they add more code.

    And, kind of connected, Linux is much more modular. So although there is more code to be supported, there might not be more code actually running. Most people don't run ext4, JFS, RAID, FireWire, Bluetooth, etc.

    For more numbers see here. Note the high number of changes in the drivers directory and the number of lines that were actually removed.

    As long as the core, the bit that everyone runs, is small and stable I'm not sure it's a problem as long as there are enough people to support the modules/drivers.

    ReplyDelete
  9. Well, the vast majority of the Linux kernel code is drivers, CPU architecture-support code, hardware subsystems' code (USB, PCI, ALSA, etc.), and so forth. This code constantly accumulates due to the addition of new hardware that is supported by the Linux kernel (partly because hardware vendors constantly ship new hardware), as well as new architectures. I don't see a point in trying to reduce this particular aspect of the code because it is manifested in many kernel modules that are not loaded by default, or even not compiled.

    The core Linux kernel may use some trimming, but if you still want to support more and more hardware, you need a lot of drivers' code.

    One way I originally thought about to mitigate the problem of the huge kernel codebase (and other problems I've seen) is the Comprehensive Linux (Kernel) Archive Network. However, I lost interest of it very quickly, and now think it is possible it may have a some disadvantages over a unified and central kernel source, such as problems in upgrading to newer APIs across modules. My original proposal had a way to create various branches of the modules and to even manage patches intelligently (similar to what's proposed for CPAN6), but then it may still add a lot of complexity to the process of managing the kernel.

    ReplyDelete
  10. Anonymous3:42 am

    About kernel bloat, being modular i dont think bloat is an issue with linux kernel. For power users, they can go in and 'remove' parts of the kernel that they dont need or wont use for their servers. that gives them a very lean kernel with only stuff they actually will use.

    so yes, kernel is growing, thats good. it means support for more hardware and prephreals.

    ReplyDelete
  11. Anonymous3:47 am

    Just because the size of the kernel is growing doesn't mean the kernel is becoming BLOATED...

    Remember that Linux uses modules... as such it may mean that more hardware/features are being added... and likely to go into modules.

    So unless you're loading them... you won't see any problem in memory consumption.

    ReplyDelete
  12. Anonymous4:42 am

    I wouldn't worry too much because;

    - Thousands of lines of code compile down to significantly smaller quantities of executable binary.

    - The kernel is modular. A lot of those lines of code are supporting hardware you don't use and therefore won't be loading the module for, rather than being in the core of the kernel itself.

    ReplyDelete
  13. Anonymous5:28 am

    I always had the impression that most of that is drivers. Inevitably as new devices are created and old drivers remain for use in older hardware, the amount of driver code continues to increase.

    That doesn't necessarily mean that the working code is becoming bloated.

    ReplyDelete
  14. Anonymous5:56 am

    With respect to your worry that Linux is becoming "bloated" and may need trimming to return to its earlier svelte self (let's call it Sveltix just for fun), then the natural process for this in the free software world is called a "fork".

    Forking a project is usually viewed as a Bad Thing. However, if Linux indeed becomes unusable for a significant market, while remaining well suited for its mainstream market, then forking Sveltix from Linux to better address both markets is a Good Thing instead.

    Note that I disagree we need Sveltix, as kernel customizations appear feasible to address most of the market space. But it's comforting (and motivating) to remember that the option exists.

    ReplyDelete
  15. Anonymous6:48 am

    A lot of the new code is new drivers, which don't really bloat the design of the kernel. I also had the impression that the 3000+ lines number was lines changed, not lines added. The number of added lines is still very large, though.

    ReplyDelete
  16. Yes I agree, it does seem a bit dangerous that the kernel is both changing and growing at such an extreme pace. I mean, 3600 changed lines of code in a day is sick...

    But I guess a large part of that are comments, "external" drivers, manual pages and other peripherals.

    ReplyDelete
  17. The key question here is modularity, not scale. That said, Andy's microkernel peeve comes to mind.

    ReplyDelete
  18. Anonymous8:33 am

    Increase in number of lines of code in Linux kernel doesn't mean a more bloated kernel. The increase in size is not in the central kernel features but more in the hardware support and specific features, usually available as modules. So a larger kernel sourcebase results in more possibilities for kernel intergrators, not a more bloated kernel.

    ReplyDelete
  19. Anonymous9:39 am

    When thousands of extra LOC go into dozens or hundreds of modules WHICH ARE LOADED ON DEMAND, the kernel is still lean. Kernel bloat is only a worry if a single module gets mired in fiddly mess.

    Far more serious code bloat is in the applications.

    ReplyDelete
  20. Anonymous11:25 am

    There is no need to 'trim' the kernel and the kernel isn't becoming bloated and will never be because of the very nature of Linux. Once you install Linux on a machine, it's automatically becoming 'lean' as it will install itself and everything necessary to function, but nothing more, no matter the platform you're installing Linux on.

    Those thousands of lines that are added all the time in the kernel are bug fixes, new drivers etc.

    You can have a look at the Kernel Mailing List here http://lkml.org/ as well as the following site: http://kerneltrap.org/ to see a bit more what's going on on an everyday basis.

    Pretty much everything that is added in the Kernel these days are drivers and very little is in new features (Of course, new features are added but not in the majority of updates).

    There was a need to develop a distributed system that allows those thousands of developers to work together and merge their work very quickly. That need was responded to by Linus Torvalds himself who created (with other developers also) Git (http://git.or.cz/) to replace BitKeeper.

    Git allows fast and regular merging through an excellent tree merging system which allows a lot of merges to happen with the choice at any moment to come back to any previous version of the code from any branch that exists in the Git tree.

    The Linux Kernel development is based on a network of trust (Linus Torvald's network of trust) and patches and updates are 'merged' into the kernel by Linus depending on who wrote that patch (who he trusts basically).

    This has been the way it is for the last two-three years (since they use Git) and the kernel has never been as good and active as it is now and it's becoming anything but bloated and out of control.

    - bentob0x

    ReplyDelete
  21. This is a interesting conclusion, and one which I really don't think I agree with. While the kernel is growing at a study rate, what I will call the effective kernel, that which is used by a single user, isn't.

    While the kernel NEEDS to grow to accomodate new technologies and support an ever increasing array of hardware, as an end user you're never going to really need more than the modules in use for your particular hardware. On most machines, that means a custom kernel can possibly throw away a vast majority of the code as unnecessary. The Linux kernel's already great module system makes even a custom kernel just unneeded for most distributions.

    Even then, if you want to talk about bloat, it's the end user applications which makes bloat more noticeable that the kernel ever well. Your system is seeming slow? Try switching from gnome or kde (which I believe both have gone into quite a bit of bloat) to xfce, fluxbox, or windowmaker. Ah, it's fast and lean again! Open office got you down? GOffice or KOffice to the rescue! It's this slew of options for every task raging from "bloaty with lots of user friendly features" to "bare minimum of helpers" that makes GNU/Linux such amazing tool.

    Also, bloat doesn't necessarily mean inefficient. All of the examples I have above as "bloat" are still very efficient programs, that make the most of the resources given to them. Except maybe open office.....

    ReplyDelete
  22. Anonymous1:41 pm

    I don't believe that just because there is a large number of lines added to the kernel each day that it equates with bloat. The reason I don't believe that is that a lot of those lines are for new architecture support, modular drivers and other modular features. The real measure of bloat would be how much is added to the core kernel code that must be loaded all the time, not the number of lines of code added to the whole tree, which I would suspect is much lower than the number in the report.

    ReplyDelete
  23. Anonymous1:45 pm

    Is it bloat? Or features? New powerful kernel features greatly expand how easy it is to mold Linux to new uses.

    Example: Asus uses UnionFS in an innovative way to have the "restore" partition simply be the union mounted base image of the system. Simply zap the 2nd user partition and you are "factory restored" in 30 seconds.

    KVM is a new feature that makes virtualization *EASY*.

    FUSE. Laptop support for Suspend and Hibernate. And more new goodies. Is it bloat or features?

    Is it device drivers? If the kernel size were growing mostly as a function of device drivers, this would be a good thing.

    ReplyDelete
  24. Anonymous2:46 pm

    The fact that the kernel tree is growing doesn't necessarily mean that the kernel is becoming bloated. Can you point to performance benchmarks to support your hypothesis?

    I doubt that the kernel performance has gone down over time, especially when one compiles the necessary kernel options as modules in order to keep the kernel as lean as possible.

    ReplyDelete
  25. Isn't it true, however, that a lot of that additional code is to accommodate new and emerging technology and hardware support? At the same time, while the Linux kernel in total might grow and grow, for my particular system, can't I build a kernel binary that is lean since it is targeted to my own use? That's something that will never be possible on Certain Other Systems.

    ReplyDelete
  26. It is definitely better!

    Just for starters, think of hardware support. Every hardware driver you implement adds up lines of code. It means that you have to increase the line count in the kernel source tree if you want to support more hardware.

    Of course, if you want to keep the lean status, you can just compile your kernel without the support for all the hardware you don't have.

    Anyway, these are most of the time compiled as modules that are loded/unloaded as needed, keeping the general leanniness automatically.

    Go Linux!

    ReplyDelete
  27. Anonymous5:45 pm

    The devil is always in the details. First, you are talking about averages. And even at that, using the numbers cited here, there are a little over 3.5 lines of code per developer per day being added. That is below the rule-of-thumb 10 lines.

    Also, where are those changes occurring? If the bulk is in driver modules, then bloat is not the issue. How much testing is performed for relatively low use devices would be of much more concern. Since drivers have direct access to kernel data, they can create more serious problems than apps.

    Another thing about averages is that they tell you nothing about the largest and smallest contributions. Are there dozens of 100+ line changes or just a few? Actually, large changes may be preferable because they tend to get more attention. But a one-liner can be catastrophic under the right conditions.

    So while the surveys are a good start, I would not suggest using them to set policy just yet.

    Later . . . Jim

    ReplyDelete
  28. I suppose "bloated" could be a term of art, if what we were worried about was the kernel becoming excessively complex.

    But I don't see that happening. The layers that are definitively "the linux kernel", like the SCI, process manager, virtual file system manager, memory manager and network stack, are all still as lean and mean as they were five years ago. If anything, they've gotten better.

    Those "lines of code," when they apply to the bottom layers mentioned above, refer to different implementations from which one can pick and choose: SLUB vs SLAB memory management, or the low-latency "desktop" process manager vs the "server" process manager. These represent the evolution of the kernel, not necessarily it's bloating.

    The biggest suite of changes to the kernel are in device drivers: new devices are being added every day. Most of the "thousands of developers" are working on device drivers: each has one particular peripheral that he needs working, and so he contributes to the development of that device driver.

    I suppose if you're an all-things-to-all-people house like RedHat or Ubuntu, the kernel will be big because it has to include everything an install might encounter. For embedded device manufacturers, however, Linux can easily be pared down to bare essentials, a lean kernel arguably faster and more efficient than any "smaller" kernel in the past.

    When users complain about bloat in a commercial desktop vendor like MS or Apple, we talk about the whole package. Comparing just the kernel to an MS or Apple install is an apples to orchard comparison. It's better to talk about the layers in isolation, and how they succeed because they're isolated (kernel, X, Gnome/KDE, Applications) and talk about the value proposition brought by each.

    As a longtime Gentoo user and device driver contributor (Sidewinder Joystick with Gameport-- hey, I wanted to play Freespace 2, okay?), I don't see bloat at the kernel level as being an issue at all.

    ReplyDelete
  29. Anonymous7:26 pm

    There is also a graph in the article that indicates the lines of code that have been deleted from the kernel source. It's not just a matter of code being added, many lines of obsolete code are being removed as well.

    ReplyDelete
  30. Anonymous12:57 am

    I agree with the code bloat issue. It may not matter too much for a high end server, but it very well could be an issue - even today - for network edge devices and mobile/embedded/consumer devices. I document one possible solution from a micro-kernel perspective in my blog post here:

    The Value of a Board Support Package

    ReplyDelete
  31. Read the rest of the report. A lot of the growth is due to drivers as more hardware is supported. The kernel itself sees a lot of churn but I wouldn't describe it as bloat.

    ReplyDelete
  32. Anonymous12:16 pm

    It would be interesting to know where these lines of codes are added ans also what method was used to estimate those numbers.
    If the said lines are mostly added to the drivers/modules as I think they might be. Then it just means more hardware that's being supported or modular code that's added and probably not compiled in most distros kernels.

    ReplyDelete
  33. I wouldn't say that the kernel is becoming more bloated. If you foolow the release process for a while you will notice that most lines added go to drivers and such. There was even a recent change in semaphores that substituted some arch specific code with much simpler (and smaller) generic code.

    ReplyDelete
  34. Maybe there are too many drivers....

    Sorry about the delay - Google is eating comments....

    ReplyDelete
  35. Thanks to everyone for their enlightening comments - and apologies again for the huge delay in posting: Google is treating most comments as spam, and only now am I retrieving them....

    ReplyDelete
  36. You are confusing SOURCE CODE amount with a compiled kernel amount.

    Because linux tar ball is growing means more drives, docs, comments and features...

    BUT, when you compile the kernel, most of the capabilities are TURNED OFF, not compiled and therefore are not used to make a compiled bloated kernel.

    END OF SORTY

    ReplyDelete
  37. @Matt: well, that's why I wrote about source code. What concerns me is that as more lines are added, the whole thing becomes more complex - whether or not it is compiled.

    ReplyDelete