Planet Skolelinux

September 16, 2014

Petter Reinholdtsen

Speeding up the Debian installer using eatmydata and dpkg-divert

The Debian installer could be a lot quicker. When we install more than 2000 packages in Skolelinux / Debian Edu using tasksel in the installer, unpacking the binary packages take forever. A part of the slow I/O issue was discussed in bug #613428 about too much file system sync-ing done by dpkg, which is the package responsible for unpacking the binary packages. Other parts (like code executed by postinst scripts) might also sync to disk during installation. All this sync-ing to disk do not really make sense to me. If the machine crash half-way through, I start over, I do not try to salvage the half installed system. So the failure sync-ing is supposed to protect against, hardware or system crash, is not really relevant while the installer is running.

A few days ago, I thought of a way to get rid of all the file system sync()-ing in a fairly non-intrusive way, without the need to change the code in several packages. The idea is not new, but I have not heard anyone propose the approach using dpkg-divert before. It depend on the small and clever package eatmydata, which uses LD_PRELOAD to replace the system functions for syncing data to disk with functions doing nothing, thus allowing programs to live dangerous while speeding up disk I/O significantly. Instead of modifying the implementation of dpkg, apt and tasksel (which are the packages responsible for selecting, fetching and installing packages), it occurred to me that we could just divert the programs away, replace them with a simple shell wrapper calling "eatmydata $program $@", to get the same effect. Two days ago I decided to test the idea, and wrapped up a simple implementation for the Debian Edu udeb.

The effect was stunning. In my first test it reduced the running time of the pkgsel step (installing tasks) from 64 to less than 44 minutes (20 minutes shaved off the installation) on an old Dell Latitude D505 machine. I am not quite sure what the optimised time would have been, as I messed up the testing a bit, causing the debconf priority to get low enough for two questions to pop up during installation. As soon as I saw the questions I moved the installation along, but do not know how long the question were holding up the installation. I did some more measurements using Debian Edu Jessie, and got these results. The time measured is the time stamp in /var/log/syslog between the "pkgsel: starting tasksel" and the "pkgsel: finishing up" lines, if you want to do the same measurement yourself. In Debian Edu, the tasksel dialog do not show up, and the timing thus do not depend on how quickly the user handle the tasksel dialog.

Machine/setup Original tasksel Optimised tasksel Reduction
Latitude D505 Main+LTSP LXDE 64 min (07:46-08:50) 44 min (11:27-12:11) >20 min 18%
Latitude D505 Roaming LXDE 57 min (08:48-09:45) 34 min (07:43-08:17) 23 min 40%
Latitude D505 Minimal 22 min (10:37-10:59) 11 min (11:16-11:27) 11 min 50%
Thinkpad X200 Minimal 6 min (08:19-08:25) 4 min (08:04-08:08) 2 min 33%
Thinkpad X200 Roaming KDE 19 min (09:21-09:40) 15 min (10:25-10:40) 4 min 21%

The test is done using a netinst ISO on a USB stick, so some of the time is spent downloading packages. The connection to the Internet was 100Mbit/s during testing, so downloading should not be a significant factor in the measurement. Download typically took a few seconds to a few minutes, depending on the amount of packages being installed.

The speedup is implemented by using two hooks in Debian Installer, the pre-pkgsel.d hook to set up the diverts, and the finish-install.d hook to remove the divert at the end of the installation. I picked the pre-pkgsel.d hook instead of the post-base-installer.d hook because I test using an ISO without the eatmydata package included, and the post-base-installer.d hook in Debian Edu can only operate on packages included in the ISO. The negative effect of this is that I am unable to activate this optimization for the kernel installation step in d-i. If the code is moved to the post-base-installer.d hook, the speedup would be larger for the entire installation.

I've implemented this in the debian-edu-install git repository, and plan to provide the optimization as part of the Debian Edu installation. If you want to test this yourself, you can create two files in the installer (or in an udeb). One shell script need do go into /usr/lib/pre-pkgsel.d/, with content like this:

#!/bin/sh
set -e
. /usr/share/debconf/confmodule
info() {
    logger -t my-pkgsel "info: $*"
}
error() {
    logger -t my-pkgsel "error: $*"
}
override_install() {
    apt-install eatmydata || true
    if [ -x /target/usr/bin/eatmydata ] ; then
        for bin in dpkg apt-get aptitude tasksel ; do
            file=/usr/bin/$bin
            # Test that the file exist and have not been diverted already.
            if [ -f /target$file ] ; then
                info "diverting $file using eatmydata"
                printf "#!/bin/sh\neatmydata $bin.distrib \"\$@\"\n" \
                    > /target$file.edu
                chmod 755 /target$file.edu
                in-target dpkg-divert --package debian-edu-config \
                    --rename --quiet --add $file
                ln -sf ./$bin.edu /target$file
            else
                error "unable to divert $file, as it is missing."
            fi
        done
    else
        error "unable to find /usr/bin/eatmydata after installing the eatmydata pacage"
    fi
}

override_install

To clean up, another shell script should go into /usr/lib/finish-install.d/ with code like this:

#! /bin/sh -e
. /usr/share/debconf/confmodule
error() {
    logger -t my-finish-install "error: $@"
}
remove_install_override() {
    for bin in dpkg apt-get aptitude tasksel ; do
        file=/usr/bin/$bin
        if [ -x /target$file.edu ] ; then
            rm /target$file
            in-target dpkg-divert --package debian-edu-config \
                --rename --quiet --remove $file
            rm /target$file.edu
        else
            error "Missing divert for $file."
        fi
    done
    sync # Flush file buffers before continuing
}

remove_install_override

In Debian Edu, I placed both code fragments in a separate script edu-eatmydata-install and call it from the pre-pkgsel.d and finish-install.d scripts.

By now you might ask if this change should get into the normal Debian installer too? I suspect it should, but am not sure the current debian-installer coordinators find it useful enough. It also depend on the side effects of the change. I'm not aware of any, but I guess we will see if the change is safe after some more testing. Perhaps there is some package in Debian depending on sync() and fsync() having effect? Perhaps it should go into its own udeb, to allow those of us wanting to enable it to do so without affecting everyone.

September 16, 2014 12:00 PM

September 10, 2014

Petter Reinholdtsen

Good bye subkeys.pgp.net, welcome pool.sks-keyservers.net

Yesterday, I had the pleasure of attending a talk with the Norwegian Unix User Group about the OpenPGP keyserver pool sks-keyservers.net, and was very happy to learn that there is a large set of publicly available key servers to use when looking for peoples public key. So far I have used subkeys.pgp.net, and some times wwwkeys.nl.pgp.net when the former were misbehaving, but those days are ended. The servers I have used up until yesterday have been slow and some times unavailable. I hope those problems are gone now.

Behind the round robin DNS entry of the sks-keyservers.net service there is a pool of more than 100 keyservers which are checked every day to ensure they are well connected and up to date. It must be better than what I have used so far. :)

Yesterdays speaker told me that the service is the default keyserver provided by the default configuration in GnuPG, but this do not seem to be used in Debian. Perhaps it should?

Anyway, I've updated my ~/.gnupg/options file to now include this line:

keyserver pool.sks-keyservers.net

With GnuPG version 2 one can also locate the keyserver using SRV entries in DNS. Just for fun, I did just that at work, so now every user of GnuPG at the University of Oslo should find a OpenGPG keyserver automatically should their need it:

% host -t srv _pgpkey-http._tcp.uio.no
_pgpkey-http._tcp.uio.no has SRV record 0 100 11371 pool.sks-keyservers.net.
%

Now if only the HKP lookup protocol supported finding signature paths, I would be very happy. It can look up a given key or search for a user ID, but I normally do not want that, but to find a trust path from my key to another key. Given a user ID or key ID, I would like to find (and download) the keys representing a signature path from my key to the key in question, to be able to get a trust path between the two keys. This is as far as I can tell not possible today. Perhaps something for a future version of the protocol?

September 10, 2014 11:10 AM