GNU Guix at the MDC

This document is a comprehensive tutorial explaining how GNU Guix is used at the MDC. If you are in a hurry and you don't really want to understand Guix right now here's what you need to know to get started:

Take a look at the most common commands to learn more about how to install or remove software in your personal profile.

Table of Contents

1 Background

1.1 The GNU project

Guix is part of the GNU project. One of the GNU project's core goals is to empower users. Primarily this is done by developing and maintaining free/libre software to provide freedom and independence from vendors of proprietary software. GNU Guix empowers users by giving them the tools to take software management into their own hands, without having to rely on the approval of "super users".

Guix achieves this by implementing the functional package management scheme.

1.2 Functional package management

Guix (pronounced "geeks") is a functional package manager. "Functional" here means "like a pure mathematical function". The value produced by a pure function only depends on its inputs, not on any global state.

In functional package management we model the package build process as a pure function that only depends on declared inputs. In a package inputs are the source code archive, the compiler toolchain, any libraries the code depends on and more. Global state would be things like the current date and time, or software with a "global", system-wide scope, such as programs under /usr, /bin, or /lib, or really anything that has not been declared as an input.

Since also the inputs of a package are expressed as packages with inputs (and their inputs likewise, recursively), the functional model allows us to capture the complete dependency graph of any single package. The roots of the graph are a small set of bootstrap binaries. Capturing the complete dependency graph and ignoring global state makes packages portable across different variants of the GNU system: Guix packages are independent from any libraries on the host system (be that Ubuntu, Debian, Fedora, CentOS, or any other GNU variant).

1.3 The store

The output of every package is a directory tree with files. Every package installs its files into a unique prefix (generated from the hash of all inputs) in a place that is called the store. All items in the store are immutable; one can only append items to the store.

The unique prefix ensures that packages with even slightly different inputs (such as an added patch, different configuration flags, or linked with a different library) are installed into different locations.

1.4 Profiles

The paths of store items are terribly long, making it very hard to use software directly out of the store. Instead of using store items directly, users can install them into one or many profiles. A profile is a store item itself (and thus immutable); its contents are the union of all the directory trees of all packages that are installed in it.

Whenever a profile is "modified" (such as by installing or removing applications) a new generation of the profile is created — the old generation is still available. For convenience, the latest generation of the default profile is linked to $HOME/.guix-profile. Since the store is immutable, rolling back to a previous profile generation is trivial: only the link of the current profile generation has to be updated to the store item of one of the previous profile generations.

Users can have any number of profiles. They can be used in addition to or instead of the default profile at $HOME/.guix-profile.

1.5 Garbage collection

To reclaim disk space Guix comes with a garbage collector. Any profile generation that is no longer "alive" (i.e. linked to) can automatically be purged from the store. Profile generations are always alive until the user decides to free them up for garbage collection by deleting their links.

2 The shared Guix installation at the MDC

At the MDC we have a shared Guix installation, which makes it simple to use the very same software across all cluster nodes and even user workstations. We offer a central store on one of the file servers. The store is mounted read-only on every node of the Max cluster and can be mounted manually on workstations.

Profile links are located on a separate file server share, and they are mounted read-writable on the Max cluster nodes. This allows profile management to be performed on all Max cluster nodes (except for the login nodes).

To simplify package management BIMSB support (that's me!) manages a central Guix installation with a version of the Guix upstream package database (or really "package library") and an additional custom BIMSB package library for non-free software and custom package variants.

The Guix upstream package library is updated semi-regularly and does not diverge from what has been published by the GNU Guix project. This is to ensure that the software environment we provide here can be reproduced elsewhere.

The BIMSB package library is made available as a git repository at https://github.com/BIMSBBioinfo/guix-bimsb

Since only the daemon can modify the store on behalf of users and the daemon only runs on a single server, we provide a wrapper that talks to the daemon over the network. It also ensures that the BIMSB package library is enabled. The wrapper is available on all cluster nodes as "guixr" (for "guix remote" or start-up speak for "even guixer than guix"). Due to a cluster configuration error you need to enter the interactive cluster session with the following command:

qrsh -l h_stack=128M

We also provide a web interface that allows users to browse all packages that are currently available via the MDC installation of Guix. It is located at http://guix.mdc-berlin.de

One last thing: accessing the store on the file servers over NFS makes Guix inexcusably slow. I'm trying to speed this up, but for now Guix at the MDC also serves as a personality trainer with a focus on enhancing your capacity for patience. I suggest using the delays for meditation, reflecting on what life choices brought you here, sitting in front of a computer screen as Berlin's short summer is washed away with rain from indifferent clouds.

3 Common Guix commands for profile management

3.1 Search, install, remove

Searching for packages can be done via the web interface or using the command guixr package -A. You can use a subset of regular expressions for package search. For example, this command will return a list of all packages with a name starting with "r-" (that's the Guix convention for names of R packages):

guixr package -A "^r-"

To install packages into your default profile at $HOME/.guix-profile just use guixr package -i followed by the names of the packages. The following command will install R along with the "genomation" package (and all of its dependencies) into your default profile:

guixr package -i r r-genomation

To remove a package from your profile use the package command with the -r option. The following command removes the "python" package from the default profile and adds the "guile" package in the same transaction:

guixr package -r python -i guile

3.2 Installing variants

We are offering a couple of variants for certain packages, such as samtools. See for yourself by searching:

guixr package -A samtools

This produces something like this:

r-rsamtools	1.24.0	out	gnu/packages/bioinformatics.scm:4454:2
samtools	1.3.1	out	gnu/packages/bioinformatics.scm:2982:2
samtools	0.1.19	out	gnu/packages/bioinformatics.scm:3043:2
samtools	1.1	out	bimsb/packages/bioinformatics-variants.scm:67:2
samtools	0.1.8	out	bimsb/packages/bioinformatics-variants.scm:80:2

Guix defaults to installing the latest available version of a given package (in the case of samtools this would be version 1.3.1). To pick a different version just append an @ followed by a unique substring of the version. If the version is ambiguous, Guix will pick the highest version number of the remaining options. This command installs version 0.1.19 as there are two versions starting with "0" and the highest of them is "0.1.19":

guixr package -i samtools@0

3.3 Rolling back and switching profile generations

If you are unhappy with a profile change you can trivially roll back to the previous generation with:

guixr package --roll-back

Guix keeps not just the most recent but all previous profile generations, so you can switch back to any of them with the -S switch. The following series of commands lists all profile generations and then switches to generation number 42:

guixr package -l
guixr package -S 42

Note that after going back to a previous generation you can accidentally overwrite the generations that follow as Guix does not branch off generations in a tree. Profile generations are a simple list. When you have 10 generations and you go back to generation 3, and then install or remove a package you are creating a new profile generation with the number 4, thereby overwriting the link to what used to be generation 4. Removing links to previous generations frees them up for garbage collection.

3.4 Deleting profile generations

If you are sure that you won't need any generation of a profile except for the one that is currently selected you can delete all but the current generation with guixr package -d. Optionally, you can pass a regular expression (not glob pattern) identifying the generations to delete. Note that the generations won't be removed immediately; only their links are removed, which frees them up for garbage collection at a later point.

4 Recommended practices for reproducible software environments

4.1 Use different profiles for different projects

The guix package command takes an optional argument --profile or -p, allowing you to operate on a profile other than the default at ~/.guix-profile.

You can use this feature to create project-specific profiles containing project-specific software. The following snippet installs R and genomation into a new profile in $HOME/projects/analysis/.guix-profile:

export PROJECT=$HOME/projects/analysis
guixr package --profile=$PROJECT/.guix-profile -i r r-genomation

I suggest naming the profile link .guix-profile (with a leading period) for consistency and for aesthetic reasons. Whenever a new profile generation is created, Guix will create a new symbolic link to the store item of the generation. This link will be a variation of the main profile link, so it will be hidden when the main profile link is hidden.

BIMSB support can also create shareable custom profiles under /gnu/var/guix/profiles/custom if you find that useful. Just contact BIMSB support (that's still me!) and they'll help you to set this up.

4.1.1 Load the environment of another profile

When software is installed to a separate profile it won't automatically appear on your PATH, so you would have to prefix all tools with their full path prefix. This may be inconvenient, so there's a way to load up the environment of a given profile. Here's a template:

bash
export PROJECT=$HOME/projects/analysis
export GUIX_PROFILE=$PROJECT/.guix-profile
source $GUIX_PROFILE/etc/profile
# do things here
exit

This creates a subshell with bash and closes it in the very end (exit). In between project-specific environment variables are set by sourcing the profile's ./etc/profile file. This file contains environment variable definitions, such that the PATH and all other variables are set to include this project's software.

We use a subshell to avoid polluting the environment with stray variables. Once the subshell is terminated (e.g. with exit or ^D) the environment is reset to what it used to be before starting the subshell.

Guix also has a subcommand for creating environments in an ad-hoc fashion. This is useful for when you don't really want to have a permanent profile for particular tools (such as when testing or developing software). Take a look at the Guix manual for "guix environment" if you are interested.

4.2 Prefer manifests over sequential profile changes

When Guix is upgraded between package installations the resulting profile state could be hard to reproduce. Instead of incrementally adding packages to existing profiles write a manifest and instantiate it. Whenever the manifest is changed all packages are upgraded together. This avoids having a profile with some packages that require version A of Guix and others that require the newer version B of Guix.

Instantiating a manifest works like this:

guixr package --manifest=/path/to/manifest

Note that this will remove software that is currently part of the profile but not mentioned in the manifest file.

We are hosting manifest files that can serve as examples at /gnu/var/guix/profiles/custom/manifests/. Manifests are simple Scheme files that load a number of package modules and then list all of the package variables that refer to packages you want to have installed. Usually, the variable name is identical to the package name, but in some cases that is not so. Rather than trying to figure out exactly how this works in detail (which would involve taking a look at the Guix package library source code) you are more than welcome to ask BIMSB support to help you writing a manifest.

Here's a simple manifest example:

(define packages '("python@2"
                   "python2-h5py"
                   "python2-joblib"
                   "python2-bx-python"
                   "python2-hiclib"
                   "python2-mirnylib"
                   "python2-pandas"
                   "python2-statsmodels"
                   "samtools@0.1"))

;; Turn the list of package names into a manifest
(use-modules (gnu packages))
(packages->manifest (map specification->package packages))

Note that upgrading all packages in a profile before adding a new package is almost equivalent to instantiating a manifest, so you could do that if you don't really want to mess with manifests.

4.3 Use guix.scm for own projects

This applies only if you are developing software on your own. Make it easier for other developers to enter a suitable environment, no matter what GNU variant they use. Ask BIMSB support for more information.

5 Caveats

5.1 Mixing libraries is bad

One side-effect of having software that is completely isolated from system software is that binary compatibility is lost (aka ABI mismatch). You cannot have a system binary that loads a Guix binary as a module (or the other way around). This is a problem that likely won't ever be fixed, because it relates to how compilers behave.

Due to the way many languages such as R and Python load modules we strongly discourage mixing Guix stuff with system stuff, e.g. using Guix R packages with a system R installation. The same applies to using install.packages() inside of a Guix R, because in some cases this will use the system's C compiler and linker and thus create binaries that are incompatible with R from Guix.

This is not as bad as it sounds, though. BIMSB support can easily create R packages for Guix. In fact, our goal is to have all of CRAN and all of Bioconductor available via Guix.

When in doubt ask BIMSB support about your particular use case.

5.2 LD_LIBRARY_PATH is bad

Many users at the MDC set the LD_LIBRARY_PATH environment variable. This is dirty hack and it was never meant for users to set it. It tells the system to dynamically load libraries from a different location. When using Guix applications you cannot simply use libraries from a different location due to the aforementioned lack of binary compatibility. If you are confused about all these environment variables and wonder whether you really need them talk to BIMSB support.

5.3 Incremental profile modifications are bad

See above section on preferring manifests over incremental profile modification. This is a problem caused by the fact that we offer a central installation of the Guix package libraries.

6 Sharing environments

Guix makes it easy to share complete software environments. There are different ways to achieve this: either by a symbolic definition of the environment, or an export of binaries.

6.1 Symbolic sharing

This means that we only share a manifest of our profile together with information describing the state of the shared Guix installation. The manifest refers to packages only by variable name, which is dependent on the state of the Guix package library.

To determine the state of the Guix upstream package library and the BIMSB package library run these commands:

git -C /gnu/remote/guix describe --always
git -C /gnu/remote/guix-bimsb describe --always

This returns something like "v0.10.0-1012-g68bf2f9" for each of the repositories and it is sufficient for a third-party to clone the repositories and check out the described state.

The upstream Guix repository is available at http://git.savannah.gnu.org/cgit/guix.git/, and the BIMSB repository is at https://github.com/BIMSBbioinfo/guix-bimsb.

Note that this approach assumes that package sources will remain available in the future. (Bioconductor, for example, removes old source archives; we are working on a fix.)

6.2 Binary sharing

This involves exporting a package closure for sharing software environments at the binary level (e.g. for archiving). This is somewhat slow (several minutes).

The following snippet can be used to export a profile recursively and store it in my-profile.nar.gz.

guix archive --export --recursive \
  $(readlink -f /project/.guix-profile) | \
  gzip --stdout - > my-profile.nar.gz

The recipient can take this archive and import into another Guix store. Exported archives are signed, so importing requires prior authentication of the public key. Ask BIMSB support for assistance when planning to export from the store.

7 Creating package variants with GNU Guix and Guile Scheme

You should have a rudimentary understanding of Scheme before trying to create package variants. The syntax of Scheme is very simple and you won't need to know much more than this:

  • expressions are possibly nested lists starting with ( and end with )
  • always keep parentheses balanced (in Emacs use paredit)
  • expressions can be quoted, which keeps them from being evaluated
    ;; will be evaluated to print "hello"
    (display "hello")
    
    ;; will be evaluated as data
    '(display "hello")
    
  • quasiquotation allows switching between data and code
    `(1 2 3 (+ 2 2) 5 (+ 2 2 2))
    

A great resource to learn Scheme is the Guile website.

Creating package variants is usually a matter of binding a package object with minor modifications to a variable in a custom module. Let's do this for bedtools.

7.1 Create a new module

You must make sure that the name of the module matches its path. Here is an example module, in which we take the package object identified by the "bedtools" variable and use it to create a package variant for a different version. This means that we must inherit from the "bedtools" package and override a couple of fields.

I place the following expressions in a file at $PACKAGE_ROOT/custom/packages/rekado.scm where PACKAGE_ROOT is an arbitrary empty directory.

(define-module (custom packages rekado)
  #:use-module (guix packages)
  #:use-module (guix download)
  #:use-module (guix utils)
  #:use-module (gnu packages)
  #:use-module (gnu packages bioinformatics))

(define-public bedtools-2.23.0
  (package (inherit bedtools)
    (version "2.23.0")
    (name "bedtools")
    (source
     (origin
       (method url-fetch)
       (uri (string-append "https://github.com/arq5x/bedtools2/archive/v"
                           version ".tar.gz"))
       (file-name (string-append name "-" version ".tar.gz"))
       (sha256
        (base32
         "0mk0lg3bl6k7kbn675hinwby3jrb17mml7nms4srikhi3mbamb4x"))))))

I copied the source field from the original definition of the "bedtools" package in (gnu packages bioinformatics). The hash was generated with guix download https://github.com/arq5x/bedtools2/archive/v2.23.0.tar.gz

7.2 Use it with GUIX_PACKAGE_PATH

Now all we need to do is tell Guix where to find our custom module. To do that we set the environment variable GUIX_PACKAGE_PATH to what I called PACKAGE_ROOT earlier, i.e. the directory containing the module directory tree.

Assuming a PACKAGE_ROOT of $HOME/guix-stuff we could do:

export GUIX_PACKAGE_PATH=$HOME/guix-stuff
guixr package -i bedtools@2.23.0

8 Creating custom packages with Guile Scheme

The easiest way to create custom packages with Guile Scheme is to let somebody or something else do the work. Guix comes with importers that can generate package expressions from third-party repositories such as CRAN or bioconductor. See the manual of guix import for more information.

9 Learn more

The Guix project provides a very comprehensive manual, which can be accessed here.

You can also always contact the author (that's me!) via email. Just write to Ricardo.

Author: Ricardo Wurmus

Created: 2016-11-23 Wed 17:12

Emacs 25.1.1 (Org mode 8.2.10)

Validate