Guix package structure: build system phases and modify-phases

As we discussed in the previous post Guix has the concept of a build-system where a build receives the source and outputs a package. To configure a build we can provide build arguments and control the build phases. We covered the build arguments last time, so this time we're going to dig into the build phases themselves. We'll look at what they are, how we can modify them and create our own.

This is post 9 in a series covering Guix Packaging - everything from handling source code to contributing packages to Guix.

Standard phases

The GNU build system is used as the base for the majority of the other build systems. It defines the standard phases in guix/build/gnu-build-system.scm in the %standard-phases variable. For the most common phases we can control them through build arguments. Those are:

Phase Argument Explanation
set-SOURCE-DATE-EPOCH   Sets the environment variable which is used by build tools to define a fixed timestamp for reproducible builds.
set-paths   Sets the search path environment variable (e.g PATH) to have all the input packages
install-locale   Sets the locale to en_US.UTF8
unpack   Unpack the source tarball
bootstrap   Runs Autotool's bootstrap script if configure is missing
patch-usr-bin-file   Patch occurrences of /usr/bin/<blah> in all executable configure files
patch-source-shebangs   Patch shebangs in source files so they point to the right Store locations (e.g. #!/gnu/store/... )
configure #:configure-flags Run configure with some default options (e.g. --prefix=/gnu/store)
patch-generated-file-shebangs   Same as patch-source-shebangs but for files generated during configure
build #:make-flags Run make with a list of flags.
check #:test-target #:tests? Run make check (or a target that's specified), unless #:tests? is set to false
install #:make-flags Run make install.
patch-shebangs   Patch shebangs in the final executable files
strip #:strip-binaries? Strip debugging symbols from ELF files (libraries) unless #:strip-binaries? is false
validate-runpath #:validate-runpath? Validate the Runpath of the ELF binaries unless #:validate-runpath? is false
validate-documentation-location   Check documentation will go to share/info and share/man
delete-info-dir-file   Delete any share/info/dir files
patch-dot-desktop-files   Replace references to executables in Desktop files with their absolute Store locations
make-dynamic-linker-cache   Speed up app start-up by creating a linker cache under /etc/ld.so.cache
install-license-files #:license-file-regexp Install the licenses in share/doc
reset-gzip-timestamps   Reset embedded timestamps in gzip files that are part of output
compress-documentation #:compress-documentation? Compress the documentation unless #:compress-documentation? is false

As we discussed in the previous post, a lot of the time the easiest way to impact the phase is to use the appropriate argument. For example, lets say that we're playing with a package and don't want it to run tests temporarily, we can turn them off with the #:tests? argument:

(arguments
  (list
    #:tests? #f

If we can't alter a phase to our requirements through the build argument then we need to change the phase itself.

Modifying phases

Altering the build phases, is required when we want to remove a particular phase because it isn't needed or to add an additional phase to deal with some element of the build. There are also complex packages that use more than one build-system, and the phases are merged together! The ability to add custom phases means that the build can be changed to meet any requirement. A build phase is just a function, which means creating short functions - for a reminder see the post on creating functions using lambda in Functions in Guile and Guix.

The procedure modify-phases is in the module (guix build utils) found in <top-level>/guix/build/utils.scm. The utils module contains lots of useful utility procedures for packaging (worth reading through!).

Modify-phases alters the build-phases of the package. It runs when the package is being built, we define it in the package, but the actual code is evaluated later when the package is built. Consequently, this code is run by the builder during the build process. It's passed to the builder by being part of the arguments and it's specified in the #:phases keyword argument.

How do we tell the Guix daemon to pass this code into the builder without evaluating it? There are two ways to do this, the standard Scheme quote (see quote and quasiquote) or a GUIX specific Gexp. Either the entire argument list is quoted, or just the modify-phases section using a Gexp:

;; quote the arguments list
(arguments
  '(#:tests? #f
    #:phases
        (modify-phases ...

;; use a Gexp - the #~ part
(arguments
    (list
        #:phases
            #~(modify-phase ...

Generally, the project prefers the Gexp approach but many older packages use quotes.

The modify-phases function is provided with the phases that it has to change. This is the %standard-phases that the build-system has provided to the package's build. The GNU build-system's phases are used by many of the other build-systems with some alterations, so in many cases the phases are similar. There are a couple of build-systems that use completely different phases, for example the Lisp ones, so worth remembering if altering one of those packages.

The modify-phases function is called with the phase that we want to delete, replace, add-before or add-after.

The delete argument is the easiest:

(modify-phases %standard-phases (delete <build phase>))

The build phase itself must evaluate to a symbol, for example (delete 'configure).

A good simple example of using modify-phases to delete a phase is the python-pytest-celery package in gnu/packages/python-check.scm. Here's a section of the package definition:

 1 (define-public python-pytest-celery
 2 (package
 3 (name "python-pytest-celery")
 4 (version "0.0.0")
 5 (source
 6   (origin
 7     (method url-fetch)
 8     (uri (pypi-uri "pytest-celery" version))
 9       (sha256
10         (base32 "01pli108qqiiyrn8qsqqabcpazrzj27r7cji9wgglsk76by61l6g"))))
11         (build-system python-build-system)
12         (arguments
13           `(#:tests? #f ; no tests and circular dependency on python-celery
14             #:phases
15             (modify-phases %standard-phases
16               (delete 'sanity-check)))) ; checks for celery

Line 11: the build system python-build-system defines the standard-phases it uses in guix/build/python-build-system. It takes the phases defined by the GNU build-system and makes some changes. One change is to add a phase called sanity-check which runs the sanity-check function.

Line: 15: if we look in the source of the package we find (perhaps unsurprisingly!) that all it's tests are written using python-celery. Due to the way that the sanity-check function works it would try and load python-celery. To avoid a circular dependency the package uses modify-phases on the %standard-phases for this build-system to delete the 'sanity-check phase.

The modify-phases replace, add-before and add-after all work in a similar way where an argument is provided which is the new phase. This is always a procedure, created using lambda:

(modify-phases %standard-phases
    (replace some-phase
      (lambda _
        (do something))))

If the function that's being created with lambda isn't going to use any arguments then the underscore will be provided as the argument - this is a convention used in Lisps, it means "I don't care about the arguments".

In cases where the arguments are going to be used, the most common calling is:

(modify-phases %standard-phases
    (replace add-before some-phase
        (lambda* (#:key tests? inputs outputs #:allow-other-keys)
            (when tests
                (do something)))))

In these cases the key passes in parameters which can be other parts of the package definition. Sometimes this will a package build argument (e.g. tests?) so that depending on the condition something can be done. Often the package's inputs or outputs are provided so that something can be done to them (e.g. delete a particular file that stops the package compiling).

Here's an example from the package go-github-com-adrg-xdg (in gnu/packages/golang-xyz.scm) which uses add-before to add a new phase:

 1  (build-system go-build-system)
 2  (arguments
 3   (list
 4    #:import-path "github.com/adrg/xdg"
 5    #:phases
 6    #~(modify-phases %standard-phases
 7        ;; Tests need HOME to be set: could not create any of the following
 8        ;; paths: /homeless-shelter/.local/data,
 9        ;; /homeless-shelter/.local/data, /usr/share
10        (add-before 'check 'set-home
11          (lambda _
12            (setenv "HOME" "/tmp"))))))

Line 2: notice, that this package uses the newer style definition so the arguments are not quoted instead it uses arguments (list ...) for them. Then to pass the #:phases to the build side line 6 begins with a #~( ... ) Gexp.

Line 6: inside the Gexp the modify-phase procedure is called on the %standard-phases for this package. It's using the go-build-system which defines it's standard-phases in guix/build/go-build-system.

Line 10: a new phase is added called 'set-home before the existing 'check phase.

Line 11: as there's no use of the arguments it's a simple lambda, and then it sets an environment variable of HOME to /tmp so that the build works - easy!

Similarly, lets look at an example of add-after, from python-pytaglib (in gnu/packages/mp3.scm). Here's the key part:

1 (build-system python-build-system)
2 (arguments
3  '(#:phases
4    (modify-phases %standard-phases
5      ;; Ensure that the Cython file is regenerated.
6      (add-after 'unpack 'setup-environment
7        (lambda _
8          (setenv "PYTAGLIB_CYTHONIZE" "1"))))))

This time the build system is the python-build-system, which defines it's build phases in guix/build/python-build-system.scm. It's exactly the same process, we're just using add-after rather than add-before.

For an easy example of using replace the python-zope-copy (in gnu/packages/python-web.scm) is a good one:

1 (build-system python-build-system)
2 (arguments
3  '(#:phases
4    (modify-phases %standard-phases
5      (replace 'check
6        (lambda _
7          (invoke "zope-testrunner" "--test-path=src" "\\[]"))))))

Line 3: it starts by using a quote to pass all the arguments to the build side.

Line 4: modify-phases is called to change the %standard-phases that this package is using. These are provided by the python-build-system.

Line 5: the 'check phase is replaced with the new function that's defined on lines 6-7.

Line 6: As no arguments are provided it's a simple lambda.

Line 7: There's just one line which is to call zope-testrunner using invoke which is a guix utility function. The invoke function is really useful, we can use it to run any command in the environment and we don't have to check for success as it will raise an exception if there's an error.

For something a bit more complicated lets look at photoflare (in gnu/packages/photo.scm):

 1 (build-system gnu-build-system)
 2 (arguments
 3  '(#:tests? #f                      ;no tests
 4    #:phases
 5    (modify-phases %standard-phases
 6      (replace 'configure
 7        (lambda* (#:key inputs outputs #:allow-other-keys)
 8          (let ((magickpp (assoc-ref inputs "graphicsmagick"))
 9                (out (assoc-ref outputs "out")))
10            (invoke "qmake"
11                    (string-append "INCLUDEPATH += " magickpp
12                                   "/include/GraphicsMagick")
13                    (string-append "PREFIX=" out)
14                    "Photoflare.pro")))))))

Line 2: it's quoting all the arguments to pass them into the builder.

Line 5: modify-phases is called on the standard phases of the GNU build system which the photoflare package is using. The next line is that it's replacing the 'configure phase.

Line 7: is a lambda* to create an anonymous function with some parameters passed in as keyword arguments: the package inputs and the package outputs.

Line 8-9: set-up two variables using let.

Line 8: the first is magickpp, where inputs is an association list of packages which is defined in the package definition as (list graphicsmagick libomp qtbase-5). The assoc-ref function is used on association lists to search for something in the list and retrieve the second element. Here it's used to find the package name "graphicsmagick" and to retrieve the second element which is the path in the Store where graphicsmagick is installed. So we're retrieving the path for graphicsmagick basically.

Line 9: the second variable is out which uses assoc-ref again to retrieve the path that the package is being written to. All packages have a default output which is a directory in the GUIX Store which is called out. In some cases a package may have multiple outputs (which we'll explore another day).

Line 10: call to qmake (as this is a QT application) and three parameters are used in lines 11-14. Line 13 is a simple version where string-append is used to add the package out directory to the PREFIX argument. Line 11 changes the INCLUDEPATH by creating a string of magickpp's path and then adding "/include/GraphicMagick" to that.

In summary, this little function calls qmake with the Store's GraphicsMagic added correctly as an include path, writing it to the correct out path.

For a final example we'll look at hash-extender (in gnu/packages/crypto.scm). This brings all the different ways of using modify-phases to alter a package's build process so that it works in Guix - here's an excerpt:

 1   (build-system gnu-build-system)
 2   (arguments
 3    `(#:phases
 4      (modify-phases %standard-phases
 5        (delete 'configure)
 6        (replace 'check
 7          (lambda _
 8            (invoke "./hash_extender_test")))
 9        (replace 'install
10          (lambda* (#:key outputs #:allow-other-keys)
11            (let* ((outdir (assoc-ref outputs "out"))
12                   (bindir (string-append outdir "/bin"))
13                   (docdir (string-append outdir
14                                          "/share/doc/hash-extender-"
15                                          ,version)))
16              (install-file "hash_extender" bindir)
17              (install-file "README.md" docdir)
18              #t))))))

Line 3: starts by quoting the entire arguments list to pass is to the build side. The only argument is a change to #:phases.

Line 4: the modify-phases call, and notice that there are three changes, so we're making multiple changed to the %standard-phases in one call to the function.

Line 5: the first one simply deletes the 'configure stage.

Line 6: the second, replaces the 'check phase with an anonymous lambda function that runs a test program/script ("hash_extender_test").

Lines 10-18: the last alteration (lines 10-18) replaces the 'install phase and is more complicated. This time lambda* is used to provide outputs as a keyword parameter.

Lines 11-14: set-up some variables using let*. We've seen assoc-ref used previously to retrieve the out path (line 11).

Line 12: create a bindir variable by taking the out path and adding "/bin" to it using string-append.

Line 13-15: do the same thing to create a docdir variable: the slightly different part is on line 15 where a quasiquote (,) is used to evaluate the version string (from the package definition) so that the docs are put into a directory that has the version number in it.

Lines 16 and 17 use the utility function install-file to create the bindir and docdir directories and copy files into them.

In summary, our new install build phase copies the binary and docs to the right place

Advanced build arguments

There are a couple of build phases that we didn't discuss in the previous blog post that we can now cover.

Argument Build Phase Summary
#:imported-modules gnu.scm Default value %gnu-build-system-modules. Modules that should be imported from the host.
#:modules gnu.scm Default value %default-modules. Modules that must be imported from the Guile that the build specifies.
#:phases gnu.scm Defaults to '%standard-phases'. The phases of the build.
#:outputs gnu.scm A list of additional outputs for the package.
#:allowed-references #:disallowed-references gnu.scm A list of package names and package references that this package is allowed (or not allowed) to reference.

As we've seen the #:phases keyword argument controls which phases are run. While the #:outputs option is used for packages with multiple outputs. Multiple output packages are quite rare, but they do provide a way to control whether a package installs particular elements, for example whether all the docs are included.

As we know, packages contain references to other packages in the Store. While a package definition specifies the inputs, the build system actually decides the references that are retained during the build. If a package definition has some packages in the inputs lists, but these aren't used then the final package won't record these as references. There are some rare instances where the packager wants to prevent the package from using a reference, this is the purpose of the #:allowed-references and #:disallowed-references lists. A somewhat common use-case for this is to use timezone data (tzdata-for-tests) in a package, but then to specify this is as a disallowed reference as the data was only required for running tests during the package build and isn't needed in the final package.

Imported modules

Sometimes it's necessary to run elements of multiple build-systems, or the manipulate the build using a module that's not normally part of the build-system that's being used. This requires loading additional modules so that the right functions can be used.

The #:modules keyword argument receives a list of modules that should be loaded and used. Technically, this is using the version of Guile that's specified in the users configuration (their default Guile).

The #:imported-modules specifies any Guile modules that must be imported into the Builder using the hosts environment. This is using the version of Guile that is running on the build side.

These are different because the Guile that the build specified may be different than the hosts.

For an example of using #:modules and #:imported-modules lets look at gmsh in (in gnu/packages/maths.scm). Here's an excerpt of the package:

 1 (build-system cmake-build-system)
 2 (arguments
 3     #:imported-modules (,@%cmake-build-system-modules
 4                        (guix build python-build-system))
 5     #:modules (((guix build python-build-system) #:select (site-packages))
 6               (guix build cmake-build-system)
 7               (guix build utils))
 8    #:phases
 9    (modify-phases %standard-phases
10      (add-after 'unpack 'patch-paths
11        (lambda* (#:key inputs outputs #:allow-other-keys)
12          ;; Use the standard Guix site-package path for
13          ;; installation of the Python API.
14          (substitute* "CMakeLists.txt"
15            (("include\\(GNUInstallDirs\\)\n")
16             (string-append "include(GNUInstallDirs)\n"
17                            "  set(GMSH_PY_LIB "
18                            (site-packages inputs outputs) ")\n"))
19            (("\\$\\{GMSH\\_PY\\} DESTINATION \\$\\{GMSH\\_LIB\\}")
20             "${GMSH_PY} DESTINATION ${GMSH_PY_LIB}"))

Line 1: the build system is the cmake-build-system. If you haven't run into Cmake before the main thing that's useful to know is that all the build instructions for an application are placed in a "CmakeLists.txt" file.

Line 5-7: a bit later the package needs to create a phase and manipulate some Python parts, so the build needs elements of the python-build-system. The #:modules keyword argument has a list, and in this there are three modules. The first one is Python ((guix build python-build-system) #:select (site-packages)), the #:select part reduces the import to only the specified function (site-packages). The default for the cmake build-system is that it uses the modules (guix build cmake-build-system) and (guix build utils) - see cmake-build function for the default modules - so the additional modules are just the same.

Line 3: the imported modules for the Cmake build system are put into a variable called %cmake-build-system-modules (see the definition in guix/build-system/cmake.scm). Again for the customised build this package adds the python-build-system so that the Python functions can be used. Rather than manually adding all the cmake build modules, there's an unquote splice (,@) to splice in the %cmake-build-system-modules list: it's just copying in all the cmake build modules and adding the Python one.

Line 8-9: as previously to add a phase to #:phases, the package uses modify-phases with the %standard-phases from the cmake build-system. These are defined in guix/build/cmake-build-system.scm.

Line 10: Adds a new phase after 'unpack called 'patch-paths.

Line 11: The new function uses a #:key to provide the package's inputs and outputs.

Line 14-20: It then uses substitute* to find "GNUInstallDirs" location in the CmakeLists.txt file and adds a "GMSH_PY_LIB" variable. Line 18: it calls the site-packages function, which returns the current outputs Python site-packages. This is then used to create the right path for the "GMSH_PY_LIB".

Line 19-20: a very complex second substitute that search for use of "GMSH_PY DESTINATION ${GMSH_LIB}" and replaces that last part with the newly created "GMSH_PY_LIB".

Overall, the function is using some capabilities from the Python build-system to find the right Python libraries and then adding them to the package's build.

Summary

In the last couple of posts we've covered how Guix packages use a build-system to create the package. As Packagers we can control how the build operates through arguments and altering the build-phases. In this post we've seen how to use modify-phases to delete, replace, add-before and add-after our own phases. Practically, we can alter builds in any way we want at this point, we're limited only by our knowledge of Guile Scheme. Guix has lots of useful functions to help with this. If you'd like to explore packaging more the most useful thing to do is to read a lot of packages definitions and examine how they alter the build process to achieve the final package.

We've covered the ways packages have their inputs specified and how those references are retained. In the next post we'll look at ways that we can wrap a program or scripts binary and how to reduce propagated inputs.

Did this post cover everything needed about Guix build-systems? If you have questions, comments or thoughts on other areas that should be covered please email or leave a comment on futurile@mastodon.social.


Posted in Tech Tuesday 23 July 2024
Tagged with tech ubuntu guix