By Evan Sangaline | March 8, 2018
Exodus started with one simple goal in mind: to make it as easy as possible for a user to relocate working binaries from one Linux machine to another.
For example, say that your laptop has a more recent version of gzip
than what’s available though your server’s package manager, but that you really want to use a command-line flag that the older version doesn’t support.
exodus gzip | ssh intoli.com
There you go, you can use the newer version on the server now. I think it’s fair to say that that’s pretty darn easy.
On the surface, Exodus might sound very similar in spirit to tools such as AppImage, Snap, FlatPak, Singularity, and even Docker. The difference is that those tools have additional competing goals which fundamentally conflict with the goal of the relocation process being as easy as possible. Don’t get me wrong, I don’t mean this as a criticism of any of these tools in the slightest. My point is just that they serve a different–and complementary–purpose to Exodus (in fact, quickly relocating tools into running Docker containers is one of the main things that we use Exodus for here at Intoli).
These other tools, to varying degrees, really aim to provide solutions on par with distro-specific packaging formats and repositories. This necessitates a lot of configuration during packaging, as well as a mostly manual specification of dependencies. That’s fine if you’re packaging something for distribution, but it’s not so fun if you just want to quickly have access to a program on another machine that isn’t available though the system package manager. Beyond that, these tools generally also require significant configuration, dependencies, and/or daemons on the machines where the packages will run. That’s also not so fun if you want to relocate something quickly and the other machine isn’t already configured to support the packages.
With Exodus, the “easy as possible” part is the goal. It has no ambitions to automatically update your software, to install desktop launchers and icons, or to sandbox applications. There will never be an Exodus daemon, an Exodus Hub package repository, or anything else like that. This simplicity is what allows Exodus to provide a much more approachable and user-friendly interface than other similar tools, but it also means that Exodus is inherently far more limited in what it’s able to package.
The way that the first version of Exodus determined what to package was by iteratively running ldd
to determine all of the shared library dependencies for an executable.
This worked great when it worked, but there are a lot of cases where it simply was not enough.
Programs can use dlopen to programmatically load shared libraries, and Exodus previously had no way to know about these extra dependencies.
Additionally, plenty of programs require other non-library resources ranging from Lua scripts to image files, to pretty much anything.
The primary motivator behind Exodus 2.0 was to address some of these limitations, while simultaneously aiming to preserve ease-of-use and keep the interface as simple as possible. As a result, Exodus can now successfully package far more sophisticated programs than it was able to previously. New features will always add a bit of complexity, but we have tried our best to remain focused on the initial project goals. We believe that, despite being far more powerful and robust, Exodus has remained as easy to use as ever. In the rest of this post, you’ll find a quick tour of some of our favorite new features.
Extra Files Can Be Added to Bundles
A good example of an executable that has a number non-library dependencies is nmap
(thanks to @bmaia for pointing this out).
The Nmap Scripting Engine (NSE) allows users to extend nmap
with custom Lua scripts, but it also ships with a broad range of default scripts which provide a lot of the built-in functionality.
Exodus previously only supported including shared library dependencies, so although a relocated version of nmap
would run, it would not be able to find the Lua scripts that it needed for the NSE.
With Exodus 2.0 it’s now possible to explicitly add files as dependencies in two different ways: you can either pipe a list of files in through stdin, or you can use the new --add
argument.
For example, these two commands are effectively equivalent, and both are able to successfully bundle nmap
with a working NSE and set of default scripts.
# Recursively adds the `/usr/share/nmap` directory to the bundle.
exodus --add /usr/share/nmap nmap
# You could filter the stream with `grep`, or use any other command to list files.
find /usr/share/nmap | exodus nmap
The --add
flag can also be used more than once, or in combination with a piped in list of files.
Included files are deduplicated at extraction time through the use of SHA-256 hashes as filenames in the /opt/Exodus/data/
directory.
This builds upon and extends one of the core ideas behind Exodus: bundles should be installed through simple tarball extraction without any central package database while still attempting to use disk space efficiently.
The inclusion of additional files may seem like a fairly simple and straightforward change, but it actually has many subtle implications.
The most notable relates to the fact that many programs, including nmap
, use a readlink("/proc/self/exe")
system call to determine their own location and then construct the paths to their dependencies relative to that.
If the actual executable were to be located in the data
directory, then any relative paths would be wrong and the programs wouldn’t be able to find their own bundled dependencies.
This necessitated storing actual copies of the linkers within the mirrored directory structure in order to trick the interpreted ELF binaries into thinking that they were located in the correct location.
Along these same lines, a --no-symlink
option was added to allow specifying that certain interpreted scripts or other resources need to actually be stored within the mirrored directory structure instead of merely symlinked from there (more on this later).
System Package Dependencies Can Be Auto-Detected
In order to add an explicit set of package dependencies, you need to first know which files will need to be included.
One way that you could find these is to use your system’s package manager to list all of the files that are included in a package.
On Arch Linux, for example, you could run pacman -Ql nmap
which would list the files in the nmap
package.
Each file is actually prefixed with the package name in the output, as you can see here from a bit of the command output.
nmap /usr/share/nmap/nse_main.lua
nmap /usr/share/nmap/nselib/
nmap /usr/share/nmap/nselib/afp.lua
nmap /usr/share/nmap/nselib/ajp.lua
nmap /usr/share/nmap/nselib/amqp.lua
nmap /usr/share/nmap/nselib/anyconnect.lua
nmap /usr/share/nmap/nselib/asn1.lua
nmap /usr/share/nmap/nselib/base32.lua
nmap /usr/share/nmap/nselib/base64.lua
nmap /usr/share/nmap/nselib/bin.lua
We’ll need to strip off this prefix using sed
, and we’ll also want to exclude parent directories like /usr/
because they would be pulled in recursively.
Applying these two filters to the pacman
output will allow us to create a bundle with every file that’s part of the nmap
package (in addition to all of the shared library dependencies that are discovered automatically).
pacman -Ql nmap | sed 's/^nmap //g' | sed '/\/$/d' | exodus nmap
That’s not that bad, but it’s definitely a little on the verbose side.
Exodus 2.0 includes a bit of syntactic sugar with the newly added -d
/--detect
option that handles detecting package dependencies like this automatically.
You can just run
exodus --detect nmap
and all of the package dependencies will get pulled in for you. This only works on Arch, Debian/Ubuntu, and CentOS/Fedora/RHEL at the moment, but support will be extended to additional distributions in the future.
Strace Can Be Used to Infer Runtime Dependencies
In addition to explicitly specifying files to include, runtime dependencies can now be automatically inferred from the output of strace
.
The primary purpose of this is to detect libraries that are loaded programmatically, making them undiscoverable to the linker.
For example, the game Minetest loads dozens of shared libraries at runtime that can’t be found using the linker (thanks to @Calinou for bringing this up).
By running strace -f minetest
we’re able to see the system calls that Minetest makes, which will include these programmatically-loaded shared libraries.
A snippet of the strace
output can be seen here.
openat(AT_FDCWD, "/usr/lib/libz.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360!\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=92056, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9cfc572000
mmap(NULL, 2187280, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f9cfc19a000
mprotect(0x7f9cfc1b0000, 2093056, PROT_NONE) = 0
mmap(0x7f9cfc3af000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15000) = 0x7f9cfc3af000
close(3)
The game is reading in /usr/lib/libz.so.1
, a library that is required for it to function correctly.
In Exodus 2.0, you can simply pipe the output of strace
into Exodus in order to include all of these runtime dependencies.
To bundle Minetest, you can simply run
strace -f minetest 2>&1 | exodus --add /usr/share/minetest minetest
which will include all of the resources in /usr/share/minetest
in addition to any runtime dependencies that are detected while Minetest is running.
Arbitrary Linkers and Chroots Are Supported
Dependencies had previously been found using the system’s ldd
script in version one, but Exodus 2.0 actually parses the ELF program header of binaries to find the correct linker and invokes it directly.
The --ldd
option has been removed, and every executable included in a bundle will automatically be configured using the proper linker.
One major consequence of this is that non-GNU linkers are now supported, such as the ld-musl
linker from musl libc.
Among other things, this means that binaries can now be bundled from within Alpine Linux containers.
Extracting the proper linker from each executable and invoking it directly also made it possible to simulate a chroot environment during bundling without requiring root access.
A --chroot
argument has been added, which prefixes the linker and all library search paths in such a way that the chroot path is treated as the root.
This argument can be used to bundle binaries from system packages that were extracted in arbitrary directories, even if the bundles won’t run on the host system due to glibc compatibility errors.
For instance, a user could extract a set of RPM files on a Debian system, create a bundle using the --chroot
option, and then install that bundle locally.
The other major consequence of the --chroot
argument is that it has allowed for far more sophisticated and robust testing of bundle creation because linkers no longer need to be present in system directories.
Exodus previously had only one executable that was actually bundled during testing, that number has now grown to six and covers multiple system architectures and libc implementations.
Exodus is “Self-Hosting”
With the changes listed above, it is now possible for Exodus to fully bundle itself.
Well… kind of.
Exodus is written in Python and this means that the entry-point for exodus
is a Python script with a shebang interpreter directive on the first line, like #! /usr/bin/env python
, that specifies which interpreter should be used to run the script.
You unfortunately can’t use relative paths when specifying the interpreter, so there’s no way to specify that the bundled version of Python should be used when running the entry point (without using awk
, perl
, or some other external dependency).
To get around this, a tiny little C program is used to launch the bundle’s version of Python with the appropriate arguments and environment variables set using relative paths.
#include <libgen.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
int main(int argc, char *argv[]) {
char buffer[4096] = { 0 };
if (readlink("/proc/self/exe", buffer, sizeof(buffer) - 25)) {
char *current_directory = dirname(buffer);
int current_directory_length = strlen(current_directory);
char python[4096] = { 0 };
strcpy(python, current_directory);
strcat(python, "/usr/bin/python");
char exodus[4096] = { 0 };
strcpy(exodus, current_directory);
strcat(exodus, "/usr/local/bin/exodus");
char **combined_args = malloc(sizeof(char*) * (argc + 2));
combined_args[0] = python;
combined_args[1] = exodus;
memcpy(combined_args + 2, argv + 1, sizeof(char*) * (argc - 1));
combined_args[argc + 1] = NULL;
char *envp[2];
char pythonpath[4096] = { 0 };
strcpy(pythonpath, "PYTHONPATH=");
strcat(pythonpath, current_directory);
strcat(pythonpath, "/usr/local/lib/python2.7/");
envp[0] = pythonpath;
envp[1] = NULL;
execve(python, combined_args, envp);
}
return 1;
}
This custom entry point, combined with some of the new Exodus 2.0 functionality, allows Exodus to bundle itself with zero external dependencies. The command for the actual build makes heavy use of some of the new features that we’ve highlighted above.
strace -f /usr/bin/python /usr/local/bin/exodus --shell-launchers -q /usr/bin/python -o /dev/null 2>&1 | \
exodus ./exodus --add /usr/local/lib/python2.7/dist-packages/exodus_bundler/ \
--no-symlink /usr/local/lib/python2.7/dist-packages/exodus_bundler/templating.py \
--no-symlink /usr/local/lib/python2.7/dist-packages/exodus_bundler/launchers.py
This command simply pipes a system call trace of Python running Exodus to package Python into Python running Exodus to package a binary called exodus
which invokes Python to run Exodus.
Any questions :-)?
This actually isn’t that complicated if we break it up piece-by-piece.
Let’s look at the strace
command first.
We’re producing a trace of the system calls made while using Python to run Exodus to package Python itself.
This trace will include all of the dynamically loaded resources that Python requires to run Exodus while producing a bundle.
The main purpose of bundling Python rather than any other program here is actually to avoid including unnecessary files in the final bundle.
It also ensures that Exodus will access all of the dynamically linked Python dependencies while producing the bundle, but this should be mostly redundant because these would show up in the strace
output anyway.
We then discard the actual Python bundle, and pipe the output of strace
into a second Exodus command where it will be parsed to extract the necessary dependencies.
The strace
output will actually include almost all of the Exodus package files already, but we’ll use the --add
option to explicitly include the entire package directory as well.
Finally, we specify that two files can’t be symlinked and should instead be placed directly in the mirrored directory tree.
These are templating.py
and launchers.py
, both of which use __file__
to locate relative dependencies in much the same way that C programs might use a readlink("/proc/self/exe")
call.
The output of this command is a 9 MB bundle which contains all of the dependencies necessary to install Exodus.
The bundle can be used to quickly install Exodus in a lightweight Docker container without needing to bring in Python, Pip, and their many dependencies.
You can create a Dockerfile
that starts with
FROM alpine
# Install the exodus bundle.
ADD https://circleci.intoli.com/artifacts/intoli/exodus/exodus-x64.tgz /
RUN tar --strip-components 1 -p -zxf exodus-x64.tgz
and then Exodus will be available at /bin/exodus
.
Conclusion
Well, that was our quick tour of some of the new features introduced in Exodus 2.0. The range of programs that can be successfully relocated with Exodus has grown tremendously, hopefully while preserving the simplicity and ease-of-use that made the project popular to begin with. Be sure to check the project out on GitHub, and to watch or star it to keep updated on future developments. You can find installation instructions, usage examples, and more in the project in README!
Suggested Articles
If you enjoyed this article, then you might also enjoy these related ones.
User-Agents — Generating random user agents using Google Analytics and CircleCI
A free dataset and JavaScript library for generating random user agents that are always current.
Recreating Python's Slice Syntax in JavaScript Using ES6 Proxies
A gentle introduction to JavaScript proxies where we use them to recreate Python's extended slice syntax.
A Slack Community for Developers to Discuss Web Scraping
Intoli is launching a new Slack community called Web Scrapers where developers can chat about web scraping.
Comments