Skip to content

scitags/flowd-go

Repository files navigation

Flowd-go

FlowyGopher

Flowd-go is a network flow and packet marking daemon. It is heavily inspired by scitags/flowd, but instead of Python it's implemented in Go.

Why reimplement something that's already working? Well, because...

  • ... we wanted to try our hand at implementing the flow marking infrastructure leveraging vanilla eBPF instead of BCC.
  • ... Go produces statically compiled binaries which make it much easier to deploy on target machines: we don't need containerisation!
  • ... Go lends itself very well to the model underlying the solution where channel-based concurrency feels natural.
  • ... Go allows for known-to-work concurrency to be implemented making scaling for high load scenarios easily achievable.
  • ... the SciTags effort might find our work useful!

Given the heavy drawing from flowd the original authors have been included in the LICENSE and other documents to make that fact explicit. We apologise in advance for any oversights in this front...

The technical specification we try to adhere to can be found here. The SciTags Organization is the entity behind this effort of tagging network traffic to get better insights into how network flows behave in the search of strategies for optimizing data delivery in data-heavy realms such as that of High Energy Physics (HEP).

Quickstart

The golden rule is that 'if something can be done, then a make target can be leveraged for it'. This basically means that compiling, running, generating the documentation and all those common tasks can be accomplished by simply issuing the appropriate make <target>. To get an updated list of targets simply run:

$ make

This will provide more comprehensive information than we can include here. At any rate, the following lines go a bit more in depth into what's actually going on when compiling and running the code. There's also a section devoted to leveraging the purposefully built Docker containers to develop and test the code!

The code base should be compilable both on Linux and Darwin (i.e. macOS) machines. Bear in mind the eBPF backend won't be available on macOS machines by design as it's a feature of the Linux kernel. In order to support eBPF the following must be installed on a Linux-based machine. We're working on AlmaLinux 9.4, where the following installs all needed dependencies:

# Enable the CRB Repo (check https://wiki.almalinux.org/repos/Extras.html)
$ dnf install epel-release; dnf config-manager --set-enabled crb

# Install libbpf together with headers and the static library (i.e. *.a), llvm, clang and the auxiliary bpftool
$ yum install libbpf-devel libbpf-static clang llvm bpftool

If you want to create the manpage you'll also need to install pandoc, which will convert the Markdown-formatted manpage into a Roff-formatted one:

# On Almalinux you can install pandoc from EPEL
$ yum install pandoc

# On macOS you can install it with Homebrew or an equivalent package manager
$ brew install pandoc

Also, if you want to build an RPM with all the necessary goodies be sure to install these additional dependencies:

$ yum install rpm-build rpm-devel rpmlint rpmdevtools

You can now create the necessary build infrastructure by simply running:

$ rpmdev-setuptree

Be sure to check the RPM Packaging Guide for a wealth of great information.

With all the above out of the way, one can leverage the Makefile with:

$ make build

The above will produce the flowd-go binary which one can run as usual with:

$ ./bin/flowd-go --conf cmd/conf.json --log-level debug run

Please bear in mind that if the eBPF backend is in use the binary should be started with privileges (i.e. by prepending sudo(8)). We are looking into setting the binaries capabilities(7) so that elevated permissions are not needed. Also, one can run make or make help to get a list of available targets together with an explanation on what they achieve.

Also, be sure to run the following to be greeted with a help message showing you what other commands besides run are available. You can also check the Markdown-formatted manpage on rpms/flowd-go.1.md to get a list of available flags and commands along a more detailed description.

Not so quickstart

One can also leverage Docker containers to run flowd-go. However, given we'll be making use of some rather advanced technologies in the sense that they are not for every day use, we'll need to do some convincing so that the containers can actually run as expected. In order to maintain a sane degree of security, Docker containers are started with very few capabilities(7) by default. Things like loading eBPF programs and creating qdiscs require a great deal of privileges which we don't really have by default. The good news is we can just 'ask' for these capabilities, the bad news is that the resulting command is a bit frightening...

Please bear in mind the following has been only tested on Docker Desktop 4.30.0 running on macOS 13.5.1: YMMV!

Docker, docker, docker!

We have added three targets (i.e. docker-{start,shell,stop}) taking care of automating the following discussion away. With this, the workflow boils down to:

# Start the container in the background
$ make docker-start

# Open as many shells as you want in that container
$ make docker-shell

# Stop (and implicitly remove) the container
$ make docker-stop

Bear in mind you can explicitly request one of the other available container flavours by specifying a value for the FLAVOUR variable:

# By default, invoking 'make docker-start' with no other arguments would be the same as running
$ make FLAVOUR=dev

# You can also run the image used for testing in the CI
$ make FLAVOUR=test

# And you can also take the image used for releases on the CI for a spin
$ make FLAVOUR=release

If in doubt, be sure to skim over mk/docker.mk to take a look at what's actually being run with the above targets. For more information on what each image flavour is trying to accomplish please check the What's what? section below.

The following paragraphs explain a bit more in depth what's actually going on behind the scenes in case you'd rather set things up yourself.

What if I despise Makefiles?

Without further ado:

$ docker run -v $(pwd):/root/flowd-go --cap-add SYS_ADMIN --cap-add BPF --cap-add NET_ADMIN -it --rm --name flowd-go ghcr.io/scitags/flowd-go:dev-v2.0 bash

To get an idea of what each option accomplishes be sure to taje a look at mk/docker.mk.

With the above we should be dropped into a working shell where we can just run:

$ cd flowd-go; make build; ./bin/flowd-go --conf cmd/conf.json --log-level debug run

As always, we can open more shells in the same container with:

$ docker exec -it flowd-go bash

Now, if we want to have access to the eBPF program's debug output on a machine running Docker Desktop we need to manually mount the debugfs filesystem (see mount(8)). On Linux-based machines, debugfs should be mounted by default and these next steps should not be necessary. Anyway, we can mount debugfs manually by running the following within the container:

$ mount -t debugfs debugfs /sys/kernel/debug

We can also do the same thing persistently by running:

$ docker volume create --driver local --opt type=debugfs --opt device=debugfs debugfs

Then, we just need to add the following when invoking docker run ... bash to mount this new filesystem:

-v debugfs:/sys/kernel/debug:rw

Please be sure to check this site which contains very valuable info on this topic! All in all, getting Docker to work with eBPF machinery can be a bit of a pain, but the payback is huge!

Configuration

As you see above, we need to provide the path to a JSON-formatted configuration file. We provide a sample on cmd/conf.json which should be suitable for locally running flowd-go to check everything's working as intended. If left unspecified, flowd-go will look for a configuration file at /etc/flowd-go/conf.json. For more information on what can be configured, please refer to the Markdown-formatted manpage on rpms/flowd-go.1.md.

Also, each plugin and backend will have a bit of documentation in their respective directories which is worth a read.

Architecture

Flowd-go follows flowd's architecture in that its core revolves around the idea of plugins and backends. An external user or program can specify flow events through the configured plugins and these events will be propagated to the backends, where each of them will carry out the action they are supposed to do. Please refer to each plugin's or backend's documentation to find out what it is they expect/do.

Within flowd-go, a flow event is represented as a struct as defined on types.go:

type FlowID struct {
    State      FlowState
    Protocol   Protocol
    Src        IPPort
    Dst        IPPort
    Experiment uint32
    Activity   uint32
    StartTs    time.Time
    EndTs      time.Time
    NetLink    string
}

Each of the fields is documented on the source file itself, but the gist of it is that these flowIDs contain the source and destination addresses and ports together with the transport level protocol and the experiment and activity identifiers. Thy can be regarded as a 5-tuple to 2-tuple mapping where we identify datagrams/segments with the 5 first values and then somehow 'mark' that flow with the latter two.

Internally, flowd-go makes heavy use of Go's channels and built-in concurrency constructs to handle the inner workings in the simplest and most elegant way we could think of.

Another key aspect separating flowd-go from flowd is how the eBPF plugin is implemented. In the latter, the eBPF program's source code is embedded into the source code and every time the program starts the eBPF program is compiled on the running machine. This of course implies the machine must have available a full-blown clang and llvm installation to gether with the bcc headers. On the other hand, flowd-go leverages libbpf, a thin C-based library handling the loading of a pre-compiled eBPF program so that it can run on different kernels. This is the basis of the Compile Once Run Everywhere (CO:RE) paradigm. The compilation of the eBPF program is done on a machine including libbpf's headers and a statically-compiled implementation of the library so that there are truly no runtime dependencies: the precompiled eBPF program si also embedded into the binary! For a deeper and thoroughly referenced discussion be sure to refer to the documentation of the eBPF backend.

This eBPF backend has been shown to run on the following distros and kernels. The eBPF program is always compiled on a machine running AlmaLinux 9.4 with the 5.14.0-427.24.1.el9_4.x86_64 Linux kernel release as given by uname(1) and libbpf-2:1.3.0-2.el9:

Distro Kernel Release
AlmaLinux 9.4  5.14.0-427.24.1.el9_4.x86_64
Fedora 35 6.11.3

Please note these machines require no runtime dependencies if libbpf is bundled with flowd-go.

What's what?

This project strives to adhere to Go's best practices in terms of module organisation as outlined here. Thus, it can be a bit overwhelming for people not familiar with Go's ecosystem. The following sheds some light on what-goes-where:

  • .: The main flowd-go module containing the type definitions and other utilities.
  • settings: A separate module handling the parsing of the configuration needed to avoid circular dependencies.
  • cmd: The flowd-go binary itself. It pulls dependencies from all over the repo.
  • backends: The implementations of the available backends. Each of them is an independent Go module.
  • plugins: The implementations of the available plugins. Each of them is an independent Go module.
  • enrichment: Implementation of several Linux interfaces allowing us to gather low-level information on ongoing connections.

Other than that, we also have pother couple of directories with auxiliary files:

  • rpms: This directory contains all the goodies for bundling up RPM packages for distribution, including:

    • flowd-go.1.md: The Markdown-formatted manpage for flowd-go. It's converted into a normal Roff-formatted manpage by pandoc.
    • flowd-go.service: The SystemD Unit file for running flowd-go as a regular SystemD service.
    • flowd-go.spec: The RPM spec file used to build RPMs to make flowd-go easily available on RHEL-like systems.
    • conf.json: A configuration file meant for deployment on real machines. For development the configuration one should use is located on cmd/conf.json.
  • mk: Several auxiliary Makefiles which are included from the main Makefile that provide convenient automations for several interactions we usually carry out with flowd-go when developing it.

  • dockerfiles: The different Dockerfiles we use to build the images used by the project. The current flavours are:

    • dev: A development image based on almalinux/9.4 that includes everything necessary to work on and develop flowd-go locally.
    • test: A lean image based on almalinux/9.4-minimal including the bare minimum needed to build flowd-go and check things are okay.
    • release: A lean image based on the previous one which also adds dependencies needed for RPM packaging.

As usual, you can check all the available images and their versions here.

Adding new backends or plugins

The code has been designed so that adding new plugins and backends is as easy as possible. Leaving configuration aside (which you can learn more about by looking at the implementation of any plugin and/or backend) you just need to provide something that adheres to the appropriate interfaces defined on types.go:

type Backend interface {
	Init() error
	Run(<-chan struct{}, <-chan FlowID)
	Cleanup() error
}

type Plugin interface {
	Init() error
	Run(<-chan struct{}, chan<- FlowID)
	Cleanup() error
}

These are more documented on the source code.

Kudos

The logo is a composition of a couple of images:

These were handled with Inkscape.

Questions or comments?

Feel free to reach me over at [email protected] or open up an issue on the repo. PRs are also welcome!