Linux KVM + host nftables + guest networking

The difficulties of getting the combination of Linux KVM, host-side modern nftables packet filtering, and guest-side networking to work together without resorting to firewalld on the host are fairly well published; for example, here. The recommended solution usually involves going back to iptables on the host, and sometimes to define libvirt-specific nwfilter rules. While that might be tolerable for dedicated virtualization hosts, it’s less than ideal for systems that also see other uses, especially uses where nftables’ expressive power and relative ease of use is desired.

Fortunately, it can be worked around without giving up on nftables.

I’m assuming that you have already set up a typical basic nftables client-style ruleset on the host, something along the lines of:

#!/usr/bin/nft -f
flush ruleset
table inet filter {
    chain input {
        type filter hook input priority 0; policy drop;
        ct state invalid drop
        ct state established accept
        ct state related accept
        iifname "lo" accept
    }
    chain forward {
        type filter hook forward priority 0; policy drop;
    }
    chain output {
        type filter hook output priority 0; policy accept;
    }
}

Start out by setting the KVM network to start automatically on boot. The network startup will also cause libvirt to create some NAT post-routing tables through iptables, which through the magic of conversion tools get transformed into a corresponding nftables table ip nat. This might cause an error to be displayed initially, but that’s OK for now. Reboot the host, run virsh net-list --all to check that the network is active, and nft list table ip nat to check to make sure that the table and chains were created. It should all look something like:

$ sudo virsh net-list --all
 Name      State    Autostart   Persistent
--------------------------------------------
 default   active   yes         yes

$ sudo nft list table ip nat
table ip nat {
    chain LIBVIRT_PRT {
        ... a few moderately complex masquerading rules ...
    }
    chain POSTROUTING {
        type nat hook postrouting priority srcnat; policy accept;
        counter packets 0 bytes 0 jump LIBVIRT_PRT
    }
}
$

Letting libvirt’s magic and the iptables-to-nftables conversion tools handle the insertion of the routing filters makes it less likely that issues will develop later on due to for example changes in what rules newer versions need. An alternative approach, which works currently for me but might not work for you or in the future, is to manually create a postrouting chain; the nftables magic incantation can be reduced to something similar to:

table ip nat {
    chain postrouting {
        type nat hook postrouting priority 100; policy accept;
        ip saddr 192.168.122.0/24 masquerade
    }
}

(In the above snippet, 192.168.122.0/24 maps to the details from the <ip> node in the output of virsh net-dumpxml <name> for each network listed by virsh net-list earlier.)

You do, however, need to add some rules to the table inet filter to allow incoming and forwarded packets to pass through to and from the physical network interface (eth0 here; substitute as appropriate, ip addr sh will tell you the interface name):

table inet filter {
    chain input {
        # ... add at some appropriate location ...
        iifname "virbr0" accept
    }
    chain forward {
        # ... add at some appropriate location ...
        iifname "virbr0" oifname "eth0" accept
        iifname "eth0" oifname "virbr0" accept
    }
}

The forward chain rules probably aren’t necessary if your forward chain has the default accept policy, but it’s generally better to have a drop or reject policy and only allow the traffic that is actually needed.

The finishing touch is to make sure that sysctl net.ipv4.ip_forward = 1 on the host; without it, IPv4 forwarding won’t work at all.

Unfortunately, as KVM still tries to use iptables to create a NAT table when its network is started, and this can’t be done when a nftables NAT table exists, the table ip nat portion, if manually configured, needs to go into a nftables script that is loaded after the KVM network is started thus replacing the automatically generated chain, whereas most distributions are set up to load the normal nftables rule set quite early during the boot process, likely and hopefully before basic networking is even fully up and running (to close the window of opportunity for traffic to sneak through). The easiest way to deal with this is very likely to just let the iptables compatibility tools handle this for you when the KVM network is started and accept the need for a reboot during the early KVM configuration process. The most likely scenario in which this simple approach won’t work seems to be if you are already using nftables to do other IP forwarding magic as well; in that case, you may need to resort to a split nftables configuration and loading the post-routing NAT ruleset late during the boot process, such as perhaps through /etc/rc.local (which is typically executed very late during boot). If so, then it’s probably worth the trouble to rewrite one or the other in terms of nft add commands instead of a full-on, atomic nft -f script.

With all this in place, KVM guests should now be able to access the outside world over IPv4, NATed through the host, including after a reboot of the host.

A huge tip of the proverbial hat to user regox on the Gentoo forums, who posted what I was able to transform into most of the above.

Using Linux nftables to block traffic outside of a VPN tunnel

For systems that commonly connect to untrusted networks, such as laptops, it can be useful to only allow outgoing traffic through a pre-configured, known-trusted (to the extent that such is a thing) VPN tunnel. This serves to ensure that unprotected traffic isn’t routed through a potentially unknown, potentially adversarial uplink provider.

Fortunately, Linux’s nftables functionality provides everything we need for that.

Usually, nftables is configured in such a way that incoming traffic is filtered, but outgoing traffic is implicitly trusted. Take, for example, Debian 11/Bullseye’s /usr/share/doc/nftables/examples/workstation.nft:

#!/usr/sbin/nft -f

flush ruleset

table inet filter {
	chain input {
		type filter hook input priority 0;

		# accept any localhost traffic
		iif lo accept

		# accept traffic originated from us
		ct state established,related accept

		# activate the following line to accept common local services
		#tcp dport { 22, 80, 443 } ct state new accept

		# accept neighbour discovery otherwise IPv6 connectivity breaks.
		ip6 nexthdr icmpv6 icmpv6 type { nd-neighbor-solicit,  nd-router-advert, nd-neighbor-advert } accept

		# count and drop any other traffic
		counter drop
	}
}

This implicitly creates an output (and forward) chain as well:

table inet filter {
	chain output {
		type filter hook output priority 0;
	}
}

Since this chain doesn’t have any policy, the default policy accept applies. In other words, everything is allowed.

To block the unwanted traffic, we need to identify the traffic that does need to be allowed. There are three kinds of traffic that need to be allowed to flow outside of the VPN tunnel:

  • Traffic for the purpose of bringing the interface up (DHCP, IPv6 neighbor discovery, …)
  • Traffic for the purpose of bringing the VPN tunnel up (DNS)
  • The VPN tunnel itself (Wireguard, OpenVPN, …)

Begin by determining on which interfaces you want to be able to establish an outgoing VPN connection. For some people this will be the wired interface, for some it might be the wireless interface, and for some, it might be both. Running ip addr sh in a terminal is one way to find the actual interface name, which will be needed in a moment. Also open the nftables configuration file (likely /etc/nftables.conf, but check your distribution’s documentation) in a text editor. If you don’t have one yet, you can start out with this, which is Debian’s example stripped of comments but the implicit chains included:

#!/usr/sbin/nft -f

flush ruleset

table inet filter {
	chain input {
		type filter hook input priority 0;
		iif "lo" accept
		ct state established,related accept
		ip6 nexthdr icmpv6 icmpv6 type { nd-neighbor-solicit,  nd-router-advert, nd-neighbor-advert } accept
		counter drop
	}

	chain forward {
		type filter hook forward priority 0;
	}

	chain output {
		type filter hook output priority 0;
	}
}

For our purposes, we will be focused on the output chain, so I will be eliding the other parts of the configuration.

It’s useful to allow traffic that is routed locally on the host, for example for inter-process communication, so immediately after the type stanza, add a rule to allow traffic over the loopback interface (oif is output interface):

oif "lo" accept

Since all interfaces may not have been brought up yet by the time nftables rules are initially loaded, for the next several stanzas use oifname instead of oif. The use of oifname comes at a bit of a performance penalty, but it is more flexible especially with interfaces that aren’t always there.

First, allow DHCP traffic, which uses UDP with source and destination ports both either 67 or 68:

oifname { "en...", "wl..." } udp sport { 67, 68 } udp dport { 67, 68 } accept

Replace the "en...", "wl..." part with the name of the interface(s) in question.

Second, allow DNS traffic for initial name resolution, which uses UDP or TCP with a destination port of 53. If you configure your VPN tunnel with an IP address as a target instead of a DNS name, then you don’t need this.

oifname { "en...", "wl..." } meta l4proto { tcp, udp } th dport 53 accept

As an alternative, you can create two rules, one each for TCP and UDP; doing so will have the same effect, at a slight performance and maintenance penalty:

oifname { "en...", "wl..." } tcp dport 53 accept
oifname { "en...", "wl..." } udp dport 53 accept

Then add rules to allow traffic to the VPN concentrator. The more tightly scoped you can make this, the better. For example, if you know the IP address and the port used, you can add a stanza such as:

oifname { "en...", "wl..." } ip daddr 192.0.2.128 udp dport 29999 accept

If the VPN concentrator runs on either a standard port that is rarely used for other purposes (such as OpenVPN’s default 1194) or an uncommon port (as is often the case with Wireguard) but you don’t know its exact IP address ahead of time, you can either use a set, or elide the IP address specification:

oifname { "en...", "wl..." } ip daddr { 192.0.2.128/28, 198.51.100.0/27 } udp dport 29999 accept

or

oifname { "en...", "wl..." } udp dport 29999 accept

Then allow traffic as needed through the VPN tunnel interface. The exact name of this interface will vary with the VPN technology you’re using; for example, Wireguard tunnels typically allow you to specify the interface name, whereas OpenVPN tunnels use a semi-unpredictable interface name. For this, the ability of oifname to match a prefix by appending * can be useful. For example, for OpenVPN you might use:

oifname "tun*" accept

whereas for a Wireguard tunnel you might end up with:

oifname "wgmyvpn" accept

As a final touch, add a policy to block traffic not matched by other rules. Since all output rules specify on which interfaces traffic is allowed to flow, this blocks traffic outside of the VPN tunnel except for the traffic that is explicitly allowed to flow outside of the VPN tunnel.

The policy typically goes at the top, just below the type stanza, whereas the reject stanza must appear below all other rules.

policy drop;
reject

The purpose of also having a reject stanza is to provide more immediate feedback. In its absence, packets will simply be dropped, resulting in long wait times before attempts time out; with it, clients will be notified immediately that the connection failed and can report this back to the user.

The final output chain might look something like:

chain output {
	type filter hook output priority 0;
	policy drop;

	oif "lo" accept
	oifname { "en...", "wl..." } udp sport { 67, 68 } udp dport { 67, 68 } accept
	oifname { "en...", "wl..." } meta l4proto { tcp, udp } th dport 53 accept
	oifname { "en...", "wl..." } ip daddr { 203.0.113.113, 203.0.113.114 } udp dport 1194 accept
	oifname "tun*" accept
	reject
}

Reload the nftables rule set (sudo nft -f /etc/nftables.conf) and verify that you can connect to the VPN and access the Internet (or the remote network) through it. Disconnect the VPN and verify that traffic is blocked, for example by attempting to reload a web page.

Reboot the computer and verify that the network interface comes up and that you can connect to the VPN, access the Internet through it, and that traffic is again blocked when you disconnect from the VPN.

Keep in mind that this ruleset isn’t perfect. For example, if routes aren’t set up properly when starting the VPN tunnel, traffic can leak through ordinary DNS queries outside of it; and it relies on interface name matching which can match unexpected interfaces. Therefore, this does not serve as a proper “kill switch” for all traffic. However, it does form a decent second (or third) line of defense against unexpected but not actively malicious traffic leaks outside of the VPN tunnel, which for a system that would otherwise allow everything going out is very much an improvement.

Exposing pfSense uplink information to LAN hosts

Sometimes, it’s beneficial to be able to programatically tell from a client which uplink connection is being used by pfSense to route traffic, or simply have access to the current value of some property that maps to each respective uplink. This can be the case if, for example, there is a desire to pause certain network-intense activities running on a client when a metered, data-capped or lower-bandwidth uplink (for example mobile broadband) is in use.

Unfortunately, this information is not readily exposed in any way I have been able to find. However, it also isn’t that difficult to get at.

This post is aimed mainly at simple primary/backup multi-homed configurations, not load-balancing configurations or primary/backup load-balanced configurations. Some adjusting may be required if your multi-homed pfSense configuration includes load-balancing.

On FreeBSD (on which pfSense is based), the way to print the routing table is netstat -r -n. Add an additional either -4 or -6 to print only the IPv4 or IPv6 routing table, respectively; by default, it prints both.

The uplink that at each time is being used by pfSense will typically be the default IP route. The default route, when printing the routing table through netstat -r -n, will have a first field with the value default.

To view the full output through the web interface, use Diagnostics > Command Prompt > Execute Shell Command. Be very careful; a typo or errant whitespace can be critical!

pfSense also includes awk, which is quite handy for filtering table-like text output such as that produced by netstat. We are primarily interested in the “Netif” (network interface) column of the output, for the line where the “Destination” field (the first one) has the value default.

Log in to the administration interface. If you haven’t already installed the Cron package, do so first through System > Package Manager.

Once Cron is installed, go to Services > Cron > Settings, and add a new entry. The command to be executed should be something very similar to:

/usr/bin/netstat -rn4 | /usr/bin/awk '($1 == "default" && $4 == "mvnetaMM") { print "ONE" } ($1 == "default" && $4 == "mvnetaNN") { print "OTHER" }' >/usr/local/www/uplink.local.txt

This will write ONE to /usr/local/www/uplink.local.txt if the default route is through the interface mvnetaMM, and will write OTHER if the default route is through mvnetaNN. The directory /usr/local/www, in turn, is exposed to local clients as / by the built-in administration interface web server.

You can add additional mappings (from physical interface name to an arbitrary value) on the same form if you have additional uplink interfaces. Look at Interfaces > Assignments in the administration web interface to see which physical interface name maps to which mnenomic name, and then from there decide what to expose if the default route is through that interface.

To avoid issues with quoting and encoding, I suggest only using US-ASCII alphanumeric characters in the awk print statements.

Do note that because Cron can only be configured to execute commands at a minute granularity, there will be a slight delay before a change in the default route is reflected in the file that is accessible from clients.

With the cron job in place, make the client request /uplink.local.txt from the firewall (no authentication required!) and take whatever action is desired based on its contents, or the change in its contents. For example, on Linux, you might do:

wget -q -O - --no-check-certificate https://pfsense.home.arpa/uplink.local.txt

or

curl -s --insecure https://pfsense.home.arpa/uplink.local.txt

The --no-check-certificate or --insecure respectively is needed if the respective tool does not trust the TLS certificate for the pfSense host. If your client trusts the certificate, it’s better to remove that part.

Why “How much should I feed my dog?” is the wrong question

I have lost count of how many times I’ve seen people ask some variation of “how much should I feed my dog?”, or “I think / someone said that my dog is overweight / too skinny, how much do you feed yours?”.

It’s generally a well-meaning question from an owner who wants to do the right thing, realizing that a dog who is either underweight or overweight is at much greater risk of a wide variety of ailments.

Usually, the question includes some information on the dog’s breed, age, gender and current weight. In the best of cases, it might even include some mention of the dog’s activity level, how much the dog is currently being fed, size (usually as height at the withers), and some pictures of the dog.

Unfortunately, even then, as asked, it’s also typically impossible to answer objectively in any manner that is actually useful to the individual dog owner.

Every dog is different. Even assuming that there are no underlying medical conditions that contribute to weight loss or obesity, a few of the things that are going to influence how much food a dog needs are:

  • The dog’s metabolism. Like humans, some dogs burn through more energy than others even given the exact same living conditions and activity levels, and this needs to be accounted for.
  • The qualities of the dog’s coat. Especially in the winter, whether or not the dog has a coat adapted for cold weather can make a huge difference in energy requirement, translating to a similar difference in dietary requirements.
  • The dog’s activity level. Unfortunately, a general label like “couch potato at home” or “very active” doesn’t help much at all, because different people have different ideas of what various activity levels mean. A sled dog enthusiast who regularly enters into competitions is going to have a rather different idea of what it means for a dog to be “moderately active”, compared to someone who spends a few hours each weekend walking in a forest.
  • The exact kind of food that the dog is being fed, or that one is considering feeding. This needs to go beyond specifying the manufacturer and, at an absolute minimum, specify the exact variety, because different foods have very different compositions and thus different energy content for the same volume, let alone weight.
  • The dog’s living conditions. A dog who spends a lot of time outdoors in extreme winter weather is going to have rather different needs compared to one who spends most of its time in a heated house, even if everything else about their lives is identical (which it usually won’t be).

And that’s just to begin with.

Consequently, such questions tend to get a huge range of answers, not uncommonly a difference of a factor 2-3 or even more than that even for dogs of the same breed and similar weight.

That doesn’t help anyone, least of which the owner who just wants to make sure they are keeping their dog at a healthy weight!

At best, a person can look at the pictures and try to determine if the dog looks like it is overweight, underweight, or at an appropriate weight and, from there, suggest to either decrease, increase, or maintain its food intake, respectively, or to make the opposite adjustment to the dog’s activity level. But this, too, is fraught with issues; for example, a thick coat can easily obscure the dog’s condition, particularly in a situation where one can’t actually physically feel the dog.

Instead, learn how to tell whether the dog is at an appropriate weight. This varies slightly between breeds because of different body types, but it’s not that difficult, and the general principles transfer well between breeds. Certainly do consider asking for a second opinion from your veterinarian, the dog’s breeder, or even just a local person who is knowledgeable about dogs; or even to show you how. If their opinion about your dog’s weight differs from yours, ask them to explain what they base theirs on. It’s not about the number displayed on the scale; it’s about how the dog is carrying that weight.

Once you know what your particular dog is supposed to look and feel like when it is at an appropriate weight, you can adjust the amount of food and/or the dog’s activity level continuously in order to maintain that appropriate weight; and you won’t need to rely on strangers on the Internet to do it.

Trust me. Long term, if it could, your dog would thank you for it.

Sound “clicks” on Debian 10, 11 Linux with ALSA and PulseAudio

Under some conditions, there can be repeated, clearly audible “clicks” in sound on at least Debian 10 and 11 (Buster and Bullseye) GNU/Linux, accompanied by momentary audio output device switches. Web searches indicate that other distributions (at least Debian derivatives) are affected as well; I have been able to locate cases where Ubuntu and Mint users have both been affected by this type of issue.

I haven’t dug very deeply into exactly why this happens, but it seems to be somehow related to ALSA port availability changes; which is kind of odd when it happens without any changes in what hardware is available.

The fix, however, is actually quite simple. Open /etc/pulse/default.pa in an editor running as root:

$ sudo nano /etc/pulse/default.pa

Locate the line

load-module module-switch-on-port-available

Prepend a # to comment it out:

#load-module module-switch-on-port-available

Save the file and exit the editor (in nano, by pressing Ctrl+O, confirm saving, then Ctrl+X to exit), then under the user account suffering from this problem, stop the running PulseAudio daemon.

$ pulseaudio --kill

A new PulseAudio instance should start as soon as it is needed, reading the new configuration as it does so.

This should resolve the issue.

Turning off fwupdmgr and lvfs automatic updates on Debian 11/Bullseye

Debian Bullseye ships with the Linux Vendor Firmware Service (LVFS) fwupdmgr enabled by default.

There are many situations in which that’s a good thing; firmware is a central part of today’s hardware and software ecosystem, and you generally want to use the latest version available.

However, even though (supposedly; I haven’t yet been in a situation to actually experience this) actual updates need to be triggered manually, there are situations in which you want to reduce polling of external systems – especially when such polling could be used to deduce whether you have a particular piece of hardware or not.

Fortunately, it’s easy to disable the automatic checks in Debian. Simply enough:

sudo systemctl mask fwupd-refresh.timer

(For some reason, it is insufficient to simply disable the timer.)

You can still perform a manual check when appropriate by simply starting the unit that would normally be started by the timer:

sudo systemctl start fwupd-refresh.service

To see the result of the check, look at the unit log:

sudo journalctl --unit=fwupd-refresh.service

Powered by WordPress & Theme by Anders Norén