Tag: Linux Page 1 of 2

Local IP port redirection using Linux nftables

It can occasionally be useful to expose a (TCP or UDP) port on a different port, without passing traffic on to a different host as is usually the case with port forwarding. In effect, changing the destination port of TCP or UDP traffic.

With Linux nftables, this is most easily done in a prerouting chain.

Starting with a typical example nftables configuration:

#!/usr/bin/nft -f
flush ruleset
table inet filter {
    chain input {
        type filter hook input priority 0; policy drop;
        ct state invalid drop
        ct state established accept
        ct state related accept
        iifname "lo" accept
    }
    chain forward {
        type filter hook forward priority 0; policy drop;
    }
    chain output {
        type filter hook output priority 0; policy accept;
    }
}

Add a prerouting chain with the correct priority:

table inet filter {
chain prerouting {
type nat hook prerouting priority dstnat; policy accept;
}
}

Add a rule to this new chain which results in a redirect action:

table inet filter {
    chain prerouting {
        type nat hook prerouting priority dstnat; policy accept;
        tcp dport 80 redirect to :22
    }
}

With the chain in place, to add the redirect rule programmatically, use something like:

# nft add rule inet filter prerouting tcp dport 80 redirect to :22

where inet filter maps to the table specification, and prerouting is the name of the chain to which to add the rule.

With this in place, any packets arriving at TCP port 80 will, in your subsequent input chain (and by the time they reach userspace), have a destination port of 22, thereby exposing your SSH server (listening on port 22) on the standard HTTP port (80) but subject to all normal tcp dport 22 conditions (and none of the tcp dport 80 ones) within the input chain.

In the words of the nft(8) man page:

The redirect statement is a special form of dnat which always translates the destination address to the local host’s one. It comes in handy if one only wants to alter the destination port of incoming traffic on different interfaces.

And yes, it works even with sysctl net.ipv4.ip_forward = 0.

Debian 12 with encrypted /boot, GRUB2 asking for root file system LUKS passphrase

After upgrading from Debian 11 to Debian 12 (as part of which was an upgrade of GRUB 2 from 2.06-3~deb11u5 to 2.06-13) on a system with separately encrypted / and /boot both using LUKS, GRUB began prompting for the LUKS passphrase to unlock the container holding the root file system even though it had no need for it (and in fact booted perfectly fine if I just pressed Enter at that prompt).

The relevant part of the file system layout is:

  • GPT partitioning
    • partition 2
      • LUKS1 container
        • /boot
    • partition 3
      • LUKS2 container
        • /

This setup is based on the description of setting up encrypted /boot with GRUB 2 >=2.02~beta2-29 on Debian (also).

Repeated web searches did not bring up anything relevant, so armed with the LUKS container UUID (from cryptsetup luksDump) I started sleuthing through /boot/grub/grub.cfg to see where it referenced the LUKS container holding /. Surprisingly, I found it near the top, generated through /etc/grub.d/00_header, in a seemingly unrelated place: code intended to load fonts. This was somewhat unexpected because the second prompt actually appeared after a replacement font appeared to already have been loaded.

Looking through /etc/grub.d/00_header and trying to match what I was seeing in grub.cfg against its generation logic, I found that the location of the container UUID within grub.cfg matched a prepare_grub_to_access_device call described in an immediately preceding comment as “Make the font accessible”.

That, in turn, was controlled by $GRUB_FONT.

With this newfound knowledge, I took a stab at /etc/default/grub and noted the commented-out GRUB_TERMINAL=console, described as “Uncomment to disable graphical terminal”.

Well, I’m fine with an 80×25 text menu and the BIOS font for GRUB, so I figured it was worth a try. Creating a new file /etc/grub.d/console-terminal.cfg setting that variable and running update-grub, the generated /boot/grub/grub.cfg no longer referenced that LUKS container; and on rebooting, GRUB again only prompted me for the LUKS passphrase for /boot.

Success!

Extracting TLS certificate fingerprint on the Linux command line

It is sometimes helpful to get the details, not least the fingerprint, of a remote server’s TLS certificate from the command line.

Unfortunately, I’m not aware of any tool which is readily available on the typical Linux system to make this particularly easy.

Fortunately, one can be cobbled together using OpenSSL, which is rather universally available.

The first step is to get the certificate data itself in PEM format:

true | openssl s_client -connect www.example.com:443 -showcerts -no_ign_eof -certform PEM 2>/dev/null

The true | at the beginning simply provides an empty standard input to the OpenSSL connecting process, and I explicitly specify -no_ign_eof to make sure it will exit once there is no more data to be read from standard input (which in this case will be immediately). The 2>/dev/null silences the complete output from the certificate chain validation; you can use -verify_quiet instead, which in the absence of certificate chain problems has almost, but not quite, the same effect.

Note that since openssl s_client is a general-purpose debugging tool, the TCP port number must be specified. For HTTPS web sites, the typical port number is 443. If you are connecting by IP address, you can use -servername something.example.com to set the SNI server name in the TLS session.

Given the certificate data in PEM format from the above command, openssl x509 can be used to display information about the certificate:

openssl x509 -in filename.pem -noout -text -sha256 -fingerprint

where filename.pem contains the output from the previous command. If -in is not specified, then certificate data is read from standard input.

Useful variations are -text to print lots of technical details from the certificate, and -sha256 -fingerprint to print the SHA-256 fingerprint. Including both will cause both to be printed. If for some reason you need the insecure MD5 fingerprint, use -md5 instead of -sha256. Fingerprints are printed in colon-separated hexadecimal notation.

Putting all of this together, we get the following somewhat long command:

true | openssl s_client -connect www.example.com:443 -showcerts -no_ign_eof -certform PEM 2>/dev/null | openssl x509 -noout -sha256 -fingerprint

If you want to introduce something like torsocks to this, it should generally go with the openssl s_client command, as that is the part that is actually making the outbound network connection:

true | torsocks openssl s_client -connect www.example.com:443 -showcerts -no_ign_eof -certform PEM 2>/dev/null | openssl x509 -noout -sha256 -fingerprint

Both of these will, if successful, print the SHA-256 fingerprint of the TLS certificate received from the server. Currently, this results in this single line of output:

sha256 Fingerprint=5E:F2:F2:14:26:0A:B8:F5:8E:55:EE:A4:2E:4A:C0:4B:0F:17:18:07:D8:D1:18:5F:DD:D6:74:70:E9:AB:60:96

And there it is!

Disable clipboard sharing (clipboard integration) with QEMU/KVM and SPICE

For some reason that eludes me, sharing of the clipboard between the KVM host and the guest (sometimes referred to as clipboard integration) is on by default with QEMU/KVM and SPICE graphics.

To disable such clipboard sharing:

  1. Edit the guest definition XML; virsh edit $GUEST and locate domain/devices/graphics, or through virt-manager’s View > Details > Display Spice > XML
  2. Add or update an element <clipboard copypaste="no"/> under the <graphics> node
  3. Save/apply the change
  4. Fully shut down the VM (if running)

If starting the VM from the command line, another option is to try adding -spice disable-copy-paste to the qemu-system-* command line. (See here.)

Clipboard sharing between the host and the guest will be disabled the next time the VM is powered on.

Linux gocryptfs on rclone mount giving “operation not permitted” errors

Trying to run a gocryptfs encrypted file system mount within a rclone mount remote file system itself accessing the remote host over SFTP, I ran into an annoying issue that writes would fail with a “Operation not permitted” error, but the file in question would appear within the gocryptfs file system (so clearly something was working).

Slothing through the output of running rclone mount with -vv but without --daemon to try to find clues as to what was actually going on, I came across these two lines:

yyyy/mm/dd hh:mm:ss ERROR : <encrypted filename>: WriteFileHandle: ReadAt: Can't read and write to file without --vfs-cache-mode >= minimal

yyyy/mm/dd hh:mm:ss DEBUG : &{<encrypted filename> (w)}: >Read: read=0, err=operation not permitted

Well, there’s the error being returned, and the cause.

Per the documentation, rclone mount --vfs-cache-mode takes one of off, minimal, writes, full; and the default, lo and behold, is off, which is clearly less than minimal.

Adding a --vfs-cache-mode minimal to the end of the rclone mount command seems to have fixed it, insofar as the error is gone and writes appear to go through fully as intended.

Using per-account SSH key files with OpenSSH

OpenSSH is a SSH server and client, in current incarnations originating on OpenBSD but used on many Unix-like operating systems, including being a common choice of SSH server and client alike on many Linux systems.

Unfortunately, it (and perhaps other SSH clients as well) in the default configuration and typical use has a somewhat nasty information leak when used with key pair authentication.

This is because of the interaction between three things.

First, a SSH client will, during authentication, offer a series of keys to the server, effectively asking for each “will you let me authenticate as this user using this key?”.

Second, the initial exchange that offers each key in turn contains enough information that the key can, effectively, be uniquely identified. It must for the server to reach a meaningful answer.

Third, the OpenSSH SSH client will, by default, try every key that it knows about to find one that the server is willing to accept for a connection attempt.

All of this would already be bad enough from a potential information leak perspective, but in isolation, it still largely only allows a rogue server to learn what keys exist on the connecting system and user account while the user is actively connecting to it, but nothing more about them or what other context those keys exist within. Not great, but not horrible.

However, additional information exists. For example, as noted by Andrew Ayer, GitHub actually publishes each user’s authorized SSH keys. This in itself isn’t a huge problem either; only the public keys are published, so as long as the keys are secure enough, there’s no real risk of compromise of a person’s GitHub access.

Put all of this together, though, and it becomes quite possible for a SSH server to derive the GitHub username of a connecting user, if that person uses OpenSSH with its defaults.

All of a sudden, a SSH server can potentially deanonymize a connecting person by, with a rather high degree of certainty, associating the connecting user with a GitHub user account.

Similarly, if multiple services publish keys in this manner, it’s fairly easy to collate them together and look for matches. If the same key is authorized for more than one account, or for accounts with more than one service, there is a rather high probability that those accounts belong to the same person, even if there is nothing else to suggest this.

A necessary first step to protect against this information leak is to use different key pairs for each such service. ssh-keygen has -f to specify the base file name to which to save the newly generated key pair; ssh has -i to specify the identity file to use; and ssh_config (usually ~/.ssh/config and /etc/ssh/ssh_config) has the IdentityFile directive. However, this doesn’t necessarily prevent the SSH client from presenting other known keys during connection key exchange.

To prevent the latter, use the IdentitiesOnly yes directive in ssh_config. This causes the SSH client to only present any explicitly configured identities during public key authentication, protecting against the server you are connecting to learning more about what keys you have on the system you are connecting from than you intended.

Unfortunately, setting these on a per-host basis in the SSH client configuration quickly gets tiresome if you have multiple accounts, and is error-prone.

Thankfully, OpenSSH offers macro expansion in the IdentityFile value based on information about, among other things, your local user account and the connection you are making. (See the ssh_config(5) manual page for a full list and description of the macro expansion tokens.) This is especially useful in conjunction with wildcard Host stanzas to provide a set of defaults.

Putting all this together you can, for example, put at the bottom of your ssh_config something like

Host *
IdentitiesOnly yes
IdentityFile %d/.ssh/keys/%h/%r/current
PasswordAuthentication yes
PubkeyAuthentication yes
PreferredAuthentications publickey,password
User nobody

and together with it (I prefer above, with the Host * providing the defaults), a Host stanza to simply set the correct username

Host ssh.example.com
User myself

With this in place, when you connect to ssh.example.com, OpenSSH will offer only the key pair in ~/.ssh/keys/ssh.example.com/myself/current for authentication. (%d expands to the path to your local home directory; %h expands to the name of the host you are connecting to; and %r expands to the remote username.)

To then add keys for a new account, use something like

$ mkdir -p ~/.ssh/keys/sftp.example.net/u1234567
$ ssh-keygen -f ~/.ssh/keys/sftp.example.net/u1234567/current

and either specify the username when connecting (for example, sftp u1234567@sftp.example.net ...), or add another Host stanza to your ssh_config specifying the username

Host sftp.example.net
User u1234567

If you don’t do either, the OpenSSH client will try to read the key pair from ~/.ssh/keys/sftp.example.net/nobody/current (because of the Host * stanza’s User nobody), find nothing at that file location, and not offer any key pair at all for authentication to the server. In the example case above, it will then fall back to password authentication. Since nobody likely doesn’t have a valid password, this effectively blocks the login attempt in a non-destructive manner while leaking minimal information either over the network or to the remote server.

Setting this in your ssh_config as defaults like this also neatly fits into many tools’ SSH integration, where it can be tricky to pass additional parameters, especially if those are dependent on for example where you are connecting to.

With this in place, you can still use the same key pair for more than one account by putting the actual key pair files in some location and symlinking from the location expanded to based on the IdentityFile directive. However, instead of the same key pair being used by default for every account everywhere unless you take special care to use separate key pairs for each account, using the same key pair for multiple accounts now becomes the active, rather than passive and by default, choice.

It also becomes much easier to rotate a key pair if you ever have reason to, because with this in place, you don’t need to stop to consider where it’s used; where it’s stored locally tells you the one remote account for which it’s being used.

Someone could still look at your authorized SSH keys on GitHub, but now it’s very little more than an anonymous blob of encoded public key data that can’t be matched against any other keys that they might encounter.

Preventing DNS leaks with Linux NetworkManager VPN connections

A good software design principle is that of least surprise. Software should do what one can reasonably expect it to do in response to user actions and any configuration that has been made.

Another good design principle is to fail safely (or securely). If for some reason a program cannot perform a requested action, it should put itself, or the system, into a known-safe state. That state probably won’t be that which the user was seeking to achieve, but it should be one that does not cause the user to unexpectedly do anything dangerous or which might endanger the system further.

When used with VPN connections (both OpenVPN and Wireguard), the Linux NetworkManager tool unfortunately comes up horribly short in both areas.

In short: absent special configuration, DNS queries (and responses) can quite easily leak outside of the VPN tunnel to the DNS resolver provided by whatever network you are on. These DNS queries can allow whoever operates that DNS resolver to see the host names you are connecting to.

If you are on a trusted network, such as at home, while surprising, this is probably not a major issue.

If you are on an untrusted network, such as a public network at a café, an airport, or a hotel, where you might want to use a general-purpose VPN for traffic confidentiality, this can be a much larger issue.

By default, when connecting to a VPN, NetworkManager will combine the DNS servers provided by that network (either through fixed configuration or obtained dynamically via DHCP) with whatever resolver configuration existed previously, likely obtained when connecting to the network you are already on and connecting to the VPN through.

Consequently, if an attacker can disrupt the VPN traffic at the right moment, they can cause DNS requests to be made to the lower-priority DNS resolver: that on the local network, outside the VPN.

Additionally, if you mistype a host name, then it is possible that the search suffix can lead to a slight loss of anonymity against the operator of the DNS resolver that happens to be used.

Worse, there is no way to configure this behavior through the NetworkManager GUI.

Fortunately, it’s easy to configure through nmcli. Open a terminal window, and check the current settings:

$ nmcli con show "vpn connection name" | grep ipv.\.dns-priority

This will most likely show two lines of output, similar to:

ipv4.dns-priority:     50
ipv6.dns-priority:     50

(The exact value for both of these can vary.)

The actual encoding of the priority value is a little peculiar. In particular:

  • Lower values are considered higher priority
  • Negative values exclude configurations with higher values
  • DNS configurations obtained through networks with the same priority value are combined

The value itself is a signed 32-bit integer, so the valid range is -2147483647 through +2147483647.

To force only the DNS servers obtained through this particular connection to be used when this connection is active, use nmcli to modify the connection to set both of these to the largest negative value possible: -2147483647. This ensures that no more negative priority value can exist on a different connection, causing the DNS configuration for this particular connection to always have priority.

$ nmcli con modify "vpn connection name" +ipv4.dns-priority -2147483647
$ nmcli con modify "vpn connection name" +ipv6.dns-priority -2147483647

For OpenVPN connections, after modifying the connection in this manner, you will need to provide the VPN user’s password (not your local login password) when connecting to it the next time.

Note that if you have multiple connections with the same value for the respective dns-priority properties, and connect to those networks simultaneously, the configurations are combined. Therefore, you do not want to set this on any potentially untrusted network that you might be connected to at the same time as the VPN connection.

Having made the configuration change, connect and disconnect the VPN connection repeatedly and observe the effect on the system name resolver configuration in /etc/resolv.conf:

$ watch cat /etc/resolv.conf

If everything is working as intended, you will see the set of search and nameserver directives being replaced as you connect and disconnect the VPN connection, instead of amended.

Server-side port knocking with Linux nftables

Port knocking is a technique to selectively allow for connections by sending a semi-secret sequence of packets, often called a “knocking” sequence.

While port knocking can very easily cut down on the amount of noise seen in logs, it’s important to keep in mind that it does not provide any significant level of security against a well-positioned adversary, as the knocking is done in the clear. The service that is hidden behind the port knocking still needs to be able to deal with being accessible from the outside network.

It used to be fairly complex on Linux to implement port knocking without relying on dedicated software to listen for the knocking packets and modify the firewall rules accordingly (which required that the listening software ran as root, which is usually something to be avoided if at all possible). Thankfully, with nftables, it’s relatively straight-forward to implement port knocking without ever leaving the firewall configuration, by using nftables’ set support.

The idea is to maintain two lists (sets) for a particular service: one of currently knocking clients, and one of clients that have successfully completed the knocking sequence.

In nftables syntax, it boils down to something similar to:

table inet filter {
  define ssh_knock_1 = 10000
  define ssh_knock_2 = 2345
  define ssh_knock_3 = 3456
  define ssh_knock_4 = 1234
  define ssh_service_port = 22

  set ssh_progress_ipv4 {
    type ipv4_addr . inet_service;
    flags timeout;
  }
  set ssh_clients_ipv4 {
    type ipv4_addr;
    flags timeout;
  }

  chain input {
    type filter hook input priority 0; policy drop;

    # ... other rules as needed ... #

    tcp dport $ssh_knock_1 update @ssh_progress_ipv4 { ip saddr . $ssh_knock_2 timeout 5s } drop
    tcp dport $ssh_knock_2 ip saddr . tcp dport @ssh_progress_ipv4 update @ssh_progress_ipv4 { ip saddr . $ssh_knock_3 timeout 5s } drop
    tcp dport $ssh_knock_3 ip saddr . tcp dport @ssh_progress_ipv4 update @ssh_progress_ipv4 { ip saddr . $ssh_knock_4 timeout 5s } drop
    tcp dport $ssh_knock_4 ip saddr . tcp dport @ssh_progress_ipv4 update @ssh_clients_ipv4 { ip saddr timeout 10s } drop

    ip saddr @ssh_clients_ipv4 tcp dport $ssh_service_port ct state new accept

    # ... other rules as needed ... #
  }
}

This works by, each time a port knock TCP connection attempt is received:

  • check that this particular knock is in @ssh_progress_ipv4 (with the exception of the first knock in the sequence)
  • for all but the last knock in the sequence, store the next expected knock in @ssh_progress_ipv4, and drop the knock packet (so to an outside observer, it looks no different from any other port)
  • for the last knock in the sequence, store the connecting IP address in @ssh_clients_ipv4, and drop the knock packet
  • when the actual connection attempt arrives, accept the connection only if the connecting IP address exists in @ssh_clients_ipv4

The knocking status is stored with a brief timeout (in the example above: 5 seconds during knocking, and 10 seconds on successful completion), ensuring that any lingering knockers are evicted promptly from the status sets.

The above example is for a four-port knocking sequence, but it could easily be both shorter and longer. The only state transition stanzas that are special is the very first and the last before the final decision stanza.

To also support port knocking and connections over IPv6, duplicate the two state sets (but use ipv6_addr for those instead of ipv4_addr), and duplicate the respective state transition and final decision stanzas (but use ip6 instead of ip).

Client-side TCP port knocking in Powershell, *nix

Port knocking is a technique to selectively allow for connections by sending a semi-secret sequence of packets, often called a “knocking” sequence.

On a *nix system, nc (netcat) is useful for port knocking. Individual knocks can be sent by nc -w 1 -z host port; this will send a TCP connection attempt to the specified host and port, with a timeout of 1 second, and without sending any data.

To use nc to send a TCP knock sequence of ports 10000, 2345, 3456, 1234 to 192.0.2.234, you might do something like

nc -w 1 -z 192.0.2.234 10000
nc -w 1 -z 192.0.2.234 2345
nc -w 1 -z 192.0.2.234 3456
nc -w 1 -z 192.0.2.234 1234

Doing the same thing in Microsoft’s Powershell is rather more verbose:

Start-Job -ScriptBlock {Test-NetConnection -ComputerName "192.0.2.234" -Port 10000 -InformationLevel Quiet} | Wait-Job -Timeout 1

Start-Job -ScriptBlock {Test-NetConnection -ComputerName "192.0.2.234" -Port 2345 -InformationLevel Quiet} | Wait-Job -Timeout 1

Start-Job -ScriptBlock {Test-NetConnection -ComputerName "192.0.2.234" -Port 3456 -InformationLevel Quiet} | Wait-Job -Timeout 1

Start-Job -ScriptBlock {Test-NetConnection -ComputerName "192.0.2.234" -Port 1234 -InformationLevel Quiet} | Wait-Job -Timeout 1

A simple bash shell script to perform port knocking and then connect and hand a connected pipe to the calling process might look something like:

#!/bin/bash
host=$1
port=$2
nport=$3
while test -n "$nport"
do
  nc -w 1 -z $host $port
  shift
  port=$2
  nport=$3
done
test "$port" != "0" && exec nc $host $port

The above script takes the host name or IP address of the remote host as the first parameter, followed by a series of TCP port numbers; the last port number is the final connection port. This can be used for example with OpenSSH’s ProxyCommand directive:

$ cat .ssh/config
Host 192.0.2.234
  ProxyCommand ~/.local/bin/portknock-connect %h 10000 2345 3456 1234 %p
$

Linux KVM + host nftables + guest networking

The difficulties of getting the combination of Linux KVM, host-side modern nftables packet filtering, and guest-side networking to work together without resorting to firewalld on the host are fairly well published; for example, here. The recommended solution usually involves going back to iptables on the host, and sometimes to define libvirt-specific nwfilter rules. While that might be tolerable for dedicated virtualization hosts, it’s less than ideal for systems that also see other uses, especially uses where nftables’ expressive power and relative ease of use is desired.

Fortunately, it can be worked around without giving up on nftables.

I’m assuming that you have already set up a typical basic nftables client-style ruleset on the host, something along the lines of:

#!/usr/bin/nft -f
flush ruleset
table inet filter {
    chain input {
        type filter hook input priority 0; policy drop;
        ct state invalid drop
        ct state established accept
        ct state related accept
        iifname "lo" accept
    }
    chain forward {
        type filter hook forward priority 0; policy drop;
    }
    chain output {
        type filter hook output priority 0; policy accept;
    }
}

Start out by setting the KVM network to start automatically on boot. The network startup will also cause libvirt to create some NAT post-routing tables through iptables, which through the magic of conversion tools get transformed into a corresponding nftables table ip nat. This might cause an error to be displayed initially, but that’s OK for now. Reboot the host, run virsh net-list --all to check that the network is active, and nft list table ip nat to check to make sure that the table and chains were created. It should all look something like:

$ sudo virsh net-list --all
 Name      State    Autostart   Persistent
--------------------------------------------
 default   active   yes         yes

$ sudo nft list table ip nat
table ip nat {
    chain LIBVIRT_PRT {
        ... a few moderately complex masquerading rules ...
    }
    chain POSTROUTING {
        type nat hook postrouting priority srcnat; policy accept;
        counter packets 0 bytes 0 jump LIBVIRT_PRT
    }
}
$

Letting libvirt’s magic and the iptables-to-nftables conversion tools handle the insertion of the routing filters makes it less likely that issues will develop later on due to for example changes in what rules newer versions need. An alternative approach, which works currently for me but might not work for you or in the future, is to manually create a postrouting chain; the nftables magic incantation can be reduced to something similar to:

table ip nat {
    chain postrouting {
        type nat hook postrouting priority 100; policy accept;
        ip saddr 192.168.122.0/24 masquerade
    }
}

(In the above snippet, 192.168.122.0/24 maps to the details from the <ip> node in the output of virsh net-dumpxml <name> for each network listed by virsh net-list earlier.)

You do, however, need to add some rules to the table inet filter to allow incoming and forwarded packets to pass through to and from the physical network interface (eth0 here; substitute as appropriate, ip addr sh will tell you the interface name):

table inet filter {
    chain input {
        # ... add at some appropriate location ...
        iifname "virbr0" accept
    }
    chain forward {
        # ... add at some appropriate location ...
        iifname "virbr0" oifname "eth0" accept
        iifname "eth0" oifname "virbr0" accept
    }
}

The forward chain rules probably aren’t necessary if your forward chain has the default accept policy, but it’s generally better to have a drop or reject policy and only allow the traffic that is actually needed.

The finishing touch is to make sure that sysctl net.ipv4.ip_forward = 1 on the host; without it, IPv4 forwarding won’t work at all.

Unfortunately, as KVM still tries to use iptables to create a NAT table when its network is started, and this can’t be done when a nftables NAT table exists, the table ip nat portion, if manually configured, needs to go into a nftables script that is loaded after the KVM network is started thus replacing the automatically generated chain, whereas most distributions are set up to load the normal nftables rule set quite early during the boot process, likely and hopefully before basic networking is even fully up and running (to close the window of opportunity for traffic to sneak through). The easiest way to deal with this is very likely to just let the iptables compatibility tools handle this for you when the KVM network is started and accept the need for a reboot during the early KVM configuration process. The most likely scenario in which this simple approach won’t work seems to be if you are already using nftables to do other IP forwarding magic as well; in that case, you may need to resort to a split nftables configuration and loading the post-routing NAT ruleset late during the boot process, such as perhaps through /etc/rc.local (which is typically executed very late during boot). If so, then it’s probably worth the trouble to rewrite one or the other in terms of nft add commands instead of a full-on, atomic nft -f script.

With all this in place, KVM guests should now be able to access the outside world over IPv4, NATed through the host, including after a reboot of the host.

A huge tip of the proverbial hat to user regox on the Gentoo forums, who posted what I was able to transform into most of the above.

Page 1 of 2

Powered by WordPress & Theme by Anders Norén