DDoS protection improvements for GrapheneOS servers

GrapheneOS

Twitter / X: https://twitter.com/GrapheneOS/status/1780240351225118813
Mastodon: https://grapheneos.social/@GrapheneOS/112281501381558085
Bluesky: https://bsky.app/profile/grapheneos.org/post/3kqb2ag7up52e

Due to frequent DDoS attacks, we're enforcing stricter limits on the number of connections to our servers. By default, each server enforces a limit of 16 or 32 TCP connections from each IPv4 address and IPv6 /64 block. During persistent attacks, these limits will be adjusted.

We've determined these limits are high enough to avoid causing issues due to CGNAT. Browsers open a single TCP connection to each domain or server due to HTTP/2 multiplexing. Our focus is tuning it to avoid it triggering for our network/update services (https://grapheneos.org/faq#default-connections).

The naive approach to enforcing TCP connection limits starts with the initial SYN packet. An attacker can leverage this to their advantage with a spoofed SYN packet flood to fill the connection limit tracking tables to bypass them or block all new connections if you fail closed.

Tracking all connections with conntrack is enough to open up a new denial of service attack vector since the conntrack table can be filled by an attacker. For this reason, we were previously making all inbound connections untracked and are still doing that for both UDP and ICMP.

To prevent conntrack table exhaustion, we're using synproxy for SYN packets above a rate limit of 1024/second with 128 burst.

To prevent abusing connections limits or filling the sets enforcing them, we're only counting successfully established connections towards the limits.

Both the official documentation for netfilter (iptables/nftables) on connection limits and every guide we've found are vulnerable to all 3 of the attacks described above. There's info on using synproxy, but not combining it with connection limits or rate limiting it kicking in.

Our firewall configuration is published at https://github.com/GrapheneOS/infrastructure/tree/7782c861cb560c91813ef6d85374830c3526f61a/nftables and provides a reference on how to do this.

There are 4 cases for the connection limits to handle both the non-synproxy and synproxy cases for both SYN packets and the first ACK for newly established connections.
GrapheneOS

Newly established connections (valid first ACK) without synproxy are added to connection limit sets or rejected if above the limit. The connection is marked to bypass the checks going forward. For synproxy, this has to be done for the spoofed SYN packets it sends via loopback.

For web services with HTTP/2 enabled, we're still enforcing a connection limit at the nginx layer since each concurrent HTTP/2 request over the same TCP connection is considered a connection. For other services, we've removed obsolete application layer per-IP connection limits.

Our new approach is superior because it enforces the limits at the firewall layer without needing applications to process and reject the connections. The reason we didn't previously enforce the limits at the firewall layer is because the typical approach opens up new weaknesses.

Implementing connection limits with nftables required coming up with a good approach to avoid spoofed SYN packets counting towards the limits or bypassing the limits by filling the sets. It also required using synproxy to prevent conntrack table exhaustion, but only when needed.

Synproxy uses Linux SipHash-based SYN cookies for stateless establishment of TCP connections, but unlike typical SYN cookies it happens at the firewall layer. On success, it injects an ESTABLISHED state connection into conntrack and spoofs the TCP handshake to backend server.

Linux SYN cookies rely on TCP timestamps to store full options. If timestamps are disabled as Windows does by default, window scaling and SACK are lost. Not having scaling is horrific (only 65535 bytes in transit at a time). Timestamps are useful so it hurts a bit with them too.