Ideas for firewalling

Jan Engelhardt

revision 2, January 2007

 

“Be conservative in what you accept and be liberal
in what you do.”
-RFC 793[1] turned around

Table of Contents

1. Kernel policies
2. Basic filtering
2.1. Traceroute filtering
3. TCP Stealth Scan detection
4. TCP SYN Scan detection
5. TCP Connect Scan detection
6. TCP Grab Scan detection
7. portscan match
8. CHAOS target
8.1. Tuning CHAOS
9. DELUDE target
10. Notes

1   Kernel policies

Other parts in the Linux kernel besides Netfilter may also have switches to control packet flow. The routing layer has some of these, which can be changed in /proc/sys/net/ipv4/conf/*/. The important ones are "accept_redirects", "accept_source_route", "rp_filter", and, secondarily, "send_redirects". The first two define whether the routing code should consider these kinds of ICMP messages (Redirect, Router Solicitation) for its execution flow. "rp_filter" checks if the packet can legitimately come from the interface it was received on (see [2]).

2   Basic filtering

At first, many common ICMP annoyances should be filtered out, such as ICMP Redirect, ICMP Router Advertisement, ICMP Router Solicitation. Especially Windows 98 likes to send them out. A few packets related to routing can also be filtered out at the routing level, which happens to come before Netfilter code[3]. Not all ICMP types are handled by te routing code, and a double check should hopefully not be too costly. ICMP packets are best dropped since replying to them may reveal that we are alive. This drop should be done for every interface (i.e. no -i or -o option), in the INPUT and FORWARD chains. The Linux kernel itself does not normally output these so it is should be safe to not have such rules in the OUTPUT chain. In fact, the OUTPUT chain should stay clear of this so that you can possibly do network hardening testing.

RFC 792[4] divides ICMP packet types hierarchially into so-called types and codes. iptables may either match all codes of a specific type or one code of one specific type, or any type.

-N icmp_drop;
-A icmp_drop -p icmp --icmp-type redirect -j DROP;
-A icmp_drop -p icmp --icmp-type router-advertisement -j DROP;
-A icmp_drop -p icmp --icmp-type router-solicitation -j DROP;
-A INPUT -j icmp_drop;
-A FORWARD -j icmp_drop;

The redirect type includes four codes, hence blocks both network-redirect and host-redirect, plus the two TOS variants thereof. router-advertisement and router-soliciation are types without any subcodes.

2.1   Traceroute filtering

A remote host can send an ICMP Echo packet with the "IP Traceroute"[5] flag. This cannot realiably be blocked, since the ipv4options module does not have code to look at it. However, we can block the return packets passing the machine using:

-A icmp_drop -p icmp --icmp-type 30 -j DROP;

Type 30 must be specified numerically (as shown above), since iptables does not know the names for such extensions that do not seem to be widely deployed. Linux does not seem to support IP Traceroute as of this writing, so it is not necessary to block it in the OUTPUT chain.

Filtering UDP Traceroute and TCP Traceroute is quite impossible without deranging other traffic, since a TTL of zero -- which is what commonly happens during traceroute -- is legitimate.

3   TCP Stealth Scan detection

In the ideal case, a user should make a connection, do what needs to be done, and close it. Standardized TCP connections begin with a SYN packet[6]. Everything else can be considered anomalous. To match a scan, the inner workings of the scan program are needed. Sometimes, it also suffices to see what traffic is coming in, because that is what can be matched. As said, normal TCP connections always begin with a SYN, anything else must be forged, and therefore is easy to match. The following rule will match most anomalies, including TCP NULL, TCP FIN, TCP XMAS, and possibly other strange combinations:

-p tcp ! --syn -m conntrack --ctstate INVALID

It does not match TCP ACK scans, because a "spurious" ACK may very well be part of an already-existing connection where our machine just does not know about its state (e.g. after a reboot or conntrack flush).

An extra rule is required to be able to continue to receive "Connection refused" message ourselves -- e.g. if you run `telnet somehost someClosedPort` -- the returned RST and/or RST-ACK packets are not associated with any connection in Netfilter. Hence, we need an exclusion rule to the above:

-p tcp --tcp-flags SYN,FIN,RST RST

According to RFC 793 page 65[7], an RST will not be replied to, hence no information leak will occur as a result of accepting it. (RST-ACK is required as a reponse to SYN to a closed port, see RFC 793 page 37[8] paragraph 3.)

You can incorporate this into your ruleset as follows. A user-defined chain is handy, but you can have it any way:

-N tcp_inval;
-A tcp_inval -p tcp --tcp-flags SYN,FIN,RST,ACK RST,ACK -j RETURN;
-A tcp_inval -j LOG --log-prefix "[STEALTH] ";
-A tcp_inval -j CHAOS;
-A INPUT -p tcp ! --syn -m conntrack --ctstate INVALID -j tcp_inval;

This allows the use of more targets, such as LOG (shown here), without repeatedly matching non-SYN, and is the preferred way. This ruleset can also be used in the FORWARD chain without fear to kill already-running connections. Active connections that have not yet been seen by Netfilter will become NEW and ESTABLISHED after the next two packets, respectively, without making the connection INVALID.

The CHAOS target is discussed in section 8.

4   TCP SYN Scan detection

A SYN scan half-opens a TCP connection and terminates the handshake in the middle. In other words, if a SYN is received, we send the obligatory SYN-ACK and then the scanner immediately sends an RST. (The exact implementation may differ from scanner to scanner and OS. For example, nmap sends a SYN using a raw socket, but when the Linux kernel receives the SYN-ACK it does not know anything about the connection and responds with RST.) Using a state machine (automaton) inside iptables, it is easy to match the third packet:


[
svg] [dot]

When connecting to localhost, special attention needs to be given since we receive our own packets. When the socket is open, the server side sends its SYN-ACK. Under normal circumstances, this packet is only seen in the OUTPUT chain and hence is not of relevance for the state graph, which is modeled upon incoming packets. However, when the loopback interface is involved, we will see our own SYN-ACK packet again in the INPUT chain, so it is to be ignored. The ruleset can modeled as follows with iptables:

SYN=$[0x401];
CLOSED=$[0x402];
SYNSCAN=$[0x403];
ESTAB=$[0x404];

-N mark_closed;
-A mark_closed -j CONNMARK --set-mark $CLOSED;

-N mark_estab;
-A mark_estab -j CONNMARK --set-mark $ESTAB;

-N tcp_new1;
-A tcp_new1 -i lo -p tcp --tcp-flags ALL SYN,ACK -j RETURN;
-A tcp_new1 -i lo -p tcp --tcp-flags ALL RST,ACK -g mark_closed;
-A tcp_new1 -p tcp --tcp-flags ALL ACK -g mark_estab;
-A tcp_new1 -j CONNMARK --set-mark $SYNSCAN;

-A INPUT -m connmark --mark $SYN -j tcp_new1;
-A INPUT -p tcp --syn -m conntrack --ctstate NEW -j CONNMARK --set-mark $SYN;

When a SYN packet in a new connection arrives, it does not have any mark set (assuming you did not set one), hence will only match the second rule in the INPUT chain (as shown here). The connection will then be marked with some integer that we define as the "SYN received" state. When the client then gives the third packet in the TCP handshake, the first rule in INPUT triggers and the second does not, because the connection is marked with $SYN and already has state ESTABLISHED. Note that the order of the rules is important.

If the third packet is an ACK, a goto (-g, note that this is different from -j) to the mark_estab chain is executed, the connection will be marked with $ESTAB and control is returned to the INPUT chain. Also note the special case for SYN-ACK, which is ignored by returning from the tcp_new1 chain and leaving the mark as-is, and the case for RST(-ACK), which will trigger the connection to be marked with $CLOSED so that it does not inadvertently match any other rule of the detection logic.

Blocking SYN scans is impossible, because you cannot tell in advance whether a SYN sent by the remote side is intended to be a real connection or a scan attempt. However, you could, for example, block all further requests for a while from the host which already did a SYN scan, using the recent module. Assuming handle_evil is a user-defined chain doing that, you have two ways for implementation, varying in the position of the jump to handle_evil in your own ruleset:

-A tcp_new1 -j handle_evil; (or)
-A INPUT -m connmark --mark $SYNSCAN -j handle_evil;

5   TCP Connect Scan detection

SYN scans require a raw socket, which is not available to unprivileged users. Instead, such users have to use the regular interface involving the connect(2) system call, where the kernel does a standards-conformant three-packet TCP handshaking. Connect scans then immediately terminate the connection using appropriate syscalls, like shutdown(2) or close(2). I have noticed that nmap manages to send an RST even though close() normally makes the kernel do the FIN sequence. Either way, it does not really matter if we get a RST or a FIN. The extended state graph is shown below, as are the iptables rules to model it.


[
svg] [dot]

CNSCAN=$[0x406];
VALID=$[0x408];

-N mark_cnscan;
-A mark_cnscan -j CONNMARK --set-mark $CNSCAN;

-N tcp_new3;
-A tcp_new3 -p tcp --tcp-flags SYN,FIN,RST RST -g mark_cnscan;
-A tcp_new3 -p tcp --tcp-flags SYN,FIN,RST FIN -g mark_cnscan;
-A tcp_new3 -j CONNMARK --set-mark $VALID;

-A INPUT -m connmark --mark $ESTAB -j tcp_new3;

Note that last rule in this code snippet must come before the rule to jump to tcp_new1, i.e.:

-A INPUT -m connmark --mark $ESTAB -j tcp_new3;
-A INPUT -m connmark --mark $SYN -j tcp_new1;

6   TCP Grab Scan detection

There is yet another type of scan, the banner grab scan, where a client connects to solely read bytes and then disconnect. Note that such an action may very well be part of a non-malicious action (FTP DATA connections, for example), so connections should be handled with care. Some services, such as SSH, always are bidirectional from a Layer7 point of view, so it is safe to apply Grab Scan Detection on it. Speaking of SSH, it is a promiment service that presents its data before the client takes any action, e.g. it shows the banner "SSH-2.0-OpenSSH_4.4" voluntarily. I would like to show a way how to match such "grab scans". As already mentioned, a grab scan is where the client sends no data itself, hence its TCP packets have no payload.


[
svg] [dot]

Packets (even with a known amount of payload) can vary in size since the IP and TCP header allow for variable-sized option fields. It is therefore impossible to unambiguously match Grab Scans with the matches as existing of iptables 1.3.7. A typical empty Linux TCP packet is 52 octets, that includes 20 octets for the IP header (no options), 20 octets for the TCP header and 12 octets for the TCP_LINGER2 option Linux sends with packets. However, 52 octets could also be composed of 20 IP header octets, 12 IP option octets and 20 TCP header octets. Or 20 IP header octets, 20 TCP header octets and 12 TCP payload octets. Matching with -m length --length 52 only looks at the total Layer3 packet length, hence its use on Layer4 and above will be ambiguous. However, if one were to use it, this is how it would be done with iptables rules:

ESTAB2=$[0x405];
CNSCAN=$[0x406];
GRSCAN=$[0x407];
VALID=$[0x408];

-N mark_grscan;
-A mark_grscan -j CONNMARK --set-mark $GRSCAN

-N tcp_new3;
-A tcp_new3 -p tcp --tcp-flags SYN,FIN,RST RST -g mark_cnscan;
-A tcp_new3 -p tcp --tcp-flags SYN,FIN,RST FIN -g mark_cnscan;
-A tcp_new3 ! -i lo -p tcp --tcp-flags SYN,FIN,RST,ACK ACK -m length --length 52 -g mark_estab2;
-A tcp_new3 -j CONNMARK --set-mark $VALID;

-N tcp_new4;
-A tcp_new4 -p tcp --tcp-flags SYN,FIN,RST,ACK ACK -m length --length 52 -j RETURN;
-A tcp_new4 -p tcp --tcp-flags SYN,FIN,RST RST -g mark_grscan;
-A tcp_new4 -p tcp --tcp-flags SYN,FIN,RST FIN -g mark_grscan;
-A tcp_new4 -j CONNMARK --set-mark $VALID;

-A INPUT -m connmark --mark $ESTAB2 -j tcp_new4;
-A INPUT -m connmark --mark $ESTAB -j tcp_new3;

Note that the loopback interface is excluded again, because seeing our own packets makes it trigger early. There are possibly ways around this, but that is beyond the scope of this document. A full example iptables ruleset for use with iptables-restore can be found in the source distribution. If you load it, running, for example, the Grab Scan will look like this (kernel messages on master in bold):

master# iptables-restore scan_detect.ipt
master# ssh vm6402
vm6402# telnet master 22
[ESTAB1] IN=vmnet2 OUT= MAC= SRC=192.168.64.2 DST=192.168.64.1 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=31040 DF PROTO=TCP SPT=3180 DPT=22 WINDOW=1460 RES=0x00 ACK URGP=0
Trying 192.168.64.1...
Connected to 192.168.64.1.
Escape character is '^]'.
[ESTAB2] IN=vmnet2 OUT= MAC= SRC=192.168.64.2 DST=192.168.64.1 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=31041 DF PROTO=TCP SPT=3180 DPT=22 WINDOW=1460 RES=0x00 ACK URGP=0
SSH-2.0-OpenSSH_4.4
^]
telnet> exit
[GRSCAN] IN=vmnet2 OUT= MAC= SRC=192.168.64.2 DST=192.168.64.1 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=31042 DF PROTO=TCP SPT=3180 DPT=22 WINDOW=1460 RES=0x00 ACK FIN URGP=0
Connection closed.

The portscan kernel module (section 7) will correctly match a TCP packet with empty payload since it is able to inspect the IP and TCP headers more closely than the length module.

7   portscan match

As the number of rules for all this portscan logic grows, it becomes a little hard to keep track of it. By putting it all into one kernel module, it can be nicely wrapped up into a single match that is listed in your iptables chains. Processing speed will also improve since the netfilter stack is run through less often. This is a simple example for logging SYN scans:

-A INPUT -p tcp -m portscan --synscan -j LOG --log-prefix "[SYNSCAN] "

portscan marks the connection with different values while the connection is active, reflecting the current state as per above's state graph. You must make sure it is configured so that it will not interfere with mark values that you use for the rest of your firewall. The mark values portscan uses can be configured at module load time or by use of the module sysfs interface in /sys/module/xt_portscan/parameters/. You will find connmark_mask, packet_mask and a number of mark_* files in this directory. connmark_mask can be used to limit portscan to using a specific set of bits of the mark value. For more details, see the iptables manual for the CONNMARK target.

When portscan is matched on a packet, the packet will be specially marked so that matching the same packet with portscan in another rule will not retrigger the detection logic and inadvertent state transitions. Like with the connection mark, you must also make sure this is configured properly. The module parameters (and sysfs attributes) packet_mask and mask_seen are used for a single packet.

Because matching on portscan will act like if you matched the connmark value, the following two rulesets are equivalent:

-A INPUT -m portscan --synscan -j LOG --log-prefix "[SYNSCAN] ";
-A INPUT -m portscan --cnscan -j LOG --log-prefix "[CNSCAN] ";
-A INPUT -m portscan --grscan -j LOG --log-prefix "[GRSCAN] ";

-A INPUT -m portscan;
-A INPUT -m connmark --mark $SYNSCAN -j LOG --log-prefix "[SYNSCAN] ";
-A INPUT -m connmark --mark $CNSCAN -j LOG --log-prefix "[CNSCAN] ";

However, the first approach is preferred because you do not need to specify the mark values (possibly numeric, depending on your scripts). The iptables portscan part knows the following four options that match their corresponding state as explained before: --stealth, --synscan, --cnscan and --grscan.

8   CHAOS target

Network scanners such as nmap have extra measures for operating systems that ratelimit the number of return ICMP and/or RST packets comprising host-unreachable, net-unreachable, port closed, and other control messages. When this happens, nmap throttles its scan timing to accomodate for this. Even though Linux is not one of those operating systems exhibiting this behavior naturally, it can be reproduced with iptables using something as simple as:

-N status;
-A status -m hashlimit --hashlimit-name st_limit --hashlimit-mode srcip --hashlimit 2/sec --hashlimit-burst 2 -j RETURN;
-A status -j DROP;
-A OUTPUT -p icmp -j status;
-A OUTPUT -p tcp --tcp-flags SYN,FIN,RST RST -j status;

The detour over RETURN is necessary because hashlimit does not support inversion which would have been handy here (! --hashlimit 2/sec -j DROP). This limits outgoing ICMP and RST packets to 2 per second. Note that if the loopback interface is used, the actual number of ICMP packets you can send is halved, since you send both an ICMP echo and an echo reply through OUTPUT, therefore reaching the limit earlier.

In a better setup, one would possibly use the srcip-destip hashlimit mode, or even limit based upon actual incoming traffic. For example, rate-limiting replies to TCP FNX (-sF, -sN, -sX) scans could be done with:

-N tcp_inval;
-A tcp_inval -m hashlimit --hashlimit-name rstlimit --hashlimit-mode srcip --hashlimit 1/sec --hashlimit-burst 1 -j RETURN;
-A tcp_inval -j DROP;
-A INPUT -p tcp ! --syn -m conntrack --ctstate INVALID -j tcp_inval;

To make things even more interesting, we can also be evil by using the nth or random matches (called statistic in Linux 2.6.18 and above).

-N xlimit;
-A xlimit -m statistic --mode nth --every 10 -j RETURN;
-A xlimit -j DROP;
-A INPUT -p tcp ! --syn -m conntrack --ctstate INVALID -j xlimit;

-A INPUT -p tcp ! --syn -m conntrack --ctstate INVALID -m statistic --mode random --probability 0.90 -j DROP;

The first code snippet will make the TCP stack reply to every 10th packet only, the second one drops packets with a 90% chance, hence giving 10% of the time a RST in response to closed ports. Note that these rules also work with UDP, since closed UDP ports will return a ICMP port unreachable message. (You are encouraged to experiment yourself a bit around.) If all of this knowledge is combined, we get the CHAOS target, which, if represented with iptables rules, is equivalent to:

-N chaos;
-A chaos -m statistic --mode random --probability 0.01 -j REJECT --reject-with host-unreach;
-A chaos -m statistic --mode random --probability 0.0101 -j REJECT -p tcp -j DELUDE;
-A chaos -j DROP;

CHAOS sends host-unreachable messages at a probability of 1% (details see section 8.1), otherwise sends the TCP connection (if applies) to a DELUDE/TARPIT or DROPs it (UDP and others). Using DELUDE/TARPIT has the extra bonus that ports will get listed as "open" in nmap even though there is nothing to see there. What's more, the use of random provides back non-deterministic information. Rerunning nmap multiple times on the same port or port range will yield different results.

"Interesting this use of random. I'll have to play with it when I get that rare bit of spare time for testing and fooling about with things not in prod or requirening immediate attention to fix! Which tend to be even more rare these days in our understaffed env. But, your reports of this random further confusing the scanner and slowing it down are extremely interesting..." -R. DuFresne[9]

Even in its "Insane timing" (as fast as possible) mode, nmap (3.81) reduces itself down to at most 2 TCP ports per second if it recognizes an ICMP rate limit, and even less ports per second on UDP. For the record, the Insane Timing mode is also a knock-often, i.e. nmap sends multiple packets per port. Anyway, random matches "every once in a while", as do nth and hashlimit so it is basically personal preference of what match to pick. It is yet to be shown which of the three has best effect, or if they all yield approximately the same slowdown.

8.1   Tuning CHAOS

I found out that reducing the RST/ICMP reply rate slows nmap more. Best results are at a reply rate of 1%. The big picture is that TCP FNX is slowed down about 50,000% and UDP about 60,000% (nmap 3.81). The exact details can be found in the subdocuments.

New benchmarks (nmap 4.x) are currently underway. It really takes time to collect empiric data because the results are actually quite random (and so is nmap in its behavior) that it requires to run the timings multiple times to get a sane average baseline. They will also show the yet-undiscovered behavior when the reply rate is below 1%, and will also present statistics about ACK and SYN scans (these two have not been done before because of the large amount of time it takes). Preliminary data mostly confirms the nmap 3.81 graph, but indications suggest that 4%-2% is a better reply range, especially for ACK and SYN scans.

9   DELUDE target

Due to the way TARPIT works, ports will be listed as "open" in nmap, though there is nothing interesting besides the TARPIT there. Because the TARPIT target leaves connections open, it will pollute the conntrack tables of your machine (when using conntrack - which is most likely the case) and routers before you. In a distributed attack (or yet another automatic Internet worm trying to smash your SQL port), this can exhaust the conntrack space of routers and will most often lead to connections being dropped from the routers' tables. Many home routers only have support for about 1000 concurrent connections, and on embedded systems that run Linux/netfilter, having only 32 MB RAM will also set the conntrack list size to 1024 entries. In consequence, the connection, when picked up again by the conntrack code, may not get trough the firewall rules -- consider the following ruleset:

-A FORWARD -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT;
-A FORWARD -m conntrack --ctstate NEW -p tcp --syn -j ACCEPT;
-A FORWARD -p icmp -j ACCEPT;
-A FORWARD -p udp -j ACCEPT;
-P FORWARD DROP;

With this ruleset, packets from a connection that got dropped from conntrack will not be able to pass.

To make a port look open, all we need is to send a SYN-ACK packet in the TCP handshake. A more light-weight variant of TARPIT is DELUDE, which only sends this SYN-ACK in response to SYN, and otherwise behaves like REJECT, sending RSTs out. Of course it lacks the keep-the-client-busy feature, but should not keep unnecessary entries in the conntrack tables. (This is yet to be shown by a test/attack.)

To nmap, it looks like the port is open. Programs running through the full handshake will end up with "Connection reset by peer" instead, giving a bit more useful info than just to hang the connection like TARPIT does. Whether that is a feature or something to watch out for lies in the eye of the beholder. (Maybe I'll find a better solution that satisfies both.)

10   Notes

Sometimes, "good" hosts send packets that get classified as a SYN scan or Connect Scan attempt. Windows XP is one such candidate (it also sets the URG/PSH flag in the TCP handshake in a SMB connections). It is therefore advised to use the CHAOS target in conjunction with its --tarpit option cautiously in internal networks. --delude should be fine, but confusing to the end user.

References

[1] RFC 793: TCP
[2] linux/net/ipv4/fib_frontend.c from kernel 2.6.18
[3] Packet flow in the Linux kernel, by josh[at]imagestream.com
[4] RFC 792: ICMP
[5] RFC 1393: IP/ICMP Traceroute
[6] RFC 793: TCP, page 23: state diagram
[7] RFC 793: TCP, page 65: RST handling
[8] RFC 793: TCP, page 37: RST-ACK in response to SYN
[9] R. DuFresne on the Netfilter mailing list - response to the idea