Ideas for firewalling
chaostables 0.5
Inversion of the "robustness principle" makes us more
robust actually.
“Be conservative in what you accept and be liberal in what you do.”
1. Introduction
2. TCP Stealth Scan detection
3. TCP SYN Scan detection
4. TCP Connect Scan detection
5. TCP Grab Scan detection
6. portscan match
7. CHAOS target
7.1. Tuning CHAOS
8. DELUDE target
9. Basic filters
9.1. Kernel policies
9.2. RFC1256 routing packets
9.3. Traceroute filtering
10. Pitfalls to watch out for
10.1. Empty connections
Appendix
What chaostables is, is not, can, cannot, does, does not
chaostables is the software package that contains this document ("Ideas for firewalling"), the netfilter implementations for some of the modules described, etc. It is not a software like iptables and is not a table like mangle.
chaostables does not disguise the OS type -- at least not intentionally. It may happen however, that nmap's OS detection gets confused as a welcomed side effect.
In the ideal case, a user should make a connection, do what needs to be done, and close it. Standardized TCP connections begin with a SYN packet [RFC793 p23]. Everything else can be considered anomalous. To match a scan, the inner workings of the scan program are needed. Sometimes, it also suffices to see what traffic is coming in, because that is what can be matched. As said, normal TCP connections always begin with a SYN, anything else must be forged, and therefore is easy to match. The following rule will match most anomalies, including TCP NULL, TCP FIN, TCP XMAS, and possibly other strange combinations:
-p tcp ! --syn -m conntrack --ctstate INVALID
It does not match TCP ACK scans, because a "spurious" ACK may very well be part of an already-existing connection where our machine just does not know about its state (e.g. after a reboot or conntrack flush).
An extra rule is required to be able to continue to receive "Connection refused" message ourselves -- e.g. if you run `telnet somehost someClosedPort` -- the returned RST and/or RST-ACK packets are not associated with any connection in Netfilter. Hence, we need an exclusion rule to the above:
-p tcp --tcp-flags SYN,FIN,RST RST
According to [RFC793 p65], an RST will not be replied to, hence no information leak will occur as a result of accepting it. (RST-ACK is required as a reponse to SYN to a closed port, see [RFC793 p37 par3].
You can incorporate this into your ruleset as follows. A user-defined chain is handy, but you can have it any way:
-N tcp_inval;
-A tcp_inval -p tcp --tcp-flags SYN,FIN,RST,ACK RST,ACK -j RETURN;
-A tcp_inval -j LOG --log-prefix "[STEALTH] ";
-A tcp_inval -j CHAOS;
-A INPUT -p tcp ! --syn -m conntrack --ctstate INVALID -j tcp_inval;
This allows the use of more targets, such as LOG (shown here), without repeatedly matching non-SYN, so this is the preferred way. This ruleset can also be used in the FORWARD chain without fear to kill already-open connections. Active connections that have not yet been seen by Netfilter will become NEW and ESTABLISHED after the next two packets, respectively, without making the connection INVALID.
The CHAOS target is discussed in a later section.
A SYN scan half-opens a TCP connection and terminates the handshake in the middle. In other words, if a SYN is received, we send the obligatory SYN-ACK and then the scanner immediately sends an RST. (The exact implementation may differ from scanner to scanner and OS. For example, nmap sends a SYN using a raw socket, but when the Linux kernel receives the return SYN-ACK, it [the kernel] does not know anything about the connection and responds with RST.) Using a state machine (automaton) inside iptables, it is easy to match the third packet:
When connecting to localhost, special attention needs to be given since we receive our own packets. When the socket is open, the server side sends its SYN-ACK. Under normal circumstances, this packet is only seen in the OUTPUT chain and hence is not of relevance for the state graph, which is modeled upon incoming packets. However, when the loopback interface is involved, we will see our own SYN-ACK packet again in the INPUT chain, so it is to be ignored. The ruleset can modeled as follows with iptables:
SYN=$[0x401];
CLOSED=$[0x402];
SYNSCAN=$[0x403];
ESTAB=$[0x404];
-N mark_closed;
-A mark_closed -j CONNMARK --set-mark $CLOSED;
-N mark_estab;
-A mark_estab -j CONNMARK --set-mark $ESTAB;
-N tcp_new1;
-A tcp_new1 -i lo -p tcp --tcp-flags ALL SYN,ACK -j RETURN;
-A tcp_new1 -i lo -p tcp --tcp-flags ALL RST,ACK -g mark_closed;
-A tcp_new1 -p tcp --tcp-flags ALL ACK -g mark_estab;
-A tcp_new1 -j CONNMARK --set-mark $SYNSCAN;
-A INPUT -m connmark --mark $SYN -j tcp_new1;
-A INPUT -p tcp --syn -m conntrack --ctstate NEW -j CONNMARK --set-mark $SYN;
When a SYN packet in a new connection arrives, it does not have any mark set (assuming you did not set one), hence will only match the second rule in the INPUT chain (as shown here). The connection will then be marked with some integer that we define as the "SYN received" state. When the client then gives the third packet in the TCP handshake, the first rule in INPUT triggers and the second does not, because the connection is marked with $SYN and already has state ESTABLISHED. Note that the order of the rules is important.
If the third packet is an ACK, a goto (-g, note that this is different from -j) to the mark_estab chain is executed, the connection will be marked with $ESTAB and control is returned to the INPUT chain. Also note the special case for SYN-ACK, which is ignored by returning from the tcp_new1 chain and leaving the mark as-is, and the case for RST(-ACK), which will trigger the connection to be marked with $CLOSED so that it does not inadvertently match any other rule of the detection logic.
Blocking SYN scans is impossible, because you cannot tell in advance whether a SYN sent by the remote side is intended to be a real connection or a scan attempt. However, you could, for example, block all further requests for a while from the host which already did a SYN scan, using the recent module. Assuming handle_evil is a user-defined chain doing that, you have two ways for implementation, varying in the position of the jump to handle_evil in your own ruleset:
-A tcp_new1 -j handle_evil; (or)
-A INPUT -m connmark --mark $SYNSCAN -j handle_evil;
SYN scans require a raw socket, which is not available to unprivileged users. Instead, such users have to use the regular interface involving the connect(2) system call, where the kernel does a standards-conformant three-packet TCP handshaking. Connect scans then immediately terminate the connection using appropriate syscalls, like shutdown(2) or close(2). I have noticed that nmap manages to send an RST even though close() normally makes the kernel do the FIN sequence. Either way, it does not really matter if we get a RST or a FIN. The extended state graph is shown below, as are the iptables rules to model it.
CNSCAN=$[0x406];
VALID=$[0x408];
-N mark_cnscan;
-A mark_cnscan -j CONNMARK --set-mark $CNSCAN;
-N tcp_new3;
-A tcp_new3 -p tcp --tcp-flags SYN,FIN,RST RST -g mark_cnscan;
-A tcp_new3 -p tcp --tcp-flags SYN,FIN,RST FIN -g mark_cnscan;
-A tcp_new3 -j CONNMARK --set-mark $VALID;
-A INPUT -m connmark --mark $ESTAB -j tcp_new3;
Note that last rule in this code snippet must come before the rule to jump to tcp_new1, i.e.:
-A INPUT -m connmark --mark $ESTAB -j tcp_new3;
-A INPUT -m connmark --mark $SYN -j tcp_new1;
There is yet another type of scan, the banner grab scan, where a client connects to solely read bytes and then disconnect. Note that such an action may very well be part of a non-malicious action (FTP DATA connections, for example), so connections should be handled with care. Some services, such as SSH, always are bidirectional from a Layer7 point of view, so it is safe to apply Grab Scan Detection on it. Speaking of SSH, it is a promiment service that presents its data before the client takes any action, e.g. it shows the banner "SSH-2.0-OpenSSH_4.4" voluntarily. I would like to show a way how to match such "grab scans". As already mentioned, a grab scan is where the client sends no data itself, hence its TCP packets have no payload.
Packets (even with a known amount of payload) can vary in size since the IP and TCP header allow for variable-sized option fields. It is therefore impossible to unambiguously match Grab Scans with the matches as existing of iptables 1.3.7. A typical empty Linux TCP packet is 52 octets, that includes 20 octets for the IP header (no options), 20 octets for the TCP header and 12 octets for the TCP_LINGER2 option Linux sends with packets. However, 52 octets could also be composed of 20 IP header octets, 12 IP option octets and 20 TCP header octets. Or 20 IP header octets, 20 TCP header octets and 12 TCP payload octets. Matching with -m length --length 52 only looks at the total Layer3 packet length, hence its use on Layer4 and above will be ambiguous. However, if one were to use it, this is how it would be done with iptables rules:
ESTAB2=$[0x405];
CNSCAN=$[0x406];
GRSCAN=$[0x407];
VALID=$[0x408];
-N mark_grscan;
-A mark_grscan -j CONNMARK --set-mark $GRSCAN
-N tcp_new3;
-A tcp_new3 -p tcp --tcp-flags SYN,FIN,RST RST -g mark_cnscan;
-A tcp_new3 -p tcp --tcp-flags SYN,FIN,RST FIN -g mark_cnscan;
-A tcp_new3 ! -i lo -p tcp --tcp-flags SYN,FIN,RST,ACK ACK -m length --length 52 -g mark_estab2;
-A tcp_new3 -j CONNMARK --set-mark $VALID;
-N tcp_new4;
-A tcp_new4 -p tcp --tcp-flags SYN,FIN,RST,ACK ACK -m length --length 52 -j RETURN;
-A tcp_new4 -p tcp --tcp-flags SYN,FIN,RST RST -g mark_grscan;
-A tcp_new4 -p tcp --tcp-flags SYN,FIN,RST FIN -g mark_grscan;
-A tcp_new4 -j CONNMARK --set-mark $VALID;
-A INPUT -m connmark --mark $ESTAB2 -j tcp_new4;
-A INPUT -m connmark --mark $ESTAB -j tcp_new3;
Note that the loopback interface is excluded again, because seeing our own packets makes it trigger early. There are possibly ways around this, but that is beyond the scope of this document. A full example iptables ruleset for use with iptables-restore can be found in the source distribution. If you load it, running, for example, the Grab Scan will look like this (kernel messages on master in bold):
master# iptables-restore scan_detect.ipt
master# ssh vm6402
vm6402# telnet master 22
[ESTAB1] IN=vmnet2 OUT= MAC= SRC=192.168.64.2 DST=192.168.64.1 LEN=52
TOS=0x10 PREC=0x00 TTL=64 ID=31040 DF PROTO=TCP SPT=3180 DPT=22 WINDOW=1460
RES=0x00 ACK URGP=0
Trying 192.168.64.1...
Connected to 192.168.64.1.
Escape character is '^]'.
[ESTAB2] IN=vmnet2 OUT= MAC= SRC=192.168.64.2 DST=192.168.64.1 LEN=52
TOS=0x10 PREC=0x00 TTL=64 ID=31041 DF PROTO=TCP SPT=3180 DPT=22 WINDOW=1460
RES=0x00 ACK URGP=0
SSH-2.0-OpenSSH_4.4
^]
telnet> exit
[GRSCAN] IN=vmnet2 OUT= MAC= SRC=192.168.64.2 DST=192.168.64.1 LEN=52
TOS=0x10 PREC=0x00 TTL=64 ID=31042 DF PROTO=TCP SPT=3180 DPT=22 WINDOW=1460
RES=0x00 ACK FIN URGP=0
Connection closed.
The portscan kernel module (section 8) will correctly match a TCP packet with empty payload since it is able to inspect the IP and TCP headers more closely than the length module.
As the number of rules for all this portscan logic grows, it becomes a little hard to keep track of it. By putting it all into one kernel module, it can be nicely wrapped up into a single match that is listed in your iptables chains. Processing speed will also improve since the netfilter stack is run through less often. This is a simple example for logging SYN scans:
-A INPUT -p tcp -m portscan --synscan -j LOG --log-prefix "[SYNSCAN] "
portscan marks the connection with different values while the connection is active, reflecting the current state as per above's state graph. You must make sure it is configured so that it will not interfere with mark values that you use for the rest of your firewall. The mark values portscan uses can be configured at module load time or by use of the module sysfs interface in /sys/module/xt_portscan/parameters/. You will find connmark_mask, packet_mask and a number of mark_* files in this directory. connmark_mask can be used to limit portscan to using a specific set of bits of the mark value. For more details, see the iptables manual for the CONNMARK target.
When portscan is matched on a packet, the packet will be specially marked so that matching the same packet with portscan in another rule will not retrigger the detection logic and inadvertent state transitions. Like with the connection mark, you must also make sure this is configured properly. The module parameters (and sysfs attributes) packet_mask and mask_seen are used for a single packet.
Because matching on portscan will act like if you matched the connmark value, the following two rulesets are equivalent:
-A INPUT -m portscan --synscan -j LOG --log-prefix "[SYNSCAN] ";
-A INPUT -m portscan --cnscan -j LOG --log-prefix "[CNSCAN] ";
-A INPUT -m portscan --grscan -j LOG --log-prefix "[GRSCAN] ";
-A INPUT -m portscan;
-A INPUT -m connmark --mark $SYNSCAN -j LOG --log-prefix "[SYNSCAN] ";
-A INPUT -m connmark --mark $CNSCAN -j LOG --log-prefix "[CNSCAN] ";
However, the first approach is preferred because you do not need to specify the mark values (possibly numeric, depending on your scripts). The iptables portscan part knows the following four options that match their corresponding state as explained before: --stealth, --synscan, --cnscan and --grscan.
Network scanners such as nmap have extra measures for operating systems that ratelimit the number of return ICMP and/or RST packets comprising host-unreachable, net-unreachable, port closed, and other control messages. Those operating systems do this to not flood the network more than already is by a scan, rather than being an active stopping power to scans. When the rate limit kicks in, nmap throttles its scan timing to accomodate for this to not lose scan result accuracy. Even though Linux is not one of those operating systems exhibiting this ratelimit behavior naturally, it can be reproduced with iptables using something as simple as:
-N status;
-A status -m hashlimit --hashlimit-name st_limit --hashlimit-mode srcip
--hashlimit 2/sec --hashlimit-burst 2 -j RETURN;
-A status -j DROP;
-A OUTPUT -p icmp -j status;
-A OUTPUT -p tcp --tcp-flags SYN,FIN,RST RST -j status;
The detour over RETURN is necessary because hashlimit does not support inversion which would have been handy here (! --hashlimit 2/sec -j DROP). This limits outgoing ICMP and RST packets to 2 per second. Note that if the loopback interface is used, the actual number of ICMP packets you can send is halved, since you send both an ICMP echo and an echo reply through OUTPUT, therefore reaching the limit earlier.
In a better setup, one would possibly use the srcip-destip hashlimit mode, or even limit based upon actual incoming traffic. For example, rate-limiting replies to TCP FNX (-sF, -sN, -sX) scans could be done with:
-N tcp_inval;
-A tcp_inval -m hashlimit --hashlimit-name rstlimit --hashlimit-mode srcip
--hashlimit 1/sec --hashlimit-burst 1 -j RETURN;
-A tcp_inval -j DROP;
-A INPUT -p tcp ! --syn -m conntrack --ctstate INVALID -j tcp_inval;
To make things even more interesting, we can also be evil by using the nth or random matches (called statistic in Linux 2.6.18 and above).
-N xlimit;
-A xlimit -m statistic --mode nth --every 10 -j RETURN;
-A xlimit -j DROP;
-A INPUT -p tcp ! --syn -m conntrack --ctstate INVALID -j xlimit;
-A INPUT -p tcp ! --syn -m conntrack --ctstate INVALID -m statistic --mode random --probability 0.90 -j DROP;
The first code snippet will make the TCP stack reply to every 10th packet only, the second one drops packets with a 90% chance, hence giving 10% of the time a RST in response to closed ports. Note that these rules also work with UDP, since closed UDP ports will return a ICMP port unreachable message. (You are encouraged to experiment yourself a bit around.) If all of this knowledge is combined, we get the CHAOS target, which, if represented with iptables rules, is equivalent to:
-N chaos;
-A chaos -m statistic --mode random --probability 0.01 -j REJECT
--reject-with host-unreach;
-A chaos -m statistic --mode random --probability 0.0101 -j REJECT
-p tcp -j DELUDE;
-A chaos -j DROP;
CHAOS sends host-unreachable messages at a probability of 1% (reason for this rate next subsection), otherwise sends the TCP connection (if applies) to a TARPIT (TCP), DELUDEs (TCP) or DROPs it (UDP and others). Using DELUDE/TARPIT has the extra bonus that ports will get listed as "open" in nmap even though there is nothing to see there. What's more, the use of random provides back non-deterministic information. Rerunning nmap multiple times on the same port or port range will yield different results. The end result is that there are much more ports listed open than there really are so the scanner is none the wiser as to which ports are "really" open without using more intrusive and detectable methods.
"Interesting this use of random. I'll have to play with it when I get that rare bit of spare time for testing and fooling about with things not in prod or requirening immediate attention to fix! Which tend to be even more rare these days in our understaffed env. But, your reports of this random further confusing the scanner and slowing it down are extremely interesting..." [DuFresne]
Even in its "Insane timing" (as fast as possible) mode, nmap (3.81) reduces itself down to at most 2 TCP ports per second if it recognizes an ICMP rate limit, and even less ports per second on UDP. For the record, the Insane Timing mode is also a knock-often, i.e. nmap sends multiple packets per port. Anyway, random matches "every once in a while", as do nth and hashlimit so it is basically personal preference of what match to pick. It is yet to be shown which of the three has best effect, or if they all yield approximately the same slowdown.
I found out that reducing the RST/ICMP reply rate slows nmap more. Best results are at a reply rate of 1%. The big picture is that TCP FNX is slowed down about 50,000% and UDP about 60,000% (nmap 3.81) compared to a "minimal" firewall that DROPs all unwanted packets. The exact details can be found in the subdocuments listed below.
New benchmarks (nmap 4.x) are currently underway. It really takes time to collect empiric data because the results are actually quite random (and so is nmap in its behavior) that it requires to run the timings multiple times to get a sane average baseline. They will also show the yet-undiscovered behavior when the reply rate is below 1%, and will also present statistics about ACK and SYN scans (these two have not been done before because of the large amount of time it takes). Preliminary data mostly confirms the nmap 3.81 graph, but indications suggest that 4%-2% is a better reply range, especially for ACK and SYN scans.
You can expect to not being able to run -T5 or -T4 over the Internet, or only with losing accuracy. What is shown here are the best (shortest) scanning times by scanning localhost (which has the best "connection").
Due to the way TARPIT works, ports will be listed as "open" in nmap, though there is nothing interesting besides the TARPIT there. Because the TARPIT target leaves connections open, it will fill up the conntrack tables of your machine (when using conntrack - which is most likely the case) and routers before you. In a distributed attack (or yet another automatic Internet worm trying to smash your SQL port), this can exhaust the conntrack space of routers and will most often lead to connections being evicted from the routers' tables. Many home routers only have support for about 1000 concurrent connections, and on embedded systems that run Linux/netfilter, having only 32 MB RAM will also set the conntrack list size to 1024 entries. In consequence, the connection, when picked up again by the conntrack code, may get reset, or hangs, depending on firewall rules -- consider the following ruleset:
-A FORWARD -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT;
-A FORWARD -m conntrack --ctstate NEW -p tcp --syn -j ACCEPT;
-A FORWARD -p icmp -j ACCEPT;
-A FORWARD -p udp -j ACCEPT;
-P FORWARD DROP;
With this ruleset, packets from a connection that got dropped from conntrack will not be able to pass.
To make a port look open, all we need is to send a SYN-ACK packet in the TCP handshake. A more light-weight variant of TARPIT is DELUDE, which only sends this SYN-ACK in response to SYN, and otherwise behaves like REJECT, sending RSTs out. Of course it lacks the feature to keep the client in a busy loop (which would be handy against any spammer who actually wants to send out data), but should not keep unnecessary entries in the conntrack tables.
Client | Router(s) | Server+DELUDE | Client | Router(s) | Server+TARPIT |
---|---|---|---|---|---|
SYN | conntrack: NEW | - | SYN | ct: NEW | - |
- | ct: ESTABLISHED | SYN-ACK | - | ct: EST. | SYN-ACK |
ACK | - | - | ACK | - | - |
- | remove ct entry | RST | - | - | ACK window 0 |
(data) | - | - | |||
- | - | ACK window 0 | |||
FIN | - | - | |||
- | - | DROP | |||
timeout | - | - | |||
- | timeout, remove ct |
timeout, remove ct |
To nmap, it looks like the port is open. Programs running through the full handshake will end up with "Connection reset by peer" instead, giving a bit more useful info than just to hang the connection like TARPIT does. This may be a feature, or something to watch out for.
This section contains a collection of other thoughts regarding firewalling. Even if not used, add it to your pool of knowledge.
Other parts in the Linux kernel besides Netfilter may also have switches to control packet flow. The routing layer has some of these, which can be changed in /proc/sys/net/ipv4/conf/*/. The important ones are "accept_redirects", "accept_source_route", "rp_filter", and, secondarily, "send_redirects". The first two define whether the routing code should consider these kinds of ICMP messages (Redirect, Router Solicitation) for its execution flow. "rp_filter" checks if the packet can legitimately come from the interface it was received on (see [LXRfib]).
Windows 98 likes to spew out multicast router-solicitation packets every now and then in LAN networks. Most ISPs should filter multicast, Linux's routing code is not configured to accept these by default, multicast packets do not seem to pass through the netfilter tables either, and above all, multicast packets typically behave like having a TTL of zero. Paranoid sysadmins may anyway block ICMP Redirect [RFC792], Router Advertisement and Router Solicitation [RFC1256] in FORWARD anyway, in case someone crafts malicious unicast packets.
[RFC792] divides ICMP packet types hierarchially into so-called types and codes. iptables may either match all codes of a specific type or one code of one specific type, or any type.
-N icmp_drop;
-A icmp_drop -p icmp --icmp-type redirect -j DROP;
-A icmp_drop -p icmp --icmp-type router-advertisement -j DROP;
-A icmp_drop -p icmp --icmp-type router-solicitation -j DROP;
-A INPUT -j icmp_drop;
-A FORWARD -j icmp_drop;
The redirect type includes four codes, hence blocks both network-redirect and host-redirect, plus the two TOS variants thereof. router-advertisement and router-soliciation are types without any subcodes.
A remote host can send an ICMP Echo packet with the "IP Traceroute"[RFC1393] flag. This cannot realiably be blocked, since the ipv4options module does not have code to look at it. However, we can block the return packets passing the machine using:
-A FORWARD -p icmp --icmp-type 30 -j DROP;
Type 30 must be specified numerically (as shown above), since iptables does not know the names for such extensions that do not seem to be widely deployed. Linux does not seem to support IP Traceroute as of this writing, so it is not necessary to block it in the OUTPUT chain.
Filtering UDP Traceroute and TCP Traceroute is quite impossible without deranging other traffic. A very low TTL may indicate an ongoing traceroute scan, but any TTL is legitimate.
Sometimes, "good" hosts send packets that get classified as a SYN scan or Connect Scan attempt. Windows XP is one such candidate (it also sets the URG/PSH flag in the TCP handshake in a SMB connections). It is therefore advised to use the CHAOS target cautiously in conjunction with its --tarpit option in internal networks. --delude should be fine, but confusing to the end user.