All day today, we’ve noticed some strange behavior on the wireless network at OS X con. Every so often, clients would get a “Connection Refused” when trying to go to a web page, or would be kicked off of iChat spontaneously, only to be reconnected a moment later. Hitting Reload would usually work, but would sometimes show another “Connection Refused”, and then just as mysteriously go away. Strangely, ssh connections seemed to work just fine, and never got dropped. Many other people (beyond the usual wireless ultra geeks) also reported similar problems.

We finally got tired of putting up with this state of affairs, and took a look at network traffic directly with tcpdump. What we found was very interesting. (Lines below are broken and highlighted for readability).

root@jojo~# tcpdump -eni en1 host 192.168.2.234

22:51:09.343951 0:30:65:aa:bb:cc 0:a0:c9:00:11:22 0800 74:
  192.168.2.234.52774 > 216.92.122.183.80: S 4010194416:4010194416(0)
  win 32768  (DF)

Here we see a client with the MAC address 0:30:65:aa:bb:cc and IP address 192.168.2.234 make an initial SYN setup request to a web page at 216.92.122.183, via the router with the MAC address 0:a0:c9:00:11:22.

22:51:09.344882 0:30:65:42:23:ff 0:30:65:aa:bb:cc 0800 54:
  216.92.122.183.80 > 192.168.2.234.52774: R 0:0(0) ack
  4010194417 win 0 (DF)
22:51:09.345665 0:30:65:42:23:ff 0:30:65:aa:bb:cc 0800 54:
  216.92.122.183.80 > 192.168.2.234.52774: R 0:0(0) ack
  1 win 0 (DF)

What’s this? Almost immediately (about .8ms later), a different host entirely (0:30:65:42:23:ff) sends two TCP RESET packets back to the client machine. How rude! Of course, the client machine aborts the current connection attempt (as a “Connection Reset by Peer”) and forgets about it.

22:51:09.417171 0:a0:c9:00:11:22 0:30:65:aa:bb:cc 0800 74:
  216.92.122.183.80 > 192.168.2.234.52774: S
  1491529510:1491529510(0) ack
  4010194417 win 65535 

Next we see the router (at 0:a0:c9:00:11:22) returning with the SYN ACK from the original web page…

22:51:09.418044 0:30:65:aa:bb:cc 0:a0:c9:00:11:22 0800 54:
  192.168.2.234.52774 > 216.92.122.183.80: R
  4010194417:4010194417(0) win 0

…which of course the client naturally refuses (sending a RESET back to the web site), since it has already received a RESET from the misbehaving peer!

22:51:09.863614 0:30:65:aa:bb:cc 0:a0:c9:00:11:22 0800 74:
  192.168.2.234.52775 > 216.92.122.183.80: S
  4246974705:4246974705(0) win 32768  (DF)

And finally, since the connection didn’t go through the first time, the client retries, and the cycle repeats again, and again, and again… Until the client’s browser finally gives up a few seconds later.

The rude machine that sent the gratuitous RESETs above (0:30:65:42:23:ff) was actually my own laptop. But by logging traffic for a while, we found that arbitrary hosts on the wireless segment were also sending RESETs. What was the common variable on all of these machines? What we have been able to determine is that if any host on a wireless network is both running the Jaguar Firewall and running a program that throws the AirPort into promiscuous mode (like tcpdump, ngrep, etherpeg, or other network monitoring tool) then that machine will send arbitrary TCP RESETs for every packet that it sees on the wireless, even if it wasn’t destined for itself. Likely, this is because something in the firewall code sees the packets as a local destination (as the card is in promiscuous mode), even though it’s not really a local destination. This also explains why ssh connections were unaffected: most people have an exception for ssh in their firewall rules, and so packets destined for port 22 (the ssh port) wouldn’t ever get matched, and so wouldn’t get rejected.

This is an exceedingly easy thing to do, especially at this conference (where people like me are working with firewalls, monitoring tools, and wireless networks, and there are also many active wireless clients!) It is possible that this behavior would manifest itself on a wired network as well, if all of the clients involved were connected to a network hub (but not a switch). As wireless APs necessarily act as a hub, every client can see the traffic of every other, and hence can send responses to packets that weren’t destined for them. Unfortunately, we don’t have a hub to test with here at the conference.

Turning off firewalling immediately eliminates the problem, and turning it back on recreates it reliably. This is very odd behavior, as filtered packets should normally be dropped on the floor (and we should certainly not automatically send RESETs to addresses that aren’t involved with a locally bound address).

As Cliff Skolnick (who instigated tracking down the above strangeness) also points out, there is even more dementia when OS X 10.2 is set up as a router. As part of a nifty hack he’s presenting that enables a Bluetooth enabled Palm to use the network through his Titanium and over its wireless network to get to the Internet, he needs to enable packet forwarding:

root@jojo~# sysctl -w net.inet.ip.forwarding=1

Now, since he’s using a Titanium, he isn’t using the internal AirPort card (as the wireless range is, well, not what it could be) but instead is using an external PCMCIA card for his network connection. As soon as routing is enabled and promiscuous mode is turned on (say, by simply running tcpdump), it suddenly attempts to send ICMP redirects on every ICMP packet it sees on the wireless segment, redirecting them to the router it was destined for in the first place. This generates a huge amount of inadvertent traffic, with very little effort:

root@caligula:~# ping 192.168.2.234
PING 192.168.2.234 (192.168.2.234): 56 data bytes
64 bytes from 192.168.2.234: icmp_seq=0 ttl=64 time=4.916 ms
64 bytes from 192.168.2.234: icmp_seq=0 ttl=64 time=7.294 ms (DUP!)
64 bytes from 192.168.2.234: icmp_seq=1 ttl=64 time=12.44 ms
64 bytes from 192.168.2.234: icmp_seq=1 ttl=64 time=15.105 ms (DUP!)
64 bytes from 192.168.2.234: icmp_seq=2 ttl=64 time=9.754 ms
64 bytes from 192.168.2.234: icmp_seq=1 ttl=64 time=1020.15 ms (DUP!)
64 bytes from 192.168.2.234: icmp_seq=2 ttl=64 time=22.165 ms (DUP!)
64 bytes from 192.168.2.234: icmp_seq=1 ttl=64 time=1026.62 ms (DUP!)
64 bytes from 192.168.2.234: icmp_seq=1 ttl=64 time=1029.02 ms (DUP!)
64 bytes from 192.168.2.234: icmp_seq=1 ttl=64 time=1032.94 ms (DUP!)
64 bytes from 192.168.2.234: icmp_seq=3 ttl=64 time=6.755 ms
64 bytes from 192.168.2.234: icmp_seq=3 ttl=64 time=23.359 ms (DUP!)
64 bytes from 192.168.2.234: icmp_seq=2 ttl=64 time=1216.67 ms (DUP!)

Quitting tcpdump immediately makes this problem go away. This is very unexpected network behavior, and certainly bears more examination…

So, to sum up:

  • If you’re running promiscuous mode tools, do not run a firewall in OS X 10.2
  • If you’re acting as a router… DON’T run promiscuous mode utilities!
  • If you’re at OS X con and are running Etherpeg (or ethereal or ngrep or ettercap or ntop any other utility that throws the card into promiscuous mode), and are running routing or firewalling, we will hunt you down and make you knock it off. ;)

Note that the actual MAC and IP addresses have been changed to protect the innocent. Thanks to Cliff Skolnick and the bunch of people hanging out on the mezzanine that helped get to the bottom of this! If you read this, and you’re at the conference, help spread the word to any other curious Etherpeg aficionados who may have their Firewall turned on…

Have you seen this strange behavior on Jaguar?