Network traffic analysis
In the last 8 months, I’ve captured and analyzed 4.4 billion network packets from my home internet connection, representing 3833 GiB of data. Life basically occurs via the internet when you’re stuck at home. Facetime calls, streaming anime, browsing reddit, and even video conferencing for my job — all of it gets funneled into my custom network monitoring software for decoding, logging, and analysis. Needless to say, most internet communication uses encrypted TLS tunnels, which are opaque to passive network observers. However, there’s a ton of interesting details exposed in the metadata — as network engineers and application programmers, we’ve repeated this adage to ourselves countless times. However, few people have actually inspected their own internet traffic to confirm exactly what kind of data they’re exposing this way. Developing my own network traffic analysis software has helped me solidify my understanding of network protocols. It’s also given me a really convenient network “watchtower”, from which I can easily do things like take packet dumps to debug my IPv6 configuration. I think anyone interested in computer networking should have access to such a tool, so in this post, I’ll describe my approach and the problems I encountered in case you’re interested in trying it out yourself.
The tap
Before I begin, I should probably remind you only to capture network traffic when you’re authorized to do so1. Naturally, the world of network protocols is so exciting that it’s easy to forget, but some people are seriously creeped out by network surveillance. Personally, I probably do a lot more self-surveillance (including security cameras) than most people are comfortable with, but do keep that in mind in case you aren’t the only one using your home internet connection.
In any case, the first thing you need is a network tap. My router offers port mirroring, and I’m personally only interested in my WAN traffic, so I simply configured my router to mirror a copy of every inbound and outbound WAN packet to a special switch port dedicated to traffic analysis. If your router doesn’t support this, you can buy network tap devices that essentially do the same thing. On the receiving end, I plugged in my old Intel NUC — a small form factor PC equipped with a gigabit LAN port2 and some decent solid state storage. Any decently fast computer would probably suffice, but if you can find one with two ethernet ports, that might save you some trouble later on. My NUC only had a single port, so I used it both as the management interface and for receiving the mirrored traffic.
Capture frames
At this point, sniffed network frames are arriving on your analysis device’s network interface card (NIC). However, you’ll need to do a bit more work before they can be delivered properly to your application for analysis. When my router mirrors a switch port, it sends a replica of the exact ethernet frame to the mirror target. This means that the MAC address of the original packet is preserved. Normally, NICs will simply discard packets with an irrelevant destination MAC address, but we can force the NIC to accept these packets by enabling “promiscuous mode”. Additionally, the operating system’s networking stack will try to decode the IP and TCP/UDP headers on these sniffed packets to route them to the correct application. This won’t work, of course, since the sniffed packets are typically intended for other computers on the network.
We want our network analysis software to configure the NIC in promiscuous mode and to receive all incoming packets, regardless of their IP and layer 4 headers. To accomplish this, we can use Linux’s packet sockets3. You’ll find plenty of filtering options in the man pages, but I’m using SOCK_RAW and ETH_P_ALL, which includes the ethernet header and does not filter by layer 3 protocol. Additionally, I add the socket to PACKET_MR_PROMISC on interface 2 (eth0 on my NUC), which enables promiscuous mode. These options typically require root privileges or at least special network capabilities, so you may need to elevate your privileges for this to work. Now that you’ve set up your socket, you can start calling recvfrom to grab frames.
Packet mmap
Fetching packets one a time using recvfrom is perfectly fine, but it requires a context switch for each packet and may not be performant enough under heavy workloads. Linux provides a more efficient mechanism to use packet sockets called PACKET_MMAP, but it’s a bit tricky to set up, so feel free to skip this section and come back later. Essentially, packet mmap allows you to configure a ring buffer for sending and receiving packets on a packet socket. I’ll only focus on receiving packets, since we’re creating a network traffic analyzer. When packets arrive on the socket, the kernel will write them directly to userspace memory and set a status flag to indicate that it’s ready. This allows us to receive multiple packets without a context switch.
To set up packet mmap, I allocated an 128MiB ring buffer for receiving packets. This buffer was divided into frames of 2048 bytes each, to accomodate ethernet frames of approximately 1500 bytes. I then set up an epoll socket on the packet socket to notify me when there were new packets available. A dedicated goroutine calls epoll_wait in a loop and reads frames from the ring buffer. It decodes the embedded timestamp, copies the bytes to a frame buffer, and resets the status flag. Once 128 frames are received, the batch of frames is sent to a Go channel to be decoded and analyzed by the pool of analysis goroutines.
I added a few more features to improve the robustness and performance of my sniffer goroutine:
- Packet timestamps are compared to the wall clock. If the drift is too high, then I ignore the packet timestamp.
- Frame buffers are recycled and reused.
- The loop sleeps for a few milliseconds after each iteration. The duration of the sleep is based on the number of non-empty frames received. The sleep duration is adjusted with AIMD (up to 100ms) and targets an optimal “emptiness” of 1 in 64. This helps reduce wasted CPU cycles while ensuring the buffer never becomes full4.
- Every hour, I use the PACKET_STATISTICS feature to collect statistics and emit a warning message if there were any dropped frames.
One final note about packet sniffing: my NIC automatically performs TCP segmentation offload, which combines multiple TCP packets into a single frame based on their sequence numbers. Normally, this reduces CPU usage in the kernel with no drawback, but these combined frames easily exceed my 2048 byte frame buffers. It also interferes with the accurate counting of packets. So, I use ethtool to turn off offload features before my network analysis software starts.
Decode
At this point, you have a stream of ethernet frames along with their timestamps. To do anything useful, we need to decode them. There are lots of guides about decoding network protocols, so I won’t go into too much detail here. My own network analysis software supports only the protocols I care about: ethernet, IPv4, IPv6, TCP, UDP, and ICMPv6. From each frame, I build a Packet struct containing information like MAC addresses, ethertype, IP addresses, port numbers, TCP fields, and payload sizes at each layer. The trickiest part of this was decoding the ethernet header. Wikipedia will tell you about the different Ethernet header formats, but in practice, I observed only 802.2 LLC for the spanning tree protocol and Ethernet II for everything else. I also added support for skipping 802.1Q VLAN tags, even though I don’t currently use them on my network. I also learned that the IPv6 header format doesn’t tell you the start of the layer 7 data, but instead only gives you the type of the next header. To properly find the start of the payload, you’ll need to decode each successive IPv6 extension header to extract its next header and header length fields.
Local addresses
Since I’m analyzing internet traffic, I want to able to categorize inbound versus outbound WAN packets. Both transmitted and received frames are combined into a single stream when mirroring a switch port, so I built heuristics to categorize a packet as inbound or outbound based on its source and destination IP addresses. For IPv4 packets, all WAN traffic must have either a source or destination IP address equal to the public IPv4 address assigned by my ISP. There are protocols like UPnP that allow a local device to discover the public IPv4 address, but the simplest solution was to just look it up using my existing DDNS record, so I went with that. For IPv6, I use the global unicast addresses configured on the network interfaces of the network traffic analyzer machine itself. The interface uses a /64 mask, but I expand this automatically to /60 since I request extra subnets via the IPv6 Prefix Length Hint from my ISP for my guest Wi-Fi network. Classifying inbound versus outbound IPv6 packets is as simple as checking whether the source IP or destination IP falls within my local IPv6 subnets5.
Duplicate frames
One disadvantage of using the same ethernet port for management and for traffic sniffing is that you end up with duplicate frames for traffic attributed to the analyzer machine itself. For outbound packets, my sniffer observes the frame once when it leaves the NIC and again when it gets mirrored as it’s sent on the WAN port. For inbound packets, my router forwards two copies of the packet that are nearly identical, except for their MAC addresses and the time-to-live field. To avoid double counting this traffic, I skip all IPv4 packets where the source or destination IP address is an address that’s assigned to the analysis machine itself. Since all of my WAN IPv4 traffic uses network address translation, I only need to count the copy of the packet that uses the public IPv4 address. For IPv6, I deduplicate traffic by keeping a short-term cache of eligible packets and discarding nearly identical copies. When evaluating possible duplicates, I simply ignore the ethernet header and zero out the TTL field. A 1-second expiration time prevents this deduplication cache from consuming too much memory.
Neighbor discovery
My network traffic analyzer mostly ignores the source and destination MAC address on captured packets. Since it’s all WAN traffic, the MAC addresses basically just show my router’s WAN port communicating with my ISP’s cable modem termination system. As a result, you can’t tell which of my devices was responsible for the traffic by looking at only the ethernet header. However, you can use the Neighbor Discovery Protocol (NDP) to attribute a MAC address to an IPv6 address, which accomplishes almost the same thing.
For every local IPv6 address observed by my network traffic analysis software, I attempt to probe for its MAC address. I open a UDP packet to the address’s discard port (9) and send a dummy payload. Since the address is local, Linux sends out an NDP neighbor solicitation to obtain the MAC address. If the device responds, then the neighbor advertisement is captured by my packet sniffer and gets decoded. Over time, I built up a catalog of IPv6 and MAC address pairs, which allows me to annotate IPv6 packets with the MAC address of corresponding device. This doesn’t work for IPv4, of course, but I’m able to achieve a similar effect using static DHCP leases and allocating NAT port ranges for those static leases. This allows me to infer a MAC address for IPv4 packets based on their source port number.
Flow statistics
At this point, you have a stream of decoded packets ready for analysis. I classify each IP packet into a flow based on its IP version, IP protocol, IP addresses, and port numbers (commonly called a “5-tuple”, even though the IP version counts as a 6th member). For each flow, I track the number of transmitted and received packets, along with the cumulative size at layers 3, 4, and 7, the timestamps of the first and last packet, and the 5-tuple fields themselves. Each flow is assigned a UUID and persisted to a PostgreSQL database once a minute. To save memory, I purge flows from the flow tracker if they’re idle. The idle timeout depends on a few factors, including whether the flow is closed (at least one RST or FIN packet), whether the flow was unsolicited, and whether the flow used UDP. Regular bi-directional TCP flows are purged after 15 minutes of inactivity, but if the flow tracker starts approaching its maximum flow limit, it can use special “emergency” thresholds to purge more aggressively.
I also track metrics about network throughput per minute. There’s not much worth saying about that, other than the fact that packets can arrive at the analyzer out of order due to my sniffer’s batch processing. Throughput metrics are also persisted to PostgreSQL once a minute, and they allow me to graph historical bandwidth usage and display realtime throughput on a web interface.
TCP reconstruction
The next step for traffic analysis is to reconstruct entire TCP flows. Since most traffic is encrypted, I’m only really interested in the preamble immediately following connection establishment, which I defined as the first 32 packets of any TCP flow. My TCP tracker keeps track of up to 32 transmitted and received TCP payloads per flow. If all the conditions are met (syn observed, sequence number established, and non-empty payload in both directions), then I freeze that flow and begin merging the packets. I start by sorting the packets based on sequence number. In theory, TCP packets could specify overlapping ranges (even with conflicting data), but this almost never happens in practice. My TCP reconstruction assumes the best case and generates a copy offset and copy length for each packet, ignoring overlapping portions. Special consideration is required for sequence numbers that overflow the maximum value for a 32 bit integer. Instead of comparing numbers directly, I subtract them and compare their value with the 32-bit midpoint (0x80000000) to determine their relative order. If there are any gaps in the preamble, then I truncate the stream at the beginning of the gap to ensure I’m only analyzing a true prefix of the payload.
Once the TCP flow is reconstructed, I pass the transmitted and received byte payloads to a layer 7 analyzer, chosen based on the port number. Currently, I’ve only implemented analyzers for HTTP (port 80) and TLS (port 443). The HTTP analyzer just decodes the client request and extracts the HTTP host header. The TLS analyzer is described in the next section.
TLS handshake
HTTPS represents a majority of my network traffic, so it seemed worthwhile to be able to decode TLS handshakes. A TLS stream consists of multiple TLS records, which are prefixed with their type, version, and length. My TLS decoder starts by decoding as many TLS records as permitted by the captured bytes, but it stops once it encounters a Change Cipher Spec record since all the subsequent records are encrypted. With any luck, the stream will begin with one or more TLS handshake records, which can be combined to form the TLS handshake messages. I’ve currently only implemented support for decoding the TLS ClientHello, which contains most of the things I care about, such as the protocol version, cipher suites, and most importantly the TLS extensions. From the TLS extensions, I can extract the hostname (server name indication extension) and the HTTP version (application level protocol negotiation extension).
I’m currently not doing very much with this information, other than annotating my flow statistics with the HTTP hostname (extracted either from the Host header or from TLS SNI). In the future, I’d like to support QUIC as well, but the protocol seems substantially harder to decode than TLS is.
Closing thoughts
I think I have a much stronger grasp on network protocols after finishing this project. There are a lot of subtleties with the protocols we rely on every day that aren’t obvious until you need to decode them from raw bytes. During development, I decoded a lot of packets by hand to confirm my understanding. That’s how I learned about GREASE, which I initially assumed was my own mistake. Now, I regularly check my network traffic analyzer’s database to debug network issues. I’ve also been able to double check my ISP’s monthly bandwidth cap dashboard using my own collected data. It’s also a really convenient place to run tcpdump as root, just to check my understanding or to test out experiments. If you’re interested in computer networking, I encourage you to try this out too.
- Not a lawyer. Not legal advice. ↩︎
- Since I’m only monitoring WAN traffic, a gigabit port is sufficient. If I monitored LAN traffic too, then the amount of mirrored traffic could easily exceed the line rate of a single gigabit port. ↩︎
- The raw socket API also provided this functionality, but it only supported IPv4 and not IPv6. However, you’ll still hear this term commonly being used to describe this general technique. ↩︎
- At maximum speed, my internet connection takes about 3.6 seconds to fill a 128MiB buffer. ↩︎
- This typically works, but I occasionally spot packets with IP addresses owned by mobile broadband ISPs (T-Mobile and Verizon) from my smartphones being routed on my Wi-Fi network for some reason. ↩︎
2 CommentsAdd one
why
coolio