Understanding subnet masks

I have a fairly broad understanding of networking and how the Internet works. However, for some reason, I had previously never fully understood exactly what a subnet mask is. Why are IP addresses sometimes written with a slash and another number (10.0.0.1/24) at the end of them? Why do you see something like ”subnet mask: 255.255.255.0” in your network configuration settings? I knew that both had something to do with subnet masks, but not much beyond that. To learn more, I decided to jot down some notes on the subject. This post assumes some basic understanding of IP addressing and the Internet, although I’ll try to fill in as much detail as I can. It turns out that, to properly understand subnet masks and the context for their existence, you need to learn a surprising amount about the architecture and evolution of the Internet.

The IP addressing system

First, the IP (internet protocol) addressing system. IP addresses are a solution to a core design problem that arises when sending messages across any type of network: how do you efficiently ensure that every message arrives at the correct destination? To solve this problem in the real world, many countries use a system of physical street addresses. Everybody puts their street number above their door, and the government assigns and keeps track of street names (fortunately, those don’t change often). The organizational structure of the Post Office then ensures that every letter makes it to the right mailbox. Within that structure, each individual Post Office location independently applies a set of rules for sorting and passing on letters and packages. By independently, I mean that they don’t consult some central authority for instructions each time a letter is dropped off. Although there are similarities, the IP system looks a little different than the post office, largely because solving the addressing problem on the Internet is more difficult. The number of clients is an order of magnitude larger, intermediate routers are controlled by many different entities, computers connect and disconnect to the Internet much more frequently than people change their street addresses, and the volume of messages and speed of handling them is significantly greater.

IPv4, the first “production” version of IP addressing, was initially described together with TCP in a 1974 paper and deployed on the Internet predecessor ARPANET in 1983. Each IPv4 address is a series of 4 bytes (thus 32 bits). This means that, despite any appearances to the contrary, an IP address is just an integer with a value between 0 and 232 – the typical “dotted quad” notation’s purpose is to make an address easily human readable. But, assigning a unique identifier to every device on the Internet isn’t sufficient to solve the message routing problem! The network needs some sort of overall structure or “topology”, otherwise designing a system of efficiently routing messages becomes quite difficult. Going back to our mail analogy, imagine if the Post Office had to deliver mail by looking up every address in a giant atlas or table, rather than relying on the fact that each ZIP code consists of a contiguous area. Maintaining that atlas would be an enormous amount of work, and looking up each address would take a long time. And of course, with the Internet, computers join and leave the network very frequently, so the “atlas” would be constantly out of date.

To create this organization, IP addresses needed to be assigned to users in an systematic way that reflected the network architecture, rather than at random. Initially, continuous ranges of IP addresses, referred to as “blocks”, were assigned to participating institutions such as Universities. The institution associated with a given IP was identified by the first byte, known as the “network byte”. For example, MIT was given the block of addresses starting with decimal 18 (18.0.0.0 through 18.255.255.255). This arrangement made the logic for routing messages straightforward. Every packet with a destination address starting with “18” could be relayed in the direction of MIT’s network. MIT would assign each machine on their internal network its own unique address using the last three bytes, known as the “host bytes”, and route the packets accordingly. Unfortunately, this system only allowed for 255 blocks of addresses, which was soon revealed to be insufficient. Since many institutions didn’t need a whole block (~16.8 million) of addresses, the system by which addresses were assigned was amended to allow three different sizes of blocks. Class A blocks were like the one given to MIT (1 network byte, 3 host bytes), Class B blocks had 2 network bytes and 2 host bytes, and Class C blocks had 3 network bytes and 1 host byte. Suddenly, there were enough blocks of addresses to accomodate many more institutions.

Classless Interdomain Routing

This arrangement worked for most of the 1980s, but problems began to emerge. One such issue: in order to know where to send packets, routers maintain what are known as routing tables. These tables are updated dynamically by routers communicating with each other using specialized protocols separate from TCP/IP. Routing tables specify information like “if you see a packet destined for an address starting with “18”, send it to this address next.” Presumably, this address, often referred to as the next “hop”, is part of the shortest path to the MIT network. With the introduction of classful block assignment, the number of sub-networks connected to the Internet grew rapidly, and the size of routing tables grew proportionally as a result. The storage requirements for these tables threatened to overwhelm the capacity of router hardware at the time.

A solution to this problem, known as CIDR (Classless Interdomain Routing), was released in 1993. It’s important to mention again that there is no inherent difference between the network and host portion of an IP address – as I mentioned, an IP address is just a group of 32 bits. The dividing line between the network and host sections is always “in the eye of the beholder”. With classful routing, this dividing line was always located in the same place, depending solely on the class of the destination network. As a result, every router on the internet had the same key for that network in their routing table. In contrast, CIDR allows routers to maintain a flexible distinction between the host and network bytes. How does that work? Let’s say a router receives a packet destined for address 123.123.123.123, which happens to be somewhere “far away” (requiring many hops) on the network. The routing table can treat, say, the first 4 bits as the network section, and store the same next hop for all addresses starting with those four bits. Of course, this rests on the assumption that all IP addresses starting with those bits are roughly grouped together on the network. This arrangement saves a lot of space in the routing table by combining “redundant” entries. By redundant entries, I mean that, if packets going to 123.121.111.111 and 123.123.123.123 are heading in the same general direction, the router doesn’t need to keep a separate line in the routing table for each of them. Under the old classful system (assuming they were both class B networks), two entries would have been required – one for addresses starting with 123.121 and one for 123.123. As a packet is passed on from hop to hop and gets closer to its destination, routers can treat the network section of the address as being larger in their routing tables.

Subnet masks (finally)

So, what do subnet masks have to do with all of this? Subnet masks are an implementation detail of the architecture I’ve described above. First, what is a bitmask in the general sense? Bitmasks are a way of allowing a computer to efficiently check whether a portion of one set of bytes matches another set of bytes. Let’s say we have the single byte 1101 1001, and we want to check whether the first four bits are actually 1101. First, we apply the mask 1111 0000 by bitwise ANDing the byte with the mask. We get 1101 0000 as the result. Now, we do a bitwise comparison with the byte 1101 0000, and we’ve verified the first four bits. When using a netmask, the set of input bytes is an IP address. Let’s say we’re a router in the early days of the internet, we’ve received a packet for IP 18.165.1.1, and we want to use our routing table to know where it should go next. We know that we only care about the first byte, so we can apply the mask 11111111 00000000 00000000 00000000 (255.0.0.0) to get 18.0.0.0. Now, looking that up in our routing table, we see the entry matching 18.0.0.0, and we know the next hop for the packet. So, a netmask is just a way of checking that a section of an IP address matches some other IP address that happens to be very efficient for CPUs. A subnet mask is a use of a netmask to check whether a packet is bound for a particular subsection of the network. Of course, I could have just told you that at the beginning, but you wouldn’t have appreciated its significance without a little understanding of why we need to do that sort of checking in practice.

Assorted other details

What about the alternative notation I mentioned earlier, using /24? This notation is known as CIDR notation. Remember, with CIDR, we can have the network portion of an IP address be any number of bits. Writing out a netmask for some arbitrary number of bits is a little bit awkward in decimal dotted-quad notation. It’s much easier to just specify the length of the mask, aka the number of bits that are part of the network section. So, if we only want to match on the first four bits, we’d use “/4”. When a computer pairs an IP address and a subnet mask together, like 123.123.123.100/24, it’s saying, “from my perspective, this address has 24 bits (123.123.123) corresponding to the subnetwork it is destined for, and 8 bits (“100”) corresponding to the host it is destined for on that subnet. This is exactly the same as specifying a subnet mask of 255.255.255.0 – three bytes of 11111111 and one byte of 00000000.

So far, in this post I’ve only discussed IPv4. It’s important to note that there is a whole other implementation of IP, known as version 6 (IPv6). Aside from the other challenges I’ve mentioned, there simply aren’t enough IPv4 addresses to go around. Having ~4.3 billion possible addresses seemed like in inexhaustable supply in the early days of the Internet, but that proved to be another false assumption. The most obvious solution to this problem is to increase the number of addresses – IPv6, released in 1993, changes the address format to be 128 bits (16 bytes), which allows for an address space of 2128, more than enough addresses (~3.4 x 1038 of them) for the foreseeable future. Unfortunately, updating a protocol without having backwards compatibility is quite a challenge. Plenty of devices connected to the Internet have legacy software that is difficult to patch with IPv6 compatability, making switching completely to IPv6 almost impossible at present. If you tried to only use IPv6, you would not be guaranteed interoperability with the entire Internet. Thus, IPv4 will probably remain the dominant addressing system in the nearer term.

The main solution to IPv4 address exhaustion has involved creating smaller private subnetworks behind a public router. This is most likely the way that you connect to the Internet at home. If you assign one single public IPv4 address to your router, and have the router create an internal network blocked off from the outside world, you can connect many more devices using the same global 32 bit address space. In this system, the main Internet is referred to as the WAN, or wide area network, and the sub network is known as the LAN, or local area network. Configuring these local networks involves another use of subnet masks. Most home routers have an internal network with an address range that’s something like 192.168.0.0 to 192.168.0.255. Because these addresses aren’t connected to the wider Internet, every router can use this same range without risk of confusion. It’s the router’s job to handle the distribution of incoming packets to the correct device. In the case I describe, the router is using a subnet mask of 255.255.255.0: everything matching 192.168.0 over the first three bytes is destined for some computer on the local network. If you have too many people over for dinner (or working for your company), you might run out of addresses in this range. One thing you could do is change the subnet mask to 255.255.0.0, giving you many more addresses to work with.

To summarize

In conclusion, here are the key takeaways from my original question. First, a bitmask is a way for a computer to efficiently check whether part of a chunk of bytes matches some pattern. Second, a netmask is a bitmask designed to check whether part of a network address matches a specific pattern. Third, subnet masks are a category of netmasks, used to match whether a message is destined for a particular subsection of a network. In practice, the difference between a netmask is almost always nonexistent, as netmasks are usually used as subnet masks. And finally, one (of many) applications of subnet masks is related to the routing of packets on the Internet, a system which has an interesting history behind it.

Thanks to Or Mattatia and Ahmad Jarara for their comments and suggestions.