Performance tests for slow networks with tc
tl;dr: you can easily replicate slow network conditions which makes it easier to performance test them.
Smartphones have made both mobile websites and apps that have to connect in mobile situations a lot more common. There’s also a lot of value to be had in servicing these kind of customers and in recent years we’ve arrived at the stage where nearly every company has to have a mobile strategy. Now there are lots of challenges to overcome in this space but one I’m going to talk briefly about is performance testing.
Performance Methodology
Many people have different methodologies when it comes to identifying performance problems but nearly all involve some concept of iterative improvement. You first set a performance target such as handling a number of concurrent connections or a latency limit on a page load. Then you implement a test which tells you whether you’re meeting such a requirement. Perhaps this uses something like apache bench or jmeter in order to load test the system. You can then optimise and tune your setup and code until you meet your target.
Of course some network connections can be incredibly poor – how realistic is testing a website page load time on a 1Gbit connection when your poor user is stuck in rural Wales on 2G? At the same time it can be quite time consuming to replicate slow connections on your phone. What you really want is a command to slow down a regular network connection so you can replicate the problem, but without too much hassle.
Replicating a Slow Network with tc
Whilst there are doubtless other ways one that I’ve used successfully before is the linux commandline program tc – which stands for traffic control. This lets you take a regular network interface and introduce additional latency or reduce the available bandwidth. I won’t go into the full details of how to configure tc but its man page helps if you want full details. My aim here isn’t to write a full tutorial on using tc, but I’ll go through the basics of how to add a couple of rules to your machine in order to simulate simple problems. In tc each of these individual rules is called a Queuing Discipline or qdisc.
List Rules
If at any point you want to see all the qdiscs then you can use the ls command. Here is an example for the lo interface:
tc -s qdisc ls dev lo
Its pretty cryptic I’ll admit, but you can read the arguments to tc as saying “list all qdiscs on device lo”. A common component of each of these examples is specifying the device. This is initially empty, but you will see entries once you’ve added some rules, which we’ll do next. I’m using lo – the lookback device for all these examples but you might want to use eth0, wlan0 or whatever is the appropriate device that you’re testing against.
Limiting bandwidth
First of all we substitue a new queuing discipline for loopback device.
tc qdisc add dev lo handle 1: root htb
If I want to add a qdisc that limits lo to only 100kbps then I can use the rate command:
tc class add dev lo parent 1: classid 1:11 htb rate 100kbps
We can breakdown this command into different parts which should hopefully make it a little less obtuse. class add tells tc that we’re adding a new class, this is a tree of queuing disciplines. dev lo specifies that we’re using the loopback device. Now because tc is designed to allow for complex filters to be built up these classes can actually form a tree and be identified, this is what parent 1:1 classid 1:11 refers to. An htb is a Hierarchial Token Bucket which is a simple method of controlling outbound traffic on a network device. Finally rate 100kbps sets the maximum rate that we want traffic to go at. You should take care here because tc accepts both 100kbps and 100kBps, depending upon whether you want kilobytes or kilobits.
Introducing Latency
If I want to add a qdisc that introduces 300ms of lag to lo then I can use the delay command:
tc qdisc add dev lo parent 1:11 netem delay 300ms
Breaking down this command we see that this time we’re adding a qdisc instead of a class, so we use qdisc add. We’re again using loopback as the device, so we use dev lo. We specify the previous class as our parent in the tree, by using parent 1:11. If you look back to the previous command you’ll see that we used classid 1:11 to set its identifier and this is what the 1:11 refers to. Then delay 300ms tells tc to add 300ms of lag.
Limiting Ports
Of course you might have other connections talking over the same network device and limiting them all to 3G isn’t going to make for a reliable speed test. So we’re going to add a filter to make sure that these rules only apply to port 8080. A filter is a restriction that only allows a qdisc to apply in certain conditions.
tc filter add dev lo protocol ip prio 1 u32 match ip dport 8080 0xffff flowid 1:11
As you should be familiar with by now, we’re using filter add to add a filter and dev lo to specify the device it applies to. protocol ip does what it says on the tin: specifies the ip protocol. prio 1 specifies the priority of this filter. Prorities allow you to determine which class gets additional bandwidth if its available and if it follows the rules that you’ve specified.
u32 refers to the type of filter rule we’re using. match ip dport 8080 tells tc to look at port 8080 as the destination of the incoming request. 0xffff is a bitmask that forces u32 filter to look at the whole header. if you have multiple qdiscs then tc needs to know which rules to use, so flowid 1:11 specifies the identifier we used before as the parent class in the “Limiting bandwidth” section.
Its a bit of a mouthfull I know, but being able to restrict by port can be pretty handy.
Returning to normality
Once you’re finished benchmarking you obviously want to return things to normal. In order to delete the restrictions you’ve just imposed on the interface lo you can use the del command.
tc qdisc del dev lo root
Concluding Remarks
This approach can be pretty helpful in testing out whether a specific change you’ve made really works without the time consuming steps or replicating on real devices. Once you are confident that your optimisations have worked then you should still test out on real hardware to make sure. There are a variety of differences between a real slow network and the changes you’ve just made to your own network device and you should make sure that you validate your changes work in the wild, not just in the lab.
tc also has loads of other useful features which are documented all over the internet.
Thanks to John Oliver for pointing out this technique and, with Perry Lorier, reviewing early drafts of this blog post.