In November 2015 we started to examine the possibility of our own Performance Enhancement and Response Team (PERT). The PERT objective is to resolve network connection problems that involve several parties quickly and efficiently.
Controlled testing after DDoS attack of 20 Gbit/s
The SURFnet support team for security incidents SURFcert deals with external Distributed Denial of Service (DDoS) attacks on our institutions every day.
The University of Twente (UT) occasionally faces such incidents as well. Because the UT has a redundant 40 Gbit/s SURFinternet connection, it can process a lot of traffic without requiring any action by SURFcert. With its Cisco 6500 with Supervisor 2T Architecture, the University of Twente can handle up to 40 Gbit/s.
In early 2016, the UT faced an incoming DDoS attack of about 20 Gbit/s. This resulted in the loss of the BGP sessions and for a short while the internet connection too. The UT was unable to explain this, since it should be able to handle 20 Gbit/s without any problems. The University of Twente therefore asked the PERT project team at SURF to help with some controlled testing to analyse the location of any bottlenecks in the setup.
Sending different traffic patterns with more than 20 Gbit/s is not something you do off the top of your head. You need a specialised device called a traffic generator or a server with a CPU that is quick enough and a 40GE or 100GE Network Interface Card (NIC).
The SURFnet testbed has OpenFlow capable switches and a number of machines with 40GE and 100GE interfaces. Depending on the system architecture, it can be difficult to generate more than 10 Gbit/s of traffic with the standard operating system and regular drivers. Using Intel DPDK libraries and drivers makes it possible to talk to the network interfaces directly and so achieve a far higher transfer rate.
As the UT has a redundant connection, it was possible to do the testing during the day. Regular traffic could go through its primary connection while the secondary connection was used for testing.
To do this, the subnet used for testing was withdrawn from the BGP announcements on the primary router to make sure all testing traffic went to the secondary router only. The UT configured this router in a way that ensured that all received traffic for that subnet was dropped immediately.
For this test, the PERT project team connected part of the SURFnet testbed to the production environment, so that traffic could be sent to the UT’s secondary connection.
A small IP series was made available for this as well, so that several source IP addresses could potentially be used to investigate the effects of hashing.
Several tests were carried out. Some of the relevant tests are shown below:
- The UT uses standard SPAN functionality to copy the traffic on the SURFinternet port to another port and so analyse traffic patterns. With this functionality, the system did not achieve more than 20 Gbit/s in any scenario.
- With several (sender) IP addresses and the SPAN feature disabled, 38.5 Gbit/s of traffic posed no problems. IP packets of 1,500 bytes were sent in this case.
- With only two different (sender) IP addresses and the SPAN feature disabled, the limit was about 32 Gbit/s. Sufficiently large packet sizes were tested.
- A test similar to the one above (item 3), but with the SPAN feature enabled, immediately resulted in overruns, causing BGP sessions to be lost again.
- With only a single flow, a sufficiently large packet size and the SPAN feature disabled, the limit was about 16 Gbit/s.
- With a single flow, small packets and the SPAN feature disabled, we also saw overruns with 16 Mpps at a rate less than 16 Gbit/s.
The tests were carried out by me, Jeroen van Ingen (University of Twente) and Ronald van der Pol (SURFnet).
The main conclusion we can draw from this is that the SPAN feature halves the maximum bandwidth of a 40 Gbit/s interface to about 20 Gbit/s. Contrary to the UT’s expectations, that also seemed to be the case for traffic dropped with an ACL in hardware.
A single flow is clearly also limited to 16 Gbit/s due to the module’s hardware design in which a 40 Gbit/s interface (in performance mode) has to load balance packets over 4x 16 Gbit/s channels. That was already known based on the architecture.
The number of packets per second also turned out to be less than what we expected. Although the hardware specs indicate that up to 60 Mpps is possible per module, a single flow at 16 Mpps (under the 16 Gbit/s limit of a single input flow) was still too much to process. The limit may be about 15 Mpps per replication engine, with the maximum rate of 60 Mpps only feasible if the flows are distributed perfectly over a module.
In the end, most of the observations can be explained based on the design of the line card.
To theoretically be able to process up to 40 Gbit/s of input without losing the current analysis capabilities, an obvious solution is to switch from using the SPAN feature to passive optical taps.
Contact the PERT project team
Read more about PERT in the blog post ‘Performance Enhancement: network detectives’