How to fine-tune pfSense 2.4.4 for 1Gbit thoughput on APU2

This is an update to the article we wrote in 2017, where we showed that pfSense 2.3 is not able to route full gigabit on APU2. With some more testing and tinkering we are now able to get full gigabit on APU2! 

As stated in the previous article, APU2 has 4x 1Ghz CPU cores, pfSense by design is able to use only 1 core per connection. This limitation still exists, however single-core performance has considerably improved. With new BIOS, and settings described below, pfSense can route about 750-800Mbit/s on one connection.

APU2C4 and APU2D4 have very performant Intel I210-AT Network Interfaces. These NICs have 4 transmit and four receive queues, being able to work simultaneusly on 4 connections. With some fine tuning, pfSense can take advantage of this and route at 1Gbit when using more than one connection.

The other APU boards (APU2C0, APU2C2, APU3, APU4) have I211-AT Network Interface, with 2 transmit/receive queues. This is less performant NIC, but it's still good enough to deliver 1Gbit on pfSense when one than one connection is used.

Routers rarily open just one connection, so single connecton is rarily a bottleneck in real world. Web browser opens about 8 TCP connections per website, Torrent clients open hundreds of connections, Netflix opens multiple TCP connections when streaming video, etc. 

Previous throughput test wasn't taking advantage of multi-queue NICs, because by default pfSense uses only one NIC queue on APU2, here's how to change this.

Gigabit pfSense config

First, head to the pfSense Web panel -> System -> Advanced -> Networking -> Scroll to the bottom.

Make sure that all 3 first checkboxes under "Network Interfaces" are unchecked.

  • Hardware Checksum Offloading
  • Hardware TCP Segmentation Offloading
  • Hardware Large Receive Offloading

Like shown on the screenshot:

 

Now we need to edit some settings from the shell. You can SSH to the box or connect with the serial cable.
To get the full gigabit, edit /boot/loader.conf.local (you may need to create it, if it doesn't exist) and insert the following settings:

# agree with Intel license terms
legal.intel_ipw.license_ack=1
legal.intel_iwi.license_ack=1

# this is the magic. If you don't set this, queues won't be utilized properly
# allow multiple processes for receive/transmit processing
hw.igb.rx_process_limit="-1"
hw.igb.tx_process_limit="-1"

# more settings to play with below. Not strictly necessary.

# force NIC to use 1 queue (don't do it on APU)
# hw.igb.num_queues=1

# give enough RAM to network buffers (default is usually OK)
# 131072 is 256MB 
# kern.ipc.nmbclusters="131072"

#net.pf.states_hashsize=2097152
#hw.igb.rxd=4096
#hw.igb.txd=4096

#net.inet.tcp.syncache.hashsize="1024"
#net.inet.tcp.syncache.bucketlimit="100"

After saving this file, reboot your router to apply it.

Now you can run some tests to verify that your settings worked properly. The easiest way it to use iperf3 with multiple connections, where one device is on the LAN and the other one in the internet. 

iperf3 APU2 throughput test

We setup one iperf3 server on the internet, and called it from a host on the LAN.

On the server (somewhere on the intetnet) run the following command

iperf3 -s

On your LAN run this command:

iperf3 -c SERVER_IP_HERE -P 4

If everything went well, you should be seeing about 940Mbit/s throughput, similar to the snippet below:

- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]  43.00-44.00  sec  56.1 MBytes   470 Mbits/sec    0    481 KBytes       
[  7]  43.00-44.00  sec  55.7 MBytes   468 Mbits/sec    0    438 KBytes       
[SUM]  43.00-44.00  sec   112 MBytes   938 Mbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]  44.00-45.00  sec  56.4 MBytes   473 Mbits/sec    0    481 KBytes       
[  7]  44.00-45.00  sec  56.1 MBytes   470 Mbits/sec    0    438 KBytes       
[SUM]  44.00-45.00  sec   112 MBytes   943 Mbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]  45.00-46.00  sec  56.1 MBytes   470 Mbits/sec    0    481 KBytes       
[  7]  45.00-46.00  sec  55.6 MBytes   466 Mbits/sec    0    438 KBytes       
[SUM]  45.00-46.00  sec   112 MBytes   936 Mbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]  46.00-47.00  sec  57.7 MBytes   484 Mbits/sec    0    481 KBytes       
[  7]  46.00-47.00  sec  55.0 MBytes   461 Mbits/sec    0    438 KBytes       
[SUM]  46.00-47.00  sec   113 MBytes   945 Mbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]  47.00-48.00  sec  55.2 MBytes   463 Mbits/sec    0    481 KBytes       
[  7]  47.00-48.00  sec  55.8 MBytes   468 Mbits/sec    0    438 KBytes       
[SUM]  47.00-48.00  sec   111 MBytes   931 Mbits/sec    0  

 

Here's a screenshot from pfSense panel - take a look at the traffic graph. 

 

 

I think this is quite neat. It turns out that internet is wrong :-) . One can get full gigabit on pfSense when utiliing multiple NIC queues and multiple CPUs!

if you have any questions about the above article, mail us at info@teklager.se