An important process before deploying a new piece of core infrastructure in production is to make sure its performance meets your expectations. As explained in the post Storage Performance Baselining with Diskspd on the Altaro’s blog, it’s crucial to make sure your network performance is top-notch, especially if you’re using SMB3.x or iSCSI as the foundation to access or provide your storage infrastructure.
Historically, it’s been pretty straightforward to test the network. You whipped out iperf or ntttcp and away you go. Starting with protocols that take advantage of RDMA such as SMB 3.x with SMB Direct, we have to revise a bit our testing methodology to make sure we reach the maximum performance provided by our equipment. The problem with iperf of ntttcp is that they’re not taking advantage of RDMA. That means the performance number you get out of those tests are not representative of how the system will be used in real life with Hyper-V hosts accessing the storage hosted on a scale out file server using RDMA capable NICs. The other problem you hit at certain throughput is that you can’t simply saturate the network because you starting hitting a CPU bottleneck. If you’re like me, the storage in my lab is far from being exotic enough to match the performance of the production environment.
With 100Gb Ethernet equipment in my hands to test, I had to find a better way to make sure I get my money’s worth with the shiny new networking gear. After discussing with my go to guys when it comes to performance testing at Microsoft, Ned Pyle (follow him on twitter, funny and tons of good info) and Dan Lovinger (couldn’t find Dan on the Twittosphere), they mentioned I could use diskspd to perform network tests…
My first reaction:
But after letting that one sink a bit, I started to rollup my sleeves and give it a shot. In the first iteration of this, I was using the RAM Disk from Starwind and that did the job very well up to a certain degree but I wasn’t squeezing as much performance as I was expecting, most likely because of the overhead generated by running the RAM drive software. In a typical “Microsoft has all the goodies” fashion, Dan mentioned they had an internal version of diskspd that did an interesting trick that simulate a lot of the results obtained with a typical RAM drive. It essentially works by disabling the client side cache while leaving the server side cache enabled. It also uses a trick with the FILE_ATTRIBUTE_TEMPORARY flag that results in the IO to not be flushed on the backing media. Well, I wanted THAT version!
Not too long after our discussions, diskspd 2.0.17 was released and it contained the tricks I just explained. Since Ned brought it to my attention, I always referred to this mode as the hel.ya mode because of the example he sent me about how to use it:
diskspd.exe -b64K -c2M -t2 -r -o2 -d60 -Sr -ft -L -w0 \\srv\d$\hel.ya
I’m sure the folks at Mellanox are scratching their heads as to what is that file name! 😉
So here diskspd will create a very small 2MB file and then issue 64K random IO from 2 threads with 2 outstanding IO for a duration of 60 seconds while disabling the client side cache and leaving the server side enabled and also measuring the latency while the test is running.
So the next logical question is, does it actually work? Well, hel.ya!
To give a bit of background on that test:
- Using 2 ConnectX-4 single port per server (Firmware 220.127.116.11 and Driver 1.40)
- 1 NIC in a x16 slot
- 1 NIC in a x8 slot (don’t have 2 x16 in those particular servers)
- A x16 slot can do around 126Gbps (no point for us to use dual ports)
- A x8 slot can do around 63Gbps
- Servers are connected back to back
- Currently have a case open with engineering at Mellanox in regards to an issue with the Mellanox SN2700 switches. When performing this particular test, it looks like SMB Direct/RDMA dies in one direction for one of the NIC.
- One server is a E5-2670 v1 and the other an E5-2670 v2
- diskspd was run like so:
- diskspd.exe -b1M -c2M -t8 -r -o32 -d1800 -Sr -ft -L -w0 \\server01\c$\hel.ya
So ~30GB/s is not too shabby! All of this while only using 18% CPU, very impressive work done by the SMB Direct folks!
A couple of things worth mentioning while testing with the -Sr switch with diskspd
- Increasing the test file size will yield worse results for example, I had the following in some specific tests:
- 2MB: 17558MB/s
- 4MB: 15003MB/s
- 16MB: 10055MB/s
- 32MB: 7682.99MB/s
- The location/path of the file is irrelevant
- SMB Multichannel will kick-in by itself as usual, if you want to test a single NIC and easy way is to disable the NIC or setup SMB Multichannel Constraints.
On another note, doing packet capture at that speed starts to get problematic. For instance, a 3 second diskspd test gave me a 700MB/s capture pcap file. I’m also working with Mellanox for their pcap file that seems to be corrupted when using Mlx5cmd -sniffer and opening those using Wireshark or Microsoft Message Analyzer.
If you have any questions about this, I’ll be glad to give it a shot at answering them!