First Impressions of Storage Spaces Direct on a Nano Server Cluster

I finally found some time to experiment with two new things with Windows Server 2016 (TP2), Storage Spaces Direct and Nano Server. Overall the experience of getting that up and running was straightforward and as advertised. The performance was as good as expected even though I’m testing on a virtualized setup with modest hardware. The resources consumed in order to deliver that service are pretty darn minimal. Between 300 and 550MB of RAM and 1.5GB and 2GB of storage for the OS drives. Looking at the list of running processes gives you a good sense of how minimal that setup is and how secure and manageable it will be. One thing that takes some getting used to, is how stripped the OS is. i.e. Right now, you have the cmdlet to change IP configuration but not DNS. I’m sure that will be added in builds following TP2.

nano_process_list

Here are a few things I’ve noticed while diskspd against the Storage Spaces Direct cluster:

Memory Consumption
Memory is jumping significantly on only one of the nodes, most likely because it’s the one targeted with the SMB connection. When running on Nano server, going from 300MB to 900MB is a big deal. 😉
commited_bytes_jump
SMB Connection
For some reason, my SMB client was not being redirected to the proper node (well according to my current understanding). To make the test, I’ve moved all of the cluster resources to the “first” node of the cluster (disk, pool, SOFS resource). Still my client had a connection established with the “second” node.
Share Continuous Availability
Another thing I noticed I didn’t expect was when a server that is part of the cluster is stopped/restarted, IO pauses for a few seconds. The first time I noticed this, I thought it was because I restarted the server which had some of the cluster resources, which causes a failover, which can take some time before everything comes back to normal. I then made sure every resources was running on the “first” cluster node and then went ahead and restarted/stopped the “last” node. Every time I did this, an IO pause occurred. I suspect it’s because the node serving the share has some backend connections for block redirection to that specific node and those need to be re-established/renegotiated with another node to serve those blocks. As those blocks are mirrored to other nodes in the cluster, I would have expected that process to be absolutely transparent.
When restarting
IO_pause
When stopping
server_stopped
This led me to an idea, perhaps an absurd one but again maybe not. In the case of a continuously available file share on an SOFS cluster running on top of Storage Spaces Direct, it might be beneficial to exchange a file allocation map when a file lock is acquired (perhaps for intended file regions would work too). This way the client performing IO against that file would be able to recover itself and perhaps in a quicker fashion from IO errors on a particular node. Another benefit from this would be to add the capacity to perform IO from all the Storage Spaces Direct nodes in parallel for a single file. Right now, you basically have to spread the VHDX of a VM across multiple SOFS shares which are then owned by different cluster nodes in order to achieve this effect, a cumbersome solution in my opinion.
I’ll keep digging on those new things that are Nano Server and Storage Spaces Direct, so far I’m pretty pleased with what I’m seeing!
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s