I’ve had an interesting problem at work where I needed to capture high-resolution perfmon counters samples in order to highlight better an IO consumption issue. Sampling in something like Operations Manager or Hyper-V resource metering just didn’t cut is as the data was too averaged out. This meant we were losing significant IO spikes information. In order to get a better feeling of the data, I used my old friend PowerShell.
In order to achieve this, I used a helper function called Get-PerformanceMonitoring I wrote around the Get-Counter cmdlet. What this wrapper does is to facilitate the setup of capturing pre-determined performance counters for a list of computers. This saves you having to manually pick each and every counter for each machine you want to monitor. There’s also a few options as to what happens when that setup is done. You can send the samples to a file, capture only one sample and create a file (in case you want to do real-time monitoring, you can use the file with all the counters setup) or you can simply output the samples to the standard PowerShell pipeline as a stream.
After the first 24 hours of collecting that data on only 4 servers, I was left with a 7.7GB file. A file this size is quite painful to consult in perfmon. I then used perfmon to export that data as csv to consume it in Excel. This made it a bit easier to manipulate/summarize the data but it was still an unsustainable process in the long run and it’s also difficult to scale to a large number of computers.
I then started to stream the output of the first function into a filter function called Filter-CounterValues I had used to capture CPU spikes on about 350 PCs. By excluding IO spikes below a specific number, I was able to reduce considerably the data captured but I wanted to still capture more accurately minimum and averages for those servers’ disks.
Then I realized that it might finally be a good excuse to kick the tires on Azure Stream Analytics. I had developed a POC in the past using SQL Server Stream Insights, so I was already familiar with the core concepts and capabilities. Here’s an overview diagram of what that looks like:
First step in getting the data processed by a Stream Analytics job was to send it to an Azure Event Hub. Since I already had a library for Azure Service Bus, I figured it would be straightforward to get this POC going. First step was to download the new Service Bus libraries from NuGet. After that was done, I started to write the function to send messages to Event Hub. Unfortunately, I was never quite able to do it using the .Net API. When calling the method on the EventData object to add properties or data to it, I received an error from PowerShell saying it couldn’t find that specific method even though it was showing up while using Get-Member. Someone might have a hint as to what’s going on here! So after much head scratching and not going anywhere, I decided to use Event Hub REST API instead. After spending some time figuring out how to authenticate (not as obvious as you would think), I finally managed to send messages to Event Hub. Step 1, done!
Creating the Stream Analytics job was a simple process. Define your input by picking the Event Hub queue, define your output by picking an Azure SQL Database and specifying a table to store your results and then finally write the actual Stream Analytics query that crunches that data. I was hitting an issue at first where the job would fail with only a message saying that diagnostic data was not available at this time (it never became available). So I spend some time rewriting the query and casting numerical properly (coming from JSON that didn’t happen automatically it seems) and then everything started working as expected.
In summary, here’s what the process look like at a high level:
1) Capture perfmon counter sample
2) Filter counter samples below a certain value
3) Convert the counter sample object to a hashtable
4) Send the event to Azure Event Hub
5) The Azure Stream Analytics picks the event data
6) Query runs according to tumbling window size
7) Results are persisted
Here’s what the PowerShell command look like:
Get-PerformanceMonitoring -computersListFilePath C:\Code\Windows\MonitoringComputersLists\Production_All_SQL_Servers.txt -counters "\LogicalDisk(*)\Disk Transfers/sec" -outputType Stream | Filter-CounterValues -minimumValue 250 -counterPath "\LogicalDisk(*)\Disk Transfers/sec" -instanceNameExclusion "_total" | Convert-PerformanceCounterToHashTable -thresholdValue 250 -counterPath "\LogicalDisk(*)\Disk Transfers/sec" | Send-SBEventHubMessage -url "https://<eventhubname>.servicebus.windows.net/perfmoncountersampleshub/messages"
Right now I have two jobs defined that are consuming the same stream of messages/events. I have one that calculating the min/max/avg/count for each counter instance using a 5 minutes tumbling window and another using an hour window. Now it’s very straightforward to have data aggregates created using various windows size (5min,60min, daily,etc) without having to persist highly detailed intermediary data. Here’s what the Stream Analytics query looks like:
SELECT ComputerName,CounterName,InstanceName,System.TimeStamp AS SampleTime,MIN(CounterValue) AS MinCounterValue,MAX(CounterValue) AS MaxCounterValue,AVG(CounterValue) AS AvgCounterValue,Count(*) AS SampleCount INTO SQLOutput FROM EventHubInput GROUP BY ComputerName,CounterName,InstanceName,TumblingWindow(hour,1)
You can then monitor the jobs execution using the dashboard to keep track of inbound messages and output events:
You can also check Event Hub statistics to have an idea of the rate at which the event are sent:
Once everything was running smoothly, I just built a simple Excel workbook to consume this new data. Here’s what that looks like with some data:
I will also investigate using Power BI to create a proper dashboard. Let me know if you have any questions about this, I’ll be glad to share more detail about my experience.