GEM Automation 4.0.0.1 Released

A quick post to let you know that this week I did a couple of releases for GEM Automation that include a large number of changes as it includes all the commits that were done to the code repository since May.

At a high level there were changes to the following sections:

  • Active Directory
  • Build Deployment
  • Elasticsearch
  • Infrastructure Testing
  • Kibana
  • Networking
  • Service Bus
  • SharePoint
  • SQL Server
  • Storage Spaces
  • Utilities
  • Windows

For a detailed list of changes, please consult the release notes for build 4.0.0.0 and 4.0.0.1.

You can download the release 4.0.0.1 here which includes all the latest changes.

PowerShell Performance Tip – Use JavaScriptSerializer instead of ConvertTo-Json

While working on a bulk API function that sends a lot of  JSON data to elasticsearch, I noticed that using the native .NET Framework object System.Web.Script.Serialization.JavaScriptSerializer yields huge performance gains versus using the ConvertTo-Json cmdlet.

Here’s a quick test that compares both methods:

Add-Type -AssemblyName System.Web.Extensions
$jsonSerializer=New-Object -TypeName System.Web.Script.Serialization.JavaScriptSerializer
measure-command{for($i=0;$i -lt 10000;$i++){$null=$jsonSerializer.Serialize("test");}}

Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 127
Ticks : 1272587
TotalDays : 1.47290162037037E-06
TotalHours : 3.53496388888889E-05
TotalMinutes : 0.00212097833333333
TotalSeconds : 0.1272587
TotalMilliseconds : 127.2587

measure-command {for($i=0;$i -lt 10000;$i++){$null="test"| ConvertTo-Json;}}

Days : 0
Hours : 0
Minutes : 0
Seconds : 3
Milliseconds : 874
Ticks : 38746129
TotalDays : 4.4845056712963E-05
TotalHours : 0.00107628136111111
TotalMinutes : 0.0645768816666667
TotalSeconds : 3.8746129
TotalMilliseconds : 3874.6129

As you can see, the performance gain is rather significant! When you need to send millions of JSON object to a web service, that makes quite a difference.

Visualize SentryOne Data Using Elasticsearch and Kibana

I’ve recently worked on consolidating a lot of log data into Elasticsearch and visualize it using Kibana. One of the source I wanted to see in Kibana was SQL Sentry (now called SentryOne) to view SQL queries performance issues and failures.

In order to achieve this, I built a few PowerShell functions in a new module called libElasticsearch.psm1 in the GEM Automation project. Here’s the process that’s involved at a high level:

  1. Run query against the SQL Sentry database to collect SQL query events incrementally
  2. Create bulk indexing calls to elasticsearch by batching the SQL result set using the bulk API format
  3. Persist the last time the data was collected to disk for use in the next iteration

Before sending data to elasticsearch, it’s typically better to define the index/object type. This will allow elasticsearch to properly identify the content of each field and index it accordingly. For the SQL Sentry data that I’m capturing, the index definition is the following:

{
      "mappings": {
         "sqlsentryevent": {
            "properties": {
              "eventype": {
                "type": "keyword"
              },
              "normalizedtextdata": {
                "type": "keyword"
              },
              "database": {
                "type": "keyword"
              },
              "clientcomputer": {
                "type": "keyword"
              },
              "application": {
                "type": "keyword"
              },
              "username": {
                "type": "keyword"
              },
              "duration": {
                "type": "integer"
              },
              "cpu": {
                "type": "integer"
              },
              "read": {
                "type": "integer"
              },
              "writes": {
                "type": "integer"
              },
              "runstatus": {
                "type": "integer"
              },
              "eventtime": {
                "type": "date",
                "format": "basic_date_time",
                "null_value": "19500101T000000.000Z"
              },
              "querytype": {
                "type": "keyword"
              },
              "servername": {
                "type": "keyword"
              }
            }
         }
      }
}

You can find the definition file in SQLSentryEvents.json

To load this definition in elasticsearch, you can use the following PowerShell code:

Import-Module -Force .\libElasticsearch.psm1

Add-ElasticsearchIndex -serverName <elasticsearch Host Name> -indexName sqlsentryevents -indexDefinition (Get-Content -Path .\IndexDefinitions\SQLSentryEvents.json)

To capture the SQL Sentry SQL query event data, I wrote the following query:

SELECT tl.RemoteObjectID AS EventType,
th.NormalizedTextData,
tl.[Database],
tl.Computer AS ClientComputer,
tl.Application,
tl.Operator AS UserName,
(tl.Duration/10000000) AS Duration,
tl.CPU,
tl.Reads,
tl.Writes,
tl.RunStatus,
FORMAT(DATEADD(mi, DATEDIFF(mi,GETDATE() ,GETUTCDATE()), tl.StartTime),'yyyyMMdd')+'T'+FORMAT(DATEADD(mi, DATEDIFF(mi, GETDATE(),GETUTCDATE() ), tl.StartTime),'HHmmss')+'.000Z' AS EventTime,
CASE
WHEN th.NormalizedTextData LIKE '%trace_getdata%' THEN 'Monitoring'
WHEN th.NormalizedTextData LIKE '%BACKUP%' THEN 'Maintenance'
WHEN th.NormalizedTextData LIKE '%STATISTICS%' THEN 'Maintenance'
WHEN th.NormalizedTextData LIKE '%INDEX%' THEN 'Maintenance'
WHEN th.NormalizedTextData LIKE '%updatestats%' THEN 'Maintenance'
WHEN th.NormalizedTextData LIKE '%sys.%' THEN 'Monitoring'
WHEN th.NormalizedTextData LIKE '%repl%' THEN 'Replication'
WHEN th.NormalizedTextData LIKE '%sp_server_diagnostics%' THEN 'Monitoring'
WHEN th.NormalizedTextData LIKE '%sp_readrequest%' THEN 'Replication'
WHEN th.NormalizedTextData LIKE '%sp_MSdistribution%' THEN 'Replication'
WHEN th.NormalizedTextData LIKE '%syncobj_%' THEN 'Replication'
WHEN th.NormalizedTextData LIKE '%waitfor delay @waittime%' THEN 'CDC'
ELSE 'Application Query'
END AS QueryType,
esc.ObjectName AS ServerName
FROM dbo.vwMetaHistorySqlServerTraceLog (nolock) tl
INNER JOIN dbo.PerformanceAnalysisTraceHash (nolock) th ON tl.NormalizedTextMD5 = th.NormalizedTextMD5
INNER JOIN EventSource (nolock) es ON tl.EventSourceId = es.ObjectId
INNER JOIN EventSourceConnection (nolock) esc ON es.EventSourceConnectionID = esc.ObjectId
WHERE (esc.ObjectName LIKE 'SRV%') AND
tl.StartTime >= @EventStartTime AND
tl.StartTime < @EventEndTime

I can now call the function that does the query execution and converts it the bulk API format. Note that you can use this function to send pretty much any SQL Server query results to Elasticsearch. At this point in time the function is very crude but functional.

Import-Module -Force .\libElasticsearch.psm1

Add-SQLQueryResultAsDocumentBatch -sqlServerName <SQL Server instance with SQLSentry DB> -databaseName SQLSentry -query (Get-Content .\SQLQueries\SQLSentryEvents.sql) -queryParameters @{EventStartTime=(Get-Content .\Checkpoints\<SQL Server instance with SQLSentry DB>_sqlsentryevents_checkpoint.txt);EventEndTime=Get-Date -Format "yyyy-MM-dd HH:mm:ss"} -batchSize 1000 -elasticsearchServerName <elasticsearch Host Name> -indexName sqlsentryevents -documentType sqlsentryevent -trackCheckPoint $true -checkPointParameterName "EventEndTime"

Here are the parameters for the function:
– sqlServerName: This is the name of the SQL Server instance you wish to connect to execute the query
– databaseName: The name of the database against which the query will be run
– query: The query text of what you wish to run. As sometimes this can be quite a long string of text, I’m putting it in a file called SQLSentryEvents.sql in this case and I load it using Get-Content
– queryParameters: This contains a hashtable of parameters that are required to execute the query. For instance, the above query contains two T-SQL parameters, @EventStartTime and @EventEndTime. Those are used to apply our moving window to incrementally load the events. Another trick in this is that I’m loading the last time the job was run by loading a checkpointing text file that contains this information.
– batchSize: The maximum number of items that will be sent in one bulk API call to elasticsearch
– indexName: The name of the index in which the documents/records will be sent
– documentType: The type of documents that will be sent in the index. This is the type that was defined in JSON earlier.
– trackCheckPoint: This is just a flag to make sure you keep the last loading point to allow incremental loads.

You can then schedule that call using Windows Task Scheduler for instance and have this run at regular intervals.

With you data now loaded, you can then start visualizing this using Kibana.

In order to do this, you do the following:

– Create saved searches using the Discover section using the filtering options you want.
– Build visualizations (those are widgets that will compose your dashboard) based on the saved searches or you can also just use the whole index as the target
– Build your dashboard using the visualizations you just built

Here’s an example of a dashboard I did:

sqlsentry_kibana_dashboard

Also note that you can still view the raw data if you need to while you explore your data:

sqlsentry_kibana_event_details

That’s it! I hope this will be useful to you. Should you have any questions regarding this, feel free to drop me a comment below.

PowerShell Performance Tip – Concatenating Strings Using StringBuilder

While doing performance tuning on a script that did a lot of string concatenation, I switched from using a traditional string concatenation using += to using the .NET object System.Text.StringBuilder.

Here’s a comparison of the speed of both methods:

measure-command {$body=New-Object -TypeName System.Text.StringBuilder;for($i=0;$i -lt 100000;$i++){$null=$body.Append("test")}}

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 162
Ticks             : 1620291
TotalDays         : 1.87533680555556E-06
TotalHours        : 4.50080833333333E-05
TotalMinutes      : 0.002700485
TotalSeconds      : 0.1620291
TotalMilliseconds : 162.0291

measure-command {$body="";for($i=0;$i -lt 100000;$i++){$body+="test"}}

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 48
Milliseconds      : 584
Ticks             : 485840018
TotalDays         : 0.000562314835648148
TotalHours        : 0.0134955560555556
TotalMinutes      : 0.809733363333333
TotalSeconds      : 48.5840018
TotalMilliseconds : 48584.0018

PowerShell Performance Tip – Sending to Null

While performance tuning a PowerShell function, I noticed that piping to Out-Null is much slower than assigning to the $null variable. Here’s a comparison of both options:

measure-command {for($i=0;$i -lt 100000;$i++){"Hello Null!" | Out-Null}}

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 3
Milliseconds      : 931
Ticks             : 39314263
TotalDays         : 4.5502619212963E-05
TotalHours        : 0.00109206286111111
TotalMinutes      : 0.0655237716666667
TotalSeconds      : 3.9314263
TotalMilliseconds : 3931.4263

measure-command {for($i=0;$i -lt 100000;$i++){$null="Hello Null!"}}

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 113
Ticks             : 1139364
TotalDays         : 1.31870833333333E-06
TotalHours        : 3.1649E-05
TotalMinutes      : 0.00189894
TotalSeconds      : 0.1139364
TotalMilliseconds : 113.9364

Troubleshooting PowerShell Memory Leaks with Visual Studio

I was recently investigating a couple of memory leaks issue with long running scripts that capture and process monitoring metrics.

Since those leaks happened over a fairly long period of the time, it was a bit difficult to troubleshoot by simply running the script in PowerShell ISE. Another thing that made the diagnostic process difficult is that PowerShell jobs were used to collect and process data, which makes it a bit hard to set break points in the code to see what gets stuck in memory. PowerShell jobs essentially spawn another process of PowerShell under which your code runs.

I started to see what I could do using Visual Studio. What I ended up doing is the following:

1) Let your PowerShell script run for a little while in order to get into a “bad enough” state

2) Use Process Explorer to create a full dump of the PowerShell process:

procexp_create_fulldump

You can also generate that dump file directly from Visual Studio by attaching the debugger to the running process. If the process is running on a remote host, you will need to run the Remote Debugger that’s appropriate for your platform. It’s usually located in c:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\IDE\Remote Debugger on the machine that has Visual Studio installed. For example the x64 remote host, you would simply run as Administrator:

\\<vs computer>\c$\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\IDE\Remote Debugger\x64\msvsmon.exe

Once Remote Debugger is running, you can simply attach to the desired process in Visual Studio:

vs_dbg_attach_to_process

and then pick the PID of the PowerShell instance you want to debug:

vs_dbg_attach_to_process_select

Once Visual Studio is attached, break the execution of the process:

vs_dbg_attach_to_process_break

Once the process execution has been suspended, you can then create the dump file using Save Dump As (make sure you select to save the heap):

vs_dbg_attach_to_process_savedump

3) Open the resulting dump file in Visual Studio and select Debug Managed Memory

vs_debug_manage_memory

4) You can then sort by Size (bytes) to figure out which object type ends up using the most memory

vs_debug_mm_sort

5) You can then view the instances of that object type by selecting View Instances

vs_debug_mm_view_instances

6) Once you are in the instances view, you can select one of the instances, right click on it and select Quick Watch

vs_debug_mm_quick_watch

7) You can now see the content of the object that’s causing the issue, giving you a better reference point in your code to fix the issue

vs_debug_mm_quick_watch_view

In my case, it turns out that the output of Write-Output at the end of a particular function that was used in my pipeline stayed in memory indefinitely, even though the object was done processing in the pipeline. For instance:

Write-Output -InputObject $body"sent"

After piping the output to Out-Null in my high level script that runs as non-stop scheduled task, the problem was resolved.

Maybe something worth mentioning, it’s a bit hard to dive in the dump file for a PowerShell process as it can be difficult to track back a particular function as the PowerShell script is instantiated as dynamic code in .NET.

Hopefully this post will help someone in the same situation as I was in!

Storage Spaces Performance Troubleshooting

Over the last few years, well, since Storage Spaces came to light, I had from time to time to investigate performance issues. The goal of this post to is show you what you might typically look at while troubleshooting performance with Storage Spaces in an scale out file server setup. I’ll refine this post over time as there is quite a bit to look at but it should already give you a good start.

Let’s use a top down approach to troubleshoot an issue by looking at the following statistics in order:

  1. SMB Share
  2. Cluster Shared Volume
  3. If you are using an SSD write back cache, Storage Spaces Write Cache
  4. If you are using a tiered volume, Storage Spaces Tier
  5. Individual SSD/HDD

So the logic is, find which share/CSV is not performing and then start diving into the components of those by looking at the write cache, the tiers and finally the physical disks composing the problematic volume.

Let’s now dive in each of the high level categories.

SMB Share
  • Under the SMB Share Perfmon category:
    • Avg. Bytes/Read and Avg. Bytes/Write
      • What are the typical IO size on that share?
    • Avg. Sec/Read and Avg. Sec/Write
      • Is that share showing sign of high latency during IO?
    • Read Bytes/sec and Write Bytes/sec
      • How much data is flowing back and forth on that share?
Cluster Shared Volume

In order to capture the physical disk statistics for the particular CSV you are after, you will need to lookup the Disk Number of the CSV in the Failover Clustering console. Once you have that, you can look at the following:

  • Physical Disk category
    • Avg. Disk Bytes/Read and Avg. Disk Bytes/Write
      • What are the typical IO size on that?
    • Avg. Disk Sec/Read and Avg. Disk Sec/Write
      • What are the typical IO size on that disk?
    • Disk Read Bytes/sec and Disk Write Bytes/sec
      • How much is being read/written per second on that disk? Is the volume performing as per the baseline already established?

You can also look at CSV specific counters such as:

  • Cluster CSV Volume Manager
    • IO Read Bytes/sec – Redirected/IO Write Bytes/sec – Redirected
      • Is there a lot of redirected IO happening?
    • IO Read – Bytes/sec/IO Write – Bytes/sec
      • How much data is being read/written per second on that CSV

Microsoft also published a good article you should look at here about CSV performance monitoring: Cluster Shared Volume Performance Counter

Write Back Cache
  • Storage Spaces Write Cache category
    • Cache Write Bytes/sec
      • How much data is being written to the cache?
    • Cache Overwrite Bytes/sec
      • Is data being overwritten prior being flushed to the final HDD destination?
Storage Spaces Tier
  • Storage Spaces Tier category
    • Avg. Tier Bytes/Read and Avg. Tier Bytes/Write
      • Are the size of the IO on each tier aligned with the physical sector size of the drive composing those?
    • Avg. Tier Sec/Read and Avg. Tier Sec/Write
      • Are you HDDs thrashing? Are you SSDs experiencing high latency spikes because of background operations on the drive (i.e. garbage collection)
    • Tier Read Bytes/sec and Tier Write Bytes/sec
      • Are each tiers providing the expected throughput?
Individual SSD/HDD

To see some basic latency and reliability metrics for your physical disks, you can use the following PowerShell command:

Get-VirtualDisk -FriendlyName VirtualDisk01 | Get-PhysicalDisk | Get-StorageReliabilityCounter | Select DeviceId,FlushLatencyMax,ReadLatencyMax,WriteLatencyMax,ReadErrorsCorrected,WriteErrorsCorrected,ReadErrorsTotal,WriteErrorsTotal

Just by looking at the latency statistics from the command above, you can get a feeling if a drive is misbehaving either by looking at the errors/errors corrected or simply by looking at the read/write latency metrics. For the latency metrics, note that those are the maximums since the last reset or reboot. If you want to get a feel of how they perform over time or under a specific load, I would recommend you use the Physical Disk metrics in Perfmon. You can use the same Physical Disk counters as what we used for the CSV, namely:

  • Physical Disk category
    • Avg. Disk Bytes/Read and Avg. Disk Bytes/Write
    • Avg. Disk Sec/Read and Avg. Disk Sec/Write
    • Disk Read Bytes/sec and Disk Write Bytes/sec

In order to match the counter disk instance found in Perfmon with the StorageReliability counters from the PowerShell above, simply use the number in the DeviceId column of the output. Make sure you connect Perfmon to the same cluster node as where you ran the PowerShell command as sometimes the disk numbers/deviceId do not match exactly between nodes.

By using those Perfmon counters, you can see if the latency is simply high because the Bytes/Read or Bytes/Write are quite large, if your drives are simply overwhelmed and are performing as per specifications or because there’s another underlying issue with those drives.