Distributed Universal Memory for Windows

*Disclaimer* This is only an idea I’ve been toying with. It doesn’t represent in any way, shape or form future Microsoft plans in regards to memory/storage management. This page will evolve over time as the idea is being refined and fleshed out.

**Last Updated 2017-03-23**

The general ideal behind Distributed Universal Memory is to have a common memory management API that would achieve the following:
General
  • Abstract the application from the memory medium required to maintain application state, whether it’s volatile or permanent
  • Allow the application to express memory behavior requirements and not worry about the storage medium to achieve this
  • Support legacy constructs for backward compatibility
  • Enable new capabilities for legacy applications without code change
  • Give modern applications a simple surface to persist data
  • Enables scale out applications to use potentially a single address space
  • Could potentially move towards a more microservice based approach instead of the current monolithic code base
  • Could easily leverage advances in hardware development such as disaggregation of compute and memory, usage of specialized hardware such FPGAs or GPUs to accelerate certain memory handling operations
  • Could be ported/backported to further increase the reach/integration capabilities. This memory management subsystem to could be cleanly decoupled from the underlying operating system.
  • Allow the data to be optimally placed for performance, availability and ultimately cost

Availability Management

  • Process memory can be replicated either systematically or on demand
  • This would allow existing process memory to be migrated from one operating system instance to another transparently.
  • This could offer higher resiliency to process execution in the event of an host failure
  • This could also allow some OS components to be updated while higher level processes keep going. (i.e. redirected memory IO)
Performance Management
  • Required medium to achieve performance could be selected automatically using different mechanisms (MRU/LRU, machine learning, etc.)
  • Memory performance can be expressed explicitly by the application
    • By expressing its need, it would be easier to characterize/model/size the required system to support the application
    • Modern applications could easily specify how each piece of data it interacts with should be performing
  • Could provide multiple copies of the same data element for compute locality purposes. i.e. Distributed read cache
    • This distributed read-cache could be shared between client and server processes if desired. This would enable to have a single cache mechanism independently of the client/process accessing it.
Capacity Management
  • Can adjust capacity management techniques depending on performance and availability requirements
  • For instance, if data is rarely used by the application, several data reduction techniques could be applied such as deduplication, compression and/or erasure coding
  • If data access time doesn’t require redundancy/locality/tolerates time for RDMA, it could be spread evenly across the Distributed Universal Memory Fabric

High Level Cluster View

Components

Here’s an high level diagram of what it might look like:

Let’s go over some of the main components.

Data Access Manager

The Data Access Manager is the primary interface layer to access data. The legacy API would sit on top of this layer in order to properly abstract the underlying subsystems in play.

  • Transport Manager
    • This subsystem is responsible to push/pull the data on the remote host. All inter-node data transfers would occur over RDMA to minimize the overhead of copying data back and forth between nodes.
  • Addressing Manager
    • This would be responsible to give a universal memory address for the data that’s independent of storage medium and cluster nodes.

Data Availability Manager

This component would be responsible to ensure the proper level of data availability and resiliency are enforced as per defined policies in the system. It would be made of the following subsystems:

  • Availability Service Level Manager
    • The Availability Service Level Manager’s responsibility to to ensure the overall availability of data. For instance, it would act as the orchestrator responsible to trigger the replication manager to ensure the data is meeting its availability objective.
  • Replication Manager
    • The Replication Manager is responsible to enforce the right level of data redundancy across local and remote memory/storage devices. For instance, if 3 copies of the data must be maintained for the data of a particular process/service/file/etc. across 3 different failure domains, the Replication Manager is responsible of ensuring this is the case as per the policy defined for the application/data.
  • Data History Manager
    • This subsystem ensure that the appropriate point in time copies of the data are maintained. Those data copies could be maintained in the system itself by using the appropriate storage medium or they could be handed of to a thrid party process if necessary (i.e. standard backup solution). The API would provide a standard way for data recovery operations.

Data Capacity Manager

The Data Capacity Manager is responsible to ensure enough capacity of the appropriate memory/storage type is available for applciations and also for applying the right capacity optimization techniques to optimize the physical storage capacity available. The following methods could be used:

  • Compression
  • Deduplication
  • Erasure Coding

Data Performance Manager

The Data Performance Manager is responsible to ensure that each application can access each piece of data at the appropriate performance level. This is accomplished using the following subsystems:

  • Latency Manager
    • This is responsible to place the data on the right medium to ensure that each data element can be accessed at the right latency level. This can be determined either by pre-defined policy or by heuristic/machine learning to detect data access pattern beyond LRU/MRU methods.
    • The Latency Manager could also monitor if a local process tends to access data that’s mostly remote. If that’s the case, instead of generally incurring the network access penalty, the process could simply be moved to the remote host for better performance through data locality.
  • Service Level Manager
    • The Service Level Manager is responsible to manage the various applications expectations in regards to performance.
    • The Service Level Manager could optimize data persistence in order to meet its objective. For example, if the local non-volatile storage response time is unacceptable, it could choose to persist the data remotely and then trigger the Replication Manager to bring a copy of the data back locally.
  • Data Variation Manager
    • A subsystem could be conceived to persist a tranformed state of the data. For example, if there’s an aggregation on a dataset, it could be persisted and linked to the original data. If the original data changes the dependent aggregation variations could either be invalidated or updated as needed.

Data Security Manager

  • Access Control Manager
    • This would create hard security boundary between processes and ensure only authorized access is being granted, independently of the storage mechanism/medium.
  • Encryption Manager
    • This would be responsible for the encryption of the data if required as per a defined security policy.
  • Auditing Manager
    • This would audit data access as per a specific security policy. The events could be forwarded to a centralized logging solution for further analysis and event correlation.
    • Data accesses could be logged in an highly optimized graph database to allow:
      • Build a map of what data is accessed by processes
      • Build a temporal map of how the processes access data
  • Malware Prevention Manager
    • Data access patterns can be detected in-line by this subsystem. For instance, it could notice that a process is trying to access credit card number data based on things like regex for instance. Third-party anti-virus solutions would also be able to extend the functionality at that layer.

Legacy Construct Emulator

The goal of the Legacy Construct Emulator to is to provide to legacy/existing applications the same storage constructs they are using at this point in time to ensure backward compatibility. Here are a few examples of constructs that would be emulated under the Distributed Universal Memory model:

  • Block Emulator
    • To emulate the simplest construct to simulator the higher level construct of the disk emulator
  • Disk Emulator
    • Based on the on the block emulator, simulates the communication interface of a disk device
  • File Emulator
    • For the file emulator, it could work in a couple of ways.
      • If the application only needs to have a file handle to perform IO and is fairly agnostic of the underlying file system, the application could simply get a file handle it can perform IO on.
      • Otherwise, it could get that through the file system that’s layered on top of a volume that makes use of the disk emulator.
  • Volatile Memory Emulator
    • The goal would be to provide the necessary construct to the OS/application to store it’s state data that’s might be typically stored in RAM.

One of the key thing to note here is that even though all those legacy constructs are provided, the Distributed Universal memory model has the flexibility to persist the data as it sees fit. For instance, even though the application might think it’s persisting data to volatile memory, the data might be persisted to an NVMe device in practice. Same principle would apply for file data; a file block might actually be persisted to RAM (similar a block cache) that’s then being replicated to multiple nodes synchronously to ensure availability, all of this potentially without the application being aware of it.

Metrics Manager

The metrics manager is to capture/log/forward all data points in the system. Here’s an idea:

  • Availability Metrics
    • Replication latency for synchronous replication
    • Asynchronous backlog size
  • Capacity Metrics
    • Capacity used/free
    • Deduplication and compression ratios
    • Capacity optimization strategy overhead
  • Performance Metrics
    • Latency
    • Throughput (IOPS, Bytes/second, etc.)
    • Bandwidth consumed
    • IO Type Ratio (Read/Write)
    • Latency penalty due to SLA per application/process
  • Reliability Metrics
    • Device error rate
    • Operation error rate
  • Security Metrics
    • Encryption overhead

High Level Memory Allocation Process

More details coming soon.

Potential Applications

  • Application high availability
    • You could decide to synchronously replicate a process memory to another host and simply start the application binary on the failover host in the event where the primary host fails
  • Bring server cached data closer to the client
    • One could maintain a distributed coherent cache between servers and client computers
  • Move processes closer to data
    • Instead of having a process try to access data accross the network, why not move the process to where the data is?
  • User State Mobility
    • User State Migration
      • A user state could move freely between a laptop, a desktop and a server (VDI or session host) depending on what the user requires.
    • Remote Desktop Service Session Live Migration
      • As the user session state memory is essentially virtualized from the host executing the session, it can be freely moved from one host to another to allow zero impact RDS Session Host maintenance.
  • Decouple OS maintenance/upgrades from the application
    • For instance, when the OS needs to be patched, one could simply move the process memory and execution to another host. This would avoid penalties such as buffer cache rebuilds in SQL Server for instance which can trigger a high number of IOPS on a disk subsystem in order to repopulate the cache based on popular data. For systems with an large amount of memory, this can be fairly problematic.
  • Have memory/storage that spans to the cloud transparently
    • Under this model it would be fairly straightforward to implement a cloud tier for cold data
  • Option to preserve application state on application upgrades/patches
    • One could swap the binaries to run the process while maintaining process state in memory
  • Provide object storage
    • One could layer object storage service on top of this to support Amazon S3/Azure Storage semantics. This could be implemented on top of the native API if desired.
  • Provide distributed cache
    • One could layer distributed cache mechanisms such as Redis using the native Distributed Universal Memory API to facilitate porting of applications to this new mechanism
  • Facilitate application scale out
    • For instance, one could envision a SQL Server instance to be scaled out using this mechanism by spreading worker threads across multiple hosts that share a common coordinated address space.
  • More to come…
Advertisements

AMD Naples – More than an Intel challenger for Storage Spaces Direct?

With the recent announce of the new AMD “Napples” processor, a few things have changed in regards to options for Storage Spaces Direct. Let’s have a look to see what’s this new CPU is about.

AMD-Naples-Zen-CPU-14--pcgh.pngA few key points:

  • Between 16/32 threads and 32 cores/64 threads per socket or up to 64 cores/128 threads in a 2 socket server
    • Intel Skylake is “only” expected to have 28 cores per socket (** Update 2017-03-19 ** There are now rumors of 32 cores Skylake E5 v5 CPUs)
  • 2TB of RAM per socket
  • 8 channel DDR4
    • Bandwidth is expected to be in the 170GB/s range
    • Intel Skylake is expected to only have 6 channel memory
  • 128 PCIe 3.0 lanes PER socket
    • In 2 sockets configuration, it’s “only” 64 lanes that will be available as the other 64 are used for socket to socket transport
    • In other words for S2D, this means a single socket can properly support 2 x 100GbE ports AND 24 NVMe drives without any sorcery like PCIe switches in between
    • That’s roughly 126GB/s of PCIe bandwidth, not too shabby

Here’s an example of what it looks like in the flesh:

AMD-Naples-Speedway-Internal

With that kind of horse power, you might be able to start thinking about having a few million IOPS per S2D node if Microsoft can manage to scale up to that level. Scale that out to the supported 16 nodes in a cluster and now we have a party going! Personally, I think  going with a single socket configuration with 32 cores would be fine sizing/configuration for S2D. It would also give you a server failure domain that’s reasonable. Furthermore, from a licensing standpoint, a 64 cores Datacenter Edition server is rather pricey to say the least… You might want to go with a variant with less cores if your workload allows it. The IO balance being provided by this new AMD CPU is much better than what’s being provided by Intel at this point in time. That may change if Intel decides to go with PCIe 4.0 but it doesn’t look like we’ll see this any time soon.

If VDI/RDS SH is your thing, perhaps taking advantage of those extra PCIe lanes for GPUs will be a nice advantage. Top that with a crazy core/thread count and you would be able to drive some pretty demanding user workload without overcommitting too much your CPU and while also having access to tons of memory.

I’ll definitely take a look at AMD systems when Naples is coming out later this year. A little competition in the server CPU market is long overdue! Hopefully AMD will price this one right and reliability will be what we expect for a server. Since it’s a new CPU architecture, it might take a little while before software manufacturers support and optimize for this chip. With the right demand from customer, that might accelerate the process!

 

 

 

 

 

Parsing SharePoint ULS Logs and Visualize Them in Kibana

In an effort to consolidate our diagnostic data in Elasticsearch and Kibana, one thing that came on my plate was to figure out a way to load the relevant SharePoint ULS log data in Elasticsearch to search and visualize it in Kibana. The process at a high level is the following:

  1. Get a list of log files to load for each server based on the last incremental load.
  2. Read each log files and exclude lines based on a predefined list of events
  3. Partition the events based on the timestamp of the log event
    1. This process is fairly important if you have a large number of events to be imported in Elasticsearch.
    2. Partitioning the ULS log data by day will allow you simply drop the index in Elasticsearch for the data that is not relevant anymore. No need to query the index to find documents matching a specific retention period.
    3. For instance, if you want to cleanup by month, you can use the following function from libElasticsearch.psm1 to drop the data from December 2016:
      1. Remove-ElasticsearchIndex -serverName -indexName sharepointulslog-2016.12.*
  4. Create batches of events for each partition to facilitate the insertion in Elasticsearch
  5. Send each batch of events to Elasticsearch
  6. Once the load is finished, persist the date and time of the last file that was loaded as a checkpoint

In order to come up with a good list of events to be excluded, I iteratively loaded small batches of log files in Elasticsearch and visualized the number of event per event Id and Category. Looking at each of the events with the most occurences, I checked whether those events would be useful from a troubleshooting standpoint. SharePoint is quite chatty depending the the logging level that is set. There are a lot of events for things like process starting, running and completing. After a few hours of reviewing the events in our quality assurance environment, I ended up with a list of 271 events in the exclusion list and that’s still a work in progress as more data is coming in.

Now let’s get into the actual details of running this process.After you have downloaded and extracted the latest release of GEM Automation,the first thing that needs to be done is to populate a file named \SharePoint\SharePointServers.csv . The file is simply a comma delimited file that contains the list of servers for which you want to collect the logs along some additional information. Here’s what it looks like

ComputerName,Environment,ULSLogNetworkBasePath
SPT1001,Quality Assurance,\\SPT1001\c$\Program Files\Common files\Microsoft Shared\Web Server Extensions\14\LOGS\
SPT1002,Quality Assurance,\\SPT1002\c$\Program Files\Common files\Microsoft Shared\Web Server Extensions\14Program Files\Common files\Microsoft Shared\Web Server Extensions\14\LOGS\
SPT1003,Production,\\SPT1003\c$\Program Files\Common files\Microsoft Shared\Web Server Extensions\14Program Files\Common files\Microsoft Shared\Web Server Extensions\14\LOGS\
SPT1004,Production,\\SPT1004\c$\Program Files\Common files\Microsoft Shared\Web Server Extensions\14Program Files\Common files\Microsoft Shared\Web Server Extensions\14\LOGS\

Once that’s populated, you can change your current directory in PowerShell to .\Elasticsearch. You can now run Import-SharePointULS.ps1 as following:

.\Import-SharePointULS.ps1 -elasticsearchHostName <server name>

Kicking this off will make the following happen:

  1. For each of the servers, a list of the log files in the specified ULSLogNetworkBasePath from the SharePointServers.csv file is gathered
  2. Then, all the log files that have not been processed since the last checkpoint with the exception of the log file currently in use by SharePoint will be used for the current execution
  3. For each of the log file remaining, Parse-SharePointULS.ps1 will be called by passing the environment specified in the SharePointServers.csv file
    1. The environment name can be used to more easily segment your dev/test data from the production data in Kibana.
  4. Parse-SharePointULS.ps1 will then read the log file line by line and check if the line needs to be excluded base on the file SharePoint\SharePointULSLog_Exclusions.txt
    1. If the line should be imported, an hashtable will be built with the parse columns from the log currently parsed line
  5. Once all the relevant lines have been processed, they can now be sent to elasticsearch using the following function from libElasticsearch.psm1
    1. Add-ElasticSearchDocumentBatch -serverName $elasticsearchHostName -batchSize 1000 -indexName sharepointulslog -documentType ulslogentry -documents $logEntries -partitionKey “timestamp” -indexPartitionType “Daily” -indexDefinitionPath .\IndexDefinitions\SharePointULS.json
      1. A few key things to notice.
        1. The partition key and type are specified to make sure we spread all the event data in daily indexes to speed up querying in elasticsearch. Kibana is able to issue queries against multiple indexes at a time i.e. sharepointulslog-*.
        2. We need pass the definition of the index as it will be used to create each of the index that will partition the data.
        3. Add-ElasticsearchDocumentBatch will then call:
          1. Partition-ElasticsearchDocument to split the ULS events in daily buckets
          2. Create-ElasticsearchDocumentBatch to split the documents in the partition into batches that will be sent to Elasticsearch
          3. If the index for the partition doesn’t exist it will get created at the beginning of the processing of the partition
  6. The process is then repeated for each log files that requires processing

Before you can see the data in Kibana, you’ll need to configure a new index pattern:

sharepointuls_indexpattern

Once that’s completed, you will now be able to create custom searches over that index pattern. Those searches can then be used to build specific visualization which in turn will be used to build a dashboard like the following (I apologize from the cramped screenshot, will create a better one soon):

sharepointuls_dashboard

You can now slice and dice millions of ULS log events in seconds using Kibana. For example, you could filter out your dashboard based on the Database event category to find when SQL Server connectivity issues are happening.

Another interesting aspect of sending that data in Elasticsearch is that will facilitate finding errors associated with a specific correlation ID. You can simply put correlationid:<correlation id> in the search bar in Kibana and the results will be returned as quickly as a few milliseconds.

If you have any questions about this, let me know in the comments below!

GEM Automation 4.0.0.1 Released

A quick post to let you know that this week I did a couple of releases for GEM Automation that include a large number of changes as it includes all the commits that were done to the code repository since May.

At a high level there were changes to the following sections:

  • Active Directory
  • Build Deployment
  • Elasticsearch
  • Infrastructure Testing
  • Kibana
  • Networking
  • Service Bus
  • SharePoint
  • SQL Server
  • Storage Spaces
  • Utilities
  • Windows

For a detailed list of changes, please consult the release notes for build 4.0.0.0 and 4.0.0.1.

You can download the release 4.0.0.1 here which includes all the latest changes.

PowerShell Performance Tip – Use JavaScriptSerializer instead of ConvertTo-Json

While working on a bulk API function that sends a lot of  JSON data to elasticsearch, I noticed that using the native .NET Framework object System.Web.Script.Serialization.JavaScriptSerializer yields huge performance gains versus using the ConvertTo-Json cmdlet.

Here’s a quick test that compares both methods:

Add-Type -AssemblyName System.Web.Extensions
$jsonSerializer=New-Object -TypeName System.Web.Script.Serialization.JavaScriptSerializer
measure-command{for($i=0;$i -lt 10000;$i++){$null=$jsonSerializer.Serialize("test");}}

Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 127
Ticks : 1272587
TotalDays : 1.47290162037037E-06
TotalHours : 3.53496388888889E-05
TotalMinutes : 0.00212097833333333
TotalSeconds : 0.1272587
TotalMilliseconds : 127.2587

measure-command {for($i=0;$i -lt 10000;$i++){$null="test"| ConvertTo-Json;}}

Days : 0
Hours : 0
Minutes : 0
Seconds : 3
Milliseconds : 874
Ticks : 38746129
TotalDays : 4.4845056712963E-05
TotalHours : 0.00107628136111111
TotalMinutes : 0.0645768816666667
TotalSeconds : 3.8746129
TotalMilliseconds : 3874.6129

As you can see, the performance gain is rather significant! When you need to send millions of JSON object to a web service, that makes quite a difference.

Visualize SentryOne Data Using Elasticsearch and Kibana

I’ve recently worked on consolidating a lot of log data into Elasticsearch and visualize it using Kibana. One of the source I wanted to see in Kibana was SQL Sentry (now called SentryOne) to view SQL queries performance issues and failures.

In order to achieve this, I built a few PowerShell functions in a new module called libElasticsearch.psm1 in the GEM Automation project. Here’s the process that’s involved at a high level:

  1. Run query against the SQL Sentry database to collect SQL query events incrementally
  2. Create bulk indexing calls to elasticsearch by batching the SQL result set using the bulk API format
  3. Persist the last time the data was collected to disk for use in the next iteration

Before sending data to elasticsearch, it’s typically better to define the index/object type. This will allow elasticsearch to properly identify the content of each field and index it accordingly. For the SQL Sentry data that I’m capturing, the index definition is the following:

{
      "mappings": {
         "sqlsentryevent": {
            "properties": {
              "eventype": {
                "type": "keyword"
              },
              "normalizedtextdata": {
                "type": "keyword"
              },
              "database": {
                "type": "keyword"
              },
              "clientcomputer": {
                "type": "keyword"
              },
              "application": {
                "type": "keyword"
              },
              "username": {
                "type": "keyword"
              },
              "duration": {
                "type": "integer"
              },
              "cpu": {
                "type": "integer"
              },
              "read": {
                "type": "integer"
              },
              "writes": {
                "type": "integer"
              },
              "runstatus": {
                "type": "integer"
              },
              "eventtime": {
                "type": "date",
                "format": "basic_date_time",
                "null_value": "19500101T000000.000Z"
              },
              "querytype": {
                "type": "keyword"
              },
              "servername": {
                "type": "keyword"
              }
            }
         }
      }
}

You can find the definition file in SQLSentryEvents.json

To load this definition in elasticsearch, you can use the following PowerShell code:

Import-Module -Force .\libElasticsearch.psm1

Add-ElasticsearchIndex -serverName <elasticsearch Host Name> -indexName sqlsentryevents -indexDefinition (Get-Content -Path .\IndexDefinitions\SQLSentryEvents.json)

To capture the SQL Sentry SQL query event data, I wrote the following query:

SELECT tl.RemoteObjectID AS EventType,
th.NormalizedTextData,
tl.[Database],
tl.Computer AS ClientComputer,
tl.Application,
tl.Operator AS UserName,
(tl.Duration/10000000) AS Duration,
tl.CPU,
tl.Reads,
tl.Writes,
tl.RunStatus,
FORMAT(DATEADD(mi, DATEDIFF(mi,GETDATE() ,GETUTCDATE()), tl.StartTime),'yyyyMMdd')+'T'+FORMAT(DATEADD(mi, DATEDIFF(mi, GETDATE(),GETUTCDATE() ), tl.StartTime),'HHmmss')+'.000Z' AS EventTime,
CASE
WHEN th.NormalizedTextData LIKE '%trace_getdata%' THEN 'Monitoring'
WHEN th.NormalizedTextData LIKE '%BACKUP%' THEN 'Maintenance'
WHEN th.NormalizedTextData LIKE '%STATISTICS%' THEN 'Maintenance'
WHEN th.NormalizedTextData LIKE '%INDEX%' THEN 'Maintenance'
WHEN th.NormalizedTextData LIKE '%updatestats%' THEN 'Maintenance'
WHEN th.NormalizedTextData LIKE '%sys.%' THEN 'Monitoring'
WHEN th.NormalizedTextData LIKE '%repl%' THEN 'Replication'
WHEN th.NormalizedTextData LIKE '%sp_server_diagnostics%' THEN 'Monitoring'
WHEN th.NormalizedTextData LIKE '%sp_readrequest%' THEN 'Replication'
WHEN th.NormalizedTextData LIKE '%sp_MSdistribution%' THEN 'Replication'
WHEN th.NormalizedTextData LIKE '%syncobj_%' THEN 'Replication'
WHEN th.NormalizedTextData LIKE '%waitfor delay @waittime%' THEN 'CDC'
ELSE 'Application Query'
END AS QueryType,
esc.ObjectName AS ServerName
FROM dbo.vwMetaHistorySqlServerTraceLog (nolock) tl
INNER JOIN dbo.PerformanceAnalysisTraceHash (nolock) th ON tl.NormalizedTextMD5 = th.NormalizedTextMD5
INNER JOIN EventSource (nolock) es ON tl.EventSourceId = es.ObjectId
INNER JOIN EventSourceConnection (nolock) esc ON es.EventSourceConnectionID = esc.ObjectId
WHERE (esc.ObjectName LIKE 'SRV%') AND
tl.StartTime >= @EventStartTime AND
tl.StartTime < @EventEndTime

I can now call the function that does the query execution and converts it the bulk API format. Note that you can use this function to send pretty much any SQL Server query results to Elasticsearch. At this point in time the function is very crude but functional.

Import-Module -Force .\libElasticsearch.psm1

Add-SQLQueryResultAsDocumentBatch -sqlServerName <SQL Server instance with SQLSentry DB> -databaseName SQLSentry -query (Get-Content .\SQLQueries\SQLSentryEvents.sql) -queryParameters @{EventStartTime=(Get-Content .\Checkpoints\<SQL Server instance with SQLSentry DB>_sqlsentryevents_checkpoint.txt);EventEndTime=Get-Date -Format "yyyy-MM-dd HH:mm:ss"} -batchSize 1000 -elasticsearchServerName <elasticsearch Host Name> -indexName sqlsentryevents -documentType sqlsentryevent -trackCheckPoint $true -checkPointParameterName "EventEndTime"

Here are the parameters for the function:
– sqlServerName: This is the name of the SQL Server instance you wish to connect to execute the query
– databaseName: The name of the database against which the query will be run
– query: The query text of what you wish to run. As sometimes this can be quite a long string of text, I’m putting it in a file called SQLSentryEvents.sql in this case and I load it using Get-Content
– queryParameters: This contains a hashtable of parameters that are required to execute the query. For instance, the above query contains two T-SQL parameters, @EventStartTime and @EventEndTime. Those are used to apply our moving window to incrementally load the events. Another trick in this is that I’m loading the last time the job was run by loading a checkpointing text file that contains this information.
– batchSize: The maximum number of items that will be sent in one bulk API call to elasticsearch
– indexName: The name of the index in which the documents/records will be sent
– documentType: The type of documents that will be sent in the index. This is the type that was defined in JSON earlier.
– trackCheckPoint: This is just a flag to make sure you keep the last loading point to allow incremental loads.

You can then schedule that call using Windows Task Scheduler for instance and have this run at regular intervals.

With you data now loaded, you can then start visualizing this using Kibana.

In order to do this, you do the following:

– Create saved searches using the Discover section using the filtering options you want.
– Build visualizations (those are widgets that will compose your dashboard) based on the saved searches or you can also just use the whole index as the target
– Build your dashboard using the visualizations you just built

Here’s an example of a dashboard I did:

sqlsentry_kibana_dashboard

Also note that you can still view the raw data if you need to while you explore your data:

sqlsentry_kibana_event_details

That’s it! I hope this will be useful to you. Should you have any questions regarding this, feel free to drop me a comment below.

PowerShell Performance Tip – Concatenating Strings Using StringBuilder

While doing performance tuning on a script that did a lot of string concatenation, I switched from using a traditional string concatenation using += to using the .NET object System.Text.StringBuilder.

Here’s a comparison of the speed of both methods:

measure-command {$body=New-Object -TypeName System.Text.StringBuilder;for($i=0;$i -lt 100000;$i++){$null=$body.Append("test")}}

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 162
Ticks             : 1620291
TotalDays         : 1.87533680555556E-06
TotalHours        : 4.50080833333333E-05
TotalMinutes      : 0.002700485
TotalSeconds      : 0.1620291
TotalMilliseconds : 162.0291

measure-command {$body="";for($i=0;$i -lt 100000;$i++){$body+="test"}}

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 48
Milliseconds      : 584
Ticks             : 485840018
TotalDays         : 0.000562314835648148
TotalHours        : 0.0134955560555556
TotalMinutes      : 0.809733363333333
TotalSeconds      : 48.5840018
TotalMilliseconds : 48584.0018