Parsing SharePoint ULS Logs and Visualize Them in Kibana

In an effort to consolidate our diagnostic data in Elasticsearch and Kibana, one thing that came on my plate was to figure out a way to load the relevant SharePoint ULS log data in Elasticsearch to search and visualize it in Kibana. The process at a high level is the following:

  1. Get a list of log files to load for each server based on the last incremental load.
  2. Read each log files and exclude lines based on a predefined list of events
  3. Partition the events based on the timestamp of the log event
    1. This process is fairly important if you have a large number of events to be imported in Elasticsearch.
    2. Partitioning the ULS log data by day will allow you simply drop the index in Elasticsearch for the data that is not relevant anymore. No need to query the index to find documents matching a specific retention period.
    3. For instance, if you want to cleanup by month, you can use the following function from libElasticsearch.psm1 to drop the data from December 2016:
      1. Remove-ElasticsearchIndex -serverName -indexName sharepointulslog-2016.12.*
  4. Create batches of events for each partition to facilitate the insertion in Elasticsearch
  5. Send each batch of events to Elasticsearch
  6. Once the load is finished, persist the date and time of the last file that was loaded as a checkpoint

In order to come up with a good list of events to be excluded, I iteratively loaded small batches of log files in Elasticsearch and visualized the number of event per event Id and Category. Looking at each of the events with the most occurences, I checked whether those events would be useful from a troubleshooting standpoint. SharePoint is quite chatty depending the the logging level that is set. There are a lot of events for things like process starting, running and completing. After a few hours of reviewing the events in our quality assurance environment, I ended up with a list of 271 events in the exclusion list and that’s still a work in progress as more data is coming in.

Now let’s get into the actual details of running this process.After you have downloaded and extracted the latest release of GEM Automation,the first thing that needs to be done is to populate a file named \SharePoint\SharePointServers.csv . The file is simply a comma delimited file that contains the list of servers for which you want to collect the logs along some additional information. Here’s what it looks like

ComputerName,Environment,ULSLogNetworkBasePath
SPT1001,Quality Assurance,\\SPT1001\c$\Program Files\Common files\Microsoft Shared\Web Server Extensions\14\LOGS\
SPT1002,Quality Assurance,\\SPT1002\c$\Program Files\Common files\Microsoft Shared\Web Server Extensions\14Program Files\Common files\Microsoft Shared\Web Server Extensions\14\LOGS\
SPT1003,Production,\\SPT1003\c$\Program Files\Common files\Microsoft Shared\Web Server Extensions\14Program Files\Common files\Microsoft Shared\Web Server Extensions\14\LOGS\
SPT1004,Production,\\SPT1004\c$\Program Files\Common files\Microsoft Shared\Web Server Extensions\14Program Files\Common files\Microsoft Shared\Web Server Extensions\14\LOGS\

Once that’s populated, you can change your current directory in PowerShell to .\Elasticsearch. You can now run Import-SharePointULS.ps1 as following:

.\Import-SharePointULS.ps1 -elasticsearchHostName <server name>

Kicking this off will make the following happen:

  1. For each of the servers, a list of the log files in the specified ULSLogNetworkBasePath from the SharePointServers.csv file is gathered
  2. Then, all the log files that have not been processed since the last checkpoint with the exception of the log file currently in use by SharePoint will be used for the current execution
  3. For each of the log file remaining, Parse-SharePointULS.ps1 will be called by passing the environment specified in the SharePointServers.csv file
    1. The environment name can be used to more easily segment your dev/test data from the production data in Kibana.
  4. Parse-SharePointULS.ps1 will then read the log file line by line and check if the line needs to be excluded base on the file SharePoint\SharePointULSLog_Exclusions.txt
    1. If the line should be imported, an hashtable will be built with the parse columns from the log currently parsed line
  5. Once all the relevant lines have been processed, they can now be sent to elasticsearch using the following function from libElasticsearch.psm1
    1. Add-ElasticSearchDocumentBatch -serverName $elasticsearchHostName -batchSize 1000 -indexName sharepointulslog -documentType ulslogentry -documents $logEntries -partitionKey “timestamp” -indexPartitionType “Daily” -indexDefinitionPath .\IndexDefinitions\SharePointULS.json
      1. A few key things to notice.
        1. The partition key and type are specified to make sure we spread all the event data in daily indexes to speed up querying in elasticsearch. Kibana is able to issue queries against multiple indexes at a time i.e. sharepointulslog-*.
        2. We need pass the definition of the index as it will be used to create each of the index that will partition the data.
        3. Add-ElasticsearchDocumentBatch will then call:
          1. Partition-ElasticsearchDocument to split the ULS events in daily buckets
          2. Create-ElasticsearchDocumentBatch to split the documents in the partition into batches that will be sent to Elasticsearch
          3. If the index for the partition doesn’t exist it will get created at the beginning of the processing of the partition
  6. The process is then repeated for each log files that requires processing

Before you can see the data in Kibana, you’ll need to configure a new index pattern:

sharepointuls_indexpattern

Once that’s completed, you will now be able to create custom searches over that index pattern. Those searches can then be used to build specific visualization which in turn will be used to build a dashboard like the following (I apologize from the cramped screenshot, will create a better one soon):

sharepointuls_dashboard

You can now slice and dice millions of ULS log events in seconds using Kibana. For example, you could filter out your dashboard based on the Database event category to find when SQL Server connectivity issues are happening.

Another interesting aspect of sending that data in Elasticsearch is that will facilitate finding errors associated with a specific correlation ID. You can simply put correlationid:<correlation id> in the search bar in Kibana and the results will be returned as quickly as a few milliseconds.

If you have any questions about this, let me know in the comments below!

GEM Automation 4.0.0.1 Released

A quick post to let you know that this week I did a couple of releases for GEM Automation that include a large number of changes as it includes all the commits that were done to the code repository since May.

At a high level there were changes to the following sections:

  • Active Directory
  • Build Deployment
  • Elasticsearch
  • Infrastructure Testing
  • Kibana
  • Networking
  • Service Bus
  • SharePoint
  • SQL Server
  • Storage Spaces
  • Utilities
  • Windows

For a detailed list of changes, please consult the release notes for build 4.0.0.0 and 4.0.0.1.

You can download the release 4.0.0.1 here which includes all the latest changes.

Visualize SentryOne Data Using Elasticsearch and Kibana

I’ve recently worked on consolidating a lot of log data into Elasticsearch and visualize it using Kibana. One of the source I wanted to see in Kibana was SQL Sentry (now called SentryOne) to view SQL queries performance issues and failures.

In order to achieve this, I built a few PowerShell functions in a new module called libElasticsearch.psm1 in the GEM Automation project. Here’s the process that’s involved at a high level:

  1. Run query against the SQL Sentry database to collect SQL query events incrementally
  2. Create bulk indexing calls to elasticsearch by batching the SQL result set using the bulk API format
  3. Persist the last time the data was collected to disk for use in the next iteration

Before sending data to elasticsearch, it’s typically better to define the index/object type. This will allow elasticsearch to properly identify the content of each field and index it accordingly. For the SQL Sentry data that I’m capturing, the index definition is the following:

{
      "mappings": {
         "sqlsentryevent": {
            "properties": {
              "eventype": {
                "type": "keyword"
              },
              "normalizedtextdata": {
                "type": "keyword"
              },
              "database": {
                "type": "keyword"
              },
              "clientcomputer": {
                "type": "keyword"
              },
              "application": {
                "type": "keyword"
              },
              "username": {
                "type": "keyword"
              },
              "duration": {
                "type": "integer"
              },
              "cpu": {
                "type": "integer"
              },
              "read": {
                "type": "integer"
              },
              "writes": {
                "type": "integer"
              },
              "runstatus": {
                "type": "integer"
              },
              "eventtime": {
                "type": "date",
                "format": "basic_date_time",
                "null_value": "19500101T000000.000Z"
              },
              "querytype": {
                "type": "keyword"
              },
              "servername": {
                "type": "keyword"
              }
            }
         }
      }
}

You can find the definition file in SQLSentryEvents.json

To load this definition in elasticsearch, you can use the following PowerShell code:

Import-Module -Force .\libElasticsearch.psm1

Add-ElasticsearchIndex -serverName <elasticsearch Host Name> -indexName sqlsentryevents -indexDefinition (Get-Content -Path .\IndexDefinitions\SQLSentryEvents.json)

To capture the SQL Sentry SQL query event data, I wrote the following query:

SELECT tl.RemoteObjectID AS EventType,
th.NormalizedTextData,
tl.[Database],
tl.Computer AS ClientComputer,
tl.Application,
tl.Operator AS UserName,
(tl.Duration/10000000) AS Duration,
tl.CPU,
tl.Reads,
tl.Writes,
tl.RunStatus,
FORMAT(DATEADD(mi, DATEDIFF(mi,GETDATE() ,GETUTCDATE()), tl.StartTime),'yyyyMMdd')+'T'+FORMAT(DATEADD(mi, DATEDIFF(mi, GETDATE(),GETUTCDATE() ), tl.StartTime),'HHmmss')+'.000Z' AS EventTime,
CASE
WHEN th.NormalizedTextData LIKE '%trace_getdata%' THEN 'Monitoring'
WHEN th.NormalizedTextData LIKE '%BACKUP%' THEN 'Maintenance'
WHEN th.NormalizedTextData LIKE '%STATISTICS%' THEN 'Maintenance'
WHEN th.NormalizedTextData LIKE '%INDEX%' THEN 'Maintenance'
WHEN th.NormalizedTextData LIKE '%updatestats%' THEN 'Maintenance'
WHEN th.NormalizedTextData LIKE '%sys.%' THEN 'Monitoring'
WHEN th.NormalizedTextData LIKE '%repl%' THEN 'Replication'
WHEN th.NormalizedTextData LIKE '%sp_server_diagnostics%' THEN 'Monitoring'
WHEN th.NormalizedTextData LIKE '%sp_readrequest%' THEN 'Replication'
WHEN th.NormalizedTextData LIKE '%sp_MSdistribution%' THEN 'Replication'
WHEN th.NormalizedTextData LIKE '%syncobj_%' THEN 'Replication'
WHEN th.NormalizedTextData LIKE '%waitfor delay @waittime%' THEN 'CDC'
ELSE 'Application Query'
END AS QueryType,
esc.ObjectName AS ServerName
FROM dbo.vwMetaHistorySqlServerTraceLog (nolock) tl
INNER JOIN dbo.PerformanceAnalysisTraceHash (nolock) th ON tl.NormalizedTextMD5 = th.NormalizedTextMD5
INNER JOIN EventSource (nolock) es ON tl.EventSourceId = es.ObjectId
INNER JOIN EventSourceConnection (nolock) esc ON es.EventSourceConnectionID = esc.ObjectId
WHERE (esc.ObjectName LIKE 'SRV%') AND
tl.StartTime >= @EventStartTime AND
tl.StartTime < @EventEndTime

I can now call the function that does the query execution and converts it the bulk API format. Note that you can use this function to send pretty much any SQL Server query results to Elasticsearch. At this point in time the function is very crude but functional.

Import-Module -Force .\libElasticsearch.psm1

Add-SQLQueryResultAsDocumentBatch -sqlServerName <SQL Server instance with SQLSentry DB> -databaseName SQLSentry -query (Get-Content .\SQLQueries\SQLSentryEvents.sql) -queryParameters @{EventStartTime=(Get-Content .\Checkpoints\<SQL Server instance with SQLSentry DB>_sqlsentryevents_checkpoint.txt);EventEndTime=Get-Date -Format "yyyy-MM-dd HH:mm:ss"} -batchSize 1000 -elasticsearchServerName <elasticsearch Host Name> -indexName sqlsentryevents -documentType sqlsentryevent -trackCheckPoint $true -checkPointParameterName "EventEndTime"

Here are the parameters for the function:
– sqlServerName: This is the name of the SQL Server instance you wish to connect to execute the query
– databaseName: The name of the database against which the query will be run
– query: The query text of what you wish to run. As sometimes this can be quite a long string of text, I’m putting it in a file called SQLSentryEvents.sql in this case and I load it using Get-Content
– queryParameters: This contains a hashtable of parameters that are required to execute the query. For instance, the above query contains two T-SQL parameters, @EventStartTime and @EventEndTime. Those are used to apply our moving window to incrementally load the events. Another trick in this is that I’m loading the last time the job was run by loading a checkpointing text file that contains this information.
– batchSize: The maximum number of items that will be sent in one bulk API call to elasticsearch
– indexName: The name of the index in which the documents/records will be sent
– documentType: The type of documents that will be sent in the index. This is the type that was defined in JSON earlier.
– trackCheckPoint: This is just a flag to make sure you keep the last loading point to allow incremental loads.

You can then schedule that call using Windows Task Scheduler for instance and have this run at regular intervals.

With you data now loaded, you can then start visualizing this using Kibana.

In order to do this, you do the following:

– Create saved searches using the Discover section using the filtering options you want.
– Build visualizations (those are widgets that will compose your dashboard) based on the saved searches or you can also just use the whole index as the target
– Build your dashboard using the visualizations you just built

Here’s an example of a dashboard I did:

sqlsentry_kibana_dashboard

Also note that you can still view the raw data if you need to while you explore your data:

sqlsentry_kibana_event_details

That’s it! I hope this will be useful to you. Should you have any questions regarding this, feel free to drop me a comment below.