Parsing SharePoint ULS Logs and Visualize Them in Kibana

In an effort to consolidate our diagnostic data in Elasticsearch and Kibana, one thing that came on my plate was to figure out a way to load the relevant SharePoint ULS log data in Elasticsearch to search and visualize it in Kibana. The process at a high level is the following:

  1. Get a list of log files to load for each server based on the last incremental load.
  2. Read each log files and exclude lines based on a predefined list of events
  3. Partition the events based on the timestamp of the log event
    1. This process is fairly important if you have a large number of events to be imported in Elasticsearch.
    2. Partitioning the ULS log data by day will allow you simply drop the index in Elasticsearch for the data that is not relevant anymore. No need to query the index to find documents matching a specific retention period.
    3. For instance, if you want to cleanup by month, you can use the following function from libElasticsearch.psm1 to drop the data from December 2016:
      1. Remove-ElasticsearchIndex -serverName -indexName sharepointulslog-2016.12.*
  4. Create batches of events for each partition to facilitate the insertion in Elasticsearch
  5. Send each batch of events to Elasticsearch
  6. Once the load is finished, persist the date and time of the last file that was loaded as a checkpoint

In order to come up with a good list of events to be excluded, I iteratively loaded small batches of log files in Elasticsearch and visualized the number of event per event Id and Category. Looking at each of the events with the most occurences, I checked whether those events would be useful from a troubleshooting standpoint. SharePoint is quite chatty depending the the logging level that is set. There are a lot of events for things like process starting, running and completing. After a few hours of reviewing the events in our quality assurance environment, I ended up with a list of 271 events in the exclusion list and that’s still a work in progress as more data is coming in.

Now let’s get into the actual details of running this process.After you have downloaded and extracted the latest release of GEM Automation,the first thing that needs to be done is to populate a file named \SharePoint\SharePointServers.csv . The file is simply a comma delimited file that contains the list of servers for which you want to collect the logs along some additional information. Here’s what it looks like

ComputerName,Environment,ULSLogNetworkBasePath
SPT1001,Quality Assurance,\\SPT1001\c$\Program Files\Common files\Microsoft Shared\Web Server Extensions\14\LOGS\
SPT1002,Quality Assurance,\\SPT1002\c$\Program Files\Common files\Microsoft Shared\Web Server Extensions\14Program Files\Common files\Microsoft Shared\Web Server Extensions\14\LOGS\
SPT1003,Production,\\SPT1003\c$\Program Files\Common files\Microsoft Shared\Web Server Extensions\14Program Files\Common files\Microsoft Shared\Web Server Extensions\14\LOGS\
SPT1004,Production,\\SPT1004\c$\Program Files\Common files\Microsoft Shared\Web Server Extensions\14Program Files\Common files\Microsoft Shared\Web Server Extensions\14\LOGS\

Once that’s populated, you can change your current directory in PowerShell to .\Elasticsearch. You can now run Import-SharePointULS.ps1 as following:

.\Import-SharePointULS.ps1 -elasticsearchHostName <server name>

Kicking this off will make the following happen:

  1. For each of the servers, a list of the log files in the specified ULSLogNetworkBasePath from the SharePointServers.csv file is gathered
  2. Then, all the log files that have not been processed since the last checkpoint with the exception of the log file currently in use by SharePoint will be used for the current execution
  3. For each of the log file remaining, Parse-SharePointULS.ps1 will be called by passing the environment specified in the SharePointServers.csv file
    1. The environment name can be used to more easily segment your dev/test data from the production data in Kibana.
  4. Parse-SharePointULS.ps1 will then read the log file line by line and check if the line needs to be excluded base on the file SharePoint\SharePointULSLog_Exclusions.txt
    1. If the line should be imported, an hashtable will be built with the parse columns from the log currently parsed line
  5. Once all the relevant lines have been processed, they can now be sent to elasticsearch using the following function from libElasticsearch.psm1
    1. Add-ElasticSearchDocumentBatch -serverName $elasticsearchHostName -batchSize 1000 -indexName sharepointulslog -documentType ulslogentry -documents $logEntries -partitionKey “timestamp” -indexPartitionType “Daily” -indexDefinitionPath .\IndexDefinitions\SharePointULS.json
      1. A few key things to notice.
        1. The partition key and type are specified to make sure we spread all the event data in daily indexes to speed up querying in elasticsearch. Kibana is able to issue queries against multiple indexes at a time i.e. sharepointulslog-*.
        2. We need pass the definition of the index as it will be used to create each of the index that will partition the data.
        3. Add-ElasticsearchDocumentBatch will then call:
          1. Partition-ElasticsearchDocument to split the ULS events in daily buckets
          2. Create-ElasticsearchDocumentBatch to split the documents in the partition into batches that will be sent to Elasticsearch
          3. If the index for the partition doesn’t exist it will get created at the beginning of the processing of the partition
  6. The process is then repeated for each log files that requires processing

Before you can see the data in Kibana, you’ll need to configure a new index pattern:

sharepointuls_indexpattern

Once that’s completed, you will now be able to create custom searches over that index pattern. Those searches can then be used to build specific visualization which in turn will be used to build a dashboard like the following (I apologize from the cramped screenshot, will create a better one soon):

sharepointuls_dashboard

You can now slice and dice millions of ULS log events in seconds using Kibana. For example, you could filter out your dashboard based on the Database event category to find when SQL Server connectivity issues are happening.

Another interesting aspect of sending that data in Elasticsearch is that will facilitate finding errors associated with a specific correlation ID. You can simply put correlationid:<correlation id> in the search bar in Kibana and the results will be returned as quickly as a few milliseconds.

If you have any questions about this, let me know in the comments below!

GEM Automation 4.0.0.1 Released

A quick post to let you know that this week I did a couple of releases for GEM Automation that include a large number of changes as it includes all the commits that were done to the code repository since May.

At a high level there were changes to the following sections:

  • Active Directory
  • Build Deployment
  • Elasticsearch
  • Infrastructure Testing
  • Kibana
  • Networking
  • Service Bus
  • SharePoint
  • SQL Server
  • Storage Spaces
  • Utilities
  • Windows

For a detailed list of changes, please consult the release notes for build 4.0.0.0 and 4.0.0.1.

You can download the release 4.0.0.1 here which includes all the latest changes.

Fixing SharePoint Search Performance with a SQL Server Plan Guide

We’ve been confronted with a performance issue in our SharePoint Server 2010 environment for quite some time and today, I was finally able to implement a workaround to get that resolved.

The problem was happening with a particular type of search that involved querying more than one managed properties in a customized search results web part in a search center. Every time that query ran, it took about 25-40 seconds, which was unacceptable. Other SharePoint search without those managed properties ran just fine. We could never really reproduce the issue in our QA environment so we were thinking it had to do with the volume of documents being indexed.

While looking at the performance of the SQL Server instance hosting SharePoint for another reason using my favorite SQL Server tool, SQL Sentry, I noticed that something was off with the following search query. I then recognized this to be our problematic query because of the parameters being passed.

WITH DocIds(DocId, Value) AS
( SELECT TOP(@joinRows) CAST(SUBSTRING(@joinData, ((ord.n-1)*8) + 1, 4) AS INT), CAST(SUBSTRING(@joinData, ((ord.n-1)*8) + 5, 4) AS INT)
FROM dbo.MSSOrdinal AS ord WITH(NOLOCK) WHERE ord.n &amp;amp;amp;lt;= @joinRows )
SELECT P.DocId, P.Pid, P.llVal, P.strVal, P.strVal2, CASE WHEN P.strVal2 IS NULL THEN 0 ELSE DATALENGTH(P.strVal2) END
FROM dbo.MSSDocProps AS P WITH(NOLOCK), DocIds AS T
WHERE P.DocId = T.DocId AND P.Pid IN ( 413, 414, 410, 411 )
ORDER BY T.Value ASC, P.Pid ASC OPTION (MAXDOP 1)

For some reason, the optimizer chose to do a Clustered Index Scan on MSSDocProps, a pretty large table of 5.3M in our production environment. This table is located in the PropertyStore database that is used by the SharePoint search engine. Here’s what the execution plan looked like in our production environment:

prod_plan

While digging a bit further in the execution plan, I noticed that one of the step was estimating 110M rows while the actual was much lower, in the 100s. I thought, maybe the statistics on the tables involved were out of date, so I decided to update all those. Tried the query again, still the same execution plan. I then took the query that ran in production and ran it against one of our QA database directly. Boom, the query completed in less than a second. While the QA environment was a bit smaller (~4M rows), it was much faster than production and this, on less hardware (2 vCPU vs 5 vCPU in prod).  I then compared the plan in QA with the plan from production. Here’s what the execution plan looks like:

qa_plan

In this plan the optimizer favored an index seek on MSSDocProps, which is what I would have expected for that query. I then started to play around a bit with the query to see how I could change the execution plan. I discovered that by introducing parallelism by changing the MAXDOP 1 hint to MAXDOP 2, the query started to use an index seek in the plan but the plan was still different between QA and production.

I then asked my SharePoint admin to restore the PropertyStore database from production on the QA server to see if the hardware difference could have been a difference. Running the query on the restored copy from production ran with the proper execution plan.

I then started to look into how I could force a seek on the MSSDocProps table and I remembered there was the FORCESEEK hint that could be used. I introduced the hint in the original query that had the MAXDOP 1 hint and the query was now using the proper plan in production. Unfortunately, that query could not be changed as it’s coming from SharePoint and I have no control on that code and/or the schema as it would leave me in an unsupported state with Microsoft.

I then started to investigate plan guides, which allow me to inject hints (or a whole plan) in a statement that is part of a batch, which was exactly my case. I had to fiddle around with sp_create_plan_guide to find how to properly specify the statement, parameters and the batch in order to match the incoming query from SharePoint. What I ended up doing is finding the existing plan in the plan cache and using sp_create_plan_guide_from_handle, I created a plan guide using that. I was then able to see how the statement, batch and parameters were specified in order to get the proper match, which would ensure that my plan guide will be used. Knowing this, I was able to create the plan guide as follow (and dropped the wrong plan guide afterwards):


EXEC sp_create_plan_guide
@name = N'SearchManagedPropertiesQuery',
@stmt = N'WITH DocIds(DocId, Value) AS ( SELECT TOP(@joinRows) CAST(SUBSTRING(@joinData, ((ord.n-1)*8) + 1, 4) AS INT), CAST(SUBSTRING(@joinData, ((ord.n-1)*8) + 5, 4) AS INT) FROM dbo.MSSOrdinal AS ord WITH(NOLOCK) WHERE ord.n &amp;amp;amp;lt;= @joinRows ) SELECT P.DocId, P.Pid, P.llVal, P.strVal, P.strVal2, CASE WHEN P.strVal2 IS NULL THEN 0 ELSE DATALENGTH(P.strVal2) END FROM dbo.MSSDocProps AS P WITH(NOLOCK), DocIds AS T WHERE P.DocId = T.DocId AND P.Pid IN ( 413, 414, 410, 411 ) ORDER BY T.Value ASC, P.Pid ASC OPTION (MAXDOP 1) ',
@type = N'SQL',
@module_or_batch = 'SET NOCOUNT ON ; DECLARE @joinRows INT ; SET @joinRows = DATALENGTH(@joinData) / 8 ;;WITH DocIds(DocId, Value) AS ( SELECT TOP(@joinRows) CAST(SUBSTRING(@joinData, ((ord.n-1)*8) + 1, 4) AS INT), CAST(SUBSTRING(@joinData, ((ord.n-1)*8) + 5, 4) AS INT) FROM dbo.MSSOrdinal AS ord WITH(NOLOCK) WHERE ord.n &amp;amp;amp;lt;= @joinRows ) SELECT P.DocId, P.SummaryBlobSize, P.Size, P.LastModified, P.IsDocument, P.IsPictureUrl, P.Author, P.Title, P.Url, P.PictureThumbnailUrl, P.ContentClass, P.FileExtension, P.Tags, P.PropertyBlob, CASE WHEN P.PropertyBlob IS NULL THEN 0 ELSE DATALENGTH(P.PropertyBlob) END, P.PopularSocialTags, P.SiteName, P.Description, P.ParentLink, P.NumberOfMembers, P.PictureHeightAndWidth, P.DisplayDate FROM dbo.MSSDocResults AS P WITH(NOLOCK), DocIds AS T WHERE P.DocId = T.DocId OPTION (MAXDOP 1)  ;WITH DocIds(DocId, Value) AS ( SELECT TOP(@joinRows) CAST(SUBSTRING(@joinData, ((ord.n-1)*8) + 1, 4) AS INT), CAST(SUBSTRING(@joinData, ((ord.n-1)*8) + 5, 4) AS INT) FROM dbo.MSSOrdinal AS ord WITH(NOLOCK) WHERE ord.n &amp;amp;amp;lt;= @joinRows ) SELECT P.DocId, P.Pid, P.llVal, P.strVal, P.strVal2, CASE WHEN P.strVal2 IS NULL THEN 0 ELSE DATALENGTH(P.strVal2) END FROM dbo.MSSDocProps AS P WITH(NOLOCK), DocIds AS T WHERE P.DocId = T.DocId AND P.Pid IN ( 413, 414, 410, 411 ) ORDER BY T.Value ASC, P.Pid ASC OPTION (MAXDOP 1) ',
@params = N'@joinData VARBINARY(MAX)',
@hints = N'OPTION (TABLE HINT( P, FORCESEEK,NOLOCK))';

GO

With that plan guide in place, the search query in SharePoint started to behave as expected. I’m still not sure why the query picked the suboptimal plan in the first place but at least my users should be happy for the time being while hopefully maintaining supportability from Microsoft (still pending a confirmation from Premier on this).

Let me know if you have questions or comments about this.

2015-12-10 – Update
One thing that I forgot to mention in the original post is that while researching the issue, I noticed that SharePoint 2013 has a property on SPWebApplication called ForceseekEnabled which probably means there are some cases/conditions where Microsoft thought it might be necessary to force this hint. You can find it in the doc here. It looks like that property is not there in SharePoint 2010 though.

2015-12-14 – Update
While exchanging a few tweets with @williambaer (Bill Baer), he brought up to our attention that this modification is not supported as per the KB 841057 which states that no changes whatsoever can be made to the SharePoint database schema (also a special mention to Trevor Seward @NaupliusTrevor). After discussing with both our SQL Server DBA and SharePoint administrator, we chose to live with an unsupported configuration for the time being.  Here are some of the points involved in the thought process:

  • The change can be easily rolled back by dropping the plan guide.
    • We will do this for upgrades and other support calls if needed
  • We feel comfortable with the change done
    • The explain plan that’s ultimately used through the plan guide is one that happens even without the plan guide
    • No change to the locking logic is done
    • Someone though that the FORCESEEK hint is useful in SharePoint in other cases, which means our solution is not that crazy to begin with
  • Our users are really happy
  • We are not seeing any adverse effect so far
  • If we open a support call that spans more than one technology, we usually end up with one engineer who is a specialist in one (i.e. SharePoint) and quite inexperienced in another (SQL Server) technology. Which means we have to “invest” quite a bit of time explaining how the other technology works to get our point across. We try not to generalize but this happened  on a few occasions:
    • DPM and SQL Server
    • DPM and Windows Server
    • Hyper-V and Storage Spaces
    • SharePoint and .NET Framework
  • We are nearing the end of our Premier contract term.
    • When an incident spans two contract terms and is considered a bug at the end, only the hours in the current contract are credited. Which means we lose quite a few precious Premier hours which could have been put to better and more proactive actions.
  • How much time an effort will that require on our part in order to prove to Microsoft that this change is required? Even though we get credited the hours on the call, we do not get those hours of our lives back! 😉 If we can skip that whole process and submit the RFC to the product group, we’ll gladly do so.

I know it might not be the most politically correct answer to give but  I try to be as transparent as possible. We’re all onboard in collaborating in getting this change in SharePoint through the official channel, it’s just the time (and money) commitment looks a bit too big from where we stand. We can definitely be convinced to go down the other/proper route if those hurdles are somewhat removed.

2015-12-15 – Update

Trevor kindly submitted an RFC to Microsoft on our behalf regarding this issue. Many thanks Trevor!

Thursday Gift: Copying SharePoint Content Databases

If you are like me and can’t believe that Microsoft doesn’t have a better answer when it comes to copy SharePoint content databases than:

– Build a new farm
– Do a site collection backup/restore (fun times when you have large site collections)
– Do an export/import of the webs (won’t get into that one!)

I now have a solution for you…

Bear in mind that this is NOT SUPPORTED BY MICROSOFT. Use only in dev/QA/lab environments and never in production.

Here’s the process:

– Backup the database using standard SQL Server backup tools (Built-in, DPM, Commvault, etc.)
– Restore the database on your target server using the desired name
– Run the following SQL script:

DECLARE @newSiteId UNIQUEIDENTIFIER,
@oldSiteId UNIQUEIDENTIFIER

DECLARE siteCollections CURSOR FOR
SELECT id
FROM AllSites

OPEN siteCollections

FETCH NEXT FROM siteCollections INTO @oldSiteId

WHILE @@FETCH_STATUS = 0
BEGIN
SET @newSiteId = NEWID()

PRINT ‘Old SiteId:’ + CAST(@oldSiteId AS VARCHAR(128))
PRINT ‘New SiteId: ‘ + CAST(@newSiteId AS VARCHAR(128))

UPDATE AllSites SET Id = @newSiteId WHERE Id = @oldSiteId
UPDATE AllDocs SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE AllDocStreams SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE AllDocVersions SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE AllLinks SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE AllListUniqueFields SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE AllLookupRelationships SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE AllUserData SET tp_SiteId = @newSiteId WHERE tp_SiteId = @oldSiteId
UPDATE AllUserDataJunctions SET tp_SiteId = @newSiteId WHERE tp_SiteId = @oldSiteId
UPDATE AllWebParts SET tp_SiteId = @newSiteId WHERE tp_SiteId = @oldSiteId
UPDATE AllWebs SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE AuditData SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE BuildDependencies SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE ComMd SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE ContentTypes SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE ContentTypeUsage SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE CustomActions SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE Deps SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE EventCache SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE EventLog SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE EventReceivers SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE Features SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE GroupMembership SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE Groups SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE HT_Cache SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE ImmedSubscriptions SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Albanian_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Arabic_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Azeri_Cyrillic_90_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Azeri_Latin_90_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Chinese_Hong_Kong_Stroke_90_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Chinese_PRC_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Chinese_PRC_Stroke_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Chinese_Taiwan_Bopomofo_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Chinese_Taiwan_Stroke_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Croatian_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Cyrillic_General_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Czech_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Danish_Norwegian_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Divehi_90_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Estonian_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Finnish_Swedish_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_French_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Georgian_Modern_Sort_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_German_PhoneBook_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Greek_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Hebrew_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Hindi_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Hungarian_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Hungarian_Technical_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Icelandic_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Indic_General_90_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Japanese_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Japanese_Unicode_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Kazakh_90_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Korean_Wansung_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Korean_Wansung_Unicode_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Latin1_General_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Latvian_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Lithuanian_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Lithuanian_Classic_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Macedonian_FYROM_90_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Modern_Spanish_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Polish_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Romanian_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Slovak_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Slovenian_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Syriac_90_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Tatar_90_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Thai_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Traditional_Spanish_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Turkish_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Ukrainian_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Uzbek_Latin_90_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NameValuePair_Vietnamese_CI_AS SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE NavNodes SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE Perms SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE Personalization SET tp_SiteId = @newSiteId WHERE tp_SiteId = @oldSiteId
UPDATE RecycleBin SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE RoleAssignment SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE Roles SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE SchedSubscriptions SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE ScheduledWorkItems SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE SiteDeletion SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE SiteQuota SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE SiteVersions SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE SolutionResourceUsageDaily SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE SolutionResourceUsageLog SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE SolutionResourceUsageWindowed SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE Solutions SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE StorageMetrics SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE StorageMetricsChanges SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE UserInfo SET tp_SiteID = @newSiteId WHERE tp_SiteID = @oldSiteId
UPDATE WebPartLists SET tp_SiteId = @newSiteId WHERE tp_SiteId = @oldSiteId
UPDATE Workflow SET SiteId = @newSiteId WHERE SiteId = @oldSiteId
UPDATE WorkflowAssociation SET SiteId = @newSiteId WHERE SiteId = @oldSiteId

FETCH NEXT FROM siteCollections INTO @oldSiteId
END

CLOSE siteCollections
DEALLOCATE siteCollections

– Grant the proper access rights to the DB
– Attach the content database in SharePoint
– Done!

We have not performed full regression testing on this and some things may be broken. Use at your own risk.

Enjoy!

File duplicates finder

A month ago I’ve built a small application with a set of Microsoft SQL Server Reporting Services reports that inventories files on a specific path on a file system or in SharePoint and stores its hash and location in a table. With this you can identify duplicates between a file server and SharePoint sites as well as get storage statistics.  

I’ll be publishing the solution on CodePlex at the following URL:

 http://www.codeplex.com/fdf

 The project was developped in .Net 3.5  using technologies such as Windows Communication Foundation and Parallel FX for multi-threaded file hashing.

Please note that this is a early version and has not been fully tested.  I ran both SharePoint and file system scans on about 130000 files without apparent issues.

 Enjoy!