Azure VM Sizing – An Automated Approach

In the following post, I will try to explain the approach I’ve used to estimate the costs of running a few hundred VMs in Azure IaaS.As manually sizing each individual VMs would take quite some time, I preferred to go with an automated approach.

At a high level, here are the steps I’ve taken to achieve this.

  1. Collect performance metrics for the VM candidates over a significant period of time
  2. Capture Azure VMs pricing information
  3. Capture Azure VMs characteristics
  4. Select appropriate VM size
  5. Calculate VM operating hours
  6. Determine VM pricing strategy
  7. Generate VM storage configuration

Collect Performance Metrics

For this step, I’ve opted to use Hyper-V metering data I was already collecting for all of the VMs running on premises. Alternatively, one could also use data coming from perfmon but that would take some extra data preparation steps in order to be usable in the VM sizing exercise. I’ve covered the basics of the scripts in this other blog post if you’re interested: Hyper-V Resource Metering and Cloud Costing Benchmarking

In this data sets I’m collecting a few data elements that are critical for roughly determining the VM size:

  • CPU Utilization (actual MHz consumed)
  • Memory Utilization
  • Total Disk IOPS
  • Total Disk Capacity

I could have opted to included network utilization but have decided to keep this aside for the time being as my workload is not network IO bound. Here’s what the raw data that will be used for the sizing exercise looks like:

Capture Azure Pricing Information

For this part I’m using a couple of price lists as the basis for the automated analysis. Here are the two main sources of information I’m using:

Based on that data, I’ve extracted pricing information for all the VM sizes in the particular region I was interested in. I took the CSV files and loaded them in a SQL Server table in the same database containing all my Hyper-V metering data.

Capture Azure VMs characteristics

Once you have the basic pricing information loaded and processed, the next step would be capture the actual sizing information for the Azure VM sizes. In order to capture that information, I used the page Sizes for Windows virtual machines in Azure to capture the following key pieces of information about the VM configuration:

  • vCPU count
  • RAM
  • Maximum IOPS
  • Maximum number of disks

Here’s what the data looks like in the table to give you and idea.

Select Appropriate VM Size

Now here comes the tricky bit of the process. At a high level, here’s how the Azure VM size logic works.

  1. Find a VM with enough vCPU to support the number of MHz the VM is currently consuming.
    1. This logic is crude at the moment as I’m strictly doing a conversion from MHz to number of cores independently of the actual CPU in Azure. I will work on tweaking this aspect in the future.
  2. Find a VM size with as much RAM as what’s being used on premises.
    1. In this particular case, I put an artificial cap of ~448GB as this is the largest VM size I can get in my targeted region
  3. Find a VM size that can accommodate the maximum number of IOPS
    1. In this particular case, I put an artificial cap of 80000 IOPS
  4. As there’s mix of both Dev/Test and Production VMs, I’m filtering to get either Dev/Test subscription pricing or production pricing
  5. I also make sure I’m getting VM SKUs that don’t include Windows licenses in the price
  6. Of all the options that are matching what’s needed, sort them in ascending order of price and pick the cheapest option.

Calculate VM Operating Hours

An important step in cost sizing and optimizing your workload in Azure involves determining the VM operating hours. i.e. Some VMs don’t always need to be running 24/7, so why pay for those extra hours? In my case, I applied the following general assumption in my logic. This can definitely be refined but it gives a good idea.

  1. If it’s a Dev/Test VM, set the operating hours as 10 hours per day, 5 days a week
  2. If it’s a production VM, set it to 24/7 unless it’s an RDS session host, then in that case, I adjust the operating hours based on our actual user demand (i.e. more hosts at peak during the day, less during the night)

Determine VM Pricing Strategy

Now that you have the operating hours in hand and the actual VM size, you can now determine if it’s better to pay hourly for it or if it’s better to get a reserved instance. If your VM is running 24/7 you definitely want to leverage the pricing of Azure Reserved Instances (RI). For the other cases, you have to evaluate if the operating hours of the VM vs the discount level you get with an Azure is worth it.

Generate VM Storage Configuration

Another fun part of sizing your VM is determining what type of disks you will use and how many of them are required to support your workload. As there’s a wide range of options, finding the most cost effective option can be tricky/time consuming. As you can see below, there are quite a few permutations you need to consider!

Load the pricing of all the storage options in a table, managed/unmanaged, standard/premiumIn my case here’s what I ended up doing, it’s not perfect and still needs some work but it gives an idea!

  1. For each disk option
    1. Determine the number of disks required to reach capacity
    2. Determining the number of disks required to reach IOPS required
  2. As you iterate through the disk options, keep the lowest priced option
  3. Discard options where the disk count required is higher than the VM size selected in the previous steps

The Result

Now for the moment of truth/did it blend moment! Here’s a sample output of that process. Here’s what the output of the SQL query looks like:

Now that I can get that output from SQL Server, I can load that up in Excel to do all sorts of fancy charts and tables and PivotTable the heck out of that data for further analysis


À la Apple, there’s one more thing I omitted to mention. In that exercise, I wanted to compare costs with AWS out of due dilligence. In order to achieve this, I went through the list of Azure VM sizes and manually found the closest size equivalent in Amazon EC2. So when I’m I’m picking the size of an Azure VM, I also pick an equivalent size at Amazon for that VM along with its pricing information. The same type of pricing logic is applied to keep things as fair as possible between the two options. Right now I have yet to tackle the VM storage sizing piece, that’s one of my next step.

I’m also attempting to compare cost with our on-premises infrastructure but that involves a whole separate set of calculation that I will not cover in this version of the article. Just be aware that it’s feasible if you roll-up your sleeves a bit. In the end you can have a nice looking chart comparing On-Premises/Azure/AWS/etc.!


Needless to say this is by no mean a perfect sizing methodology. It’s still very rough around the edges but should give pricing ballpark. The goal is to have an iterative approach in order to appropriately size your workload for execution in Azure. You may find that some workloads are just not good candidate at all depending on your requirements. There are a LOT of variables to consider when sizing a VM and not all of those variables were considered in the current iteration of the process I have so far. I’ll keep adding those to improve/optimize the costing model.

Right now my process works for VMs for which I have Hyper-V metering statistics but it wouldn’t be too difficult to extend it to include future/hypothetical VMs as well. One would simply have to throw the simulation data in another table and process it using the same logic, which in my case is a T-SQL Table Valued function. Here’s what the actual query I’m using in Excel looks like to give you a feel for this:

select vmnhs.ClusterName,
cvi.VMYearlyOperatingHours*AzureVMHourlyPrice AS AzureYearlyCost,
cvi.VMYearlyOperatingHours*AzureVMHourlyPrice + cvi.AzureVMYearlyStoragePrice AS AzureTotalYearlyCost,
cvi.VMYearlyOperatingHours*AWSVMHourlyPrice AS AWSYearlyCost,
from (select vmnhs.ClusterName,
       MAX(vmnhs.MaximumMemoryUsage) AS MaximumMemoryUsage,
	   MAX(vmnhs.MaximumProcessorUsage) AS MaximumProcessorUsage,
	   MAX(vmnhs.MaximumAggregatedAverageNormalizedIOPS) AS MaximumIOPS,
	   AVG(vmnhs.AverageAggregatedAverageNormalizedIOPS) AS AverageIOPS,
	   MAX(vmnhs.MaximumTotalDiskAllocation) AS MaximumTotalDiskAllocation,
	   CEILING(((MAX(CAST(vmnhs.MaximumProcessorUsage AS FLOAT)))/2600)) AS MaximumEstimatedCores, 
	   CEILING(((MAX(CAST(vmnhs.MaximumMemoryUsage AS FLOAT)))/1024)) AS MaximumEstimatedRAMGB,
	   MAX(SampleTime) AS SampleTime
from [dbo].[VirtualMachineNormalizedHourlyStatistics] vmnhs INNER JOIN VirtualMachines vm ON vmnhs.VMName=vm.vmname AND vmnhs.ClusterName = vm.ClusterName
where SampleTime > '2017-11-01' 
AND vm.PrimarySystemName NOT IN ('Microsoft Remote Desktop Virtual Desktop Infrastructure') 
GROUP BY vmnhs.ClusterName,vm.VMName,vm.PrimaryEnvironment,vm.PrimarySystemName) AS vmnhs
CROSS APPLY dbo.getCloudVMSizingInformation(vmnhs.VMName,'Microsoft',vmnhs.ClusterName,vmnhs.ClusterName,vmnhs.MaximumMemoryUsage,vmnhs.MaximumProcessorUsage,vmnhs.SampleTime,vmnhs.PrimaryEnvironment,vmnhs.PrimarySystemName,vmnhs.AverageIOPS*1.25,vmnhs.MaximumTotalDiskAllocation) cvi

I’d like to package this better so that I can share the sizer with the rest of the community. When things stabilize a bit with the sizer, I’ll definitely work on that.

If you have questions/comments about this blog post, feel free to comment below!


Application Insights – Capture ASMX WebMethod Names Invoked

While piloting Application Insights to monitor our homegrown applications, one thing that was asked was to capture the operation names of WebMethod that were invoked in ASMX web services. While Application Insights will capture the call to the web services asmx resource, the actual name of the method getting invoked doesn’t get captured out of the box.

In order to achieve this, I had to write a custom Application Insights Telemetry Initializer library to capture that information. A special shout out goes to Sergey Kanzhelev for pointing me in the right direction for this. I highly recommend checking out his blog here: and follow him on Twitter using the handle @SergeyKanzhelev.

At a high level, it boils down to capturing two high level scenarios. Depending on how the call to the method is done, can capture the name of the method in a few ways.  The most important one to address in my particular case was being able to capture SOAP requests over HTTP. In order to get the method name in that scenario, you have to inspect the HTTP headers of the POST request being made in order to grab the SOAPAction property. With some simple string manipulation, you can extract the name of the method invoked.

For more information about the ITelemetryInitializer interface from the Application Insights SDK, consult the following page from Microsoft: Add properties: ITelemetryInitializer

As I’m only an amateur developer, please forgive me if the code is not the prettiest! If you think I should change some things in the code below, feel free to comment. I’ll gladly improve it and update this post!

To build this extension, I built a simple C# class library project in Visual Studio.

I then added the Application Insights Nuget packages to my solution in order to add the references needed for this new extension.

  <package id="Microsoft.ApplicationInsights" version="2.4.0" targetFramework="net461" />
  <package id="Microsoft.ApplicationInsights.Agent.Intercept" version="2.4.0" targetFramework="net461" />
  <package id="Microsoft.ApplicationInsights.DependencyCollector" version="2.4.1" targetFramework="net461" />
  <package id="Microsoft.ApplicationInsights.PerfCounterCollector" version="2.4.1" targetFramework="net461" />
  <package id="Microsoft.ApplicationInsights.Web" version="2.4.1" targetFramework="net461" />
  <package id="Microsoft.ApplicationInsights.WindowsServer" version="2.4.1" targetFramework="net461" />
  <package id="Microsoft.ApplicationInsights.WindowsServer.TelemetryChannel" version="2.4.0" targetFramework="net461" />
  <package id="Microsoft.AspNet.TelemetryCorrelation" version="1.0.0" targetFramework="net461" />
  <package id="System.Diagnostics.DiagnosticSource" version="4.4.0" targetFramework="net461" />

I then proceeded by writing the code below in a new class file named WebMethodInitializer.cs .

using Microsoft.ApplicationInsights.Channel;
using Microsoft.ApplicationInsights.DataContracts;
using Microsoft.ApplicationInsights.Extensibility;
using System.Linq;


/// Summary description for WebMethodInitializer
/// </summary>

namespace GEM.ApplicationInsights.Web
public class WebMethodInitializer : ITelemetryInitializer
    public WebMethodInitializer()
        // TODO: Add constructor logic here

        public void Initialize(ITelemetry telemetry)
            var requestTelemetry = telemetry as RequestTelemetry;
            string soapActionMethod = null;
            string requestMethodName = null;
            string webServiceMethod = null;

            // Is this a TrackRequest() ?
            if (requestTelemetry == null) return;
            requestMethodName = System.Web.HttpContext.Current.Request.Params["op"]; // Item("HTTP_SOAPACTION");

            if (requestMethodName == "" || requestMethodName == null)
                if (System.Web.HttpContext.Current.Request.PathInfo != null)
                    requestMethodName = System.Web.HttpContext.Current.Request.PathInfo;

                if (requestMethodName != "" && requestMethodName != null)
                    requestMethodName = requestMethodName.Replace("/", "");
                    // If we set the Success property, the SDK won't change it:
                    requestTelemetry.Success = true;
                    // Allow us to filter these requests in the portal:
                    requestTelemetry.Properties["WebMethodName"] = requestMethodName;
                    webServiceMethod = requestMethodName;

            string soapAction = System.Web.HttpContext.Current.Request.Headers["SOAPAction"];

            if (soapAction != null)
                soapAction = soapAction.Replace("\"", "");
                soapActionMethod = soapAction.Split('/').Last();
                requestTelemetry.Properties["SOAPAction"] = soapAction;
                webServiceMethod = soapActionMethod;

            if (webServiceMethod != null)
                requestTelemetry.Context.Operation.Name = requestTelemetry.Context.Operation.Name.Replace("/" + webServiceMethod, "") + "/" + webServiceMethod;


The above code captures the mtehod name in both SOAP and regular HTTP POST requests. It will also append the method name to the operation name (out of the box, it’s only the name of the asmx getting called.). That way you will see trends for each method of a web service in the Application Insights Azure portal.

Once the solution and DLL is compiled, you can take the resulting DLL and drop it in the bin directory of the ASP .NET site where your ASMX are located. Once that’s done, you’ll need to add a line like the following to the ApplicationInsights.config file for that particular web application:

   <!--other initializers above-->
    <Add Type="GEM.ApplicationInsights.Web.WebMethodInitializer, GEM.ApplicationInsights"/>

With that in place, your new extension should be capturing the method name of the web services during HTTP SOAP requests.

You can see what it looks like ultimately in the following screeenshot:

Should you have questions about this post, don’t hesitate to reach out using the comments below!

Azure Resource Manager Template – From Snowflake to Infrastructure as Code

Automation has always been an area of debate when it comes to infrastructure. Through different conversations I had with various people, the following questions/concerns are often brought up:

  • I only need to deploy a couple of those servers, no need to sink time into automating the deployment
  • I’m a system administrator, not a programmer
  • Each deployment of application X is custom
  • I don’t have time to do automation, I need to deliver this next week
  • I have my trusty recipe and checklist to deploy infrastructure piece Y

Bottom line, automation is unfortunately often perceived as a significant challenge. I think the main reason it might be perceived as an difficult task is that we often look at the whole automation as a complex piece of work instead of as a composite of simple things. If you start looking at the individual actions that will bring you closer to your end result, you often realize they are not complex in themselves.

One misconception I would like to debunk is the idea that all deployments are too different to automate. That statement really reminds me of one of the principle of the Toyota Way (principle 6):

You cannot improve process before it is standardized and stable. In not stable and not standardized process there is very many variations and when trying to improve this process the result is that improvement will be just one more variation that is occasionally used and mostly ignored.

Infrastructure deployment is a process and if you cannot standardize and rationalize that process, it will be very difficult to improve the quality and speed of your deployments consistently accross the board. I think as an industry we need to sort that out. How many different automations do we need to build that achieve the exact same thing? We really need to do a better job of cultivating reusability and standardization, which ironically, for a industry that prone automation of business processes, we’re doing a pretty horrible job at this. That in itself, might be the topic for a complete blog post by itself!

OK, I was on board with the idea of standardization and continuous improvement. I did my fair share of programming in the past and have done quite a bit of PowerShell recently to solve a variety of problems. Those skills while useful are still quite a bit different than Azure Resource Manager templates or AWS CloudFormation. Why is that? Declarative infrastructure as code is a fairly different beast than the imperative/procedural alternative.

Writing PowerShell bits to deploy a VM and then handle all the possible outcomes of that operation is quite a bit different than describing how your VM should be configured and then simply shifting over the responsibility of making that happen to the cloud platform. Don’t you like the idea of just describing what you want and not having to do the job of explaining in excruciating details HOW it should be done? That’s the essence of what Azure Resource Manager templates can do for you. Couple that with the inherent benefits of scalability and programmability of the cloud platform and you truly have a powerful set of tools in your hands.

Yes, there’s a learning curve to get going. I’m not going to lie, you will need to learn new things but it’s not only worth it, I think it’s mandatory to keep progressing as an IT professional and ensure you have a future in this industry. Here’s how you might want approach this:

  1. First there will be JSON. ARM templates are written in that notation, so you need to understand the basis format.
  2. You will need to get acquainted with the Azure resource types you need for your deployment.
  3. When you have the basics down and did a few basic resource deployments, you most likely will want to make your templates more flexible so they can serve multiple purposes. That’s probably when you will start to use things like parameters and functions in your templates.
  4. Then you’ll get into more complex deployments and will want to split things up a bit just for your own sanity, here come the linked ARM templates!
  5. If you need to get a little bit fancier with your VM deployments, most likely you will need your old friend (or enemy) PowerShell to grease the wheels a bit with some Desired State Configuration love.

Along the way, you might want to start using some sort of versioning strategy. I would highly recommend you take the time to learn the basics of Git for your ARM template source control needs.

As you can see, you can go fairly progressively into the world of declarative infrastructure as code. You start by doing a basic template for a storage account or a VM. Then you start asking yourself: “Well, if I can do this, what else can I do?” Maybe you’ll start writing an ARM template to deploy the various infrastructure tiers of an application that’s not too complex by deploying a few VMs.

Then you’ll want to do another application and realize there are bits from the other application infrastructure definition/ARM template you could reuse for this new deployment. You might be at a crossroad there. Do I refactor the previous template and perhaps make it more modular to foster reusability? That would probably a good idea! The more reusable/flexible/useful your ARM templates will be, the more chances they will get used, maintained and improved over time. That’s just the nature of the beast. Each time you improve your declarative infrastructure code, you’re sculpting the masterpiece that’s your company’s infrastructure.

I think you now get the general idea of how things will go learning/experimentation wise for ARM template deployment. So if you’re still asking, why would I want to go through this, here’s my take on answering that question. Having a good set of ARM templates will give you the following:

  • A repeatable way of deploying your infrastructure to
    • Possibly recover from errors/failures by simply and quickly redeploying your infrastructure
    • Quickly scale applications
    • Give you consistent environment with no variations between individual deployments
  • Reusability
    • Template flexibility drives reusability, reusability drives productivity up!
    • The more you reuse code, the faster you can come up with solutions. Ultimately that means you deliver a working solution faster for your company.
    • Reusability makes you look like a champion, because you will deploy complete infrastructure at speed and quality no human being clicking around will match
  • A detailed recipe of how your infrastructure is built
    • Want to know what it takes to spin up a particular app? Go check out its ARM template, everything should be there
    • Need to transfer ownership of deploying a particular system? Now you easily can as the recipe is self contained in the templates

In future posts, I’ll go through a few examples of ARM templates I created recently. While I’m still no master at ARM templates, I can definitely share some of my learning experiences. That might come in handy for you!

Until then, sleep on the glory of declarative infrastructure as code! OK there will be some imperative in there as well too but you get the idea! Hopefully I was able to explain why you might want to spend some time getting acquainted with Azure Resource Manager templates.

Using Azure Application Insights with On-Premises Servers

For those who are not aware, Azure Application Insights can be a nice addition to your toolbox when it comes to troubleshooting application performance issues. It provides tons of functionality to drill into application metrics and diagnostic information. You can even use its core functionality free of charge for basic/low volume workloads. For more advanced scenarios, I would recommend going with the Enterprise edition that provides additional capabilities data export to Log Analytics, which by itself provides tons of new ways to explore the telemetry data captured by Application Insights. For detailed pricing information, hit the following link.

A lot of people wrongly assume that Azure Application Insights is only useful for applications hosted in Microsoft Azure. Today’s blog post will show you how you can leverage it with on-premises applications.

There are various ways you can use to send telemetry data to the Application Insights service. The easiest way to get going is by installing a local agent which will be installed on your IIS servers called Application Insights Status Monitor. You can install it using Microsoft Web Platform Installer or by clicking the following link. The installation process is very straightforward.

Once Application Insights Status Monitor is installed, it will discover ASP .NET web applications running on your IIS server.

In order to enable Application Insights for a web application, select it from the left pane. You will then have to sign to Azure. If you simply click the blue Add Application Insights button, a new instance of Application Insights will get created into a resource group named ApplicationInsights and it will be named after the IIS web site name. You probably want to name the objects yourself for clarity sake and respect of your corporate naming convention. In order to do this, your first need to select New Application Insights resource and then click the Configure settings link:

You will then be presented with the following dialog box:

You will now have the opportunity to pick the proper subscription and name the resource group and the Application Insight Resource as you wish. Once you have completed the dialog box, click OK. Now the Application Insights resource has been created according to your standard, you can now hit the Add Application Insights button.

This will then configure your web site/application by modifying the web.config of the application and adding the Application Insights DLLs in the bin directory. You will then be asked to restart IIS to complete the setup process.

With that process completed, the agent will start sending the telemetry data to the proper Azure Application Insights instance.

If you need to get more in depth performance data, you can setup the Application Insight Profiler. It’s worth noting that while this works just fine, this is not a scenario that’s supported by Microsoft and is provided as is.

In order to get this going, you will need first need to enable profiling in your Application Insights instance using the Azure Portal by going in the Performance section and then by clicking the Profiler gear icon. In the blade that will show up, simply click Enable Profiler.

Then you will need to grab the following package from Github here and follow the instructions provided on the Github page in order to install and configure the profiler agent.

Once this two step process is completed, ETL traces will get uploaded to the Application Insight workspace. From there you will be able to see more detailed information as to which .NET method in your page is causing slowness.

In a future post, I’ll go over some of the core areas of interest in Application Insights that will help you find and prioritize your application issues. If you have any questions regarding this post, feel free to contact me through the comments section.

Email addresses showing up with instead of custom domain name in Office 365

My colleague wanted me to post this little bit of info, so there you go Fred, that one is for you…and also for everyone that encounters that problem! 🙂

While working on setting up a third-party SaaS application with single sign-on with Azure AD, we had an issue where the email address of the user was not showing up correctly. This in turn caused problems in the SAML that was exchanged for authentication/authorization purposes with the third-party SaaS application. While looking at the proxyAddresses attribute in our on premises Active Directory, everything looked good, the right email addresses were in there for the user. The Azure AD Connect synchronization was configured to push that attribute across as well, so that wasn’t the issue. Looking at the proxyAddresses attribute in Azure AD showed the email address we were expecting to see was still missing.

While reading a few forum posts, I saw a couple of people reporting that email addresses for a user could be filtered out if the domain of the email address has not been verified as a domain by Azure Active Directory. Well, it turned out to be the case for our issue as we just started setting up that particular tenant. As soon as the domain was verified, the primary email address of the user changed from the address@<tenant id> to the proper address@<domain name>.

See the following links for more information as to how that works in more details:

Azure AD Connect sync service shadow attribute

How the proxyAddresses attribute is populated in Azure AD

Azure Privileged Identity Management – Activation Delays

While activating roles using Azure Privileged Identity Management for just in time escalation of privileges, we noticed issues where the rights were not being applied once the role was “activated”. We activated the required role and were then navigating to the desired section in the Azure Portal but we still didn’t have access. We often had to logout/login/ fiddle around with browser page refresh in order for the new rights to kick in. As a couple of my colleagues and myself were annoyed enough by this, I decided to open up a Microsoft Premier support case to see what’s going on with this. Here’s what I’ve found out/clarified with Microsoft support.

First, while the role assignment is written to Azure Active Directory right away, it does take time before it gets to the Azure AD replica servers. That usually happens within 5 minutes but it can sometimes take a little while longer. Once that’s done, the Azure Portal will become aware of this change but that generally requires a browser refresh for the change to kick in. It’s typically not required to re-login but that might be required to refresh any cached tokens. This token refresh would also refresh the user permissions in the Azure Portal.

Some other Azure/Office 365 services such as Exchange Online and Intune are not using Azure AD directly for authentication/authorization purposes. In those cases, it would depend on how fast their authentication/authorization systems are polling Azure AD to get the new changes. Typically that should happen within 15-40 minutes according to support but the announced SLA could be up to 24 hours.

Knowing this, you might need to adjust the duration of the activation for the role in order for it to make sense. You could also decide to go down the old route of permanently assigning those roles to the users. I’ve opened a feedback piece here if you would like to see this improved: Azure AD Privileged Identity Management – Display elevation propagation process . The idea is if it takes time for the role elevation to be propagated, then at least display where it’s at in the propagation process in order to set the users expectations accordingly.

Hopefully that shed some additional lights on the internals of Azure Privileged Identity Management.

Introduction to Azure Privileged Identity Management

As a general security best practice, it’s best to operate and manage IT infrastructure under the least privilege principle. Doing this on premise has often been problematic as it involved either a manual escalation process (Run As) or a custom automated process to achieve. The Run As approach is typically not ideal as even those secondary accounts have generally way more privileges than required to perform administrative tasks on systems. PowerShell Just Enough Administration definitely helps in that regard but today I will cover Azure’s take on this problem by covering the basics of Azure Privileged Identity Management (PIM).

With Azure PIM, you will have better visibility on the privileges required to manage your environment. It’s fairly easy to get started and to use so I highly encourage you to adopt this security practice in your environment, especially if you are just getting started with Azure in general.

Initially, Azure Privileged Identity Management (PIM) only covered privilege escalation for Azure Active Directory roles. This changed when Microsoft announced they are now covering Azure Resource Manager resources as well. This means you can now do just in time escalation of privileges to manage things like subscriptions, networking, VMs etc. In this post, I’ll cover the Azure AD roles portion of Azure PIM.

To quickly get started with Azure PIM with Azure AD roles, you can simply login to the Azure Portal and start assigning users as eligible to specific Azure AD roles. To achieve this, you go to the Azure AD Directory Roles section.

Once in the section, you can now go in the Roles section to start making users eligible to specific Azure AD roles by clicking the Add user button. A thing to note, is that you can only assign roles to specific users, not to a group.

Once you have specified a user as eligible to a role, that user can now activate it. To do this, they simply have to go in the Azure PIM section of the Azure Portal and pick My Roles. The user can then select the appropriate role to activate in order to perform the desired administrative task.

When you activate a role, you will be prompted to enter a reason as to why you need to elevate your privileges. This is generally good practice as it will allow the persons reviewing the escalations to understand why certain high privileges had to be used to perform a task.

Now that we have covered the basics to quickly get you started with PIM. We can dive a bit into how that experience can be customized. Here are the configuration options for an Azure AD role:

  • Maximum Activation duration: When the user activates a role, how long should it remain activated? A shorter duration is desirable for security reasons.
  • Notifications: Should an email be sent to an administrator when a role is activated? This can also give the admin a feeling as to whether an admin role is abused. i.e. Why use Global Admin when its not necessary to perform task X?
  • Incident/Request Ticket: You could enforce a support ticket number to be entered with each activation. This can be useful if you really need to close the loop as to why elevation is required. i.e. Need to change a setting to apply a change request or resolve an incident #####.
  • Multi-Factor Authentication: A user will need to be enrolled in Azure MFA in order to activate a role.
  • Require approval: When this is enabled, an admin will need to approve the activation for a user. This might be useful for high privilege roles such as Global Admin where you don’t want to have abuse of privileges. It also documents the full process better. i.e. User X asked for elevation and admin Y approved the request.

From an operational standpoint, you can also get alerts for the following things:

Out of those alerts, you can tune the thresholds in order to match your organization requirements:

  • For There are too many global administrators alerts, you can define the number of allowed global admins and the percentage of global admins versus the total number of administrators configured.
  • For Roles are being activated too frequently, you can specify the maximum duration between activation and the number of acceptable activation during that period. This could be useful to flag users that simply activate all roles for no good reasons just to make sure they have the required privileges to perform a task.

You can also configure the Access review functionality which specifies how you want to review the user activation history in order to maintain a tight ship security wise. You can configure the access review with the following settings:

  • Mail notifications: Send an email to advise an administrator to perform the access review
  • Reminders: Send an email to advise an administrator to complete an access review
  • Require reason for approval: Make sure the reviewer documents why an activation was approved/makes sense.
  • Access review duration: The number of days between each access review exercise (default is 30 days).

Once all this is configured, you can monitor the activation/usage of roles using the Directory Roles Audit History section:

I hope this quick introduction to Azure Privileged Identity Management was helpful. Should you have any questions about this, let me know!