In the following post, I will try to explain the approach I’ve used to estimate the costs of running a few hundred VMs in Azure IaaS.As manually sizing each individual VMs would take quite some time, I preferred to go with an automated approach.
At a high level, here are the steps I’ve taken to achieve this.
- Collect performance metrics for the VM candidates over a significant period of time
- Capture Azure VMs pricing information
- Capture Azure VMs characteristics
- Select appropriate VM size
- Calculate VM operating hours
- Determine VM pricing strategy
- Generate VM storage configuration
Collect Performance Metrics
For this step, I’ve opted to use Hyper-V metering data I was already collecting for all of the VMs running on premises. Alternatively, one could also use data coming from perfmon but that would take some extra data preparation steps in order to be usable in the VM sizing exercise. I’ve covered the basics of the scripts in this other blog post if you’re interested: Hyper-V Resource Metering and Cloud Costing Benchmarking
In this data sets I’m collecting a few data elements that are critical for roughly determining the VM size:
- CPU Utilization (actual MHz consumed)
- Memory Utilization
- Total Disk IOPS
- Total Disk Capacity
I could have opted to included network utilization but have decided to keep this aside for the time being as my workload is not network IO bound. Here’s what the raw data that will be used for the sizing exercise looks like:
Capture Azure Pricing Information
For this part I’m using a couple of price lists as the basis for the automated analysis. Here are the two main sources of information I’m using:
- Price list from https://ea.azure.com (If you’re a Microsoft Enterprise Agreement customer)
- Reserved Instance Price list (also found on the https://ea.azure.com site)
Based on that data, I’ve extracted pricing information for all the VM sizes in the particular region I was interested in. I took the CSV files and loaded them in a SQL Server table in the same database containing all my Hyper-V metering data.
Capture Azure VMs characteristics
Once you have the basic pricing information loaded and processed, the next step would be capture the actual sizing information for the Azure VM sizes. In order to capture that information, I used the page Sizes for Windows virtual machines in Azure to capture the following key pieces of information about the VM configuration:
- vCPU count
- Maximum IOPS
- Maximum number of disks
Here’s what the data looks like in the table to give you and idea.
Select Appropriate VM Size
Now here comes the tricky bit of the process. At a high level, here’s how the Azure VM size logic works.
- Find a VM with enough vCPU to support the number of MHz the VM is currently consuming.
- This logic is crude at the moment as I’m strictly doing a conversion from MHz to number of cores independently of the actual CPU in Azure. I will work on tweaking this aspect in the future.
- Find a VM size with as much RAM as what’s being used on premises.
- In this particular case, I put an artificial cap of ~448GB as this is the largest VM size I can get in my targeted region
- Find a VM size that can accommodate the maximum number of IOPS
- In this particular case, I put an artificial cap of 80000 IOPS
- As there’s mix of both Dev/Test and Production VMs, I’m filtering to get either Dev/Test subscription pricing or production pricing
- I also make sure I’m getting VM SKUs that don’t include Windows licenses in the price
- Of all the options that are matching what’s needed, sort them in ascending order of price and pick the cheapest option.
Calculate VM Operating Hours
An important step in cost sizing and optimizing your workload in Azure involves determining the VM operating hours. i.e. Some VMs don’t always need to be running 24/7, so why pay for those extra hours? In my case, I applied the following general assumption in my logic. This can definitely be refined but it gives a good idea.
- If it’s a Dev/Test VM, set the operating hours as 10 hours per day, 5 days a week
- If it’s a production VM, set it to 24/7 unless it’s an RDS session host, then in that case, I adjust the operating hours based on our actual user demand (i.e. more hosts at peak during the day, less during the night)
Determine VM Pricing Strategy
Now that you have the operating hours in hand and the actual VM size, you can now determine if it’s better to pay hourly for it or if it’s better to get a reserved instance. If your VM is running 24/7 you definitely want to leverage the pricing of Azure Reserved Instances (RI). For the other cases, you have to evaluate if the operating hours of the VM vs the discount level you get with an Azure is worth it.
Generate VM Storage Configuration
Another fun part of sizing your VM is determining what type of disks you will use and how many of them are required to support your workload. As there’s a wide range of options, finding the most cost effective option can be tricky/time consuming. As you can see below, there are quite a few permutations you need to consider!
Load the pricing of all the storage options in a table, managed/unmanaged, standard/premiumIn my case here’s what I ended up doing, it’s not perfect and still needs some work but it gives an idea!
- For each disk option
- Determine the number of disks required to reach capacity
- Determining the number of disks required to reach IOPS required
- As you iterate through the disk options, keep the lowest priced option
- Discard options where the disk count required is higher than the VM size selected in the previous steps
Now for the moment of truth/did it blend moment! Here’s a sample output of that process. Here’s what the output of the SQL query looks like:
Now that I can get that output from SQL Server, I can load that up in Excel to do all sorts of fancy charts and tables and PivotTable the heck out of that data for further analysis
À la Apple, there’s one more thing I omitted to mention. In that exercise, I wanted to compare costs with AWS out of due dilligence. In order to achieve this, I went through the list of Azure VM sizes and manually found the closest size equivalent in Amazon EC2. So when I’m I’m picking the size of an Azure VM, I also pick an equivalent size at Amazon for that VM along with its pricing information. The same type of pricing logic is applied to keep things as fair as possible between the two options. Right now I have yet to tackle the VM storage sizing piece, that’s one of my next step.
I’m also attempting to compare cost with our on-premises infrastructure but that involves a whole separate set of calculation that I will not cover in this version of the article. Just be aware that it’s feasible if you roll-up your sleeves a bit. In the end you can have a nice looking chart comparing On-Premises/Azure/AWS/etc.!
Needless to say this is by no mean a perfect sizing methodology. It’s still very rough around the edges but should give pricing ballpark. The goal is to have an iterative approach in order to appropriately size your workload for execution in Azure. You may find that some workloads are just not good candidate at all depending on your requirements. There are a LOT of variables to consider when sizing a VM and not all of those variables were considered in the current iteration of the process I have so far. I’ll keep adding those to improve/optimize the costing model.
Right now my process works for VMs for which I have Hyper-V metering statistics but it wouldn’t be too difficult to extend it to include future/hypothetical VMs as well. One would simply have to throw the simulation data in another table and process it using the same logic, which in my case is a T-SQL Table Valued function. Here’s what the actual query I’m using in Excel looks like to give you a feel for this:
select vmnhs.ClusterName, vmnhs.VMName, vmnhs.PrimaryEnvironment, vmnhs.PrimarySystemName, cvi.*, cvi.VMYearlyOperatingHours*AzureVMHourlyPrice AS AzureYearlyCost, cvi.VMYearlyOperatingHours*AzureVMHourlyPrice + cvi.AzureVMYearlyStoragePrice AS AzureTotalYearlyCost, cvi.VMYearlyOperatingHours*AWSVMHourlyPrice AS AWSYearlyCost, vmnhs.AverageIOPS, vmnhs.MaximumIOPS, vmnhs.MaximumEstimatedCores, vmnhs.MaximumEstimatedRAMGB, vmnhs.MaximumTotalDiskAllocation from (select vmnhs.ClusterName, vm.VMName, vm.PrimaryEnvironment, vm.PrimarySystemName, MAX(vmnhs.MaximumMemoryUsage) AS MaximumMemoryUsage, MAX(vmnhs.MaximumProcessorUsage) AS MaximumProcessorUsage, MAX(vmnhs.MaximumAggregatedAverageNormalizedIOPS) AS MaximumIOPS, AVG(vmnhs.AverageAggregatedAverageNormalizedIOPS) AS AverageIOPS, MAX(vmnhs.MaximumTotalDiskAllocation) AS MaximumTotalDiskAllocation, CEILING(((MAX(CAST(vmnhs.MaximumProcessorUsage AS FLOAT)))/2600)) AS MaximumEstimatedCores, CEILING(((MAX(CAST(vmnhs.MaximumMemoryUsage AS FLOAT)))/1024)) AS MaximumEstimatedRAMGB, MAX(SampleTime) AS SampleTime from [dbo].[VirtualMachineNormalizedHourlyStatistics] vmnhs INNER JOIN VirtualMachines vm ON vmnhs.VMName=vm.vmname AND vmnhs.ClusterName = vm.ClusterName where SampleTime > '2017-11-01' AND vm.PrimarySystemName NOT IN ('Microsoft Remote Desktop Virtual Desktop Infrastructure') GROUP BY vmnhs.ClusterName,vm.VMName,vm.PrimaryEnvironment,vm.PrimarySystemName) AS vmnhs CROSS APPLY dbo.getCloudVMSizingInformation(vmnhs.VMName,'Microsoft',vmnhs.ClusterName,vmnhs.ClusterName,vmnhs.MaximumMemoryUsage,vmnhs.MaximumProcessorUsage,vmnhs.SampleTime,vmnhs.PrimaryEnvironment,vmnhs.PrimarySystemName,vmnhs.AverageIOPS*1.25,vmnhs.MaximumTotalDiskAllocation) cvi
I’d like to package this better so that I can share the sizer with the rest of the community. When things stabilize a bit with the sizer, I’ll definitely work on that.
If you have questions/comments about this blog post, feel free to comment below!