VMWare DRS Possible Improvements
February 10, 2008
I’ve been using VMWare ESX since version 2 and one of my favorite features is the combination of DRS (Distributed Resource Scheduler) and VMotion. After using it in production for a while, I noticed that DRS could be improved. One thing I saw is that the cluster was not balancing CPU and memory load properly accross all nodes. I could have nodes using 80% of their memory with a decent CPU load while others were sitting at 40% memory usage even with the most aggressive DRS settings. What I would like to see in DRS is smarter load balancing. For instance ESX could detect that a certain VM has a high CPU or I/O usage every night and issues a VMotion accordingly before the peak usage. It would be nice to setup priorities for such type of loads. One example that comes to my mind is SharePoint. You usually schedule document indexing during the night, this is typically not a high priority job but during the day you want to be able to deliver good response time for users while they navigate and query the search engine. In that case you could define a time range that specifies the priority a certain VM has over others. This way you could better load balance VMs accross the cluster. For example VMs that are mostly idle could be regrouped on a limited set of nodes and the ones performing intensive operations distributed appropriately on the remaining nodes during that period. It would be interesting to see if data mining of the data in Virtual Center could discover load patterns and correlation between VMs to enhance DRS functionnality.