This was the first day of the Annual Technical Conference which is part of the Usenix FCW.
The first presentation of the day was from a researcher at VMware who presented their new method for resuming VMs from a snapshot. Their new technique called Halite reduced resume time from 26 seconds down to 1.5 seconds for a small Windows VM. They were able to achieve this by writing memory pages that were accessed during the checkpointing process together to optimize data reads while resuming to avoid random IO. I’ve asked whether they tested with SSD and if they would have good result just with that they answered it wasn’t tested with that configuration.
The second presentation was regarding a new model for virtual switches in hypervisors called Hyper Switch. They were to achieve very good performance improvements which could be improved even more in my opinion through direct memory access between VMs running on the same host instead of copying the data buffers around.
The third presentation was from Microsoft research who presented a method to migrate VMs over low bandwidth links through a series of optimizations which basically revolves around a better understanding of the memory content and structure. The result cuts migration time in half, mostly through data transferred being reduced.
The fourth presentation was regarding a data protection scheme called copysets. This new approach to data redundancy in the cloud differs from the traditional approach where data is copied to a determined number of random hosts. The goal is to minimize data loss events while keeping rebuild time to a minimum.
The fifth presentation was from Facebook on their graph service infrastructure called Tao. Basically it is built using a mix of memcache and MySQL database servers. The presentation covered the scale and resiliency of their system. The presenter mentioned they handle more than 1 billion queries per SECOND with that system. Impressive!
The sixth presentation of the day was a method of dynamic load adaptation method for Hadoop jobs to ensure that faster server nodes in the cluster were properly balanced versus slower nodes.
The afternoon sessions revolved around SSD usage and optimization. Microsoft research presented a paper on how they maintain write performance of SSD through an understanding of the physical and firmware characteristics of SSDs. Google presented Janus, their flash cache layer used in their Colossus distributed file system. The presenter presented their approach at introducing SSD in their environment though experimentation and careful planning.
The last presentations of the day covered topics such as a key value store accessed over infiniband and RDMA, techniques to trace memory access in applications through binary translation and clever use of x64 extensions, usage of flash on the client side when accessing a file server and to conclude a method to provide a secure and lightweight sandboxing mechanism through the use of system calls interception using seccomp/BPF and copy on write virtual file systems.
Really interesting day overall!