Cloud computing has become ubiquitous due to its resource flexibility and cost efficiency. Resource flexibility allows Cloud users to elastically scale their Cloud resources, for instance, by horizontally scaling the number of virtual machines allocated to each application as the application demands change. However, matching resource demands to applications is non-trivial and applications experiencing highly dynamic workloads make it much more difficult. Cost efficiency is primarily achieved through workload consolidation, i.e., by co-locating applications on the same physical host. Unfortunately, workload consolidation often comes at a performance penalty, as consolidated applications contend for shared resources, leading to interference and performance unpredictability. Interference is particularly destructive for latency-critical applications, which must meet strict quality of service (QoS) requirements. Another significant technological trend is the growing prevalence of multi-socket systems in contemporary data centers. However, to the best of our knowledge, existing proposals for QoS-aware resource allocation are, by design, not tailored to multi-socket systems. Specifically, existing proposals do not support cross-socket sharing of memory, which entails a sub-optimal use of multi-socket host’s aggregate memory resources. This thesis focuses on two aspects of Cloud resource management namely, QoS-aware elasticity and resource arbitration, on two levels: inter-node resource management and intra-node resource management. In the first level, we consider the number of virtual machines (VMs) as the main resource to allocate and de-allocate for horizontal auto-scaling of an elastic service or application in the Cloud. In the intra-node resource management, we treat the memory bandwidth in multi-socket system as the resource to arbitrate among co-located applications. In both levels, the overall goal of this thesis is to provide resource management mechanisms that automatically adapt the resources allocated to data-intensive services to improve resource utilization while meeting service-level objectives (SLOs). In the context of inter-node resource management for auto-scaling of elastic Cloud services, this thesis improves the usefulness of elasticity controllers by addressing some of the challenges posed by current model-predictive control systems (such as training and tuning of the controller and adapting it to different workload patterns). To enable elastic execution of Cloud-based services using model-predictive control, we propose, implement, and evaluate OnlineElastMan, a self-trained proactive elasticity manager for Cloud-based storage services. OnlineElastMan excels its peers with its practical aspects, including easily measurable and obtainable performance and QoS metrics, automatic online training, and an embedded generic workload prediction module. Our evaluation shows that OnlineElastMan continuously improves its provision accuracy, minimizing provisioning cost and SLO violations, under various workload patterns. In the context of intra-node resource management, this thesis departs from the observation that, since state-of-the-art QoS-aware resource allocation systems disallow cross socket sharing of memory among consolidated applications, the memory bandwidth resources of multi-socket hosts cannot be properly exploited. Therefore, this thesis aims at filling that gap by designing, implementing and evaluating two novel techniques for memory bandwidth allocation for multi-socket Cloud nodes. First, we propose BWAP, a novel bandwidth-aware page placement tool for memory-intensive applications on non-uniform memory access (NUMA) systems. BWAP takes the asymmetric bandwidths of every NUMA node into account to determine and enforce an optimized application-specific weighted interleaving. Our evaluations on a diverse set of memory-intensive workloads, show that BWAP achieves up to 4× speedups when compared to a first-touch baseline policy (as provided by Linux’s default). Second, we propose BALM, a QoS-aware memory bandwidth allocation technique for multi-socket architectures. The key insight of BALM, is to combine commodity bandwidth allocation mechanisms originally designed for single socket with a novel adaptive cross-socket page migration scheme. Our evaluation shows that BALM can safeguard the SLO of latency-critical applications, with marginal SLO violation windows, while delivering up to 87% throughput gains to bandwidth-intensive best-effort applications compared to state-of-the-art alternatives. All solutions proposed and presented in this thesis, namely OnlineElastMan, BWAPand BALM, have been implemented and evaluated on real-world workloads. The result indicates the feasibility and effectiveness of our proposed approaches to improve inter-resource and intra-resource management through QoS-aware elastic execution and effective arbitration of resources among consolidated workloads in Cloud nodes.
Type of publication
Year of publication