No Time to Halt: In-Situ Analysis for Large-Scale Data Processing via Virtual Snapshotting

Abstract

Large-scale data processing applications often perform long running computations or analytical queries over days or even weeks. While these are running, users typically get no proper information on the current state and progress of the computation and the quality of the intermediate result. Consequently, it takes a long time until a user is able to assess whether a computation was useful or not. To shorten this feedback loop, in-situ analysis accesses the internal state of the application while it is still running. Unfortunately, state-of-the-art techniques come with two problems: They (a) require the application to halt in order to access a stable state and (b) require extensive code modification. To solve both problems, we advocate to perform virtual snapshotting instead, which allows to create stable snapshots without halting or modifying the application in any way using a copy-on-write based approach. While we already show-cased the core technique as a proof-of-concept in previous work, it remains open how effectively our technique coined in-situ-CoW actually operates on different types of applications. To find out, we first perform an in-depth analysis of memory access patterns of 16 large-scale data processing workloads, including representative applications from the scientific domain as well as from data management. Revealing highly different needs of application classes, we adjust in-situ-CoW to dynamically adapt the snapshotting granularity to the workload. In an extensive experimental evaluation, we compare in-situ-CoW on all applications against (i) traditional physical snapshotting, (ii) MVCC, as well as (iii) two oracle policies. In comparison to traditional physical snapshotting, in-situ-CoW reduces the performance overhead by up to 98% (66% on average) for scientific workloads and by up to 64% (45% on average) for YCSB. In comparison to MVCC, in-situ-CoW reduces the performance overhead by up to 89% (25% on average) under write-intensive workloads.

Publication
28th International Conference on Extending Database Technology (EDBT)
Reza Salkhordeh
Reza Salkhordeh
Substitute Professor

My research interests include operating systems, solid-state drives, and data storage systems.