Big Data Benchmarking

Tilmann Rabl and Chaitan Baru.

IEEE International Big Data Conference, 2014.


This tutorial will introduce the audience to the broad set of issues involved in defining big data benchmarks, for creating auditable industry-standard benchmarks that consider performance as well as price/performance. Big data benchmarks must capture the essential characteristics of big data applications and systems, including heterogeneous data, e.g. structured, semi- structured, unstructured, graphs, and streams; large-scale and evolving system configurations; varying system loads; processing pipelines that progressively transform data; workloads that include queries as well as data mining and machine learning operations and algorithms. Different benchmarking approaches will be introduced, from micro-benchmarks to application- level benchmarking.

Since May 2012, five workshops have been held on Big Data Benchmarking including participation from industry and academia. One of the outcomes of these meetings has been the creation of industry’s first big data benchmark, viz., TPCx-HS, the Transaction Processing Performance Council’s benchmark for Hadoop Systems. During these workshops, a number of other proposals have been put forward for more comprehensive big data benchmarking. The tutorial will present and discuss salient points and essential features of such benchmarks that have been identified in these meetings, by experts in big data as well as benchmarking. Two key approaches are now being pursued—one, called BigBench, is based on extending the TPC- Decision Support (TPC-DS) benchmark with big data applications characteristics. The other called Deep Analytics Pipeline, is based on modeling processing that is routinely encountered in real-life big data applications. Both will be discussed.


Tags: big data, benchmarking, tutorial

Readers who enjoyed the above work, may also like the following:

  • Discussion of BigBench: A Proposed Industry Standard Performance Benchmark for Big Data.
    Chaitanya Baru, Milind Bhandarkar, Carlo Curino, Manuel Danisch, Michael Frank, Bhaskar Gowda, Hans-Arno Jacobsen, Huang Jie, Dileep Kumar, Raghunath Nambiar, Meikel Poess, Francois Raab, Tilmann Rabl, Nishkam Ravi, Kai Sachs, Saptak Sen, Lan Yi, and Choonhan Youn.
    In Sixth TPC Technology Conference on Performance Evaluation & Benchmarking, pages 44-63, 2014. Springer Berlin Heidelberg.
    Tags: bigbench, big data, benchmarking
  • BigBench Specification V0.1.
    Tilmann Rabl, Ahmad Ghazal, Minqing Hu, Alain Crolotte, Francois Raab, Meikel Poess, and Hans-Arno Jacobsen.
    In Proceedings of the 2012 Workshop on Big Data Benchmarking, pages 164-202, 2013.
    Tags: bigbench, big data, benchmarking
  • Big Data Generation.
    Tilmann Rabl and Hans-Arno Jacobsen.
    In Proceedings of the Workshop on Big Data Benchmarking, pages 20-27, 2013.
    Tags: pdgf, big data, benchmarking