Skip to main content Skip to secondary navigation
Journal Article

Evaluating End-to-end Optimization for Data Analytics Applications in Weld

Abstract

Modern analytics applications use a diverse mix of libraries and functions. Unfortunately, there is no optimization across these libraries, resulting in performance penalties as high as an order of magnitude in many applications. To address this problem, we proposed Weld, a common runtime for existing data analytics libraries that performs key physical optimizations such as pipelining under existing, imperative library APIs. In this work, we further develop the Weld vision by designing an automatic adaptive optimizer for Weld applications, and evaluating its impact on realistic data science workloads. Our optimizer eliminates multiple forms of overhead that arise when composing imperative libraries like Pandas and NumPy, and uses lightweight measurements to make data-dependent decisions at run-time in ad-hoc workloads where no statistics are available, with sub-second overhead. We also evaluate which optimizations have the largest impact in practice and whether Weld can be integrated into libraries incrementally. Our results are promising: using our optimizer, Weld accelerates data science workloads by up to 23× on one thread and 80× on eight threads, and its adaptive optimizations provide up to a 3.75× speedup over rule-based optimization. Moreover, Weld provides benefits if even just 4–5 operators in a library are ported to use it. Our results show that common runtime designs like Weld may be a viable approach to accelerate analytics.

Project page

A runtime for improving the performance of data-intensive applications. It optimizes across libraries and functions by expressing the core computations in libraries using a small common intermediate representation, similar to CUDA and OpenCL.
Author(s)
Shoumik Palkar
James Thomas
Deepak Narayanan
Pratiksha Thaker
Rahul Palamuttam
Parimajan Negi
Anil Shanbhag
Malte Schwarzkopf
Holger Pirk
Saman Amarasinghe
Samuel Madden
Matei Zaharia
Journal Name
Proceedings of the VLDB Endowment
Publication Date
May 1, 2018
DOI
10.14778/3213880.3213890
Publisher
ACM