Skip to main content Skip to secondary navigation
Journal Article

Weld: A Common Runtime for High Performance Data Analytics

Abstract

Modern analytics applications combine multiple functions from different libraries and frameworks to build increasingly complex workflows. Even though each function may achieve high performance in isolation, the performance of the combined workflow is often an order of magnitude below hardware limits due to extensive data movement across the functions. To address this problem, we propose Weld, a runtime for data-intensive applications that optimizes across disjoint libraries and functions. Weld uses a common intermediate representation to capture the structure of diverse data-parallel workloads, including SQL, machine learning and graph analytics. It then performs key data movement optimizations and generates efficient parallel code for the whole workflow. Weld can be integrated incrementally into existing frameworks like TensorFlow, Apache Spark, NumPy and Pandas without changing their user-facing APIs. We show that Weld can speed up these frameworks, as well as applications that combine them, by up to 30×.

Project page

A runtime for improving the performance of data-intensive applications. It optimizes across libraries and functions by expressing the core computations in libraries using a small common intermediate representation, similar to CUDA and OpenCL.
Author(s)
Shoumik Palkar
James J Thomas
Anil Shanbhag
Deepak Narayanan
Holger Pirk
Malte Schwarzkopf
Saman Amarasinghe
Matei Zaharia
Journal Name
8th Biennial Conference on Innovative Data Systems Research (CIDR ’17)
Publication Date
January, 2017