Skip to main content Skip to secondary navigation

DAWNBench

Main content start

DAWNBench is a benchmark suite for end-to-end deep learning training and inference. Computation time and cost are critical resources in building deep models, yet many existing benchmarks focus solely on model accuracy. DAWNBench provides a reference set of common deep learning workloads for quantifying training time, training cost, inference latency, and inference cost across different optimization strategies, model architectures, software frameworks, clouds, and hardware.

Building on our experience with DAWNBench, we helped create MLPerf as an industry-standard for measuring machine learning system performance. Now that both the MLPerf Training and Inference benchmark suites have successfully launched, we ended rolling submissions to DAWNBench on 3/27/2020 to consolidate benchmarking efforts.

Citation

Please cite the following if you use results from the benchmark or competition in any way:

DAWNBench: An End-to-End Deep Learning Benchmark and CompetitionCody A. Coleman, Deepak Narayanan, Daniel Kang, Tian Zhao, Jian Zhang, Luigi Nardi, Peter Bailis, Kunle Olukotun, Chris Ré, and Matei ZahariaNIPS ML Systems Workshop, 2017

 

 


Image Classification on ImageNet

Training Time

Objective: Time taken to train an image classification model to a top-5 validation accuracy of 93% or greater on ImageNet.

RankTime to 93% AccuracyModelHardwareFramework

1

Mar 2020

0:02:38

ResNet50-v1.5

Apsara AI Acceleration(AIACC) team in Alibaba Cloud

source

16 ecs.gn6e-c12g1.24xlarge (AlibabaCloud)AIACC-Training 1.3 + Tensorflow 2.1

2

May 2019

0:02:43

ResNet-50

ModelArts Service of Huawei Cloud

source

16 nodes with InfiniBand (8*V100 with NVLink for each node)Moxing v1.13.0 + TensorFlow v1.13.1

3

Dec 2018

0:09:22

ResNet-50

ModelArts Service of Huawei Cloud

source

16 * 8 * Tesla-V100(ModelArts Service)Huawei Optimized MXNet

4

Sep 2018

0:18:06

ResNet-50

fast.ai/DIUx (Yaroslav Bulatov, Andrew Shaw, Jeremy Howard)

source

16 p3.16xlarge (AWS)PyTorch 0.4.1

5

Sep 2018

0:18:53

Resnet 50

Andrew Shaw, Yaroslav Bulatov, Jeremy Howard

source

64 * V100 (8 machines - AWS p3.16xlarge)ncluster / Pytorch 0.5.0a0+0e8088d

 

Training Cost

Objective: Total cost of public cloud instances to train an image classification model to a top-5 validation accuracy of 93% or greater on ImageNet.

RankCost (USD)ModelHardwareFramework

1

Mar 2020

$7.43

ResNet50-v1.5

Apsara AI Acceleration(AIACC) team in Alibaba Cloud

source

1 ecs.gn6e-c12g1.24xlarge (AlibabaCloud)AIACC-Training 1.3 + Tensorflow 2.1

2

Sep 2018

$12.60

ResNet50

Google Cloud TPU

source

GCP n1-standard-2, Cloud TPUTensorFlow v1.11.0

3

Mar 2020

$14.42

ResNet50-v1.5

Apsara AI Acceleration(AIACC) team in Alibaba Cloud

source

16 ecs.gn6e-c12g1.24xlarge (AlibabaCloud)AIACC-Training 1.3 + Tensorflow 2.1

4

Aug 2019

$19.00

Resnet 50

Chuan Li

source

Lambda GPU Cloud - 4x GTX 1080 Tincluster / Pytorch 1.0.0

5

Apr 2019

$20.89

ResNet50

Setu Chokshi (MS AI MVP | PropertyGuru)

source

Azure ND40s_v2PyTorch 1.0

 

Inference Latency

Objective: Latency required to classify one ImageNet image using a model with a top-5 validation accuracy of 93% or greater.

Rank1-example Latency (milliseconds)ModelHardwareFramework

1

Mar 2020

0.0739

ResNet26d

Apsara AI Acceleration(AIACC) team in Alibaba Cloud && Alibaba T-Head

source

Alibaba Cloud [ecs.ebman1.26xlarge]Pytorch+AIACC-Inference+HGAI

2

Feb 2020

0.3880

ResNet101

AI Cognitive Computing team in Alipay Group

source

Alibaba Cloud Nputensorflow+NpuInference

3

Mar 2020

0.3926

MIVT-NET-v2

Machine Intelligence in Alibaba Cloud

source

Alibaba Cloud [ecs.gn6i-c8g1.2xlarge]HIE

4

Feb 2020

0.4662

ResNet26

PAI: Platform of A.I. in Alibaba Cloud

source

Alibaba Cloud [ecs.gn6i-c8g1.2xlarge]PAI-Blade + TensorRT

5

Nov 2019

0.4945

ResNet26

ModelArts Service of Huawei Cloud

source

Huawei Cloud [pi2.2xlarge.4]ModelArts-AIBOX + TensorRT

 

Inference Cost

Objective: Average cost on public cloud instances to classify 10,000 validation images from ImageNet using of an image classification model with a top-5 validation accuracy of 93% or greater.

RankCost (USD)ModelFrameworkHardware

1

Oct 2019

$0.00

ResNet26d

Apsara AI Acceleration(AIACC) team in Alibaba Cloud

source

Pytorch+AIACC-InferenceAlibaba Cloud [ecs.gn6i-c8g1.2xlarge]

2

Jun 2019

$0.00

ResNet50

InferenceX Team of Didi Cloud

source

ifxDidi Cloud [1 P4 / 16 GB / 8 vCPU]

3

May 2018

$0.01

ResNet50

Perseus AI Cloud Acceleration team in Alibaba Cloud

source

TensorFlow 1.12.2Alibaba Cloud [ecs.gn5i-c8g1.2xlarge]

4

Dec 2018

$0.02

ResNet50

Perseus AI Cloud Acceleration team in Alibaba Cloud

source

TensorFlow 1.10.0Alibaba Cloud [ecs.gn5i-c8g1.2xlarge]

5

Apr 2018

$0.02

ResNet50

Intel(R) Corporation

source

Intel(R) Optimized CaffeAmazon EC2 [c5.2xlarge]

 

 

 


Image Classification on CIFAR10

Training Time

Objective: Time taken to train an image classification model to a test accuracy of 94% or greater on CIFAR10.

RankTime to 94% AccuracyModelFrameworkHardware

1

Dec 2019

0:00:10

Custom Resnet 9

Santiago Akle Serrano, Hadi Pour Ansari, Vipul Gupta, Dennis DeCoste

source

Pytorch 1.1.0Tesla V100 * 8 GPU / 32 GB / 40 CPU

2

Jan 2020

0:00:11

Custom ResNet 9

Ajay Uppili Arasanipalai

source

PyTorch 1.1.0IBM AC922 + 4 * Nvidia Tesla V100 (NCSA HAL)

3

Oct 2019

0:00:28

Kakao Brain Custom ResNet9

clint@KakaoBrain

source

PyTorch 1.1.0Tesla V100 * 4 GPU / 488 GB / 56 CPU (Kakao Brain BrainCloud)

4

May 2019

0:00:45

BaiduNet9P

Baidu USA GAIT LEOPARD team: Baopu Li, Zhiyu Cheng, Yingze Bao

source

PyTorch v1.0.1 and PaddlePaddleBaidu Cloud Tesla 8*V100-16GB/448 GB/96 CPU

5

Oct 2019

0:00:58

Kakao Brain Custom ResNet9

clint@KakaoBrain

source

PyTorch 1.1.0Tesla V100 * 1 GPU / 488 GB / 56 CPU (Kakao Brain BrainCloud)

Training Cost

Objective: Total cost for public cloud instances to train an image classification model to a test accuracy of 94% or greater on CIFAR10.

RankCost (USD)ModelFrameworkHardware

1

May 2019

$0.02

BaiduNet9

Baidu USA GAIT LEOPARD team: Baopu Li, Zhiyu Cheng, Yingze Bao

source

PyTorch v1.0.1 and PaddlePaddleBaidu Cloud Tesla V100*1-16GB/56 GB/12 CPU

2

Aug 2019

$0.04

BaiduNet9

Chuan Li

source

fastai / Pytorch 1.0.0Lambda GPU Cloud - 4x GTX 1080 Ti

3

Nov 2018

$0.06

Custom ResNet 9

David Page, myrtle.ai

source

pytorch 0.4.0V100 (AWS p3.2xlarge)

4

May 2019

$0.11

BaiduNet9P

Baidu USA GAIT LEOPARD team: Baopu Li, Zhiyu Cheng, Yingze Bao

source

PyTorch v1.0.1 and PaddlePaddleBaidu Cloud Tesla 8*V100-16GB/448 GB/96 CPU

5

Apr 2018

$0.26

Custom Wide Resnet

fast.ai + students team: Jeremy Howard, Andrew Shaw, Brett Koonce, Sylvain Gugger

source

fastai / pytorchPaperspace Volta (V100)

Inference Latency

Objective: Latency required to classify one CIFAR10 image using a model with a test accuracy of 94% or greater.

Rank1-example Latency (milliseconds)ModelFrameworkHardware

1

Nov 2019

0.1345

ResNet8

ModelArts Service of Huawei Cloud

source

ModelArts-AIBOX + TensorRTHuawei Cloud [pi2.2xlarge.4]

2

Apr 2019

0.6830

BaiduNet8 using PyTorch JIT in C++

Baidu USA GAIT LEOPARD team: Baopu Li, Zhiyu Cheng, Jiazhuo Wang, Haofeng Kou, Yingze Bao

source

PyTorch v1.0.1 and PaddlePaddleBaidu Cloud Tesla V100*1/60 GB/12 CPU

3

Nov 2018

0.8280

Custom ResNet 9 using PyTorch JIT in C++

Laurent Mazare

source

PyTorch v1.0.0.dev201811161 P100 / 128 GB / 16 CPU

4

Oct 2019

0.8570

Kakao Brain Custom ResNet9 using PyTorch JIT in python

clint@KakaoBrain

source

PyTorch 1.1.0Tesla V100 * 1 GPU / 488 GB / 56 CPU (Kakao Brain BrainCloud)

5

Oct 2017

9.7843

ResNet 56

Stanford DAWN

source

PyTorch v0.1.121 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])

Inference Cost

Objective: Average cost on public cloud instances to classify 10,000 test images from CIFAR10 using an image classification model with a test accuracy of 94% or greater.

RankCost (USD)ModelFrameworkHardware

1

Apr 2019

$0.00

BaiduNet8 using PyTorch JIT in C++

Baidu USA GAIT LEOPARD team: Baopu Li, Zhiyu Cheng, Jiazhuo Wang, Haofeng Kou, Yingze Bao

source

PyTorch v1.0.1 and PaddlePaddleBaidu Cloud Tesla V100*1/60 GB/12 CPU

2

Oct 2017

$0.02

ResNet 56

Stanford DAWN

source

PyTorch v0.1.121 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])

3

Oct 2017

$0.04

ResNet 164 (without bottleneck)

Stanford DAWN

source

TensorFlow v1.260 GB / 16 CPU (Google Cloud [n1-standard-16])

4

Oct 2017

$0.05

ResNet 164 (with bottleneck)

Stanford DAWN

source

TensorFlow v1.260 GB / 16 CPU (Google Cloud [n1-standard-16])

5

Oct 2017

$0.07

ResNet 164 (without bottleneck)

Stanford DAWN

source

PyTorch v0.1.121 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])

 

 

 


Question Answering on SQuAD

Training Time

Objective: Time taken to train a question answering model to a F1 score of 0.75 or greater on the SQuAD development dataset.

RankTime to 0.75 F1ModelFrameworkHardware

1

Mar 2019

0:18:46

FastFusionNet

Wu et al. (Cornell, SayMosaic, Google)

source

Pytorch v0.3.11 NVidia GTX-1080 Ti

2

Dec 2018

0:27:07

DrQA

Runqi Yang, Facebook ParlAI, Brett Koonce

source

Pytorch 1.0.01 NVidia 2080 RTX (dev box)

3

Apr 2018

0:45:56

QANet

Google

source

TensorFlow v1.81 TPUv2

4

Dec 2018

0:50:21

DrQA

Runqi Yang, Facebook ParlAI, Brett Koonce

source

Pytorch 1.0.01 T4 / GCP

5

Dec 2018

0:56:43

DrQA

Runqi Yang, Facebook ParlAI, Brett Koonce

source

Pytorch 1.0.01 P4 / GCP

Training Cost

Objective: Total cost for public cloud instances to train a question answering model to a F1 score of 0.75 or greater on the SQuAD development dataset.

RankCost (USD)ModelFrameworkHardware

1

Dec 2018

$0.57

DrQA

Runqi Yang, Facebook ParlAI, Brett Koonce

source

Pytorch 1.0.01 P4 / GCP

2

Dec 2018

$0.76

DrQA

Runqi Yang, Facebook ParlAI, Brett Koonce

source

Pytorch 1.0.01 T4 / GCP

3

Sep 2018

$1.23

DrQA

Runqi Yang, Facebook ParlAI, Brett Koonce

source

Pytorch 0.4.11 K80 / AWS p2.xlarge

4

Sep 2018

$3.09

DrQA

Runqi Yang, Facebook ParlAI, Brett Koonce

source

Pytorch 0.4.11 V100 / AWS p3.2xlarge

5

Oct 2017

$5.78

BiDAF

Stanford DAWN

source

TensorFlow v1.260 GB / 16 CPU (Google Cloud [n1-standard-16])

Inference Latency

Objective: Latency required to answer one SQuAD question using a model with a F1 score of at least 0.75 on the development dataset.

Rank1-example Latency (milliseconds)ModelFrameworkHardware

1

Jul 2019

7.5790

PA-Occam-Bert

Ping An Technology Occam Platform

source

Tensorflow 1.13.01 NVidia Tesla V100

2

Feb 2019

7.9000

FastFusionNet

Wu et al. (Cornell, SayMosaic, Google)

source

Pytorch v0.3.11 NVidia GTX-1080 Ti

3

Oct 2017

100.0000

BiDAF

Stanford DAWN

source

TensorFlow v1.260 GB / 16 CPU (Google Cloud [n1-standard-16])

4

Oct 2017

590.0000

BiDAF

Stanford DAWN

source

TensorFlow v1.21 K80 / 30 GB / 8 CPU (Google Cloud)

5

Oct 2017

638.1000

BiDAF

Stanford DAWN

source

TensorFlow v1.21 P100 / 512 GB / 56 CPU (DAWN Internal Cluster)

Inference Cost

Objective: Average cost on public cloud instances to answer 10,000 questions from the SQuAD development dataset using a question answering model to a dev F1 score of 0.75% or greater.

RankCost (USD)ModelFrameworkHardware

1

Oct 2017

$0.15

BiDAF

Stanford DAWN

source

TensorFlow v1.260 GB / 16 CPU (Google Cloud [n1-standard-16])

2

Oct 2017

$1.58

BiDAF

Stanford DAWN

source

TensorFlow v1.21 K80 / 30 GB / 8 CPU (Google Cloud)

3

Oct 2017

$1.76

BiDAF

Stanford DAWN

source

TensorFlow v1.21 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])
 

People