A great aspect of deep learning (and machine learning in general) is that there are a lot of well established datasets and tasks on which researchers try and measure the performance of their approaches. This post is an attempt to gather some of those in one place. Of course, state of the art (SOA) on any task changes, sometimes quite quickly, and therefore this post is bound to be obsolete soon. The numbers below are for April 2017. Please comment below if some info is mission and/or outdated.

Machine Vision

Task Dataset Best result Publication
Classification MNIST Top1 Accuracy of 99.79% Regularization of Neural Networks using DropConnect
Classification CIFAR-10 Top1 Accuracy of 97.14% Shake-shake regularization of 3-branch residual networks
Classification CIFAR-100 Top1 Accuracy of 81.7% Wide Residual Networks
Classification SVHN Top1 Accuracy of 98.46% Wide Residual Networks

Natural Language Processing

Task Dataset Best result Publication
Language Modeling One Billion Word Benchmark Single model perplexity 24.29 Factorization tricks for LSTM networks
Machine Translation WMT newstest 2014 En->Fr BLEU score 40.56 Outrageously large neural networks: the sparceley-gated mixture-of-experts layer
Machine Translation WMT newstest 2014 En->De BLEU score 26.03 Outrageously large neural networks: the sparceley-gated mixture-of-experts layer
Speech Recognition NIST 2000 Switchboard Task Word error rate 6.2% The Microsoft 2016 Conversational Speech Recognition System