A great aspect of deep learning (and machine learning in general) is that there are a lot of well established datasets and tasks on which researchers try and measure the performance of their approaches. This post is an attempt to gather some of those in one place. Of course, state of the art (SOA) on any task changes, sometimes quite quickly, and therefore this post is bound to be obsolete soon. The numbers below are for April 2017. Please comment below if some info is mission and/or outdated.
Machine Vision
Task | Dataset | Best result | Publication |
---|---|---|---|
Classification | MNIST | Top1 Accuracy of 99.79% | Regularization of Neural Networks using DropConnect |
Classification | CIFAR-10 | Top1 Accuracy of 97.14% | Shake-shake regularization of 3-branch residual networks |
Classification | CIFAR-100 | Top1 Accuracy of 81.7% | Wide Residual Networks |
Classification | SVHN | Top1 Accuracy of 98.46% | Wide Residual Networks |
Natural Language Processing
Task | Dataset | Best result | Publication |
---|---|---|---|
Language Modeling | One Billion Word Benchmark | Single model perplexity 24.29 | Factorization tricks for LSTM networks |
Machine Translation | WMT newstest 2014 En->Fr | BLEU score 40.56 | Outrageously large neural networks: the sparceley-gated mixture-of-experts layer |
Machine Translation | WMT newstest 2014 En->De | BLEU score 26.03 | Outrageously large neural networks: the sparceley-gated mixture-of-experts layer |
Speech Recognition | NIST 2000 Switchboard Task | Word error rate 6.2% | The Microsoft 2016 Conversational Speech Recognition System |