In the dark corners of the academic world there is a rampant fight between practitioners of deep learning and researchers of Bayesian methods. This polemic article testifies to this, although firmly establishing itself as anti-Bayesian.

There is not much you can have against Bayes’ rule, so the hate runs deeper than this. I think it stems from the very behavior of Bayesian researchers rewriting existing methods as approximations to Bayesian methods.

Ferenc Huszár, a machine learning researcher at Twitter describes some of these approximations.

  • L1 regularization is just Maximum A Posteriori (MAP) estimation with sparsity inducing priors;
  • Support vector machines are just the wrong way to train Gaussian processes;
  • Herding is just Bayesian quadrature done slightly wrong;
  • Dropout is just variational inference done slightly wrong;
  • Stochastic gradient descent (SGD) is just variational inference (variational EM) done slightly wrong.

Do you have other approximations you can think of?