Machine learning done Bayesian
In the dark corners of the academic world there is a rampant fight between practitioners of deep learning and researchers of Bayesian methods. This polemic article testifies to this, although firmly establishing itself as anti-Bayesian.
There is not much you can have against Bayes’ rule, so the hate runs deeper than this. I think it stems from the very behavior of Bayesian researchers rewriting existing methods as approximations to Bayesian methods.
Ferenc Huszár, a machine learning researcher at Twitter describes some of these approximations.
- L1 regularization is just Maximum A Posteriori (MAP) estimation with sparsity inducing priors;
- Support vector machines are just the wrong way to train Gaussian processes;
- Herding is just Bayesian quadrature done slightly wrong;
- Dropout is just variational inference done slightly wrong;
- Stochastic gradient descent (SGD) is just variational inference (variational EM) done slightly wrong.
Do you have other approximations you can think of?