Just to add a couple others: rmsprop is a great technique I don't hear talked ab... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		dave_sullivan on Dec 9, 2014 \| parent \| context \| favorite \| on: Why are deep neural networks hard to train? Just to add a couple others: rmsprop is a great technique I don't hear talked about as much, example implementation here: https://github.com/BRML/climin/blob/master/climin/rmsprop.py Using nesterov momentum and a "sparse" weight initialization scheme rather than uniform: https://www.cs.toronto.edu/~hinton/absps/momentum.pdf Reducing the learning rate exponentially and increasing the momentum rate linearly over the course of training. Learning rate from .5 to .0001, momentum from .7 to .995. I've seen variations on this, like adjusting based on sigmoid curve. Dropout may or may not help, adjusting dropout rate (percentage of activations that are discarded) may or may not help. Mini-batch size can make a difference. Somewhere between 2 and 200? You can use bayesian optimization to intelligently search hyperparameters: https://github.com/JasperSnoek/spearmint Try rmsprop though, I've heard good things.

benanne on Dec 9, 2014 | [–]

I haven't had any luck so far with rmsprop, adagrad and adadelta. SGD + Nesterov momentum has served me best.

xtacy on Dec 9, 2014 | [–]

Great, thanks for the pointers! I've tried momentum trick before and it has helped. I'll try rmsprop.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact