Training Deep Neural Networks

Published: 09 Oct 2015 Category: deep_learning

Tutorials

Popular Training Approaches of DNNs — A Quick Overview

https://medium.com/@asjad/popular-training-approaches-of-dnns-a-quick-overview-26ee37ad7e96#.pqyo039bb

Activation functions

Rectified linear units improve restricted boltzmann machines (ReLU)

Rectifier Nonlinearities Improve Neural Network Acoustic Models (leaky-ReLU, aka LReLU)

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (PReLU)

Empirical Evaluation of Rectified Activations in Convolutional Network (ReLU/LReLU/PReLU/RReLU)

Deep Learning with S-shaped Rectified Linear Activation Units (SReLU)

Parametric Activation Pools greatly increase performance and consistency in ConvNets

Noisy Activation Functions

Weights Initialization

An Explanation of Xavier Initialization

Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?

All you need is a good init

What are good initial weights in a neural network?

RandomOut: Using a convolutional gradient norm to win The Filter Lottery

Batch Normalization

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift(ImageNet top-5 error: 4.82%)

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks

Loss Function

The Loss Surfaces of Multilayer Networks

Optimization Methods

On Optimization Methods for Deep Learning

On the importance of initialization and momentum in deep learning

Invariant backpropagation: how to train a transformation-invariant neural network

A practical theory for designing very deep convolutional neural network

Stochastic Optimization Techniques

Alec Radford’s animations for optimization algorithms

http://www.denizyuret.com/2015/03/alec-radfords-animations-for.html

Faster Asynchronous SGD (FASGD)

An overview of gradient descent optimization algorithms (★★★★★)

Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters

Writing fast asynchronous SGD/AdaGrad with RcppParallel

Regularization

DisturbLabel: Regularizing CNN on the Loss Layer [University of California & MSR] (2016)

Dropout

Improving neural networks by preventing co-adaptation of feature detectors (Dropout)

Regularization of Neural Networks using DropConnect

Regularizing neural networks with dropout and with DropConnect

Fast dropout training

Dropout as data augmentation

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

Improved Dropout for Shallow and Deep Learning

Gradient Descent

Fitting a model via closed-form equations vs. Gradient Descent vs Stochastic Gradient Descent vs Mini-Batch Learning. What is the difference?(Normal Equations vs. GD vs. SGD vs. MB-GD)

http://sebastianraschka.com/faq/docs/closed-form-vs-gd.html

An Introduction to Gradient Descent in Python

A Variational Analysis of Stochastic Gradient Algorithms

The vanishing gradient problem: Oh no — an obstacle to deep learning!

Gradient Descent For Machine Learning

http://machinelearningmastery.com/gradient-descent-for-machine-learning/

Accelerate Training

Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices

Image Data Augmentation

DataAugmentation ver1.0: Image data augmentation tool for training of image recognition algorithm

Caffe-Data-Augmentation: a branc caffe with feature of Data Augmentation using a configurable stochastic combination of 7 data augmentation techniques

Papers

Scalable and Sustainable Deep Learning via Randomized Hashing

Tools

pastalog: Simple, realtime visualization of neural network training performance