"some subheader text."

deeplearning.ai: Improving Deep Neural Networks

posted , updated
tagged: courses


Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization

Course Resources

Week 1: Practical aspects of Deep Learning

Train / Dev / Test sets

Bias / Variance

Basic recipe for machine learning


L2-regularization relies on the assumption that a model with small weights is simpler than a model with large weights. Thus, by penalizing the square values of the weights in the cost function you drive all the weights to smaller values. It becomes too costly for the cost to have large weights! This leads to a smoother model in which the output changes more slowly as the input changes.

Dropout regularization

Data augmentation

Early stopping

Deep NN suffer from vanishing and exploding gradients

Weight Initialization for Deep Networks

Numerical approximation of gradients

Gradient Checking

- check our gradient computation functions - this helps debug backprop

Week 2:Optimizatoni algorithims

Mini-batch gradient descent

Exponentially weighted averages

V(t) = beta * v(t-1) + (1-beta) * theta(t)

bias correction in exponentially weighted averages:

v(t) = (beta * v(t-1) + (1-beta) * theta(t)) / (1 - beta^t)

Gradient descent with momentum

With Stochastic Gradient Descent we don’t compute the exact derivate of our loss function. Instead, we’re estimating it on a small batch. Which means we’re not always going in the optimal direction, because our derivatives are ‘noisy’. Just like in my graphs above. So, exponentially weighed averages can provide us a better estimate which is closer to the actual derivate than our noisy calculations.

RMSprop or Root mean square prop

Adam optimization algorithim

Learning rate decay

The problem of local optima

Week 3: Hyperparameter tuning, Batch Normalization and Programming Frameworks

Hyperparameter tuning

The Tuning Process

Using an appropriate scale to pick hyperparameters

Hyperparameters tuning in practice: Pandas vs. Caviar

Batch Norm

Normalizing activations in a network

Fitting Batch Normalization into a neural network

Why does Batch normalization work?

Three main reaons:

Batch normalization at test time

Multi-class Classification

Softmax regresssion

Training a Softmax classifier

Intro to programming frameworks

Deep learning frameworks


Yuanqing Lin interview