Bài giảng Máy học nâng cao - Chương 6: Artificical Neural Network - Trịnh Tấn Đạt

Introduction ❖ What are artificial neural networks?  A neuron receives a signal, processes it, and propagates the signal (or not)  The brain is comprised of around 100 billion neurons, each connected to ~10k other neurons: 1015 synaptic connections  ANNs are a simplistic imitation of a brain comprised of dense net of simple structures  Origins: Algorithms that try to mimic the brain  Very widely used in 80s and early 90s; popularity diminished in late 90s.  Recent resurgence: State-of-the-art technique for many applica1ons

62 trang | Chia sẻ: thanhle95 | Lượt xem: 577 | Lượt tải: 1

Bạn đang xem trước 20 trang tài liệu Bài giảng Máy học nâng cao - Chương 6: Artificical Neural Network - Trịnh Tấn Đạt, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên

Trịnh Tấn Đạt Khoa CNTT – Đại Học Sài Gòn Email: trinhtandat@sgu.edu.vn Website: https://sites.google.com/site/ttdat88/ Contents  Introduction  Perceptron  Neural Network  Backpropagation Algorithm Introduction ❖ What are artificial neural networks?  A neuron receives a signal, processes it, and propagates the signal (or not)  The brain is comprised of around 100 billion neurons, each connected to ~10k other neurons: 1015 synaptic connections  ANNs are a simplistic imitation of a brain comprised of dense net of simple structures  Origins: Algorithms that try to mimic the brain  Very widely used in 80s and early 90s; popularity diminished in late 90s.  Recent resurgence: State-of-the-art technique for many applica1ons Comparison of computing power  Neural networks are designed to be massively parallel  The brain is effectively a billion times faster Applications of neural networks Medical Imaging Fake Videos Conceptual mathematical model  Receives input from sources  Computes weighted sum  Passes through an activation function  Sends the signal to m succeeding neurons Artificial Neural Network  Organized into layers of neurons  Typically 3 or more: input, hidden and output  Neural networks are made up of nodes or units, connected by links  Each link has an associated weight and activation function Perceptron  Simplified (binary) artificial neuron Perceptron  Simplified (binary) artificial neuron with weights Perceptron  Simplified (binary) artificial neuron; no weights Perceptron  Simplified (binary) artificial neuron; add weights Perceptron  Simplified (binary) artificial neuron; add weights Introducing Bias  Perceptron needs to take into account the bias o Bias is just like an intercept added in a linear equation. o It is an additional parameter in the Neural Network which is used to adjust the output along with the weighted sum of the inputs to the neuron. o Bias acts like a constant which helps themodel to fit the given data Sigmoid Neuron  The more common artificial neuron Sigmoid Neuron  In effect, a bias value allows you to shift the activation function to the left or right, which may be critical for successful learning.  Consider this 1-input, 1-output network that has no bias:  Here is the function that this network computes, for various values of w0: Sigmoid Neuron  If we add a bias to that network, like so: Having a weight of -5 for w1 shifts the curve to the right, which allows us to have a network that outputs 0 when x is 2. Simplified Two-Layer ANN One hidden layer Simplified Two-Layer ANN Optimization Primer  Cost function` Calculate its derivative Gradient Descent Gradient Descent Gradient Descent Optimization Backpropagation Backpropagation Activation functions  Bias (threshold) activation function was proposed first  Sigmoid and tanh introduce non-linearity with different codomains  ReLU is one of the more popular ones because its simple to compute and very robust to noisy inputs Sigmoid function  Sigmoid non-linearity squashes real numbers between [0, 1]  Historically a nice interpretation of neuron firing rate (i.e. not firing at all to fully saturated firing ).  Currently, not used as much because really large values too close to 0 or 1 result in gradients too close to 0 stopping Sigmoid function Tanh function  Tanh function squashes real numbers [-1, 1]  Same problem as sigmoid that its activations saturate thus killing gradients.  But it is zero-centered minimizing the zig-zagging dynamics during gradient descent.  Currently preferred sigmoid nonlinearity ReLU: Rectifier Linear Unit  ReLU’s activation is at threshold of zero  Quite popular over the last few years  Speeds up Stochastic Gradient Descent (SGD) convergence  It is easier to implement due to simpler mathematical functions  Sensitive to high learning rate during training resulting in “dead” neurons (i.e. neurons that will not activate across the entire dataset). Neuron Modeling: Logistic Unit ANNs  1 hidden layer Modeling Other Network Architectures Example  Image Recognition: 4 classes ( one-hot encoding) Example Neural Network Classification Example: Perceptron - Representing Boolean Functions Example: Perceptron - Representing Boolean Functions Example: Perceptron - Representing Boolean Functions  Combining Representations to Create Non-Linear Functions Example: MNIST data Example: MNIST data Neural Network Learning Perceptron Learning Rule Batch Perceptron Learning in NN: Backpropagation Cost Function Optimizing the Neural Network Forward Propagation Backpropagation Intuition Backpropagation Intuition Backpropagation Intuition Backpropagation Intuition Backpropagation Intuition Backpropagation: Gradient Computation Backpropagation Training  Training a Neural Network via Gradient Descent with Backpropagation Training a Neural Network Homework 1) Implement iris flower classification using neural network • Hint: - Using MLPClassifier from sklearn module https://www.python-course.eu/neural_networks_with_scikit.php - Keras model https://gist.github.com/NiharG15/cd8272c9639941cf8f481a7c4478d525