Ml Cheatsheet

Ml Cheat Sheet Scikit
Azure Ml Cheat Sheet
Ml Cheatsheet
Uml-cheat Sheet Git
Ml Cheatsheet Logistic Regression

Download and print the MicrosoftML: Algorithm Cheat Sheet in tabloid size to keep it handy for guidance when choosing a machine learning algorithm. MicrosoftML machine learning algorithms. This section contains descriptions of the machine learning algorithms contained in the Algorithm Cheat Sheet. The algorithms are available in R or Python. Evaluation metrics help to evaluate the performance of the machine learning model. They are an important step in the training pipeline to validate a model. Before getting deeper into definitions. H y p e r p a r a m e te r Tu n i n g w i th m l r 3 tu n i n g::C H E AT S H E E T Class Overview The pac k age provide s a s e t of R 6 c las s e s whic h allow to (a). Python ML Cheat Sheet Share. Python Machine Learning Applications: Python Heuristic Search- AI: 9. Python Statement, Indentation, and Comments. Machine Learning tips and tricks cheatsheet Star. By Afshine Amidi and Shervine Amidi. Classification metrics. In a context of a binary classification, here are the.

A straight line function where activation is proportional to input ( which is the weighted sum from neuron ).

Function	Derivative
[begin{split}R(z,m) = begin{Bmatrix} z*m end{Bmatrix}end{split}]	[begin{split}R'(z,m) = begin{Bmatrix} m end{Bmatrix}end{split}]

Pros

It gives a range of activations, so it is not binary activation.
We can definitely connect a few neurons together and if more than 1 fires, we could take the max ( or softmax) and decide based on that.

Cons

For this function, derivative is a constant. That means, the gradient has no relationship with X.
It is a constant gradient and the descent is going to be on constant gradient.
If there is an error in prediction, the changes made by back propagation is constant and not depending on the change in input delta(x) !

Exponential Linear Unit or its widely known name ELU is a function that tend to converge cost to zero faster and produce more accurate results. Different to other activation functions, ELU has a extra alpha constant which should be positive number.

ELU is very similiar to RELU except negative inputs. They are both in identity function form for non-negative inputs. On the other hand, ELU becomes smooth slowly until its output equal to -α whereas RELU sharply smoothes.

Function	Derivative
[begin{split}R(z) = begin{Bmatrix} z & z > 0 α.( e^z – 1) & z <= 0 end{Bmatrix}end{split}]	[begin{split}R'(z) = begin{Bmatrix} 1 & z>0 α.e^z & z<0 end{Bmatrix}end{split}]

Pros

ELU becomes smooth slowly until its output equal to -α whereas RELU sharply smoothes.
ELU is a strong alternative to ReLU.
Unlike to ReLU, ELU can produce negative outputs.

Cons

For x > 0, it can blow up the activation with the output range of [0, inf].

A recent invention which stands for Rectified Linear Units. The formula is deceptively simple: (max(0,z)). Despite its name and appearance, it’s not linear and provides the same benefits as Sigmoid but with better performance.

Function	Derivative
[begin{split}R(z) = begin{Bmatrix} z & z > 0 0 & z <= 0 end{Bmatrix}end{split}]	[begin{split}R'(z) = begin{Bmatrix} 1 & z>0 0 & z<0 end{Bmatrix}end{split}]

Pros

It avoids and rectifies vanishing gradient problem.
ReLu is less computationally expensive than tanh and sigmoid because it involves simpler mathematical operations.

Cons

One of its limitation is that it should only be used within Hidden layers of a Neural Network Model.
Some gradients can be fragile during training and can die. It can cause a weight update which will makes it never activate on any data point again. Simply saying that ReLu could result in Dead Neurons.
In another words, For activations in the region (x<0) of ReLu, gradient will be 0 because of which the weights will not get adjusted during descent. That means, those neurons which go into that state will stop responding to variations in error/ input ( simply because gradient is 0, nothing changes ). This is called dying ReLu problem.
The range of ReLu is [0, inf). This means it can blow up the activation.

Ml Cheat Sheet Scikit

Yes You Should Understand Backprop, Karpathy (2016)

Azure Ml Cheat Sheet

Tanh squashes a real-valued number to the range [-1, 1]. It’s non-linear. But unlike Sigmoid, its output is zero-centered.Therefore, in practice the tanh non-linearity is always preferred to the sigmoid nonlinearity. [1]

Function	Derivative
[tanh(z) = frac{e^{z} - e^{-z}}{e^{z} + e^{-z}}]

Ml Cheatsheet

Pros

The gradient is stronger for tanh than sigmoid ( derivatives are steeper).

Cons

Tanh also has the vanishing gradient problem.

Softmax function calculates the probabilities distribution of the event over ‘n’ different events. In general way of saying, this function will calculate the probabilities of each target class over all possible target classes. Later the calculated probabilities will be helpful for determining the target class for the given inputs.

Uml-cheat Sheet Git

References

Ml Cheatsheet Logistic Regression

[1]	(1, 2)http://cs231n.github.io/neural-networks-1/

Comments are closed.

Function	Derivative
[S'(z) = S(z) cdot (1 - S(z))]