Machine Learning
Artificial Intelligence
Statistical tools for learning from data in order to derive predictive insights. Pattern recognition from examples.
Terminology
Label - a correct output for an input - a fact or true answer
Input - a variable used to predict the label
Example - a set of inputs and a corresponding label
Model - a mathematics function that takes input variables and tries to approximate the label
Training - adjusting the model to minimise the error between the approximated label and the actual label
Prediction - the output of the Model on unlabelled data
Hyper-parameter tuning
Back-propagation
Epoch - one traversal through the entire training set
Gradient Descent - optimisation - the process of reducing the error
Batch size - the amount of data that the error is computed on
Weights - the parameters of a function that are optimised
Evaluation - periodically determining whether the model is good enough, based on a set of metrics
Training - the process of optimising the weights, including gradient descent and evaluation
Softmax - helps handle multiple labels - all values are normalised to sum to one
Over-fitting - the model does not generalise very well for unseen examples
Under-fitting - the model is too inaccurate
Feature Engineering - using insights to calculate or engineer extra features/inputs
Neuron - one unit of combining inputs
Activation Function
Hidden Layer - a set of neurons that operate on the same set of inputs
Features - transformations of inputs, typically using an Activation Function
Ground Truth
Error = ground truth value - prediction value
Root Mean Squared Error (RMSE) - for Regression - square the error (so it becomes positive) and take the mean, then square rooted
Cross-Entropy - differentiable error value for Classification - the log loss
Confusion Matrix - for evaluation of a model - True Positives (TP), False Positives (FP), False Negatives (FN), True Negatives (TN)
Accuracy - intuitive measure of skill for classifiers if dataset is balanced - the fraction that is correct (fails if dataset is unbalanced)
Precision - use when what you are trying to find is common; accuracy when a classifier is positive ; positive predictive value = TP / (TP + FP) (good if dataset is unbalanced)
Recall - use when what you are trying to find is rare; accuracy when the truth is positive ; true positive rate = TP / (TP + FN) (good if dataset is unbalanced)
Training Dataset
Validation Dataset
Test Dataset
Cross-validation - if Test Dataset is rare, use different splits of training and validation datasets
Dense features - continuous numbers; Neural Network is good for these
Sparse/Wide features - independent, discrete, categorical values; feature cross pairs; Linear models are good for these
ML Steps
1. Explore the data
2. Split data into train/validation/test datasets
3. Benchmark the performance to be obtained
Classes of Machine Learning
Supervised Learning
Learning from past examples to predict future values
Model Types
Regression
Label has a continuous real value
Classification
Label has a discrete set of values or classes
Datasets
What makes a good Dataset?
Positive examples
Negative examples
Negative examples that are near misses
Exhaustive coverage of examples
Examples of outliers - so they can be learned and handled gracefully
Neural Networks
Single neuron line function - w.x1 + w.x2 > bias ?
Optimisation - Gradient Descent - iteratively reducing the error of output from the label
Unsupervised Learning
Using unlabelled data to discover relationships between data
Clustering
Semi-supervised Learning
Applications of Machine Learning
Natural Language Processing (NLP)
The processing of any natural language in order to understand both its grammatical syntax and semantics
Computer Vision
Methods for acquiring, processing, analysing and reasoning about images or video sequences in order to extract meaningful/useful information that can be interpreted and acted upon as desired
Robotics
Deep Learning
Deep Neural Networks
Code Libraries
Python Libraries
PyTorch
FastAI
TensorFlow
TODO
Classical Machine Learning
TODO
AI Adoption Strategy
Preference 1 - Use pre-built AI services/models
Preference 2 - Customise pre-built AI services/models
Preference 3 - Create new models => rule of thumb: only when you have > 100k high-quality examples
Workflow Options
Kubeflow Pipelines
TODO
Feature Engineering
Tips
Have a reasonable hypothesis for why a specific feature may be relevant for the problem, otherwise discard it
The feature value must be known at the time when a prediction is needed - don't use historical data that was later determined - careful when training on data from a Data Warehouse!
Ensure the feature data is legal and ethical to use
Feature values need to be numerical WITH a meaningful magnitude; or at least representable in a numeric form with a vector representation...
Must have enough examples of each feature input value - e.g. at least 5 examples or samples; for real values you may need to group/bin them together
Discard values that are too specific - like a transaction id
One-hot encode categorical values - a vector/list representing each input category; only one item in the list has a value of 1 and the others are zero
Create a vocabulary in training pre-processing to create a vocab of keys
Don't mix magic numbers representing missing data (e.g. null or -1) with real data - perhaps have 2 values - one for whether the value was provided and another for the actual value (or zero if was not provided)
Use feature crosses - e.g. using intuition like a yellow car in New York is likely a taxi, so combine 2 features so a yellow car in another city is not misrepresented because of the training data from New York.
e.g. Bucketise Latitude/Longitude into 0.1 degrees and do a feature cross - essentially same as putting lat/long points onto grid cells
Use a wide and deep network if you have both dense and sparse features