June 30 2015 / Day 7

Machine learning is arguably the most important aspect of Artificial Intelligence. It focuses on linear classification to minimize loss. This happens by making predictions, types of which include classification (binary, multiclass, and multilabel) regression, ranking, and structured prediction. A predictor is a function f that maps an input x to an output y.

We use binary classification to predict whether an email is spam or not spam. In the context of classification tasks, f is called a classifier, and y is called a label. Binary classification could technically be seen as a regression problem if the labels are −1 and +1.

The starting point of machine learning is the data, which is the main resource that we can use to address the information complexity of the prediction task at hand. The training data is a set of examples which forms a partial specification of desired behavior of the predictor. The framework works as shown: Training data –> feature extraction –> parameter tuning –> f. Learning is about taking the training data Dtrain and producing a predictor f, which is a function that takes inputs x and maps them to y = f(x). Given an input, the predictor outputs a set of (feature name, feature value) pairs. The predictor then changes this output to a feature vector. The general principle is that features should represent properties which might be relevant for predicting y. Each input x represented by a feature vector φ(x), which is computed by the feature extractor φ. Feature engineering uses domain knowledge such as natural language (words, parts of speech, capitalization pattern) and computer vision (HOG, SIFT, image transformation, smoothing, histograms) to learn the task; later, it automatically learns the features.

A weight vector specifies the contributions of each feature vector to the prediction.

Linear predictors are based on a feature extractor and a weight vector that calculates a score.

The perceptron algorithm was originally developed by Frank Rosenblatt in the late 1950s. Training patterns are presented to the network’s inputs; the output is computed. Then the weight vectors are modified. The key idea is mistake-driven learning.

A score is how confident we are in predicting a label. A margin is how correct we are in our prediction.

A support vector is a linear combination of feature extractors that had errors.

Leave a comment