Random Forests are one of the simplest yet most powerful Machine Learning techniques available today. Random Forests are an ensemble method, meaning that they combine information from multiple independent models to generate a consensus.
Hit '>Play' to see how Random Forests generate a diverse set of uncorrelated models from a single data set
Random Forests are also an example of a supervised learning approach, meaning that they are trained with labelled datasets. These datasets train (“supervise”) the algorithm to predict outcomes correctly. The use of a labelled target means that the model's accuracy can be measured as is "learns" and improves.
A Random Forest will produce multiple (perhaps tens of thousands) of separate decision trees. To ensure that these decision trees are as uncorrelated as possible a bespoke set of training data is created per tree. This is achieved via a technique called 'Bagging'. Bagging creates a dataset where some test samples are eliminated completely, while some are included multiple times. Bagging results in trees that are specialised at classifying certain types of data. Further specialization is achieved by randomly eliminating some features from the training data for each decision split in each tree.
The resulting set of trees can then collectively classify new input data. Each tree in the forest has a vote, and the final classification will be the one with the most votes. The diversification and specialization means that Random Forests offer much greater classification accuracy than single Decision Trees. Their enhanced accuracy stems from their ability to better generalise and the reduction in overfitting relative to single Decision Trees.
Random Forests are akin to 'The Wisdom of Crowds'. This concept was popularized by James Surowiecki in his 2004 book,The Wisdom of Crowds. This book illustrates how large, diverse groups have made superior decisions to individual experts.
According to Surowiecki, wise crowds have several key characteristics:
The crowd should be able to have a diversity of opinions.
One person’s opinion should remain independent of those around them (and should not be influenced by anyone else).
Anyone taking part in the crowd should be able to make their own opinion based on their individual knowledge.
The crowd should be able to aggregate individual opinions into one collective decision.