In this video we look at optimising our AI using the tools we developed in the last video. We host a competition between neural networks of different configurations. Networks with varying numbers of layers with different numbers of neurons in each. All the networks are trained on the same training data, we want to see which architectures perform better. This gives so an intuition for optimum topologies of networks. Unsurprisingly networks with more layers and more neurons per layer tend to perform better. But perhaps surprisingly the very best performing networks aren't the very largest. This leads us to explore the concepts of overfitting and the 'Curse of Dimensionality'. We observe that at some point adding more layers or more neurons makes the performance worse, not better.
Key takeaways here are the:
1) Importance of lots of diverse training data. Without plenty of training examples it is hard to tune a network with even a modest number of parameters.
2) Power of "Ensembles". Combining multiple, diverse networks all trained to perform the same task is a cheap way to get performance increases. As in nature diversity improves performance.
3) Requirement to perform lots of experimentation. AI and neural networks in particular are not "One and Done". Train lots of networks with a wide variety of hyperparameters (such as learning rate, number of layers, number of neurons, dropout etc.) Then fine-tune the most successful architectures to improve performance.