A relatively recent paper by Baidu (Dec 2017) that was covered in The Morning Paper has empirically demonstrated something fascinating that could have major implications on the management of deep learning projects— Deep Learning output errors decrease predictably as a “power-law” of the training set size:
(m is the number of samples in the training set, and Beta is usually between 0 and -1)
Insight 1: While there are some specific caveats about the way these particular research and analyses were conducted (some of which I mention later*) – the studies find that you can use relatively small amounts of data to accurately extrapolate the improvement in performance gained from adding X more data without any extra research (except for hyper-parameter sweeps to increase model capacity). This can help companies prioritize and execute only the necessary data efforts to get to the desired performance on time, as well as quantify how valuable it is to acquire and annotate X more data in each project.
Insight 2: The studies have demonstrated Insight1 in 4 different domains, with “real-world” datasets, such as Imagenet classification which is the closest to our application.
An example of the results can be seen in the images below.
Image-net classification (*it is suspicious to me they they didn’t show the graph beyond 2^9 samples per class):
Character Based Language Models:
Insight 3: Research achievements could potentially affect the following two domains
I would love to hear your thoughts about this, and especially those of you who have time to go deeper into this paper and gain additional insights.