A few days ago, I discovered two Twitter threads from Simon DeDeo and Dagmar Monett. They quote a recent publication by Dacrema et al., which suggests that there are several problems in Deep Learning research:

In this work, we report the results of a systematic analysis of algorithmic proposals for top-n recommendation tasks. Specifically, we considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced with reasonable effort. For these methods, it however turned out that 6 of them can often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or graph-based techniques. The remaining one clearly outperformed the baselines but did not consistently outperform a well-tuned non-neural linear ranking method. Overall, our work sheds light on a number of potential problems in today's machine learning scholarship and calls for improved scientific practices in this area.

Dacrema et al. point out two major problems in today's research practice in Applied Machine Learning, namely the reproducibility of results and the choice of baselines. As Dagmar Monett already mentioned in her Twitter thread, both issues are neither new nor limited only to the Machine Learning community.

I may address the reproducibility issue in a separate blog post since I suspect there might be a deeper problem in academia responsible for this. I would just like to say that I have already noticed from some doctoral students that they have to fulfill a certain amount of publications from their doctoral supervisors, which certainly does not lead to a better quality of their works.

Regarding the choice of baselines, Dacrema et al. suspect that this issue is not only limited to Machine Learning in recommendation systems (or Machine Learning for behavior in general). Most creativity in modern Deep Learning research is often used for finding fancy names for tiny improvements to long-established ideas. Furthermore, when you heard about big milestones in the AI field in the past decade, the progress is usually due to advances of processors (see Moore's law) or the availability of data. From a theoretical point of view, the most interesting achievements of Deep Learning research in the past years are Generative Adversarial Networks (proposed by Goodfellow et al.) and Adversarial Examples. (Note: My view may be biased since I am not a Machine Learning researcher myself, however, I have been studying this topic for quite some time.)