Jan Wieczorek

Computing Science

Year of study:

5th

Gradient Descent – theory and practice: investigating the operation of an algorithm critical for development of modern AI systems, such as ChatGPT, under certain modelling assumptions.

Abstract

In the last few years, we have seen many remarkable developments in AI technology. This has been the first time many of us have experienced for ourselves the types of AI capabilities which previously were only discussed in science fiction.
However, before ChatGPT can write your party invitation in the style of a medieval troubadour or suggest a recipe based on the contents of your fridge in mere seconds, the Large Language Model that powers it must be trained on an enormous amount of data. During this training, the responses the AI model produces are improved step-by-step in an optimisation process called Gradient Descent. As you read this abstract, huge numbers of software engineers and researchers around the world are continuously tweaking various parameters of Gradient Descent. Their aim is to train the models faster and to make the resulting AI systems generate better responses. The method employed to make those adjustments is, fairly often, simply trial and error. Consequently, the best, known settings for Gradient Descent do not always have a good theoretical justification.
My research attempts to bridge some of this gap between theory and practice by examining how the empirical performance of Gradient Descent is affected by various ways of choosing its step sizes, when the error is modelled by a certain class of functions. The results will be useful for future research aiming to develop new theoretical models of Gradient Descent’s performance. It could therefore contribute towards improving the optimisation method used in a wide variety of applications from Machine Learning and AI to analysis of seismographic data.

Bio

Though I am currently in my 5th year of a Computing Science degree, my interests have always spanned a wide range of disciplines: from History, through Psychology to Mathematics. This has naturally led me to choose a research project whose subject is related to current advances in the field of Artificial Intelligence. I hope that my presentation, alongside demonstrating my own findings, will make some technical aspects of the fascinating research currently being conducted on AI systems easier to understand.