At our event on the 22nd March, speaker Lars Malmqvist discussed how large language models, such as Google's GPT, may disrupt the world in the next few years.
Lars works in commercial technology and has a PhD in natural language programming and argues that these models are transformative and have the potential to disrupt various industries.
Our article is based from what was discussed at our event, so if you missed it this is a summary of what we discussed with Lars.
What is a Large Language Model?
First, it’s important to understand large language models and how they are made.
A large language model is a type of artificial intelligence (AI) that is trained on vast amounts of data to generate human-like language.
These models use deep learning algorithms to analyse and understand patterns in natural language, such as speech and text, and then generate their own language based on what they've learned.
Large language models are built on the Transformer architecture, which encodes input using a neural network and decodes it into a response using a separate neural network.
Training Large Language Models
To train large language models like GPT, billions of sentences from the internet are used to create a dataset where the model learns to predict the next word in a sentence.
This process is like a massive fill-in-the-blank exercise.
By following this process at scale, the models are capable of generating fantastic sequences of text from any starting prompt.
The model can quickly learn how to do something it hasn't been explicitly trained for if given some examples.
The Creation of ChatGPT
To create ChatGPT, two steps were taken.
- First, the model was fine-tuned using supervised learning with human-written instructions. This resulted in a model called Instruct GPT, which was better at following instructions but still not conversational.
- Then, reinforcement learning with human feedback was used to train the model to converse, resulting in ChatGPT.
This represents a technology shift that has been developing over the past few years, with the core technology being made around two and a half years ago. The potential impact of these models has become more apparent since the release of ChatGPT.
How Capable are Large Language Models?
So just how capable are these models?
A recent investigation into the effectiveness of ChatGPT was done through evaluating the performance of different versions of GPT on university exams.
The study revealed that the model does well in subjects where writing is the main focus, such as art, history and psychology, but historically has performed worse in science and maths whilst the new version is better at science and maths than the previous one.
The real-world test performance of the model is roughly equivalent to that of a bright first-year undergraduate, which is a respectable level of performance.
In some cases, the model can provide really good responses if asked in the right way, which many people have found out.
There are various other new models that have been released by Google, Facebook, and other companies, including some open-source alternatives that offer similar capabilities to GPT, but with smaller hardware footprints and better performance in some areas.
Applications of Large Language Models
Large language models have a range of applications across various industries and fields including productivity boosts in knowledge work, code generation, and education.
Large language models can be used to generate content such as news articles, social media posts, and product descriptions.
They can be useful in language translation, powering chatbots and virtual assistants to respond to natural language queries, analysing the sentiment behind a piece of text, classifying text into categories such as spam or non-spam emails and different topics in news articles or even to create personalised product or content recommendations.
Problems and Restrictions of Large Language Models
There are however several problems that large language models like ChatGPT face. One of these problems is that they tend to make some information up and cannot be fully trusted without fact-checking.
They can also have issues with logical and mathematical reasoning, are vulnerable to being subverted into doing things that are against their instructions and they can be repetitive and biassed.
There are also ethical and legal issues around liability and copyright that need to be considered when using these models.
There is also a limit to how much these language models can be scaled, and there may be diminishing returns beyond a certain point.
Large language models can be used to classify emails into different categories with almost perfect accuracy, based on the prompt given.
Whilst this core ability has been around for a while, it was only recently demonstrated to the world in an easy-to-use way with ChatGPT. Until then, these models could only be used effectively by structuring prompts in a particular way using prompt engineering.
However, the core technology remains the same. Thus, Lars Malmqvist believes that progress may taper off rather than lead to a complete transformation.
Emergent Abilities of Large Language Models
Despite these issues, large language models have the potential to be transformative as their capabilities continue to improve with each new version.
The development of large language models has been happening for about five years, starting with a paper called "attention is all you need" which introduced the Transformer architecture.
The focus has been on scaling up these models with more data and parameters, which has led to emergent capabilities like the ability to summarise text, write poems, and make coherent arguments. The release of GPT was a game-changer that showed the potential of these models.
Since then, there has been an explosion of generative models from various companies. The latest foundation models include GPT-3, which is more capable than the previous version.
In conclusion, large language models such as Google’s GPT may disrupt the world in the next few years.
There are many applications for these models and with the technology developing, emerging abilities come with it.
However, there are also many restrictions of large language models to bear in mind as well as a tapering off in how much capability you get for an extra level of scaling.