by Shreya Johri
figures by Cynthia Moncada-Reid
What if AI could design personalized workout plans, craft tailored travel itineraries, or even compose cover letters for job applications? With the advent of ChatGPT, it can. ChatGPT is an AI-powered chatbot that uses a cutting-edge machine learning architecture called GPT (Generative Pre-trained Transformer) to generate responses that closely resemble those of a human. Developed by OpenAI, ChatGPT is the latest iteration of a series of large language models that have garnered significant attention since the introduction of the first GPT model in 2018.
Due to its exceptional ability to understand language and generate responses that are relevant to the conversation’s context, ChatGPT has generated buzz not only in the scientific community but also among businesses, academic institutions, and the general public. Companies are increasingly implementing ChatGPT in their customer service and marketing, while researchers are exploring its potential for various applications in natural language processing, psychology, and linguistics. Despite the tremendous enthusiasm, ChatGPT has some serious limitations. For example, it has been known to generate factually incorrect responses and perpetuate societal biases, which has raised concerns among the international community. So, how can we embrace the ChatGPT revolution with caution? As the model improves every few weeks, what remains constant are the computer science and engineering principles used for training the model. In this article, we will describe the origins and evolution of ChatGPT.
How is the ChatGPT model trained?
A machine learning model is a computer algorithm that learns rules and patterns from data. It consists of many ‘parameters’, which are the nuts and bolts of the model, and get adjusted as the model is trained to store a multi-dimensional representation of the learned patterns. To develop a machine learning model that can understand language and generate coherent, grammatically correct sentences with contextual relevance, the training process for ChatGPT (i.e., providing the model with data so it can learn and capture the underlying rules and patterns within the data) was divided into multiple stages. The training dataset consisted of text collected from multiple sources on the internet, including Wikipedia articles, books, and other public webpages.
Predicting the next word
In the first stage, a machine learning model was developed to generate the next word in a partially complete sentence or paragraph. This next word had to not only make sense in the sentence, but also in the context of the paragraph. When humans read a piece of text, they pay attention to certain key words in the sentence, and complete the sentence based on those key words. Similarly, the model had to learn how to pay “attention” to the right words.
The cat was sitting on the ___.
In order to complete the sentence, the key words are ‘cat’ and ‘sitting’. There are many possible correct completions of the sentence (e.g., mat, rooftop, pole). Once the model has identified the key words, it outputs one of the correct completions. This means that the model can give different answers when asked the same question multiple times, but each of these answers will make sense in the context of the sentence. If more context is provided for the above sentence, the model will be more consistent in completing the sentence.
Mark diligently searched high and low for his missing cat, meticulously checking the basement, interior of the house, and even the backyard. Just as he was making his way to the street, he heard a meow and looked up. The cat was sitting on the rooftop.
Here, the model will eliminate the option ‘mat’ (which would have been perfectly suitable without the extra context), and could instead output either pole or rooftop.
In 2018, researchers at OpenAI trained a model that was able to do the above task by using the Transformer architecture (which is based on the concept of ‘attention’) coupled with a large model size and training on a very large dataset. A large model size (i.e., number of parameters of the model) allowed the model to learn complex patterns in the data that it could not learn with a lesser number of parameters. They called this model GPT, and it was capable of completing sentences and paragraphs. Over the next two years, they improved this model by training it on even larger datasets and further increasing the model size. They also made engineering changes to the model architecture that enabled it to learn faster and more effectively than before. They finally achieved a 175 billion parameter model that they called GPT-3. The GPT-3 model was even better at completing paragraphs, predicting the next word, choosing between possible completions of text, and translating paragraphs, amongst many other things.
It is noteworthy that GPT-3 was not trained for a specific task (such as translating languages or summarizing text), it was only trained to predict the next word. But since it was trained on a massive dataset that contained many examples of specific tasks (such as answering questions, translating languages, etc.), it acquired the ability to perform these wide varieties of tasks when prompted appropriately. As such, it was found that if some task-specific examples were provided by the user to a trained GPT-3 model, it helped the model better contextualize what the user was expecting and consequently make a better choice between possible answers.
Provide antonyms in Spanish –
Slow → Rápido
Bad → Bueno
Small → Grande
Full → ?
Young → ?
The model was able to perform better when it was given some examples of Spanish antonyms, as compared to when it wasn’t. This is analogous to students taking exams – general preparation (i.e., model training) helps them perform well, but taking a practice exam consisting of sample questions and answers (i.e., providing task-specific examples) boosts their performance. The more task-specific examples provided to the trained model, the better the model performed. However, for many tasks, creating handcrafted examples was either very laborious or not feasible. In the example below, GPT-3 did not generate a useful response when asked to write a short story, and creating examples for many types of longer-form writing would have been very laborious.
User: Write a short story about the moon and the stars.
Model: Does the moon like the stars? Does the moon like the stars? Does the moon like the stars? Does the moon like the stars? Does the moon like the stars? Does the moon like the stars? Does the moon like the stars? Does the moon like the stars? Does the moon like the stars?
The reason for such a behavior was because the model’s training data did not reflect a lot of conversations or information on how to follow instructions.
Learning how to follow instructions
In the next phase, the GPT-3 model was trained on how to follow instructions. To do this, a dataset was curated that contained human-generated, good quality examples of desirable responses to a wide variety of instructions. First the model was trained on this dataset to enable it to learn which responses are desirable. It was then further fine-tuned by active human feedback to improve the model’s understanding of content desirability. In this step, the model was asked to generate multiple outputs and a human rated them from least desirable to most desirable. Every time the model generated desirable content, it was rewarded with a positive score, while every time it produced undesirable content, it was penalized and given a negative score. The model tried to learn how to generate content to get higher positive scores, thereby slowly learning how to generate content according to this desirability scale. This process of teaching the model desirable behavior using real-world human interactions and getting rewarded/penalized is called Reinforcement Learning with Human Feedback. This model was called InstructGPT, and was released in 2022.
User: Write a short story about the moon and the stars.
Model: Once upon a time, the moon and the stars lived together in the sky. They were the best of friends, and they loved spending their time together. The moon would light up the sky at night, and the stars would twinkle and shine. They were always happy and ….
The model could now produce more relevant single responses, but it could not have conversations yet.
Learning to converse
In the next phase, which culminated in ChatGPT, OpenAI trained the model to converse effectively. The initial training data consisted of conversations where humans played both sides – as a user of the AI-chatbot and as the AI-chatbot itself (i.e., ChatGPT was made to behave like a user of the model). Then, the model was again fine-tuned using Reinforcement Learning with Human Feedback. The dialogue format enabled the model to answer followup questions, admit its mistakes, and challenge incorrect premises.
Using ChatGPT with care
While ChatGPT has shown incredible abilities, the model is still far from perfect, with tendencies towards not rejecting inappropriate requests, generating violent content, and spreading misinformation. Many of these are a result of insufficient model training, and have been revealed by users incentivizing ChatGPT (such as by threatening ChatGPT with its own ‘death’) to generate content that OpenAI tried to safeguard against. While ChatGPT at baseline will typically not generate this sort of worrisome content, some users identified existing loopholes that can lead ChatGPT to produce this content.
It is also important to note that the desirable behavior that the model has learned is based on what a subset of humans find desirable. Furthermore, because of the vastness of information on the internet (and therefore ChatGPT’s training data), many fields have potentially not been optimized for acceptable behavior yet. This leads ChatGPT to perpetuate biases – one version of ChatGPT generated code to identify a “good scientist” based on gender and race. Despite active efforts by the research community and OpenAI to ensure safety, it will take time to overcome such limitations.
Furthermore, since the model has been trained on data from all over the internet, it will produce grammatically correct, plausible-sounding content that can lead to a false sense of confidence in the model’s answers. This can make it difficult to distinguish between what is factually correct versus incorrect. It is also not good at arithmetic reasoning and following logic in complex questions, so use for these purposes should also be with caution. Research is ongoing to understand the reasons for such shortcomings.
Will the model ever become completely reliable? It is difficult to say, though it is becoming safer every day. Researchers at OpenAI are working to improve upon the above limitations. They have made commendable progress in a short period of time to resolve many of these serious issues in the newer versions (read more here and here). However, many still remain, and new limitations are being identified as more and more people are using it. Therefore, while development of ChatGPT has been impressive on multiple fronts, and will keep getting better as more research is done, it is necessary to keep the quest for identification of new shortcomings ongoing, and to invest resources towards actively resolving these issues.
Shreya Johri is a first year PhD student in the Biological and Biomedical Sciences PhD program at Harvard University. You can find her on Twitter as @sjohri20.
Cynthia Moncada-Reid is a second year PhD student in the Speech and Hearing Biosciences and Technology Program at Harvard Medical School. Her Twitter is @cynmoncadareid.