OpenAI’s ChatGPT is the worlds most popular AI – a specific type of artificial intelligence called an LLM. Assuming you don’t have your LLM degree yet, you won’t know what a large language model is – completely understandable. What is an LLM, and what does language model mean?
What does LLM stand for?
LLM stands for large language model. Popular LLMs include GPT-3, GPT-3.5 Turbo, GPT-4, PaLM 2, and Llama 2. These neural networks are what power AI chatbots like OpenAI‘s ChatGPT, Microsoft’s Bing Chat, and Google Bard.
Company | AI chatbot | LLM |
OpenAI | ChatGPT | GPT-4 or GPT-3* |
OpenAI | ChatGPT API | GPT-4 or GPT-3.5 Turbo |
Bard | PaLM 2 | |
Microsoft | Bing Chat | GPT-4 |
Meta | No chatbot** | Llama 2 |
Anthropic | Claude 2 | Claude |
*OpenAI’s GPT-3 model is the default for free users of the ChatGPT AI chatbot. Using GPT-4 requires a paid subscription to ChatGPT Plus or ChatGPT Enterprise.
**Meta does not have their own AI chatbot running on Llama 2. Instead, the language model is open-source, and third-party developers are encouraged to create their own interfaces (such as chatbots) for it. Chatbots are not the only uses case or type of interface for an LLM. However, HuggingChat and llama2.ai are clear public favourites.
What does LLM mean?
LLM (Large Language Model) refers to the model itself, which includes parameters and weightings (contextual understanding) and the algorithm used for NLP (natural language processing). The training data set is not strictly part of the LLM, and training can be a one-time or iterative process. ChatGPT itself was trained through a process called Reinforcement Learning from Human Feedback (RLHF), but the pre-training process is also not strictly part of the model – more so how the model was arrived at.
Essential AI Tools
Content Guardian – AI Content Checker – One-click, Eight Checks
Jasper AI
WordAI
Copy.ai
Writesonic
Language models use a deep learning architecture called transformer architecture. This is where the GPT (Generative Pre-trained Transformer) series inherits its name from.
A transformer model is a neural network involving bidirectional encoder representations, a self-attention mechanism, and word embeddings (tokens). One specific type of transformer model is auto-regressive, in the case of GPT-3, and GPT-4.
Transformer models are not exclusively used to create LLMs, but they are well suited for the task. One example of an earlier transformer model is BERT (Bidirectional Encoder Representations from Transformers), a 1B parameter model produced by Google in 2018. This is already considered very small by today’s standards.
Rick Merret of NVIDIA, the hardware manufacturer enabling the AI industry, explains that “Transformer models apply an evolving set of mathematical techniques, called attention or self-attention, to detect subtle ways even distant data elements in a series influence and depend on each other.”
Transformer architecture is the latest in a long line of recurrent neural network (RNN) technologies. LLMs supercede LSTM (Long Short-Term Memory) models, which OpenAI chief scientist Ilya Sutskever was working with prior to co-founding OpenAI in 2015.
What is an LLM?
An LLM is a type of AI for text generation. It takes an input text prompt and generates an output (also of text). So what’s the point?
Well the input could be just a sentence or two, with an output of hundreds or thousands of words. Not only can it speed up your writing process, it can summarize other peoples text, reformat data, help produce Word, PowerPoint, and Excel documents, and write in languages that you don’t speak yourself – like code!
Large language models are an example of natural language processing. The neural network of an LLM is trained on billions of words – broken down into tokens – and, through a process of machine learning, an artificial understanding of these words (and the relationship between them) is built up.
The degree of relational understanding is quantified as parameters. Where a token can be thought of as the neuron in a human brain, the parameters are the synapses – the connections in between. Without connections, you have a static database of information. With connections, you have a contextual understanding of that information – a neural network that can understand why you’re asking your questions, and even educate you about what you don’t know that you don’t know!
What can GPT do?
An LLM is capable of language translation, fluency in any programming language (if included in the training data), writing and text summarization, human-like conversation, classification, sentiment analysis, and inference beyond that of a search engine.
They do, however, require much more computational resources to run than a search engine. This because an AI is not simply an IR (Information Recall) system, but a generative one. AI creates new data. There is no guarantee that this exact data was not previously produced by someone somewhere sometime before – that would require checking against all human literature, including that not digitally archived, and on the internet outside of the world wide web. Neither AI not you have access to all that! All generative AI really means is that the responses are not pre-written. The datasets were used as training wheels, but now the LLM program is self-reliant.
Do large language models work?
Yes, large language models work! These artificial intelligence systems have already passed the bar exam, as well as engineering and doctoral exams. While the bachelor of laws title is still exclusive to humans, technically speaking, current AI could hold a law degree.
LLMs have a great many use cases, and new specific tasks and applications are discovered seemingly by the day!