What are Large Language Models (LLMs) and How Do They Work?

Generative AI has changed the game, and now with advances in large language models (LLMs), AI models can have conversations, create scripts, and translate between languages. Let’s dive in!

We’ve all agonized and overanalyzed a text message or email to figure out if the sender is upset or not. What if AI tools could help you figure it out? I asked ChatGPT:

Is this sentence happy or angry: “I don’t want to go outside today because it’s too hot.”

It responded: “The sentence conveys a negative sentiment, suggesting reluctance or dissatisfaction with the idea of going outside due to the heat. While it doesn’t explicitly express anger, it does express a negative emotion, likely leaning towards a feeling of annoyance or discomfort rather than happiness.”

This is called sentiment analysis, and it’s just one of the many abilities of large language models (LLMs). While it might still be hard for these models to pick up on nuance or the passive-aggressive undertones we sometimes add to our messages, they’re pretty skilled at analyzing and understanding the emotions in our text.

So…what exactly are large language models, and how can we use them to understand ourselves better?

Table of Contents

Is Tech Right For you? Take Our 3-Minute Quiz!

You Will Learn:

☑️ If a career in tech is right for you

☑️ What tech careers fit your strengths

☑️ What skills you need to reach your goals

Take The Quiz!

What is a Large Language Model (LLM)?

Large language models (LLMs), a kind of generative AI, are a type of foundation model that focuses on generating language. Foundation models are machine learning models trained to receive natural language inputs (or prompts), and then generate an output. Generative AI utilize different types of foundation models for content creation; for example, if you give a large language model the prompt to “Create a TV script about a group of six friends living in New York City”, it wouldn’t regurgitate an episode of Friends (or its predecessor, Living Single). Instead, a large language model would create a new show, dependent on what data it had been trained on.

Foundation models are based on neural networks and can be trained for different purposes, like generating text, images, and audio. In the case of large language models, they’re typically trained on massive amounts of data so they can specifically generate text and focus on language tasks.

You’ve completed language tasks many times already, even if you didn’t realize you were doing it. In the early 2000s, you might’ve used Ask Jeeves, a search-engine that relied on asking natural-language questions and getting an answer from its well-dressed mascot, Jeeves. If you wanted to translate a book from one language to another, you could go to the original source and copy and paste it into Google Translate. With the advancements in AI, machine learning, and deep learning, large language models can do all of this — and more, with increased accuracy.

LLMs like OpenAI’s ChatGPT answer your questions within the platform. Because of the way the AI is trained on existing data available online, you can prompt ChatGPT with a statement like “Translate the first sentence of the book The Little Prince into Spanish,” and the LLM will find the content and generate the translation without you having to provide the original content to be translated. LLMs can perform tons of different language tasks, including helping developers with writing and debugging code.

How Do Large Language Models Work?

Large language models are trained on massive datasets. They work by using deep learning techniques to process, understand, and generate natural-sounding language. To understand how LLMs do this, we can examine a few key terms: natural language processing (NLP), tokens, embeddings, and transformers.

Natural Language Processing

Natural language processing (NLP) is a branch of AI and a machine learning technology that gives computers the ability to “understand” natural human languages. With NLP, computers can process, analyze, and interpret written and spoken language. Simply put, systems wouldn’t be able to understand any input text or generate any output data if they weren’t able to process the language requesting the data in the first place.

Natural language processing doesn’t start and end with commonly spoken languages like English, Spanish, or French. It also works with programming languages like JavaScript and Python, making it possible for AI systems, like GitHub Copilot, to generate and interpret code as well.

Tokens

For LLMs to process text data, they have to know how to process natural language. A question like, “When’s the first day of summer?” makes sense to you, but it might as well be gibberish to an AI model without NLP. To make sense of this sentence, AI models have to break the data down into smaller units called tokens.

Depending on the use-case, tokens can represent a single character, a few characters, or an entire word. No matter what language you speak, there are pre-existing building blocks for processing language. We use letters, characters, pictograms, sounds, words, and phrases that we then build into larger thoughts and sentiments. To try and copy the way our brains work, AI models rely on tokens to process language.

Tokens are common sequences of characters found in a set of text, and they bridge the gap between what humans understand and input as natural language and a format that the model can process. By breaking words into tokens, AI models can recognize patterns not only in the letters and words themselves, but also start to make associations for their context and relationships to one another.

Embeddings

In large language models, embeddings are the representations of tokens in a vector space (mathematical models that make it possible to represent complex ideas or objects). In this vector space, each dimension corresponds to a feature or attribute of the language in-question. Microsoft describes embeddings as “the way that the model captures and stores the meaning and the relationships of the language, and the way that the model compares and contrasts different tokens…” Simply put, embeddings help models understand the semantics (meaning) and syntax (arrangement) between tokens.

Words that have similar meanings are represented by vectors that are closer together. For example, in the embedding space, vectors for “cat” and “dog” would be closer to each other than to the vector for “car”, despite the spellings for “cat” and “car” being similar. For syntax, embeddings might capture relationships between different forms of a word:

  • Verb tenses (look, looked, looking)
  • Singular-plural pairs (dog, and dogs)
  • Comparative-superlative pairs (good, better, best)
  • Relationships between different parts of speech (run, and runner)

Because embeddings tend to group tokens that have similar semantics and syntax, that makes them especially useful in natural language processing.

With natural language processing, large language models can now both take huge steps towards understanding and generating new, human-like text in the blink of an eye — with a boost from transformer models.

Transformer Models

Transformers are the actual neural network used in modern LLMs. A neural network is a machine learning model that makes decisions by trying to copy the complex way human brains process information. The transformer neural network deals specifically with sequential data, or data that comes in order (like a sentence!) which makes them especially useful for LLMs. While older generative AI models would process each word in a sequence separately and then in order, transformer models are able to process the entire sequence (or sentence) at the same time.

Neural networks work in a layered way and set up an intricate system of algorithms (essentially a set of rules) so they can process data, learn, and then improve. This system — plus the transformers’ ability to process entire sequences at once — makes it possible for transformers to track relationships within a sequence and learn its context and meaning.

Transformers have revolutionized natural language processing, and give LLMs like ChatGPT and Gemini the ability to respond to prompts and generate text faster and more efficiently than ever before.

The TL;DR: Tokens are characters that computers use to understand our natural human language. Embeddings help AI models understand the meaning and relationship between these tokens. By using tokens and embeddings, models have natural language processing abilities, making it possible for LLMs to use transformer models to process, understand, and generate text.

Is Tech Right For you? Take Our 3-Minute Quiz!

You Will Learn:

☑️ If a career in tech is right for you

☑️ What tech careers fit your strengths

☑️ What skills you need to reach your goals

Take The Quiz!

Types of Large Language Models

Types of large language models are usually dictated by their training process. While some models don’t need training data for specific tasks, others are fine-tuned on specific data to perform specific tasks or functions. The common types of LLMs are: zero-shot, fine-tuned (domain-specific), and multimodal.

Zero-Shot

A zero-shot model is an LLM that can perform tasks that it hasn’t been explicitly trained to do. Instead of using supervised learning or labeled examples, these models use the relationships and patterns they have learned to make their predictions or perform their tasks. An example of a zero-shot model is GPT-3. The accuracy of the outputs can vary, but these models can generate text and perform tasks that they haven’t seen before.

Imagine you’re a trained golfer, and you know your job is to use clubs to get the golf ball into the hole. Now, someone plops you on a basketball court, and you’ve never seen basketball played a day in your life. But, because of what you’ve learned about golf, you could make the connection that this (basket)ball is also supposed to go in the hole (hoop). This is how zero-shot models work – they can perform new tasks without needing a specific training example for it.

Fine-Tuned or Domain-Specific

In contrast, a fine-tuned or domain-specific LLM is exactly that – a model that has gone through extra training on specific data to complete specific tasks. These models first go through pre-training on large, general datasets, then they’re trained on domain-specific data so the model can learn more nuanced representations for performing domain-specific tasks. For example, consider OpenAI’s Codex, a code generation tool. Based on GPT-3 — the zero-shot model — Codex is a domain-specific model that’s been fine-tuned for use in programming applications.

Multimodal

LLMs were originally developed for language tasks, but they’ve since developed into multimodal models that work across different data types. Multimodal models have been trained for text and images. These trained models, like GPT-4, can process both text and image inputs and output text. They’re especially useful in image captioning and text-image retrieval — retrieving relevant images or text based on a prompt that includes both text and visuals.

What are Large Language Models Used For?

Everything! Okay, this could be a bit of a stretch, but the reason that LLMs are becoming so popular is because of the wide range of use cases in the real-world. Large language models are used for:

  • Text generation: The ability to generate language or text in response to prompts, for example, writing short stories or guessing the next word to complete a sentence.
  • Code generation: The ability to generate code from natural language prompts. LLMs can code in programming languages like Python and JavaScript while also generating code for web design projects.
  • Translation: When LLMs are multilingual — trained in multiple languages — they can be used for translations from one language to another.
  • Sentiment analysis: The process of analyzing and understanding the sentiment (or emotion) in a text to determine whether it expresses positive, neutral, or negative feelings toward a particular subject.
  • Conversational AI assistants: AI chatbots are being deployed to enhance user experience across various tasks and activities, often answering questions through natural language conversations.
  • Content summarization: LLMs can summarize long pieces of text — articles, documents, books, etc. — and highlight its main points and relevant information.

Any industry could probably from utilizing a LLM to benefit the organizations in at least one way, if not more. Whether it’s healthcare, finance, entertainment, or retail, LLMs help streamline and automate tasks, provide customer support and assistance, personalize recommendations and suggestions, and offer data-driven decision-making, freeing up human time to focus on other tasks or just making life easier.

Is Tech Right For you? Take Our 3-Minute Quiz!

You Will Learn:

☑️ If a career in tech is right for you

☑️ What tech careers fit your strengths

☑️ What skills you need to reach your goals

Take The Quiz!

Examples of LLMs

You probably had heard of GPT before you clicked on this article. OpenAI’s GPT is probably the most popular LLM on the market, if not the most ubiquitous in the cultural zeitgeist. Just two months after the launch of their generative AI tool, ChatGPT already had 100 million monthly active users! But just because GPT is the most popular LLM doesn’t mean it’s the only one. There are dozens of available LLMs, and countless more in the making. Here’s a quick look at what’s available right now:

GPT

In 2018, OpenAI introduced its first version of GPT. It stands for “Generative Pre-trained Transformer” and refers to a family of LLMs. They’re trained on large amounts of data and, as you may have guessed, are based on the transformer neural network architecture. There have since been multiple versions of GPT, and each version gets faster and more powerful. As of early 2024, the latest model is GPT-4. The multimodal model has recently been integrated with DALL-E, OpenAI’s text-to-image generative tool. This makes ChatGPT capable of both text and image generation, although this is a paid upgrade at time of publishing.

Gemini

In early 2024, Google renamed its LLM from Bard to Gemini. Gemini is a family of large language models available in different sizes — nano, pro, and ultra. While Google’s other LLMs, LaMDA and PaLM 2, used to power the Bard/Gemini chatbot, it has since been replaced by the multimodal Gemini LLM. Gemini models generate and focus on text and language tasks, but they can also handle images, audio, video, and code.

LLaMa 2

LLaMa 2 is Meta’s (previously Facebook) transformer-based LLM that was released in 2023. Compared to its LLaMa 1 predecessor, this family of LLMs is partially “open source.” While its starting code and model weights are accessible to researchers and developers, there are licensing restrictions. LLaMa 2 has also been pre-trained on 40% more data and offers smoother conversations for its AI models than its predecessor.

BERT

The AI researchers at Google developed BERT or “Bidirectional Encoder Representations from Transformers.” This LLM is also based on the transformer architecture and has been fine-tuned for specific tasks, like text classification, answering questions, and sentiment analysis. Unlike GPT, BERT is truly open-source, meaning that researchers and developers can access its transformers architecture, pre-trained weights, and code implementations.

LLMs are Generating the Words of the Future

For a long time, artificial intelligence was smart, but not responsive. AI models were trained to do a particular job with defined rules. Generative AI has changed the game, and now with advances in LLMs, AI models can have conversations, create scripts, and translate between languages. They can use sentiment analysis to try and figure out whether a customer’s words are happy or upset.

LLMs are revolutionizing business operations, but they’re doing more than that. They’re changing how people approach different tasks in their everyday lives. And as a developer, it’s not too late to jump into the world of generative AI and start taking advantage of its tools. Start by jumping into our Break Into Tech program to learn some of the most in-demand skills for AI careers. And now we even have a dedicated course on building complex web applications with Generative AI!

Author Image

Jouviane Alexandre

After spending her formative years in the height of the Internet Age, Jouviane has had her fair share of experience in adapting to the inner workings of the fast-paced technology industry. Note: She wasn't the only 11-year-old who learned how to code when building and customizing her MySpace profile page. Jouviane is a professional freelance writer who has spent her career covering technology, business, entrepreneurship, and more. She combines nearly a decade’s worth of experience, hours of research, and her own web-building projects to help guide women toward a career in web development. When she's not working, you'll find Jouviane binge-watching a series on Netflix, planning her next travel adventure, or creating digital art on Procreate.