- The Why
- Posts
- Navigating the LLM Evolution: From Fine-Tuning to 'Context Jukeboxes'
Navigating the LLM Evolution: From Fine-Tuning to 'Context Jukeboxes'
Discover how companies amplify the intelligence of LLMs with their own data
The Quest to Boost LLM Intelligence
Generative AI, which has entered the public consciousness thanks to companies like ChatGPT and Midjourney, is a new frontier in computing as well as B2B and B2C solutions. Now that people know their software can speak their language, it has become almost a necessity that every tool and app have an AI component attached.
As a result, companies are working feverishly to amplify the capabilities of Large Language Models (LLMs) like GPT4. Imagine LLMs as a talented musician, perfecting their craft. With the right techniques (and direction) they can master any tune, new or old. They can reproduce, improvise, and adapt.
Let’s explore the primary methods you can use to get more out of an LLM. I’ll use the popular Llama 2 model as a super clear graphical aid for each method.
Fine-Tuning: Tailoring the Maestro's Repertoire
What it is: Fine Tuning is like introducing a classical musician to the world of jazz. You're refining their skills, guiding them to navigate the unique rhythms and beats of a new genre. They have a huge wealth of knowledge of musical theory and technique already, but you’re adding to that knowledge with specific new content.
Real-life Business Case: A bank wishes to offer a chatbot service that not only answers generic financial questions but specifically caters to its unique financial products and services. They take a base LLM that understands finance in general and fine-tune it on their services, products, techniques, and other proprietary data, ensuring the bot understands and promotes their specific offerings the right way, when the right questions are asked.
The Power: Fine-tuning provides businesses with a proprietary, laser-focused LLM, capable of addressing very specific needs. It's the difference between a generic solution and a bespoke one.

Data is ingested into Llama for Fine Tuning
How It’s Done:
Step 1: Evaluate the Base LLM - Start with a comprehensive assessment of your existing LLM to understand its capabilities and weaknesses.
Step 2: Identify Specific Needs - Pinpoint the unique requirements of your business that the current LLM doesn't cater to.
Step 3: Gather Relevant Data - Assemble a robust dataset that captures these specific needs. This might include unique business terminologies, product specifics, and other proprietary information.
Step 4: Retrain the LLM - Feed this data into the LLM, allowing it to adjust its algorithms and learn from the new information.
Step 5: Test & Refine - Deploy the fine-tuned LLM in controlled environments to ensure it's addressing the specific needs identified. Refine as needed based on feedback.
Retrieval Augmented Generation (RAG): The Maestro's Sheet Music Collection
What it is: RAG is like handing the musician a collection of sheet music to pick and choose from. Even if they haven't practiced a particular piece recently, or even at all, they can quickly reference it to deliver a stellar performance. They don’t need to learn all the details, just reference notes and words as needed.
Real-life Business Case: An IT company provides support to clients for a wide range of products. Instead of retraining their chatbot every time a new product is launched, they store detailed product information in a retrievable format. When a customer asks about the latest product, the LLM uses RAG to quickly fetch the right data and answer the query.
Likewise and in the same scenario, the company may not want Customer A to see the information that Customer B sees. While fine-tuning the LLM would have retrained the entire dataset, RAG allows specific context to only be available for specific purposes, so different customers get different responses.
Why it's a big deal: With RAG, businesses can ensure their LLMs are always equipped with the most updated and relevant information, making their responses timely and accurate.

Selected data is referenced alongside Llama without ingesting
How It’s Done:
Step 1: Understand Your LLM's Limitations - Recognize what current knowledge the LLM holds and where gaps exist.
Step 2: Prepare the New Data - Gather the new information you wish to make available for the LLM to retrieve. Organize it in a structured, easily searchable manner.
Step 3: Calculate Embeddings - Use your LLM to create embeddings for this new data, essentially translating it into a format the LLM can quickly understand and access.
Step 4: Store Vectors - Save these embeddings (or vectors) in a specialized database designed for rapid retrieval.
Step 5: Augment Responses - When the LLM needs to answer a query related to the new data, it fetches the relevant vectors to deliver a well-informed response.
3. Few-Shot Prompts: The Maestro's Improvisation
What it is: Few-shot prompts give the musician a starting note or rhythm, encouraging them to weave an entire composition around it, drawing from their extensive training.
Real-life Business Case: A marketing agency needs an LLM to generate creative taglines for different products. Instead of exhaustive retraining, they provide a few example taglines (the 'shots'). From there, the LLM crafts a variety of fresh, innovative taglines based on the given inspiration. The marketer can further improvise and massage the output via other prompts, and even via other LLMs as needed.
The Power: Few-shot prompts offer flexibility, allowing businesses to get diverse outputs from their LLM (or multiple LLMs) without needing extensive retraining. It's about getting maximum creativity with minimal input.

Llama is provided with specific prompts providing context and structure
How It’s Done:
Step 1: Determine the Objective - Clearly identify what you want the LLM to achieve. For example, generating creative taglines for a product.
Step 2: Prepare Example Prompts - Craft a few high-quality examples (the 'shots') that are representative of the desired output.
Step 3: Feed the Prompts to LLM - Provide these examples to the LLM, ensuring it understands the context and desired outcome.
Step 4: Generate Responses - Allow the LLM to use the examples as a foundation, building upon them to produce varied outputs.
Step 5: Refinement - Based on the results, refine the quality or specificity of your example prompts, or use further prompts, to get closer to your desired outputs on subsequent attempts.
Where It Gets More Fun
Obviously the actual process of doing this, and the science behind it, is far more complex than I could explain here. That said, there are new tools and innovations coming every day to expand on these principles. For example, check out the ‘finetune-embedding’ library from LlamaIndex, which allows you to use synthetic data to finetune the model that you use to generate embeddings so you can use them for RAG later!
Vector Databases — Where Embeddings Get Stored
Vector databases act like specialized storage systems for high-dimensional data. In the context of making LLMs smarter with our own data, think of these databases as vast libraries. Instead of books or articles, they store unique mathematical representations of information, known as vectors. When LLMs need to access specific chunks of data or knowledge, they query these vector databases, seeking the closest matching vectors, which correspond to the most relevant pieces of information. It’s like the Dewey Decimal system, except for every word and sentence rather than by book.

God I’m old.
Crafting Your LLM Symphony
As the maestro of your LLM journey, understanding and leveraging these techniques can set your business apart. Whether you're refining the repertoire, expanding the sheet music collection, or honing the art of improvisation, your LLM can be fine-tuned to deliver optimal results. Ready to elevate your business with LLMs? I'm here to guide the way.
Reply