Learning RAG with LlamaIndex

5 min read4 days ago

Intro

With the advent of powerful foundational LLMs, many developers could harness this technology into new and novel applications. However, current LLMs have had limitations such as a hard maximum size of input. In addition, some problems users want to solve may require the LLM to be queried multiple times or in different ways to obtain the data or information they want. Finally, once users obtain the info they want, they may want to format this data in a specific way before presenting this information to their customers. Software tools and frameworks like LlamaIndex were developed to help developers address these types of limitations in order to create better applications for the end user.

Background

While the hard limit for the number of input tokens that LLMs accept has been increasing, LLMs are still restricted by the number of input tokens. Oftentimes, users want to provide these LLMs with more information than the LLM can receive. For example, if an LLM can only process 1,000 tokens, a user cannot ask these LLMs to answer a question about a 2,000 token article out-of-the-box. In this case, LLM may simply ignore the first 1,000 tokens of the article and only use the last 1,000 tokens to answer the provided query. One possible solution to handle long documents with more than 1,000 tokens is to split the document into 2 separate 1,000 token documents. Frameworks like LlamaIndex help users to process long documents in this manner easily and open up to more complex tasks.

In addition to the token limit, model retraining or fine-tuning may not be an option. As LLMs grow larger and larger, the cost of fine-tuning a model often also correspondingly increases. To combat this scaling issue, methods without retraining become desirable.

LLMs are typically trained on a large corpus of data collected from a variety of datasources. Rather than general questions, the expected tasks of applications will more likely be targeted questions on specific input documents. One such task is summarization of a document. Another one may be revision of a paragraph or article.

What’s LlamaIndex

LlamaIndex, short for Large Language Model Index, is an open-source project designed to simplify and enhance the integration of large language models (LLMs) with data sources.

It prepares data sources for LLMs. Some applications include Retrieval-Augmented Generation (RAG), that combines information retrieval and text generation to produce more accurate and contextually relevant responses. Current LLMs are quite powerful. LlamaIndex allows developers to customize their various forms of data to tap into the power of LLMs, and cater to more specific use cases. In this blog post, we will learn and demo some examples of LlamaIndex in real world applications.

What Problems Does LlamaIndex to Solve?

We’re here to list some of the problems that users may face:

Type of data and their integration problems. A user’s data is often scattered in various forms, for example, databases, APIs, and file systems. LlamaIndex can prepare and process these data into tokens for LLM’s inputs. These data are user specific, can then be used for training and inference. LlamaIndex is also able to store this data in the form of vector embeddings within an index for the purpose of querying the data.
When looking through a large corpus of data, the relevant information to your specific problem is often a small section in a large piece of text. In this case, the vast majority of your data is irrelevant to the problem at hand. As you can imagine, a straightforward approach to take might be to ignore the irrelevant data and only focus on the relevant data. Tools like LlamaIndex can potentially help developers to identify and rank the most relevant data needed for the task.
Once the most relevant documents have been retrieved from the index, LlamaIndex is then able to send the relevant data to an LLM in order to synthesize a new response for the user.

Vector Data Indexing and Storage

When data, such as structured, unstructured and programmatic data, are loaded into the LlamaIndex system. In the system, the data is used to create numerical representations, called vector embeddings, that are understood by LLM and LlamaIndex. LlamaIndex also converts queries into embeddings. The database indexes the data on these embeddings, so related information can be efficiently retrieved.

The Concept of Agents

In LlamaIndex, an “agent” is the engine doing the work, which is an automated reasoning and decision engine. An agent translates a user’s input or query and executes through internal decisions, then returns results. Here are examples of what an agent can do:

Breaking down a complex question into smaller ones
Choosing an external Tool to use and coming up with parameters for calling the Tool
Planning out a set of tasks
Adding memory modules to RAG

LlamaIndex Hug provides over 40 agent tools. You can also build your own customized agents. Here are some of the examples:

Fine Tune the Model

To serve the needs of customizable models based on specific use cases and data requirements. Fine tuning can update the large language model itself through training over a set of data to improve the model in more specific ways. It can benefit to improve the quality of outputs, reduce hallucinations, reduce latency and cost.

LlamaIndex has a set of toolkits to use the LLM directly in inference mode. But it also can train the model with external data to finetune.

Fine tuning can show the following benefits:

Adapting the style of a specific dataset
Learning more domain specific topics through additional training data, for example, a DSL such as SQL.
Reducing hallucinations and errors.
Distilling larger models into smaller models. GPT-4 → gpt-3.5

Fine tuning requires OpenAI API:

from llama_index.finetuning import OpenAIFinetuneEngine

finetune_engine = OpenAIFinetuneEngine(
 "gpt-3.5-turbo",
 "finetuning_events.jsonl",
 # start_job_id="<start-job-id>"  # if you have an existing job, can specify id here
)

You can view the finetuning tutorial in this Google Collab.

LlamaIndex Example Applications

You can use LlamaIndex for a number of user facing applications such as chatbot, question and answering, structured data extraction. For these applications, users can provide both structured or unstructured data. Tools like LlamaIndex can ingest the data and generate answers based on the provided data. LlamaIndex uses the LLM to transform the provided data across multiple steps into another more compact representation for other tasks. You can also use tools such as LlamaIndex to build agents which are objects that can take some sort of action. For example, an agent can ingest a prompt and take action in response. Overall, tools like LlamaIndex can help users more fully utilize LLMs to accomplish their goals.