Q&A with PDF – Building RAG Application with Oracle 23ai Vector Search and OpenAI LLM

Written by Manoj Kumar | Aug 27, 2024 8:32:00 PM

What is RAG?

Retrieval Augmented Generation (RAG) is an important component in Generative AI. It allows important context to be included with the prompt to the LLM.

Why do we use RAG? What are the benefits, beyond what an LLM alone can deliver?

The RAG has access to information that may be fresher than the data used to train the LLM.
Data in the RAG’s knowledge repository can be continually updated without incurring significant costs.
The RAG’s knowledge repository can contain data that are more contextual than the data in a generalized LLM.
The source of the information in the RAG’s vector database can be identified. And because the data sources are known, incorrect information in the RAG can be corrected or deleted.

In this blog post, I am going to build a simple RAG application using Oracle 23ai Vector Search and OpenAI LLM. I am using Oracle 23ai as a vector store here.

Pre-requisites:

Oracle 23ai installed
Access to the OpenAI model
Jupyter Notebook and using the LangChain framework

Building an RAG application requires 7 steps as you can see in the above diagram.

Load your document.
Transform the document to text.
Chunk the text document into smaller pieces.
Using an embedding model, embed the chunks as vectors into Oracle Database 23ai.
Ask the question for the prompt, the prompt will use the same embedding model to vectorize the question.
The question will be passed to Oracle Database 23ai and a similarity search is performed on the question.
The results (context) of the search and the prompt are passed to the LLM to generate the response.

I will show you all the steps one by one in Jupiter Notebook.

Pre-requisite steps:

1. First Import the libraries and modules that we need for this application.

2. This next code snippet defines the function to include metadata with the chunks.

3. Loads the environment variables and connects to Oracle Database 23ai with the credentials and connection string. I have Oracle 23ai installed locally on my laptop.

RAG steps:

1. Load the document ( I downloaded oracle-database-23c-new-features-guide.pdf) in the same directory to use it for this application.

2. Transform the document to text

3. Split the text into chunks

Add metadata such as id to each chunk for the database table.

4. Set up Oracle AI Vector Search and insert the embedding vectors – I am using OpenAI embedding here to embed the chunks as vectors into Oracle Database 23ai.

5. Build the prompt to query the document:

take user question:

Set up OpenAI LLM to generate your response I used the GPT-3.5-turbo model, you can use any LLM model.

Builds the prompt template to include both the question and the context, and instantiates the knowledge base class to use the retriever to retrieve context from Oracle Database 23ai.

6. and 7. The last 2 steps are to Invoke the chain.

This is the key part of the RAG application. It is the LangChain pipeline that chains all the components together to produce an LLM response with context.

The chain will embed the question as a vector. This vector will be used to search for other vectors that are similar. The top similar vectors will be returned as text chunks (context).

Together the question and the context will form the prompt to the LLM for processing. And ultimately generating the response. See the below code.

The code above sets up a processing pipeline where user_question is processed sequentially by retriever, prompt, LLM, and StrOutputParser, with each step performing some transformation or analysis on the input. The final result is stored in the variable response.

The steps inside the chain:

1. {“context”: retriever, “question”: RunnablePassthrough()}:

This is the first step in the chain. It involves a dictionary with keys “context” and “question” mapped to some objects or functions named retriever and RunnablePassthrough() respectively.

2. | prompt:

The | operator is used to chain the output of the previous step with the prompt object or function. This suggests that the output of the first step will be passed as input to the prompt.

3. | LLM:

Similarly, the output of the previous step is passed as input to llm.

4. | StrOutputParser():

Finally, the output of llmOCI is passed through StrOutputParser.

5. response = chain.invoke(user question):

This line invokes the entire chain with the user question as input and assigns the output to the variable response.

We are done here and you can see the output, it provides answers from PDF to the question “Tell me more about AI vector search”

You can use any other question related to PDF or you can upload any new PDF and ask questions related to that PDF.

View full post