Retrieval Augmented Generation (RAG) is an important component in Generative AI. It allows important context to be included with the prompt to the LLM.
In this blog post, I am going to build a simple RAG application using Oracle 23ai Vector Search and OpenAI LLM. I am using Oracle 23ai as a vector store here.
I will show you all the steps one by one in Jupiter Notebook.
1. First Import the libraries and modules that we need for this application.
2. This next code snippet defines the function to include metadata with the chunks.
3. Loads the environment variables and connects to Oracle Database 23ai with the credentials and connection string. I have Oracle 23ai installed locally on my laptop.
1. Load the document ( I downloaded oracle-database-23c-new-features-guide.pdf) in the same directory to use it for this application.
2. Transform the document to text
3. Split the text into chunks
Add metadata such as id to each chunk for the database table.
4. Set up Oracle AI Vector Search and insert the embedding vectors – I am using OpenAI embedding here to embed the chunks as vectors into Oracle Database 23ai.
5. Build the prompt to query the document:
take user question:
Set up OpenAI LLM to generate your response I used the GPT-3.5-turbo model, you can use any LLM model.
Builds the prompt template to include both the question and the context, and instantiates the knowledge base class to use the retriever to retrieve context from Oracle Database 23ai.
6. and 7. The last 2 steps are to Invoke the chain.
This is the key part of the RAG application. It is the LangChain pipeline that chains all the components together to produce an LLM response with context.
The chain will embed the question as a vector. This vector will be used to search for other vectors that are similar. The top similar vectors will be returned as text chunks (context).
Together the question and the context will form the prompt to the LLM for processing. And ultimately generating the response. See the below code.
The code above sets up a processing pipeline where user_question is processed sequentially by retriever, prompt, LLM, and StrOutputParser, with each step performing some transformation or analysis on the input. The final result is stored in the variable response.
1. {“context”: retriever, “question”: RunnablePassthrough()}:
This is the first step in the chain. It involves a dictionary with keys “context” and “question” mapped to some objects or functions named retriever and RunnablePassthrough() respectively.
2. | prompt:
The | operator is used to chain the output of the previous step with the prompt object or function. This suggests that the output of the first step will be passed as input to the prompt.
3. | LLM:
Similarly, the output of the previous step is passed as input to llm.
4. | StrOutputParser():
Finally, the output of llmOCI is passed through StrOutputParser.
5. response = chain.invoke(user question):
This line invokes the entire chain with the user question as input and assigns the output to the variable response.
We are done here and you can see the output, it provides answers from PDF to the question “Tell me more about AI vector search”
You can use any other question related to PDF or you can upload any new PDF and ask questions related to that PDF.