Creating a Simple RAG in Python with AzureOpenAI and LlamaIndex. Part 2
/ 3 min read
Following my previous post, where I successfully embedded text from PDF documents and made queries based on that data, I’m now going to save that embedded and vectorised data into a database.
I’ll be using Azure Cosmos MongoDB, but you can also use a regular MongoDB that Atlas offers, since it’s essentially the same MongoDB.
PS: I LOVE Azure for this. They didn’t try to reinvent MongoDB like AWS did with their DocumentDB, which unfortunately has some significant missing features.
Refactoring
I’m going to split my scripts into two parts: ingestion and querying, so that we don’t re-ingest the data into our database every time. I’ll also move our setup into a separate file, including things like setting LLM and embedding models, loading the .env
file, and initialising the vector store, since we’ll need all of that in both functions.
As a result, we’ll have a setup file that looks like this:
import osimport pymongofrom dotenv import load_dotenvfrom llama_index.llms.azure_openai import AzureOpenAIfrom llama_index.embeddings.azure_openai import AzureOpenAIEmbeddingfrom llama_index.core import Settingsfrom llama_index.vector_stores.azurecosmosmongo import ( AzureCosmosDBMongoDBVectorSearch,)
load_dotenv()
Settings.llm = AzureOpenAI( engine="gpt-4o-mini", api_key=os.environ.get('AZURE_OPENAI_API_KEY'), azure_endpoint=os.environ.get('AZURE_OPENAI_ENDPOINT'), api_version="2024-05-01-preview",)
Settings.embed_model = AzureOpenAIEmbedding( model="text-embedding-3-small", deployment_name="text-embedding-3-small", api_key=os.environ.get('AZURE_OPENAI_API_KEY'), azure_endpoint=os.environ.get('AZURE_OPENAI_EMBEDDING_ENDPOINT'), api_version='2023-05-15',)
mongodb_client = pymongo.MongoClient(os.environ.get("AZURE_COSMOSDB_URI"))
store = AzureCosmosDBMongoDBVectorSearch( mongodb_client=mongodb_client, db_name="gambitai", collection_name="uk",)
Our previous main.py
will be renamed to ingest.py
, and we’ll keep only the embedding functionality:
import setupfrom llama_index.core import VectorStoreIndex, SimpleDirectoryReaderfrom llama_index.core.node_parser import SentenceSplitterfrom llama_index.core import StorageContext
def ingest(): documents = SimpleDirectoryReader("data").load_data() storage_context = StorageContext.from_defaults(vector_store=setup.store) VectorStoreIndex.from_documents( documents, transformations=[SentenceSplitter(chunk_size=1024, chunk_overlap=20)], storage_context=storage_context )
ingest()
That’s it for the ingestion part.
Querying
Previously, for querying, we had three lines:
- Index from documents
- Query Engine, and
- Response, which was a result of our query.
index = VectorStoreIndex.from_documents(documents)query_engine = index.as_query_engine()response = query_engine.query("Query me this...")
Since we’ve already ingested the data and it now lives in our database, we no longer need to set up a VectorStoreIndex from documents. Instead, we’ll make it point to our CosmosDB:
index = VectorStoreIndex.from_vector_store(vector_store=setup.store)query_engine = index.as_query_engine()
The QueryEngine initialisation remains the same.
And that’s it. Now when we query, we’ll get exactly the same result, except we’ll go straight to the database and won’t process our PDF documents again. I’ve put the content in a text file and am reading from it, so we’ll have something like this:
file = open("test/test.txt", "r")content = file.read()
response = query_engine.query( """ You are acting on behalf of the Gambling Commission of the United Kingdom. You need to go through that content and analyse if any of the sections, words, images or more break any rules or laws. Go through the content find what should not be there and report why. You can also suggest what should be there instead.
Here is the content: {} """.format(content))
file.close()
print(response)
As always, see the GitHub repo for the full code.