Question Answering: Building Conversational RAG using LangChain, Node and Redis

Photo by @fakurian on Unsplash

Table of Contents

Understanding Vector Stores
1. What is Vector Embedding?
What is a Question Answering System?
Talking with documents
Conclusion
References

Understanding Vector Stores

Before proceeding, it is important to understand some initial concepts surrounding vector storage:

What is a Vector Database?
Vector Embeddings

A Vector Database, or Vector Store, is a collection of data stored as mathematical representations, enabling the use of semantic search through vectors.

Fig. 1. Representation of a Vector Database and vector embeddings based on provided content

A vector is represented as an array of numbers (or positions) in a specific space. For instance, [x, y, z] represents an object in a three-dimensional space.

The aim of such storage is to organize objects based on their similarity. For example, a dictionary arranges words alphabetically, meaning you won’t find words starting with “Z” placed near those beginning with “A”.

Unlike traditional databases, where we query for specific strings and expect rows with corresponding columns (as in the SQL example below):

SELECT * FROM TABLE_NAME

In a vector store, searches are conducted using an embedding. This allows the system to find the vector most similar to the query, which itself is converted into a vector.

In a vector store, searches are conducted using an embedding. This allows the system to find the vector most similar to the query, which is itself converted into a vector. This approach is particularly valuable for applications utilizing large language models (LLMs) because it facilitates extremely fast searches. Furthermore, similar vectors are returned, providing valuable context to the model. For this reason, vector stores are commonly integrated with the concept of RAG (Retrieval Augmented Generation).

What is Vector Embedding?

By leveraging LLMs, documents and other files can be transformed into vectors using an embedding strategy. These vectors encapsulate a wealth of attributes and metadata, enabling semantic search and easy identification of relevant information — see Figure 1 for details.

What is a Question Answering System?

Question-answering is a subdomain of Semantic Search that enables searching based on concepts and ideas within a vast set of data. The system is provided with a natural language question, and it returns the most semantically similar pieces of information.

There are three types of question-answering (QA) systems:

Open-domain
Closed-domain
Extractive QA

The open-domain system, as the name implies, is implemented with the assistance of external resources. This could involve using information retrieval (IR) methods along with vector databases.

In a closed-domain system, external data is not supplied; instead, it relies on knowledge from past interactions and the data it was trained on to generate answers.

Finally, extractive QA is somewhat different; it allows users to ask questions and extract answers from a specific text. This approach necessitates utilizing the reading comprehension (RC) concept and can be combined with open-domain systems for improved results.

Fig. 2. Open-domain question system using IR and RC

Talking with documents

In this section, we will focus exclusively on the AI-related aspects of the application. Features such as Role-Based Access Control (RBAC) and authentication were incorporated, but we will not cover those here. You can clone the repository if you wish to explore the implementation details.

To implement our question-answering system using PDF documents, we first need to outline how the application will function using the concepts discussed above:

Fig.3. Application's /search-in-documents endpoint diagram

To perform a search in documents and ask questions about them, we have two endpoints for that:

POST /resources/docs
POST /genai/search-in-documents

On the /resources/docs endpoint, documents are uploaded to the local system, which then breaks them down into smaller chunks for storage in a vector database.

The /genai/search-in-documents endpoint utilizes the RAG concept. It is responsible for creating an embedding of the user's question and retrieving the most relevant chunks from the vector database to provide context for follow-up questions. Additionally, Redis is used to store and retrieve chat history during a user's session.

LangChain is a framework that helps build scalable applications using diverse LLM models and strategies, simplifying the process of handling AI tools and dependencies.

As illustrated in Figure 3, the first step involves breaking down uploaded documents into smaller chunks and storing them in a vector database. This can be accomplished with the following code (some parts were hided for better understanding):

class DocumentsService {
	private _textSplitter = new RecursiveCharacterTextSplitter({
		chunkSize: 1536,
		chunkOverlap: 128,
	})

	...

	async loadMultipleDocuments(filePaths: string[]): Promise<Document[]> {
		const systemPath = process.cwd()
		const pdfLoaders = filePaths.map((file) => {
			...
		})

		const documents: Document[][] = await Promise.all(
			pdfLoaders.map((loader: PDFLoader) => loader.load()),
		)

		return documents.flat()
	}

	async splitDocuments(docs: Document[]) {
		return await this._textSplitter.splitDocuments(docs)
	}
}

In this section, we utilize the PDFLoader from the @langchain/community/document_loaders library to extract all necessary information from our documents. After that, we employ a TextSplitter to divide the documents into manageable segments.

With our documents now stored in the vector store, we can implement the Information Retrieval (IR) and Reading Comprehension (RC) processes using LangChain's chains.

class SearchInDocumentUseCase {
	const contextualizedPrompt = ChatPromptTemplate.fromMessages([
		['system', CONTEXTUALIZED_SYSTEM_PROMPT],
		new MessagesPlaceholder('chat_history'),
		['human', '{question}'],
	])

	const contextualizedQuestionChain = RunnableSequence.from([
		contextualizedPrompt,
		llmModel,
		new StringOutputParser(),
	])

	const questionAnsweringPrompt = ChatPromptTemplate.fromMessages([
		['system', SEARCH_DOC_SYSTEM_PROMPT],
		new MessagesPlaceholder('chat_history'),
		['human', '{question}'],
	])

	...

	const history = await chatMemory.retrieveMemoryHistory()
	const result = await retrievalChain.invoke({
		question: query,
		chat_history: history,
	})
}

At first glance, it can be confusing, but it's pretty simple to understand, first we need to define a system prompt to guide the model and explain what the task here is — in this case, answer user questions based on a provided context and past conversations history.

That's why we need to use the Redis database for storing chat messages, after each interaction with the app, we store the system and user messages in a temporary collection to be used afterward, langChain will replace the chat_history placeholder with an array that represents the conversations.

Conclusion

Establishing a robust AI workflow based on Retrieval-Augmented Generation (RAG) concepts is beneficial for a variety of applications. One useful implementation is the ability to read documents and answer questions about them.

I am excited to study and apply these concepts in my applications. If you’re interested in what’s coming in the next few months, keep an eye on my application, Astronomy.

Thank you for joining me. I hope this article has been helpful and has inspired you to delve deeper into the fascinating world of AI. See you next time!