rag
what is rag?
RAG (Retrieval-Augmented Generation) is an innovative approach that combines the strengths of information retrieval and generative models.
The pre-trained generative models (such as ChatGPT, DeepSeek , etc) are served as general enquiry purpose, the reponses are not based on updated knowledge base nor specific domain.
​
RAG comes to fill the gap, it integrates an external and updated knowledge base to retrieve relevant information dynamically and then uses this information to generate a response.
HOW does rag work?

1. Data Collection & Preprcoessing
-
Gathering sources from different sources (such as external database, PDFs)
-
Preprocessing step converts source files into machine-friendly and model friendly formats.
2. Embed Sources
-
Embed sources involves two main steps chunking and embedding
-
Split large documents into smaller, manageable chunks to enable efficient retrieval.
-
These text chunks are then processed by an embedding model, which transforms the text chunks into fixed-length numeric vectors that capture their semantic meaning.
3. Vector Storage
-
Embedding vectors are stored in vector database (such as ChromaDB, FAISS, or Milvus) for future retrieval. ​
4. User's Prompt
-
A user's prompt can either be standalone or incorporate prior conversation history.If a discussion history is included, conversation memory tools (for example, LangChain, LlamaIndex) help augment the current prompt with the relevant context.
-
The user's prompt is embedded using the same embedding model as the source documents, ensuring compatibility.
5. Relevant Data Retrieval
-
Retrieving the most relevant text chunk.
-
Fetching the entire document containing the relevant chunk.
-
Retrieving multiple relevant documents.
6. Augmented Prompt
-
Augmented prompts merge the user's original query with retrieved data by simple concatenation of the user's prompt with retrieved text or structured templates where different components are placed along instructions for the LLM to follow.​
7. Response
-
Responses can be further refined using predefined templates, ensuring a consistent presentation style tailored to the use case
