检索增强生成：Ollama、ChromaDB与Python的应用

2024年10月24日由 alex 发表 495 0

在人工智能领域，尤其是在自然语言处理 (NLP) 领域，仅依靠大型语言模型 (LLM) 来生成相关且准确的内容是不够的。虽然像llama3.2这样的模型非常强大，但它们通常缺乏访问存储在大型数据集或外部数据库中的特定上下文知识的能力。这就是检索增强生成 (RAG)发挥作用的地方。

RAG 是一种混合方法，既利用从数据存储（例如ChromaDB）检索特定信息，又利用LLM（例如Ollama 的 llama3.2 ）的生成功能。这种组合可实现更准确、更符合上下文且更基于事实的输出。在这篇文章中，我们将探讨如何使用Ollama 的 llama3.2进行文本生成和使用ChromaDB进行高效的信息检索来实现 RAG 。

什么是检索增强生成 (RAG)？

检索增强生成 (RAG)是一种通过集成检索步骤来增强模型生成相关且明智响应的能力的方法。RAG 并不完全依赖预先训练的语言模型，而是从外部来源（例如数据库或知识库）检索相关文档或数据，并通过将检索到的数据输入模型来增强生成过程。这使模型能够根据实时、最新和特定于领域的信息生成答案。

例如，当提出一个问题时，RAG 不会仅仅基于模型的现有知识来生成响应，而是首先查询外部数据库（在我们的例子中是 ChromaDB）以检索最相关的信息，然后使用该信息作为生成响应的上下文。

为什么要将 Ollama 的 llama3.2 与 ChromaDB 结合起来？

Ollama 的 llama3.2是一款功能强大且性能卓越的 LLM，可处理各种文本生成任务，但与大多数 LLM 一样，它受到训练数据的限制。这意味着它可能无法访问特定或更新的信息。通过集成ChromaDB（专为高效文本检索而设计的高性能嵌入数据库），我们可以增强 llama3.2 将其答案建立在相关、可检索数据中的能力。这种方法提高了生成答案的准确性和相关性。

使用 Ollama 和 ChromaDB 的 RAG 工作流程

数据提取：数据作为文档块存储在ChromaDB中，每个文档块都带有元数据注释（如页码或文档 ID）。
查询 ChromaDB：当提出问题时，将查询 ChromaDB 以根据查询检索最相关的文档块。
检索：ChromaDB 返回最相关的文本块以及元数据（例如源信息）。
生成：检索到的文本块作为上下文传递给Ollama 的 llama3.2模型，该模型根据检索到的信息生成响应。
回应：最终回应包括答案、作为证据的引用文本以及相关元数据（如页码或源标识符）。

使用 Ollama 和 ChromaDB 实现 RAG 的代码

让我们来看看这个 RAG 设置的代码实现。我们将使用ChromaDB作为文档存储，使用Ollama 的 llama3.2作为生成模型。

安装所需的软件包

在开始之前，我们将安装该项目所需的软件包，即 ChromaDB 和 Ollama。

# requirements.txt
chromadb
ollama

1. 设置 ChromaDB

ChromaDB 用于存储和检索文档块。我们还将为每份文档存储页码等相关元数据。在进行查询时，我们会从 ChromaDB 中检索最相关的文档块。

import chromadb
# Initialize ChromaDB client
chroma_client = chromadb.PersistentClient(path="./") # I saved it to my current directory file path
# Create or get collection
collection = chroma_client.get_or_create_collection(name="my_collection") # By default chromadb collections uses L2, you can alter the metadata to use Cosine Similarity instead.
# Function to upsert text chunks into ChromaDB with metadata
def upsert_into_chromadb(chunks, metadata):
    # Upsert documents into ChromaDB with metadata and unique IDs
    collection.add(
        documents=chunks,
        metadatas=metadata,
        ids=[f"chunk_{i+1}" for i in range(len(chunks))]  # Generating unique ids like "chunk_1", "chunk_2", etc.
    )

2. 查询 ChromaDB

当用户提问时，ChromaDB 会根据输入的问题查询最相关的文档块。检索到的数据将传递给 LLM 以生成答案。

# Query ChromaDB
def query_chromadb(prompt, n_results=3):
    # ChromaDB will generate the embedding for the query and find the most similar chunks
    results = collection.query(
        query_texts=[prompt],  # Chroma will embed this for you
        n_results=n_results,
        include=["documents", "metadatas"]  # Include both documents and metadata in the results
    )
    return results

3. 扁平化数据处理

ChromaDB 返回的数据可能包含嵌套列表。我们对文档块及其相关元数据进行了扁平化处理，以简化处理过程并确保对齐

# Flatten the list of documents
def flatten_documents(documents):
    return [sentence for doc in documents for sentence in doc]
# Flatten the list of metadatas
def flatten_metadatas(metadatas):
    return [meta for meta_list in metadatas for meta in meta_list]

4. RAG 流程：检索和生成

RAG 流程包括从 ChromaDB 检索相关信息，然后使用 llama3.2 模型生成响应。

import ollama
def rag_process(prompt, SYSTEM_PROMPT, filename):
    # Define the documents and metadata which I used to simulate our data. 
    chunks = [
        "Peter Pan is a fictional character created by Scottish novelist and playwright J.M. Barrie...",
        "Peter Pan is adventurous and fearless, often engaging in battles with Captain Hook...",
        "Peter brings Wendy Darling and her brothers to Neverland. They experience thrilling adventures...",
        "Peter Pan represents the complexities of eternal youth. Wendy eventually returns to the real world...",
        "Other key characters in Peter Pan’s story include Tiger Lily, a Native American princess...",
        "The story of Peter Pan has been adapted into numerous films, plays, and books...",
        "Peter Pan is an handsome young man."
    ]
    
    metadata = [
        {"page": "Page 1"}, {"page": "Page 2"}, {"page": "Page 3"}, 
        {"page": "Page 4"}, {"page": "Page 5"}, {"page": "Page 6"}, 
        {"page": "Page 7"}
    ]
    # 1. Upsert chunks into ChromaDB with metadata
    upsert_into_chromadb(chunks, metadata)
    # 2. Query ChromaDB to find the most relevant chunks
    chromadb_results = query_chromadb(prompt, n_results=3)
    # Flatten the nested lists of documents and metadata
    flat_chunks = flatten_documents(chromadb_results["documents"])
    flat_metadata = flatten_metadatas(chromadb_results["metadatas"])
    # Concatenate the most relevant flattened chunks
    retrieved_chunks = []
    for i, chunk in enumerate(flat_chunks):
        if i < len(flat_metadata):
            metadata_entry = flat_metadata[i]
            if metadata_entry and isinstance(metadata_entry, dict):
                retrieved_chunks.append(f"{chunk} (Source: {metadata_entry.get('page', 'No page available')})")
            else:
                retrieved_chunks.append(f"{chunk} (Source: Metadata missing)")
    # 3. Use the chat method to generate a response with evidence and source
    full_retrieved_chunks = " ".join(retrieved_chunks)
    response = ollama.chat(
        model="llama3.2",
        messages=[
            {
                "role": "system",
                "content": f"{SYSTEM_PROMPT}\nContext:\n{full_retrieved_chunks}"
            },
            {
                "role": "user",
                "content": f"Answer this question: {prompt} Include the source in your answer."
            }
        ]
    )["message"]["content"]
    return response

5. 定义系统提示

系统提示为 LLM 提供指示，以确保它在引用证据和提供元数据的同时，根据检索到的上下文生成回复。

SYSTEM_PROMPT = """
You are a helpful AI assistant that answers questions using only the provided context. 
For each answer, include the following:
1. Directly quote the relevant text as evidence for your answer.
2. Provide the source of the quoted evidence, including metadata such as the page number or document ID.
3. Be as concise as possible in your response.
If you're unsure or if the context doesn't provide enough information, just say "I don't know."
Context:
"""

结论

检索增强生成（RAG）是一种功能强大的方法，它通过将回复建立在可检索数据的基础上，提高了人工智能生成内容的准确性和相关性。通过结合 Ollama's llama3.2 和 ChromaDB 的功能，我们可以确保生成的回复不仅准确，而且有可验证的证据支持。这种设置尤其适用于对事实准确性要求较高的应用，如问题解答系统、研究工具和教育平台。

有了这种混合方法，人工智能生成内容的未来就可以既智能又明智--不仅依靠预先训练好的知识，还依靠实时的、针对特定问题的数据检索。

完整Python代码

# rag_chroma.py
import chromadb
import ollama
# Initialize ChromaDB client
chroma_client = chromadb.PersistentClient(path="./")
# Create or get collection
collection = chroma_client.get_or_create_collection(name="my_collection_2")
# Function to upsert text chunks into ChromaDB with metadata
def upsert_into_chromadb(chunks, metadata):
    # Upsert documents into ChromaDB with metadata and unique IDs
    collection.add(
        documents=chunks,
        metadatas=metadata,
        ids=[f"chunk_{i+1}" for i in range(len(chunks))]  # Generating unique ids like "chunk_1", "chunk_2", etc.
    )
# Query ChromaDB
def query_chromadb(prompt, n_results=3):
    # ChromaDB will generate the embedding for the query and find the most similar chunks
    results = collection.query(
        query_texts=[prompt],  # Chroma will embed this for you
        n_results=n_results,
        include=["documents", "metadatas"]  # Include both documents and metadata in the results
    )
    return results
# Flatten the list of documents
def flatten_documents(documents):
    return [sentence for doc in documents for sentence in doc]
# Flatten the list of metadatas
def flatten_metadatas(metadatas):
    return [meta for meta_list in metadatas for meta in meta_list]
# The RAG process: Retrieve + Generate
def rag_process(prompt, SYSTEM_PROMPT, filename):
    # Define the documents and metadata
    chunks = [
        "Peter Pan is a fictional character created by Scottish novelist and playwright J.M. Barrie...",
        "Peter Pan is adventurous and fearless, often engaging in battles with Captain Hook...",
        "Peter brings Wendy Darling and her brothers to Neverland. They experience thrilling adventures...",
        "Peter Pan represents the complexities of eternal youth. Wendy eventually returns to the real world...",
        "Other key characters in Peter Pan’s story include Tiger Lily, a Native American princess...",
        "The story of Peter Pan has been adapted into numerous films, plays, and books...",
        "Peter Pan is an handsome young man."
    ]
    
    metadata = [
        {"page": "Page 1"}, {"page": "Page 2"}, {"page": "Page 3"}, 
        {"page": "Page 4"}, {"page": "Page 5"}, {"page": "Page 6"}, 
        {"page": "Page 7"}
    ]
    # 1. Upsert chunks into ChromaDB with metadata (page numbers)
    upsert_into_chromadb(chunks, metadata)
    # 2. Query ChromaDB to find the most relevant chunks to the prompt
    chromadb_results = query_chromadb(prompt, n_results=3)
    # Flatten the nested lists of documents and metadata
    flat_chunks = flatten_documents(chromadb_results["documents"])
    flat_metadata = flatten_metadatas(chromadb_results["metadatas"])
    # Concatenate the most relevant flattened chunks
    retrieved_chunks = []
    for i, chunk in enumerate(flat_chunks):
        if i < len(flat_metadata):
            metadata_entry = flat_metadata[i]
            # Append the chunk and its source (if metadata is valid)
            if metadata_entry and isinstance(metadata_entry, dict):
                retrieved_chunks.append(f"{chunk} (Source: {metadata_entry.get('page', 'No page available')})")
            else:
                retrieved_chunks.append(f"{chunk} (Source: Metadata missing)")
        else:
            retrieved_chunks.append(f"{chunk} (Source: No metadata available)")
    # Join all the retrieved chunks with their sources
    full_retrieved_chunks = " ".join(retrieved_chunks)
    # 3. Use the chat method to create a synthesized answer (generation)
    # Provide the retrieved text along with its sources explicitly
    response = ollama.chat(
        model="llama3.2",
        messages=[
            {
                "role": "system",
                "content": f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n{SYSTEM_PROMPT}<|eot_id|>\n" + full_retrieved_chunks
            },
            {
                "role": "user",
                "content": f"<|start_header_id|>user<|end_header_id|>\nAnswer this question: {prompt} Include the source in your answer.<|eot_id|>"
            },
            {
                "role": "assistant",
                "content": "<|start_header_id|>assistant<|end_header_id|>"  # This signals to the model to start generating.
            }
        ]
    )["message"]["content"]
    return response
def main():
    SYSTEM_PROMPT = """
    You are a helpful AI assistant that answers questions using only the provided context. 
    For each answer, include the following:
    1. Directly quote the relevant text as evidence for your answer.
    2. Provide the source of the quoted evidence, including metadata such as the page number, document ID, or any available metadata.
    3. Be as concise as possible in your response.
    If you're unsure or if the context doesn't provide enough information, just say "I don't know."
    Context:
    """
    prompt = "Who is the main character?"
     
    # Perform the RAG process: retrieve relevant chunks and generate a response
    response = rag_process(prompt, SYSTEM_PROMPT, 'peterpan')
    print("Generated Response:")
    print(response)
if __name__ == '__main__':
    main()
# Output when running the Python code
# Generated Response:
# Yes, Peter Pan is a fictional character.
# Peter Pan is "a boy who refuses to grow up" and is "a fictional character created by Scottish novelist and playwright J.M. Barrie." (Source: Page 1)

文章来源：https://medium.com/@jonathantan12/retrieval-augmented-generation-rag-with-ollama-llama3-2-and-chromadb-with-python-code-7a401335c069

标签：

大型语言模型人工智能检索增强生成

0 评论

欢迎关注ATYUN官方公众号

商务合作及内容投稿请联系邮箱:bd@atyun.com

上一篇【指南】利用Haystack和Hypster实现模块化RAG

下一篇【指南】使用HuggingFace对大型语言模型进行微调

评论登录

要发表评论，您必须先登录。

jonatasgrosman/wav2vec2-large-xlsr-53-english facebook/dino-vitb16 bert-base-uncased xlm-roberta-large xlm-roberta-base gpt2 microsoft/resnet-50 facebook/dino-vits8

AGENTIC AI如何塑造未来