使用Neo4j和OpenAI:创建 RAG 知识图谱并高效检索数据

2024年09月11日 由 alex 发表 71 0

在本文中,我将逐步指导你如何使用 Neo4j 为检索增强生成 (RAG) 创建知识图谱,并通过 OpenAI API 从中检索数据。


以下是该项目的总体情况:


4


设置你的环境


设置 Neo4j AuraDB

首先,你需要在Neo4j上创建一个帐户


创建帐户后,下一步是设置 Neo4j 的新实例:

  1. 登录你的 Neo4j 仪表板。
  2. 按照屏幕上的说明创建一个新实例。此实例将作为知识图谱的数据库。


5


创建实例后,你需要对其进行配置以满足你的需求:


6


  1. 单击新实例右上角的“打开”。
  2. 设置你喜欢的用户名和密码。这些凭证将用于访问数据库。


要将你的应用程序连接到 Neo4j 数据库,你需要.env使用实例详细信息更新文件。操作方法如下:

  1. 打开.env项目目录中的文件。
  2. 添加以下行,用 Neo4j 实例中的值替换占位符:


NEO4J_URI=Your Connection URI
NEO4J_USERNAME=Your Username
NEO4J_PASSWORD=Your Password
NEO4J_DATABASE=neo4j (By default, this is named neo4j)


OpenAI

在本项目中,我们使用 OpenAI 进行嵌入和生成响应。从 OpenAI 获取 API 后,打开 .env 文件并将其放入。OPENAI_BASE_URL 用于嵌入。


OPENAI_API_KEY=
OPENAI_BASE_URL='https://api.openai.com/v1'


1. 收集数据(数据/原始数据)

我从维基百科中保存了 3 个 HTML 页面。你可以保存任何页面或文件,包括 PDF 或简单的文本文件。


2. 清理数据并整理章节(preprocessing.py)

在这一步中,我将清理数据并将其归类为一般信息、职业和死亡等部分。但是,我为什么要手动准备数据呢?在设计知识图谱时,了解数据之间的所有关系至关重要。这个过程类似于设计 SQL 数据库--在表之间创建有效的关系需要对数据有透彻的了解。


3. 准备 JSON 数据(txt2json.py)

在这一步中,我使用预处理过的数据,并从中创建一个 JSON 文件,用于分块。


4. 给数据分块(chunking.py)

在这里,我使用 LangChain 的 RecursiveCharacterTextSplitter() 函数分割数据。


text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2000,
    chunk_overlap=200,
    length_function=len,
    is_separator_regex=False,
)
def split_data_from_file(file):
    #### define a variable to accumlate chunk records
    chunks_with_metadata = []
    #### Load json file
    file_as_object = json.load(open(file))
    keys = list(file_as_object.keys())
    print(keys)
    # pull keys (section names) from the json file
    for item in keys:
        print(f'Processing {item} from {file}')
        # grab the text of the section
        item_text = file_as_object[item]
        # split the text into chunks
        item_text_chunks = text_splitter.split_text(item_text)
        chunk_seq_id = 0
        # loop through chunks
        for chunk in item_text_chunks:
            # Extract the file name (without extension) from the file path
            form_name = file[file.rindex('/') + 1:file.rindex('.')]
            #### create a record with metadata and the chunk text
            chunks_with_metadata.append({
                # metadata from looping...
                'text': chunk,
                'formItem': item,
                'chunkSeqId': chunk_seq_id,
                # constructed metadata...
                'chunkId': f'{form_name}-{item}-chunk{chunk_seq_id:04d}',
                'source': file_as_object['Source']
            })
            chunk_seq_id += 1
        print(f'\tSplit into {chunk_seq_id} chunks')
    return chunks_with_metadata


split_data_from_file 函数的输出是什么?

该函数的输出是数据块的元数据,将在以后的步骤中使用。下面我们来看看元数据:

  • text: 实际的文本块。
  • formItem:部分名称(JSON 文件中的关键字)。
  • chunkSeqId:文本块的序列 ID(从文本块开始): 文本块的序列 ID(从 0 开始,每个文本块递增)。
  • chunkId: 块的唯一 ID: 该数据块的唯一 ID,由文件名、章节名和数据块序列号组成(例如,file-section1-chunk0001)。
  • source: 文本的来源,假设 JSON 文件中有来源字段。


5. 在知识图谱中创建块节点

首先,我们需要将文件中的数据块上传到知识图谱中。这将在 Napoleon.ipynb、Talleyrand.ipynb 和 Battle of Waterloo.ipynb 中进行处理。


为每个数据块创建节点

在做任何事情之前,我们都应该将三个文件中的所有内容块上传到知识图谱中。为此,我们首先需要为每个知识块创建一个节点。在这里,我们要创建 Napoleon_Chunk:


# Create Napoleon_Chunk node and its properties
merge_chunk_node_query = """
MERGE(mergedChunk:Napoleon_Chunk {chunkId: $chunkParam.chunkId})
    ON CREATE SET
        mergedChunk.text = $chunkParam.text, 
        mergedChunk.source = $chunkParam.source, 
        mergedChunk.formItem = $chunkParam.formItem, 
        mergedChunk.chunkSeqId = $chunkParam.chunkSeqId
RETURN mergedChunk
"""
kg.query(merge_chunk_node_query, 
         params={'chunkParam':file_chunks[0]})


输出:


[{'mergedChunk': {'formItem': 'General Information','mergedChunk': {'formItem': 'General Information',
   'text': "Napoleon Bonaparte (born Napoleone di Buonaparte;[b] 15 August 1769 – 5 May 1821), later known by his regnal name Napoleon\xa0I, was a French military and political leader who rose to prominence during the French Revolution and led a series of successful campaigns across Europe during the Revolutionary Wars and Napoleonic Wars from 1796 to 1815. He was the leader of the French Republic as First Consul from 1799 to 1804, then of the French Empire as Emperor of the French from 1804 to 1814, and briefly again in 1815.\nBorn on the island of Corsica to a family of Italian origin, Napoleon moved to mainland France in 1779 and was commissioned as an officer in the French Army in 1785. He supported the French Revolution in 1789, and promoted its cause in Corsica. He rose rapidly in the ranks after breaking the siege of Toulon in 1793 and firing on royalist insurgents in Paris on 13 Vendémiaire in 1795. In 1796, Napoleon commanded a military campaign against the Austrians and their Italian allies in the War of the First Coalition, scoring decisive victories and becoming a national hero. He led an expedition to Egypt and Syria in 1798 which served as a springboard to political power. In November 1799, Napoleon engineered the Coup of 18 Brumaire against the Directory, and became First Consul of the Republic. He won the Battle of Marengo in 1800, which secured French victory in the War of the Second Coalition, and in 1803 sold the territory of Louisiana to the United States, which doubled the latter's area. In December 1804, Napoleon crowned himself Emperor of the French, further expanding his power.",
   'source': 'Napoleon History',
   'chunkId': 'Napoleon-General Information-chunk0000',
   'chunkSeqId': 0}}]


那么,我们就应该确保没有两个具有相同信息的相似节点:


avoid_duplicate_chunks = """
CREATE CONSTRAINT unique_chunk IF NOT EXISTS
    FOR (nc:Napoleon_Chunk) REQUIRE nc.chunkId IS UNIQUE
"""
kg.query(avoid_duplicate_chunks)


我们应该把数据块上传到 Neo4j:


node_count = 0
for chunk in file_chunks:
    print(f"Creating `:Chunk` node for chunk ID {chunk['chunkId']}")
    kg.query(merge_chunk_node_query,
            params={
                'chunkParam': chunk
            })
    node_count += 1
print(f"Created {node_count} nodes")


向量索引是一种数据结构,用于存储和有效检索向量嵌入。如果我们想嵌入这些块,就需要一个向量索引。由于矢量索引是我们需要传递给 OpenAI 的嵌入器的一个参数,因此它可以作为块节点内的一个属性来保存文本嵌入,但要使用特定的数据结构。


VectorIndex = """
         CREATE VECTOR INDEX `NapoleonOpenAI` IF NOT EXISTS
          FOR (nc:Napoleon_Chunk) ON (nc.textEmbeddingOpenAI)
          OPTIONS { indexConfig: {
            `vector.dimensions`: 1536,
            `vector.similarity_function`: 'cosine'
         }}
"""
kg.query(VectorIndex)


然后,我们应该嵌入文本并将其存储在 textEmbeddingOpenAI 中。这个查询可以帮助我们做到这一点:


kg.query("""
    MATCH (Napoleon_Chunk:Napoleon_Chunk) WHERE Napoleon_Chunk.textEmbeddingOpenAI IS NULL
    WITH Napoleon_Chunk, genai.vector.encode(
      Napoleon_Chunk.text,
      "OpenAI",
      {
        token: $openAiApiKey,
        endpoint: $openAiEndpoint
      }) AS vector
    CALL db.create.setNodeVectorProperty(Napoleon_Chunk, "textEmbeddingOpenAI", vector)
    """,
    params={"openAiApiKey":OPENAI_API_KEY, "openAiEndpoint": OPENAI_ENDPOINT} )


嵌入后,我们需要根据数据块的类别和 ID 在它们之间创建 “下一个 ”关系。


首先,我们要找到相同类别的数据块:


cypher = """
   MATCH (from_same_chunk_item:Napoleon_Chunk)
    WHERE from_same_chunk_item.formItem = $NapoleonParam
    AND from_same_chunk_item.formItem = $NapoleonParam
  RETURN from_same_chunk_item {.text, .formItem, .chunkId, .chunkSeqId } as chunkItemInfo
    ORDER BY from_same_chunk_item.chunkSeqId ASC
    LIMIT 1
"""
items = ['General Information', 'Career', 'Death']
for item in items:
  result = kg.query(cypher, params={'NapoleonParam':item})
  print(result)


输出:


[{'chunkItemInfo': {'text': "Napoleon Bonaparte (born Napoleone di Buonaparte;[b] 15 August 1769 – 5 May 1821), later known by his regnal name Napoleon\xa0I, was a French military and political leader who rose to prominence during the French Revolution and led a series of successful campaigns across Europe during the Revolutionary Wars and Napoleonic Wars from 1796 to 1815. He was the leader of the French Republic as First Consul from 1799 to 1804, then of the French Empire as Emperor of the French from 1804 to 1814, and briefly again in 1815.\nBorn on the island of Corsica to a family of Italian origin, Napoleon moved to mainland France in 1779 and was commissioned as an officer in the French Army in 1785. He supported the French Revolution in 1789, and promoted its cause in Corsica. He rose rapidly in the ranks after breaking the siege of Toulon in 1793 and firing on royalist insurgents in Paris on 13 Vendémiaire in 1795. In 1796, Napoleon commanded a military campaign against the Austrians and their Italian allies in the War of the First Coalition, scoring decisive victories and becoming a national hero. He led an expedition to Egypt and Syria in 1798 which served as a springboard to political power. In November 1799, Napoleon engineered the Coup of 18 Brumaire against the Directory, and became First Consul of the Republic. He won the Battle of Marengo in 1800, which secured French victory in the War of the Second Coalition, and in 1803 sold the territory of Louisiana to the United States, which doubled the latter's area. In December 1804, Napoleon crowned himself Emperor of the French, further expanding his power.", 'formItem': 'General Information', 'chunkId': 'Napoleon-General Information-chunk0000', 'chunkSeqId': 0}}]'chunkItemInfo': {'text': "Napoleon Bonaparte (born Napoleone di Buonaparte;[b] 15 August 1769 – 5 May 1821), later known by his regnal name Napoleon\xa0I, was a French military and political leader who rose to prominence during the French Revolution and led a series of successful campaigns across Europe during the Revolutionary Wars and Napoleonic Wars from 1796 to 1815. He was the leader of the French Republic as First Consul from 1799 to 1804, then of the French Empire as Emperor of the French from 1804 to 1814, and briefly again in 1815.\nBorn on the island of Corsica to a family of Italian origin, Napoleon moved to mainland France in 1779 and was commissioned as an officer in the French Army in 1785. He supported the French Revolution in 1789, and promoted its cause in Corsica. He rose rapidly in the ranks after breaking the siege of Toulon in 1793 and firing on royalist insurgents in Paris on 13 Vendémiaire in 1795. In 1796, Napoleon commanded a military campaign against the Austrians and their Italian allies in the War of the First Coalition, scoring decisive victories and becoming a national hero. He led an expedition to Egypt and Syria in 1798 which served as a springboard to political power. In November 1799, Napoleon engineered the Coup of 18 Brumaire against the Directory, and became First Consul of the Republic. He won the Battle of Marengo in 1800, which secured French victory in the War of the Second Coalition, and in 1803 sold the territory of Louisiana to the United States, which doubled the latter's area. In December 1804, Napoleon crowned himself Emperor of the French, further expanding his power.", 'formItem': 'General Information', 'chunkId': 'Napoleon-General Information-chunk0000', 'chunkSeqId': 0}}]
[{'chunkItemInfo': {'text': "Upon graduating in September 1785, Bonaparte was commissioned a second lieutenant in La Fère artillery regiment. He served in Valence and Auxonne until after the outbreak of the French Revolution in 1789, but spent long periods of leave in Corsica which fed his Corsican nationalism. In September 1789, he returned to Corsica and promoted the French revolutionary cause. Paoli returned to the island in July 1790, but he had no sympathy for Bonaparte, as he deemed his father a traitor for having deserted the cause of Corsican independence.\nBonaparte plunged into a complex three-way struggle among royalists, revolutionaries, and Corsican nationalists. He became a supporter of the Jacobins and joined the pro-French Corsican Republicans who opposed Paoli's policy and his aspirations to secede. He was given command over a battalion of Corsican volunteers and promoted to captain in the regular army in 1792, despite exceeding his leave of absence and a dispute between his volunteers and the French garrison in Ajaccio.\nIn February 1793, Bonaparte took part in the failed French expedition to Sardinia. Following allegations that Paoli had sabotaged the expedition and that his regime was corrupt and incompetent, the French National Convention outlawed him. In early June, Bonaparte and 400 French troops failed to capture Ajaccio from Corsican volunteers and the island was now controlled by Paoli's supporters. When Bonaparte learned that the Corsican assembly had condemned him and his family, the Buonapartes fled to Toulon on the French mainland.", 'formItem': 'Career', 'chunkId': 'Napoleon-Career-chunk0000', 'chunkSeqId': 0}}]
[{'chunkItemInfo': {'text': 'Napoleon\'s health continued to worsen, and in March 1821 he was confined to bed. In April he wrote two wills declaring that he had been murdered by the British, that the Bourbons would fall and that his son would rule France. He left his fortune to 97 legatees and asked to be buried by the Seine.\nOn 3 May he was given the last rites but could not take communion due to his illness. He died on 5 May 1821 at age 51. His last words, variously recorded by those present, were either France, l\'armée, tête d\'armée, Joséphine ("France, the army, head of the army, Joséphine"), or qui recule...à la tête d\'armée ("who retreats... at the head of the army") or "France, my son, the Army."\nAntommarchi and the British wrote separate autopsy reports, each concluding that Napoleon had died of internal bleeding caused by stomach cancer, the disease that had killed his father. A later theory, based on high concentrations of arsenic found in samples of Napoleon\'s hair, held that Napoleon had died of arsenic poisoning. However, subsequent studies also found high concentrations of arsenic in hair samples from Napoleon\'s childhood and from his son and Joséphine. Arsenic was widely used in medicines and products such as hair creams in the 19th century. A 2021 study by an international team of gastrointestinal pathologists once again concluded that Napoleon died of stomach cancer.', 'formItem': 'Death', 'chunkId': 'Napoleon-Death-chunk0000', 'chunkSeqId': 0}}]


然后,我们按顺序返回数据块,并使用 “NEXT ”关系将它们连接起来:


cypher = """
  MATCH (from_same_chunk_item:Napoleon_Chunk)
  WHERE from_same_chunk_item.formItem = $NapoleonParam
    AND from_same_chunk_item.formItem = $NapoleonParam
  WITH from_same_chunk_item
    ORDER BY from_same_chunk_item.chunkSeqId ASC
  WITH collect(from_same_chunk_item) as section_chunk_list
    CALL apoc.nodes.link(
        section_chunk_list,
        "NEXT",
        {avoidDuplicates: true}
    )
  RETURN size(section_chunk_list)
"""
items = ['General Information', 'Career', 'Death']
for item in items:
  result = kg.query(cypher, params={'NapoleonParam':item})
  print(f"for {item}: {result}" )


输出:


for General Information: [{'size(section_chunk_list)': 18}]
for Career: [{'size(section_chunk_list)': 34}]
for Death: [{'size(section_chunk_list)': 8}]


现在一切就绪。是时候在 Neo4j 中创建节点和关系了。


6. 创建节点和关系(KnowledgeGraph/Nodes_and_Relationships.ipynb)

在开始编码之前,我们应该根据数据定义我们的知识图谱结构。在我们的例子中:

我们有三个 HTML 文件:两个是拿破仑和塔列朗,他们是两个人,另一个是滑铁卢战役,我们将其视为一个事件。每个主节点将有一个属性与子节点相连,子节点有一个或多个属性。


例如:

我们为拿破仑和塔列朗定义了一个人物节点,为滑铁卢定义了一个事件节点。每个节点都有相关属性。


CREATE (napoleon:Person {
    name: "Napoleon Bonaparte"
})
CREATE (talleyrand:Person {
    name: "Charles-Maurice de Talleyrand"
})
CREATE (waterloo:Event {
    name: "Battle of Waterloo"
})


定义子节点:

预处理文件中的每个部分都对应一个子节点。例如,每个人有三个子节点: 一般信息、职业生涯和死亡。


CREATE (napoleonGeneral:General_info {
    chunk_info: "General Information",
    birthDate: "1769-08-15",
    deathDate: "1821-05-05",
    nationality: "French",
    knownFor: "Military and political leader"
})
CREATE (talleyrandGeneral:General_info {
    chunk_info: "General Information",
    birthDate: "1754-02-02",
    deathDate: "1838-05-17",
    nationality: "French",
    knownFor: "Diplomat and statesman"
})
CREATE (napoleonCareer:Career {
    position: "Emperor",
    period: "1804-1814",
    chunk_info: "Career"
})
CREATE (talleyrandCareer:Career {
    position: "Foreign Minister",
    period: "1799-1807",
    chunk_info: "Career"
})
CREATE (napoleonDeath:Death {
    date: "1821-05-05",
    location: "Longwood, Saint Helena",
    chunk_info: "Death"
})
CREATE (talleyrandDeath:Death {
    date: "1838-05-17",
    location: "Paris, France",
    chunk_info: "Death"
})


创建节点之间的关系:

我们需要定义主节点和其子节点之间的关系,以及主节点本身和事件节点之间的关系。



// Create relationships for career and death information
CREATE (napoleon)-[:HAS_Career_INFO]->(napoleonCareer)
CREATE (napoleon)-[:HAS_Death_INFO]->(napoleonDeath)
CREATE (napoleon)-[:HAS_General_INFO]->(napoleonGeneral)
CREATE (talleyrand)-[:HAS_Career_INFO]->(talleyrandCareer)
CREATE (talleyrand)-[:HAS_Death_INFO]->(talleyrandDeath)
CREATE (talleyrand)-[:HAS_General_INFO]->(talleyrandGeneral)
// Create relationships between Person nodes
CREATE (napoleon)-[:RELATED_TO]->(talleyrand)
CREATE (talleyrand)-[:RELATED_TO]->(napoleon)
// Create relationships between Person nodes and Event
CREATE (napoleon)-[:RELATED_TO]->(waterloo)
CREATE (talleyrand)-[:RELATED_TO]->(waterloo)


连接节点和块节点:

最后,我们将节点与相关的大块节点连接起来。例如,我们搜索拿破仑职业生涯节点和大块节点,如果它们的属性匹配,就在它们之间创建 HAS_Chunk_INFO 关系。


MATCH (napoleonCareer:Career {position: "Emperor"}), (careerChunks:Napoleon_Chunk)
WHERE napoleonCareer.chunk_info = careerChunks.formItem
WITH napoleonCareer, careerChunks
MERGE (napoleonCareer)-[r:HAS_Chunk_INFO]->(careerChunks)
RETURN count(r)


7. 检索答案

现在我们已经获取了所有数据,是时候根据用户查询生成回复了。我们可以使用两种不同的结构来实现这一目标: GraphRAG 和 VectorRAG。


7.1 使用 GraphRAG 检索(graphRAG_generation.py)

这一步的核心涉及提示工程:


retrieval_qa_chat_prompt = """
Task: Generate Cypher statement to query a graph database.
Instructions:
Use only the provided relationship types and properties in the schema. Do not use any other relationship types or properties that are not provided.
Remember the relationships are like Schema:
{schema}
If the question mentions 'Talleyrand', it refers to Charles-Maurice de Talleyrand. If it mentions 'Napoleon', it refers to Napoleon Bonaparte, and if it mentions 'Waterloo', it refers to the Battle of Waterloo.
Note: Do not include any explanations or apologies in your responses. Do not include any text except the generated Cypher statement. Remember to correct any typos in names.
Example 1: What was the story of Napoleon in the Battle of Waterloo?
MATCH (Napoleon:Person)-[:RELATED_TO]->(waterloo:Event)-[:HAS_General_INFO]->(info:General_info)-[:HAS_Chunk_INFO]->(ChunkInfo:Waterloo_Chunk)
RETURN Napoleon, waterloo, info, ChunkInfo.text
Example 2: What was the story of the Battle of Waterloo?
MATCH (waterloo:Event)-[:HAS_General_INFO]->(info:General_info)-[:HAS_Chunk_INFO]->(ChunkInfo:Waterloo_Chunk)
RETURN waterloo, info, ChunkInfo.text
Example 3: Tell me about Talleyrand and Napoleon in 5 lines.
MATCH (Talleyrand:Person)-[:RELATED_TO]->(Napoleon:Person)-[:HAS_Career_INFO]->(info:Career_info)-[:HAS_Chunk_INFO]->(ChunkInfo:Napoleon_Chunk)
RETURN Talleyrand, Napoleon
The question is:
{question}
"""


我在这里使用了角色提示和少量学习。这种方法向模型展示了节点之间的关系是如何定义的。通过提供的模式,模型可以访问节点之间的关系,但在创建涉及查找多个节点之间关系的查询时,模型会很吃力,直到我们采用了少量学习方法。


输出使用 GraphCypherQAChain() 生成:


class GraphRAG:
    def __init__(self):
        # Initialize the GraphRAG instance.
        # Create a PromptTemplate instance that defines the structure of the prompt.
        # This template uses 'schema' and 'question' as input variables.
        # The template is used to guide the generation of Cypher queries.
        self.cypher_prompt = PromptTemplate(
            input_variables=["schema", "question"],  # Input variables for the template
            template=retrieval_qa_chat_prompt  # Template for generating Cypher queries
        )
        
        # Create an instance of GraphCypherQAChain using the ChatOpenAI model.
        # This chain is responsible for generating Cypher queries based on the provided prompt and graph schema.
        # 'graph' should be the Neo4j graph object.
        self.cypher_chain = GraphCypherQAChain.from_llm(
            ChatOpenAI(temperature=0),  # The language model used for generating Cypher queries (deterministic behavior)
            graph=graph,  # The Neo4j graph object to query
            verbose=True,  # Enable verbose logging for debugging
            cypher_prompt=self.cypher_prompt  # The prompt template for guiding query generation
        )
    def generate_cypher_query(self, question: str) -> str:
        # Run the Cypher query generation chain with the given question.
        # This method generates a Cypher query based on the input question.
        response = self.cypher_chain.run(question)
        
        # Format the response to ensure it is wrapped to 60 characters per line for better readability.
        return textwrap.fill(response, 60)


7.2. 使用 VectorRAG 检索(vectorRAG_generation.py)

在这一步中,我们基本上创建了一个检索数据的链。我们使用 Neo4jVector 和 from_existing_graph() 函数初始化向量存储,传递环境变量和 VECTOR_NODE_LABEL。然后定义 prompt 和 LLM,最后使用 create_retrieval_chain 创建检索链。VECTOR_NODE_LABEL 在 neo4j_env.py 中定义,应根据我们要提取数据的向量进行调整。


class VectorRAG:
    def __init__(self):
        self.vector_store = Neo4jVector.from_existing_graph(
            embedding=OpenAIEmbeddings(),  # Use OpenAI embeddings for vectorization
            url=NEO4J_URI,                 # URI for the Neo4j instance
            username=NEO4J_USERNAME,       # Username for authenticating with Neo4j
            password=NEO4J_PASSWORD,       # Password for authenticating with Neo4j
            index_name=VECTOR_INDEX_NAME,  # The name of the vector index in Neo4j
            node_label=VECTOR_NODE_LABEL,  # The label used for the nodes in the graph
            text_node_properties=[VECTOR_SOURCE_PROPERTY],  # Text properties of the nodes that will be embedded
            embedding_node_property=VECTOR_EMBEDDING_PROPERTY,  # Property that stores the embedding vectors on the nodes
        )
        self.retrieval_qa_chat_prompt = hub.pull("langchain-ai/retrieval-qa-chat")
        self.combine_docs_chain = create_stuff_documents_chain(ChatOpenAI(temperature=0), self.retrieval_qa_chat_prompt)
        self.retrieval_chain = create_retrieval_chain(
            retriever=self.vector_store.as_retriever(),  # Use the vector store as a retriever for similarity search
            combine_docs_chain=self.combine_docs_chain  # Combine the retrieved documents using the QA chat chain
        )
    def query(self, question: str) -> str:
        result = self.retrieval_chain.invoke(input={"input": question})
        return textwrap.fill(result['answer'], 60)


7.3. 生成(main.py)

在此文件中,我们整合了 GraphRAG 和 VectorRAG 功能:


def query_graph_or_vector_rag(use_graph: bool, question: str) -> str:query_graph_or_vector_rag(use_graph: bool, question: str) -> str:
    if use_graph:
        # Use GraphRAG
        query_generator = GraphRAG()
        cypher_query = query_generator.generate_cypher_query(question)
        return f"Cypher Query: {cypher_query}"
    else:
        # Use VectorRAG
        retrieval_qa = VectorRAG()
        answer = retrieval_qa.query(question)
        return f"Answer: {answer}"
# Example query
question = "Who was leading the Battle of Waterloo?"
# Set to use GraphRAG -> True
# Set to use VectorRAG -> False
result_relationship = query_graph_or_vector_rag(True, question)
print(result_relationship)


结论

在本文中,我介绍了如何使用 Neo4j 创建知识图谱,并使用 OpenAI 检索其中的信息。


文章来源:https://medium.com/@homayoun.srp/building-a-knowledge-graph-for-rag-using-neo4j-e69d3441d843
欢迎关注ATYUN官方公众号
商务合作及内容投稿请联系邮箱:bd@atyun.com
评论 登录
热门职位
Maluuba
20000~40000/月
Cisco
25000~30000/月 深圳市
PilotAILabs
30000~60000/年 深圳市
写评论取消
回复取消