使用LlamaIndex工作流和Groq实现高级RAG

2024年09月19日 由 alex 发表 39 0

简介

人工智能应用由不同组件执行的各种任务组成。为了简化人工智能工程师的工作,开源框架为数据加载器、大型语言模型(LLM)、矢量数据库和重读器等基本要素提供了用户友好的抽象,并扩展到外部服务。这些框架还在探索协调这些组件的最有效方法,重点是如何让开发人员最直观、最高效地创建具有凝聚力的人工智能系统。


正在考虑的协调方法包括链和管道,两者都基于有向无环图(DAG)模型。


今年早些时候(2024 年),LlamaIndex 推出了我们的查询管道(Query Pipelines),这是一种声明式 API,旨在促进各种查询工作流的协调,适用于问题解答、结构化数据提取和自动化流程等应用。然而,当我们试图通过为更复杂的工作流整合周期来增强这一功能时,我们遇到了一些挑战。这促使我们重新考虑 DAG 模型对代理环境的适用性,并在我们的框架内探索其他解决方案。


基于图形的框架有什么局限性?

基于Graph/DAG 的方法,如 LangGraph(以及我们之前的查询管道),必须在引擎盖下进行大量繁重的工作,以确定 “下一步运行什么、输入是什么等”。所有这些逻辑都会带来大量的边缘情况,这在我们日益复杂的查询管道代码库中变得非常明显。


在有向无环图(DAG)中,无环性意味着不存在循环,这在日益代理化的人工智能应用中可能会造成限制。开发人员需要有能力在组件产生不良结果时实施自我纠正机制。即使不包含循环,查询管道也面临着一些挑战:

  • 调试困难: 发现问题变得非常麻烦。
  • 执行不透明: 组件的执行流程不透明。
  • 复杂的协调: 协调器变得过于复杂,需要管理大量边缘情况。
  • 可读性问题: 复杂的管道难以理解。


当循环被添加到查询管道中时,这些开发人员体验问题更加严重。遇到的主要痛点包括:

  • 繁琐的逻辑: 核心协调逻辑,如 if-else 和 while 循环,使图的边缘变得杂乱无章,使其变得冗长。
  • 边缘案例处理: 管理可选值和默认值成了问题,因为不清楚参数是否会从上游节点传递。
  • 自然流程中断: 对于创建基于代理的系统的开发人员来说,用循环定义图形感觉很不自然。对 “代理 ”节点的明确传入和传出边的要求导致了与其他节点的冗长通信模式。


为了克服上述问题,LlamaIndex 提出了一种新的设计范式,即工作流(Workflows)。


什么是工作流?

它是一种基于事件驱动的步骤方式,用于控制应用程序的执行流程。在这里,应用程序被分为多个部分,称为步骤,由事件触发。


步骤就是字面上的 python 函数。它们可以是单行代码,也可以是多行复杂代码。


事件字面意思是状态的变化或任何我们可以注意到、观察到或记录下来的东西。


通过结合步骤和事件,我们可以创建任意复杂的流程,封装逻辑,使应用程序更易于维护和理解。


总之,工作流(以及一般的事件驱动编程)提供了一种更强大的解决方案。代码在引擎盖下要简单得多,因为现在完全由用户决定下一步运行什么、用户如何调试,并消除了很多其他方法(如我们的查询管道)所存在的 “黑盒 ”问题。


LlamaIndex 工作流相对于 LangGraph 的价值主张?

LlamaIndex 工作流提供了一些独特的特性和功能,使其有别于 LangGraph:

1. 数据连接器: 从各种本地来源和格式(如 API、PDF 和 SQL 数据库)摄取数据的工具。

2. 数据索引: 以中间表示形式构建数据,便于 LLM 使用,且性能良好。

3. 引擎: 用于以自然语言访问数据的不同类型引擎:

  • 查询引擎: 用于问题解答界面(如 RAG 管道)。
  • 聊天引擎: 用于多信息交互的对话界面。

4. 代理: 由工具(包括简单的辅助功能和 API 集成)增强的 LLM 驱动的知识工作者。ReActAgent 实现允许定义一组工具,可以是 Python 函数或 LlamaIndex 查询引擎,以便对数据进行自动推理。

5. 可观察性/评估: 用于对应用程序进行严格实验、评估和监控的集成。

6. LlamaCloud 用于数据解析、摄取、索引和检索的托管服务,包括 LlamaParse(最先进的文档解析解决方案)。

7. 社区和生态系统: 强大的社区和相关项目,如用于定制数据连接器的 LlamaHub 和用于快速搭建项目脚手架的 create-llama。

8. 集成灵活性: 允许启动和自定义构建。用户可以从 llama-index 软件包开始进行快速设置,也可以使用 llama-index-core 从 LlamaHub 添加特定的集成,LlamaHub 提供 300 多个集成软件包。

9. 高级检索/查询界面: 提供高级界面,用于输入 LLM 输入提示和检索上下文及知识增强输出。

10. 便于初学者和高级用户使用: 为初学者提供高级应用程序接口,只需几行代码即可获取和查询数据,同时也为高级用户提供低级应用程序接口,以便定制和扩展模块。

11. 代理组件: 核心模块能够对数据的不同用例进行自动推理,本质上就是代理。例如,用于多文档分析、查询转换、路由和 LLM 重排的 SubQuestionQueryEngine。

12. 本地 OpenAIAgent: 包括一个基于 OpenAI API 的 OpenAIAgent 实现,用于函数调用,允许快速开发代理。

13. 与其他框架集成: 可作为 LangChain 和 ChatGPT 等其他代理框架的工具,提供深度集成和附加功能。


这些功能共同使 LlamaIndex 成为一个全面的框架,用于使用 LLM 构建上下文增强的生成式人工智能应用。


设计工作流程

在此,我们将使用 LlamaIndex 工作流程实现高级 RAG 系统

  1. 索引数据,创建索引
  2. 使用索引和查询检索相关文本块
  3. 使用原始查询对检索到的文本块重新排名
  4. 合成最终响应


2


用于实施的技术堆栈


3


代码执行

1. 安装所需的依赖项


pip install -qU llama-index 
pip install -qU llama-index-llms-groq
pip install -qU llama-index-embeddings-huggingface
pip install -qU lama-index-utils-workflow
pip install python-dotenv
pip install pyvis


2. 创建 .env 文件保存 api 密钥


GROQ_API_KEY = <your api key>


3. 设置 Groq_API_Key


import os
from dotenv import load_dotenv
load_dotenv()  # take environment variables from .env.
os.getenv("GROQ_API_KEY")


4. 导入所需的依赖项


from llama_index.core import VectorStoreIndex
from llama_index.core.schema import NodeWithScore
from llama_index.core.response_synthesizers import CompactAndRefine
from llama_index.core import SimpleDirectoryReader
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.response_synthesizers import CompactAndRefine
from llama_index.core.postprocessor.llm_rerank import LLMRerank
from llama_index.core.workflow import (
    Context,
    Workflow,
    StartEvent,
    StopEvent,
    step,
    Event
)
from llama_index.core.workflow.utils import get_steps_from_class, get_steps_from_instance
from llama_index.llms.groq import Groq
from llama_index.embeddings.huggingface import HuggingFaceEmbedding


5. 设置工作流程事件。


为了处理这些步骤,我们需要定义几个事件:

  1. 将检索到的节点传递给重排序器的事件
  2. 将重新排序的节点传递给合成器的事件


其他步骤将使用内置的 StartEvent 和 StopEvent 事件。


class RetrieverEvent(Event):
    """Result of running retrieval"""
    nodes: list[NodeWithScore]
class RerankEvent(Event):
    """Result of running reranking on retrieved nodes"""
    nodes: list[NodeWithScore]


6. 设置工作流程


class RAGWorkflow(Workflow):
    @step
    async def ingest(self, ctx: Context, ev: StartEvent) -> StopEvent | None:
        """Entry point to ingest a document, triggered by a StartEvent with `dirname`."""
        dirname = ev.get("dirname")
        if not dirname:
            return None
        documents = SimpleDirectoryReader(dirname).load_data()
        index = VectorStoreIndex.from_documents(
            documents=documents,
            embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
        )
        return StopEvent(result=index)
    @step
    async def retrieve(
        self, ctx: Context, ev: StartEvent
    ) -> RetrieverEvent | None:
        "Entry point for RAG, triggered by a StartEvent with `query`."
        query = ev.get("query")
        index = ev.get("index")
        if not query:
            return None
        print(f"Query the database with: {query}")
        # store the query in the global context
        await ctx.set("query", query)
        # get the index from the global context
        if index is None:
            print("Index is empty, load some documents before querying!")
            return None
        retriever = index.as_retriever(similarity_top_k=2)
        nodes = await retriever.aretrieve(query)
        print(f"Retrieved {len(nodes)} nodes.")
        return RetrieverEvent(nodes=nodes)
    @step
    async def rerank(self, ctx: Context, ev: RetrieverEvent) -> RerankEvent:
        # Rerank the nodes
        ranker = LLMRerank(
            choice_batch_size=5, top_n=3, 
            llm=Groq(model="llama-3.1-70b-versatile")
        )
        print(await ctx.get("query", default=None), flush=True)
        new_nodes = ranker.postprocess_nodes(
            ev.nodes, query_str=await ctx.get("query", default=None)
        )
        print(f"Reranked nodes to {len(new_nodes)}")
        print(new_nodes)
        return RerankEvent(nodes=new_nodes)
    @step
    async def synthesize(self, ctx: Context, ev: RerankEvent) -> StopEvent:
        """Return a streaming response using reranked nodes."""
        llm = Groq(model="llama-3.1-70b-versatile")
        summarizer = CompactAndRefine(llm=llm, streaming=True, verbose=True)
        query = await ctx.get("query", default=None)
        response = await summarizer.asynthesize(query, nodes=ev.nodes)
        return StopEvent(result=response)


7. 检查步骤是否正确实例化


# Check if steps have __step_config attributeCheck if steps have __step_config attribute
workflow = RAGWorkflow()
steps = get_steps_from_class(RAGWorkflow)
if not steps:
    steps = get_steps_from_instance(workflow)
print(f"steps class :{steps}")
for step_name, step_func in steps.items():
    step_config = getattr(step_func, "__step_config", None)
    print(f"step config :{step_config}")
    if step_config is None:
        print(f"Step {step_name} is missing __step_config")


回复


steps class :{'_done': <function Workflow._done at 0x000001BD6E5F3880>, 'ingest': <function RAGWorkflow.ingest at 0x000001BD07DEAB60>, 'rerank': <function RAGWorkflow.rerank at 0x000001BD07DEA160>, 'retrieve': <function RAGWorkflow.retrieve at 0x000001BD07DEA5C0>, 'synthesize': <function RAGWorkflow.synthesize at 0x000001BD07DEA0C0>}class :{'_done': <function Workflow._done at 0x000001BD6E5F3880>, 'ingest': <function RAGWorkflow.ingest at 0x000001BD07DEAB60>, 'rerank': <function RAGWorkflow.rerank at 0x000001BD07DEA160>, 'retrieve': <function RAGWorkflow.retrieve at 0x000001BD07DEA5C0>, 'synthesize': <function RAGWorkflow.synthesize at 0x000001BD07DEA0C0>}
step config :accepted_events=[<class 'llama_index.core.workflow.events.StopEvent'>] event_name='ev' return_types=[<class 'NoneType'>] context_parameter='ctx' num_workers=1 requested_services=[]
step config :accepted_events=[<class 'llama_index.core.workflow.events.StartEvent'>] event_name='ev' return_types=[<class 'llama_index.core.workflow.events.StopEvent'>] context_parameter='ctx' num_workers=1 requested_services=[]
step config :accepted_events=[<class '__main__.RetrieverEvent'>] event_name='ev' return_types=[<class '__main__.RerankEvent'>] context_parameter='ctx' num_workers=1 requested_services=[]
step config :accepted_events=[<class 'llama_index.core.workflow.events.StartEvent'>] event_name='ev' return_types=[<class '__main__.RetrieverEvent'>] context_parameter='ctx' num_workers=1 requested_services=[]
step config :accepted_events=[<class '__main__.RerankEvent'>] event_name='ev' return_types=[<class 'llama_index.core.workflow.events.StopEvent'>] context_parameter='ctx' num_workers=1 requested_services=[]


8. 调用工作流程并实现可视化


import nest_asyncio
nest_asyncio.apply()
# Visualization
from llama_index.utils.workflow import draw_all_possible_flows, draw_most_recent_execution
# Draw all possible flows
draw_all_possible_flows(RAGWorkflow, filename="multi_step_workflow.html")
# Draw the most recent execution
w = RAGWorkflow()
# Ingest the documents
index = await w.run(dirname="Data")
result = await w.run(query="What is Fibromyalgia?", index=index)
async for chunk in result.async_response_gen():
    print(chunk, end="", flush=True)
draw_most_recent_execution(w, filename="rag_flow_recent.html")


回复


multi_step_workflow.htmlhtml
Query the database with: What is Fibromyalgia?
Retrieved 2 nodes.
What is Fibromyalgia?
Reranked nodes to 2
[NodeWithScore(node=TextNode(id_='abde4b4c-b787-4003-acf5-2f5bd05d867c', embedding=None, metadata={'page_label': '137', 'file_name': 'fibromyalgia.pdf', 'file_path': 'c:\\Users\\PLNAYAK\\Documents\\workflow\\Data\\fibromyalgia.pdf', 'file_type': 'application/pdf', 'file_size': 632664, 'creation_date': '2024-09-09', 'last_modified_date': '2024-09-09'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='7b100860-f0b3-445b-b5d6-21f021d8c3c0', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_label': '137', 'file_name': 'fibromyalgia.pdf', 'file_path': 'c:\\Users\\PLNAYAK\\Documents\\workflow\\Data\\fibromyalgia.pdf', 'file_type': 'application/pdf', 'file_size': 632664, 'creation_date': '2024-09-09', 'last_modified_date': '2024-09-09'}, hash='65d90d8fae093e6a574c784e808701ce0596d595e3e6136f7f0b4a70be8d2b57'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='7edbd742-55e9-4559-9e9f-55d8688c6e62', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='08366d434882c9ea475468bcff6e0fe183d4192c93617ec3cf3700cf03fd5a65')}, text='February 2023  ◆ Volume 107, Number 2  www.aafp.org/afp  American Family Physician  137Fibromyalgia is characterized by diffuse mus -\nculoskeletal pain, fatigue, poor sleep, and other \nsomatic symptoms.1 Chronic diffuse pain affects \n10% to 15% of adults in the general population worldwide, many of whom have fibromyalgia.\n2,3 \nApproximately 2% of people in the United States have fibromyalgia, although the prevalence var -\nies across populations and with the diagnostic criteria used.\n3 Fibromyalgia can occur in chil -\ndren and adults and is found worldwide and across cultures. Women are diagnosed more \nfrequently than men;   a Scot tish survey found \nthat women are diagnosed between two and 14 times as often as men depending on the crite -\nria used.\n3,4 Changes in the diagnostic criteria over the past decade, including the elimination of specific tender points, have resulted in more patients with chronic pain meeting the criteria for fibromyalgia.\n3-5\nPathophysiology\nFibromyalgia is likely caused by disordered cen -\ntral nociceptive signal processing that leads to sensitization expressed as hyperalgesia and allo -\ndynia, which is similar to chronic pain conditions such as irritable bowel syndrome, interstitial cys -\ntitis, chronic pelvic pain, and chronic low back pain.\n6,7 Functional brain imaging suggests that \nthis aberrant processing may be attributed to an imbalance between excitatory and inhibitory neu -\nrotransmitters, particularly within the insula.\n8 \nSuggested etiologies include dysfunction of the hypothalamic-pituitary-adrenal axis and the autonomic nervous system, diffuse inflammation, glial cell activation, small fiber neuropathy, and infections such as the Epstein-Barr virus, Lyme disease, and viral hepatitis.\n9 Twin studies suggest \na genetic component may also be a factor.10Fibromyalgia:   Diagn osis and Management\nBradford T. Winslow, MD, University of Colorado School of Medicine, Aurora, \nColorado;   Swedi sh Family Medicine Residency, Englewood, Colorado\nCarmen Vandal, MD, and Laurel Dang, MD, Swedish Family Medicine Residency, Englewood, Colorado\n CME  This clinical content conforms to AAFP criteria for \nCME. See CME Quiz on page 127.\nAuthor disclosure:   No relevant financial relationships.\nPatient information:   A handout on this topic, written by the \nauthors of this article, is available with the online version of \nthis article.Fibromyalgia is a chronic, centralized pain syndrome characterized by disordered processing of painful stimuli. Fibromyal -\ngia is diagnosed more frequently in women and occurs globally, affecting 2% of people in the United States. Patients with \nfibromyalgia have diffuse chronic pain, poor sleep, fatigue, cognitive dysfunc -\ntion, and mood disturbances. Comorbid conditions, such as functional somatic syndromes, psychiatric diagnoses, and rheumatologic conditions may be pres -\nent. The Fibromyalgia Rapid Screening Tool is a helpful screening method for patients with diffuse chronic pain. The American College of Rheumatology criteria or the Analgesic, Anesthetic, and Addiction Clinical Trial Translations Innovations Opportunities and Networks–American Pain Society Pain Taxonomy diagnostic criteria can diagnose fibromyalgia. Establishing the diagnosis and providing education can reassure patients and decrease unnecessary testing. A multidisciplinary approach that incorporates nonpharmacologic therapies and medications to address problematic symptoms is most effective. Patient educa -\ntion, exercise, and cognitive behavior therapy can improve pain and function. Duloxetine, milnacipran, pregabalin, and amitriptyline are potentially effective medications for fibromyalgia. Nonsteroi -\ndal anti-inflammatory drugs and opioids have not demonstrated benefits for fibromyalgia and have significant limitations.  \n(Am Fam Physician\n. 2023;  107(2):  137-1 44. Copyright © 2023 American Academy of Family Physicians.)\nIllustration by Jonathan Dimes\nDownloaded from the American Family Physician website at www.aafp.org/afp. Copyright © 2023  American Academy of Family Physicians. For the private, non -\ncommercial use of one individual user of the website. All other rights reserved. Contact copyrights@aafp.org for copyright questions and/or permission requests.Downloaded from the American Family Physician website at www.aafp.org/afp. Copyright © 2023  American Academy of Family Physicians.', mimetype='text/plain', start_char_idx=0, end_char_idx=4397, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=10.0), NodeWithScore(node=TextNode(id_='0a680a09-1f8d-409e-bbdc-b562b4879f6f', embedding=None, metadata={'page_label': '138', 'file_name': 'fibromyalgia.pdf', 'file_path': 'c:\\Users\\PLNAYAK\\Documents\\workflow\\Data\\fibromyalgia.pdf', 'file_type': 'application/pdf', 'file_size': 632664, 'creation_date': '2024-09-09', 'last_modified_date': '2024-09-09'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='f2591e71-4fd5-48ef-8c08-ab465fdabf88', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_label': '138', 'file_name': 'fibromyalgia.pdf', 'file_path': 'c:\\Users\\PLNAYAK\\Documents\\workflow\\Data\\fibromyalgia.pdf', 'file_type': 'application/pdf', 'file_size': 632664, 'creation_date': '2024-09-09', 'last_modified_date': '2024-09-09'}, hash='e0f0653ead2af78ad773abacd139db824a7e40216476d9aa63077b83a7370686')}, text='138  American Family Physician  www.aafp.org/afp  Volume 107, Number 2  ◆ February 2023\nFIBROMYALGIAClinical Presentation\nChronic diffuse pain is the predominant \nsymptom in most patients with fibromyalgia. Patients may also experience muscle stiffness and tenderness. The physical examination in patients with fibromyalgia generally finds diffuse tenderness without other unusual findings. If joint swelling, inflammation, or deformities are present, an alternative or additional diagnosis should be investigated.\n5 \nFatigue and sleep disturbances are also com -\nmon.5,11 Sleep disturbances include difficulty \nfalling and staying asleep, frequent awaken -\nings, or feeling unrefreshed after sleeping. Comorbid mental health diagnoses are com -\nmon, as are cognitive symptoms such as poor concentration, forgetfulness, or altered think -\ning.\n5,6,12 This cognitive dysfunction has been \ntermed “fibrofog” and is described by patients as a mental slowing that adversely affects daily activities.\n13\nThe presence of another painful disorder \ndoes not exclude the diagnosis of fibromyal -\ngia. The Fibromyalgia Rapid Screening Tool can screen patients with diffuse chronic pain to help distinguish between fibromyalgia and other conditions (Table 1) .\n14 The tool may SORT:   KEY RECOMMENDATIONS FOR PRACTICE\nClinical recommendationEvidence \nrating Comments\nThe diagnosis of fibromyalgia should be considered in patients \nwith diffuse pain, fatigue, and sleep disturbances that have been present for at least three months.\n5,11C Diagnosis of fibromyalgia can be made using AAPT 2019 diagnostic criteria or the American College of Radiology 2011/2016 criteria\nPatients with fibromyalgia should be offered a multidisci -\nplinary treatment approach that includes education, exercise, and nonpharmacologic and pharmacologic options.\n27,28C Consensus guidelines and systematic reviews\nCognitive behavior therapy leads to improvement in pain and disability in patients with fibromyalgia in the short and medium term.\n32,34,35A Systematic reviews demonstrate improvement\nAmitriptyline, cyclobenzaprine, duloxetine (Cymbalta), mil -\nnacipran (Savella), and pregabalin (Lyrica) are effective for pain in fibromyalgia.\n43,46-48,50,52,54A Systematic reviews demonstrate effectiveness of these medications\nAAPT = Analgesic, Anesthetic, and Addiction Clinical Trial Translations Innovations Opportunities and Networks–American Pain Society Pain \nTaxonomy.\nA = consistent, good-quality patient-oriented evidence;   B = inconsistent or limited-quality patient-oriented evidence;   C = consensus, disease -\noriented evidence, usual practice, expert opinion, or case series. For information about the SORT evidence rating system, go to https://  www.aafp.\norg/afpsort.\nTABLE 1\nFibromyalgia Rapid Screening Tool (FiRST)\n Yes\nI have pain all over my body.  \nMy pain is accompanied by a continuous and very unpleas -\nant general fatigue. \nMy pain feels like burns, electric shocks, or cramps.  \nMy pain is accompanied by other unusual sensations \nthroughout my body, such as pins and needles, tingling, or numbness. \nMy pain is accompanied by other health problems such as digestive problems, urinary problems, headaches, or restless legs. \nMy pain has a significant impact on my life, particularly on my sleep and my ability to concentrate, making me feel slower in general. \nTotal*  \n*—One point for each yes answer. A score of 5 or greater suggests fibromyalgia.\nAdapted with permission from Perrot S, Bouhassira D, Fermanian J;   CEDR (C ercle \nd’Etude de la Douleur en Rhumatologie). Development and validation of the Fibro -\nmyalgia Rapid Screening Tool (FiRST). Pain. 2010;  150(2):  255.\nDescargado para Boletin -BINASSS (bolet-binas@binasss.sa.cr) en National Library of Health and Social Security de ClinicalKey.es por Elsevier en marzo 24, \n2023. Para uso personal exclusivamente. No se permiten otros usos sin autorización. Copyright ©2023. Elsevier Inc. Todos los derechos reservados.', mimetype='text/plain', start_char_idx=0, end_char_idx=3975, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=8.0)]
Fibromyalgia is a chronic, centralized pain syndrome characterized by disordered processing of painful stimuli. It is characterized by diffuse musculoskeletal pain, fatigue, poor sleep, and other somatic symptoms.rag_flow_recent.html


工作流程可视化


4


工作流程观察:

  • 我们有两个入口点(接受 StartEvent 的步骤)
  • 步骤本身决定何时运行
  • 工作流上下文用于存储用户查询
  • 节点被传递,最后返回流式响应


RAG工作流程 :

  • 工作流由 StartEvent 触发。
  • StartEvent 会触发摄取步骤,将文档索引、分块并加载到 VectorStore 中。
  • 摄取步骤返回生成的索引,该步骤由 StopEvent 关闭。
  • 然后,StartEvent 以查询和索引作为输入,触发检索步骤。
  • 检索步骤触发 RetrieverEvent,返回与查询匹配的节点。
  • RetrieverEvent 会触发 rerank 步骤,将匹配的节点作为输入。
  • 重排步骤会根据匹配节点与查询的接近程度对其重新排序,并触发 RerankEvent。
  • 然后,RerankEvent 会触发合成步骤,根据重新排序的节点生成最终响应。


可视化当前执行工作流程


5


确保 Colab 中安装了所有必需的软件包。


!pip install pyvis llama-index-core llama-index-llms-openai llama-index-embeddings-openai llama-index-readers-file llama-index-utils-workflow
#Check HTML Rendering: Use the following approach to render the HTML content in Colab:Check HTML Rendering: Use the following approach to render the HTML content in Colab:
from pyvis.network import Network
from IPython.core.display import display, HTML
def draw_all_possible_flows(
    workflow: Workflow,
    filename: str = "workflow_all_flows.html",
    notebook: bool = True,  # Set notebook to True
) -> None:
  
    net = Network(directed=True, height="750px", width="100%")
    # Add the nodes + edge for stop events
    net.add_node(
        StopEvent.__name__,
        label=StopEvent.__name__,
        color="#FFA07A",
        shape="ellipse",
    )
    net.add_node("_done", label="_done", color="#ADD8E6", shape="box")
    net.add_edge(StopEvent.__name__, "_done")
    # Add nodes from all steps
    steps = get_steps_from_class(workflow)
    if not steps:
        # If no steps are defined in the class, try to get them from the instance
        steps = get_steps_from_instance(workflow)
    step_config: Optional[StepConfig] = None
    for step_name, step_func in steps.items():
        step_config = getattr(step_func, "__step_config", None)
        if step_config is None:
            continue
        net.add_node(
            step_name, label=step_name, color="#ADD8E6", shape="box"
        )  # Light blue for steps
        for event_type in step_config.accepted_events:
            net.add_node(
                event_type.__name__,
                label=event_type.__name__,
                color="#90EE90" if event_type != StartEvent else "#E27AFF",
                shape="ellipse",
            )  # Light green for events
    # Add edges from all steps
    for step_name, step_func in steps.items():
        step_config = getattr(step_func, "__step_config", None)
        if step_config is None:
            continue
        for return_type in step_config.return_types:
            if return_type != type(None):
                net.add_edge(step_name, return_type.__name__)
        for event_type in step_config.accepted_events:
            net.add_edge(event_type.__name__, step_name)
    if notebook:
        net.show(filename, notebook=True)
        with open(filename, "r") as file:
            display(HTML(file.read()))
    else:
        net.show(filename)
# Example usage in Google Colab
draw_all_possible_flows(
    RAGWorkflow, filename="multi_step_workflow.html", notebook=True
)


结论

随着人工智能应用向更高的复杂性和代理功能发展,传统有向无环图(DAG)的局限性日益明显。无法加入循环限制了开发人员实施自我校正机制的能力,而自我校正机制对于维护人工智能系统的稳健性和适应性至关重要。

文章来源:https://medium.com/the-ai-forum/implementing-advanced-rag-using-llamaindex-workflow-and-groq-bd6047299fa5
欢迎关注ATYUN官方公众号
商务合作及内容投稿请联系邮箱:bd@atyun.com
评论 登录
热门职位
Maluuba
20000~40000/月
Cisco
25000~30000/月 深圳市
PilotAILabs
30000~60000/年 深圳市
写评论取消
回复取消