【指南】如何构建通用的LLM代理

2024年12月09日由 alex 发表 423 0

为什么要构建通用代理？因为它是一个极佳的工具，可用于原型化你的用例，并为设计你自己的定制代理架构奠定基础。

在我们深入探讨之前，先简要介绍一下LLM代理。

什么是LLM代理？

LLM代理是一个程序，其执行逻辑由其底层模型控制。

LLM代理与少样本提示或固定工作流程等方法的不同之处在于，它能够定义并调整执行用户查询所需的步骤。在获得一组工具（如代码执行或网络搜索）的访问权限后，代理可以决定使用哪个工具、如何使用它，并根据输出对结果进行迭代。这种适应性使系统能够以最小的配置处理多样化的用例。

代理架构存在于一个范围内，从固定工作流程的可靠性到自主代理的灵活性不等。例如，像检索增强生成（RAG）这样的固定流程可以通过加入自我反思循环来增强，使程序在初始响应不足时能够进行迭代。或者，也可以为ReAct代理配备固定流程作为工具，从而提供一种灵活但结构化的方法。架构的选择最终取决于用例以及可靠性和灵活性之间的期望权衡。

从零开始构建一个通用LLM代理

第一步：选择正确的LLM

选择正确的模型对于实现你期望的性能至关重要。有几个因素需要考虑，如许可、成本和语言支持。构建LLM代理时最重要的考虑因素是模型在编码、工具调用和推理等关键任务上的表现。评估基准包括：

大规模多任务语言理解（MMLU）（推理）
伯克利函数调用排行榜（工具选择和工具调用）
HumanEval和BigCodeBench（编码）

另一个关键因素是模型的上下文窗口。代理工作流程可能会消耗大量标记——有时达到10万或更多——因此，更大的上下文窗口非常有帮助。

一般来说，更大的模型往往提供更好的性能，但能够在本地运行的小型模型仍然是一个不错的选择。使用小型模型时，你将受限于更简单的用例，并且可能只能将你的代理连接到一个或两个基本工具。

第二步：定义代理的控制逻辑（即通信结构）

一个简单的LLM（大型语言模型）与代理之间的主要区别在于系统提示。

期望LLM表现出的代理行为可以在系统提示中进行编码。

以下是一些常见的代理模式，你可以根据自己的需求进行定制：

工具使用：代理决定何时将查询路由到适当的工具，或者依赖其自身的知识来回答。
反思：代理在回应用户之前会审查和纠正其答案。大多数LLM系统中也可以添加一个反思步骤。
先思考后行动（ReAct）：代理通过迭代思考来解决查询问题，执行一个动作，观察结果，并决定是否需要采取另一个动作或提供回应。
先计划后执行：代理通过（如果需要的话）将任务分解为子步骤来提前规划，然后执行每个步骤。

最后两种模式——ReAct和先计划后执行——通常是构建通用单代理的最佳起点。

为了有效地实现这些行为，你需要进行一些提示工程。你可能还想使用结构化生成技术。这基本上意味着将LLM（大型语言模型）的输出塑造成符合特定格式或模式的样子，以便代理的响应与你期望的通信风格保持一致。

示例：以下是来自Bee Agent Framework的ReAct风格代理的系统提示摘录。

# Communication structure
You communicate only in instruction lines. The format is: "Instruction: expected output". You must only use these instruction lines and must not enter empty lines or anything else between instruction lines.
You must skip the instruction lines Function Name, Function Input and Function Output if no function calling is required.
Message: User's message. You never use this instruction line.
Thought: A single-line plan of how to answer the user's message. It must be immediately followed by Final Answer.
Thought: A single-line step-by-step plan of how to answer the user's message. You can use the available functions defined above. This instruction line must be immediately followed by Function Name if one of the available functions defined above needs to be called, or by Final Answer. Do not provide the answer here.
Function Name: Name of the function. This instruction line must be immediately followed by Function Input.
Function Input: Function parameters. Empty object is a valid parameter.
Function Output: Output of the function in JSON format.
Thought: Continue your thinking process.
Final Answer: Answer the user or ask for more information or clarification. It must always be preceded by Thought.
## Examples
Message: Can you translate "How are you" into French?
Thought: The user wants to translate a text into French. I can do that.
Final Answer: Comment vas-tu?

第三步：定义代理的核心指令

我们通常认为LLM（大型语言模型）开箱即用就具备许多功能。其中一些功能很棒，但其他可能并不完全是我们需要的。为了获得你所期望的性能，重要的是在系统提示中明确列出你想要和不想要的所有功能。

这可能包括如下指令：

代理名称和角色：代理的称呼以及它的预期用途。
语气和简洁性：它应该听起来多正式或多随意，以及它应该多简洁。
何时使用工具：决定何时依赖外部工具，何时依赖模型自身的知识。
错误处理：当工具或流程出现问题时，代理应该怎么做。

示例：以下是Bee Agent Framework中指令部分的一个片段。

# Instructions
User can only see the Final Answer, all answers must be provided there.
You must always use the communication structure and instructions defined above. Do not forget that Thought must be a single-line immediately followed by Final Answer.
You must always use the communication structure and instructions defined above. Do not forget that Thought must be a single-line immediately followed by either Function Name or Final Answer.
Functions must be used to retrieve factual or historical information to answer the message.
If the user suggests using a function that is not available, answer that the function is not available. You can suggest alternatives if appropriate.
When the message is unclear or you need more information from the user, ask in Final Answer.
# Your capabilities
Prefer to use these capabilities over functions.
- You understand these languages: English, Spanish, French.
- You can translate and summarize, even long documents.
# Notes
- If you don't know the answer, say that you don't know.
- The current time and date in ISO format can be found in the last message.
- When answering the user, use friendly formats for time and date.
- Use markdown syntax for formatting code snippets, links, JSON, tables, images, files.
- Sometimes, things don't go as planned. Functions may not provide useful information on the first few tries. You should always try a few different approaches before declaring the problem unsolvable.
- When the function doesn't give you what you were asking for, you must either use another function or a different function input.
  - When using search engines, you try different formulations of the query, possibly even in a different language.
- You cannot do complex calculations, computations, or data manipulations without using functions.m

第四步：定义并优化你的核心工具

工具是赋予你的代理超能力的关键。通过一组定义明确的窄范围工具，你可以实现广泛的功能。需要包含的关键工具包括代码执行、网络搜索、文件读取和数据分析。

对于每个工具，你需要定义以下内容，并将其作为系统提示的一部分：

工具名称：为该功能提供一个独特且描述性的名称。
工具描述：清楚解释工具的功能以及何时使用它。这有助于代理确定何时选择正确的工具。
工具输入模式：一个概述必需和可选参数、它们的类型以及任何约束的模式。代理根据用户的查询使用此模式来填写所需的输入。
工具运行位置/方式的指示：指出在哪里/如何运行该工具。

示例：以下是来自Langchain Community的Arxiv工具实现的摘录。此实现需要ArxivAPIWrapper实现。

class ArxivInput(BaseModel):
    """Input for the Arxiv tool."""
    query: str = Field(description="search query to look up")

class ArxivQueryRun(BaseTool):  # type: ignore[override, override]
    """Tool that searches the Arxiv API."""
    name: str = "arxiv"
    description: str = (
        "A wrapper around Arxiv.org "
        "Useful for when you need to answer questions about Physics, Mathematics, "
        "Computer Science, Quantitative Biology, Quantitative Finance, Statistics, "
        "Electrical Engineering, and Economics "
        "from scientific articles on arxiv.org. "
        "Input should be a search query."
    )
    api_wrapper: ArxivAPIWrapper = Field(default_factory=ArxivAPIWrapper)  # type: ignore[arg-type]
    args_schema: Type[BaseModel] = ArxivInput
    def _run(
        self,
        query: str,
        run_manager: Optional[CallbackManagerForToolRun] = None,
    ) -> str:
        """Use the Arxiv tool."""
        return self.api_wrapper.run(query)p

在某些情况下，你需要优化工具以获得所需的性能。这可能涉及通过一些提示工程技术来调整工具名称或描述，设置高级配置以处理常见错误，或过滤工具的输出。

第五步：确定内存处理策略

LLM（大型语言模型）受其上下文窗口的限制，即它们一次能够“记住”的标记数量有限。在多轮对话中，过去的交互、冗长的工具输出或代理所依赖的额外上下文等信息可能会迅速填满这个内存。因此，拥有一个可靠的内存处理策略至关重要。

常见的内存处理策略包括：

滑动内存：保留最后k轮对话，并丢弃更旧的对话。
标记内存：保留最后n个标记，并忘记其余部分。
总结内存：在每轮对话中使用LLM来总结对话，并丢弃单个消息。

此外，你还可以让LLM检测要存储在长期记忆中的关键时刻。这允许代理“记住”关于用户的重要事实，从而使体验更加个性化。

到目前为止，我们所涵盖的五个步骤为设置代理奠定了基础。但如果在这个阶段我们通过LLM运行用户查询，会发生什么？

以下是一个示例：

User Message: Extract key insighs from this dataset
Files: bill-of-materials.csv
Thought: First, I need to inspect the columns of the dataset and provide basic data statistics.
Function Name: Python
Function Input: {"language":"python","code":"import pandas as pd\n\ndataset = pd.read_csv('bill-of-materials.csv')\n\nprint(dataset.columns)\nprint(dataset.describe())","inputFiles":["bill-of-materials.csv"]}
Function Output:

此时，代理会产生原始文本输出。那么，我们如何让它实际执行下一步呢？这就需要解析和编排的介入。

第六步：解析代理的原始输出

解析器是一个函数，用于将原始数据转换为你的应用程序能够理解和处理的格式（如带有属性的对象）。

对于我们正在构建的代理，解析器需要识别我们在第二步中定义的通信结构，并返回结构化输出，如JSON。这使得应用程序更容易处理和执行代理的下一步。

第七步：编排代理的下一步

最后一步是设置编排逻辑。这决定了在LLM输出结果后会发生什么。根据输出，你将执行以下操作之一：

执行工具调用，或
返回答案——对用户查询的最终响应或对更多信息的后续请求。

如果触发了工具调用，该工具的输出将被发送回LLM（作为其工作内存的一部分）。然后，LLM将确定如何处理这条新信息：是执行另一个工具调用，还是向用户返回答案。

以下是这段编排逻辑在代码中的示例：

def orchestrator(llm_agent, llm_output, tools, user_query):
    """
    Orchestrates the response based on LLM output and iterates if necessary.
    Parameters:
    - llm_agent (callable): The LLM agent function for processing tool outputs.
    - llm_output (dict): Initial output from the LLM, specifying the next action.
    - tools (dict): Dictionary of available tools with their execution methods.
    - user_query (str): The original user query.
    Returns:
    - str: The final response to the user.
    """
    while True:
        action = llm_output.get("action")
        if action == "tool_call":
            # Extract tool name and parameters
            tool_name = llm_output.get("tool_name")
            tool_params = llm_output.get("tool_params", {})
            if tool_name in tools:
                try:
                    # Execute the tool
                    tool_result = tools[tool_name](**tool_params)
                    # Send tool output back to the LLM agent for further processing
                    llm_output = llm_agent({"tool_output": tool_result})
                except Exception as e:
                    return f"Error executing tool '{tool_name}': {str(e)}"
            else:
                return f"Error: Tool '{tool_name}' not found."
        elif action == "return_answer":
            # Return the final answer to the user
            return llm_output.get("answer", "No answer provided.")
        else:
            return "Error: Unrecognized action type from LLM output."

你现在拥有了一个能够处理多种用例的系统，从竞争分析和高级研究到自动化复杂工作流程。

文章来源：https://medium.com/towards-data-science/build-a-general-purpose-ai-agent-c40be49e7400

标签：

LLM 人工智能

0 评论

欢迎关注ATYUN官方公众号

商务合作及内容投稿请联系邮箱:bd@atyun.com

上一篇通过小波变换实现高效时频分析

下一篇【指南】可视化套索和弹性网络回归

评论登录

要发表评论，您必须先登录。

jonatasgrosman/wav2vec2-large-xlsr-53-english facebook/dino-vitb16 bert-base-uncased xlm-roberta-large xlm-roberta-base gpt2 microsoft/resnet-50 facebook/dino-vits8

AGENTIC AI如何塑造未来