在这篇文章和开放源代码库中,我们希望通过一个多代理系统来演示法律文件条款的智能导航,该系统利用词法图(文件层次结构)和多图多代理工作流中的块链接来实现监管文件的智能导航。我们在这里使用的堆栈包括 Reducto.AI、WhyHow.AI、Langgraph 和 LlamaIndex。
我们在文件(尤其是法律文件)中遇到的一些非常具体的问题是,需要为文件中的不同条款建立文件层次结构。这是因为条款有时会提到其他条款,以获得完整的含义和上下文。
为了获得完整的上下文,你必须递归地浏览和检索任何被提及的条款(甚至脚注!),浏览文档层次图以找到被提及的条款,查看是否有任何其他条款被提及,然后重复进行。递归检索可以在法律文档之外的一系列其他文档元素上进行,包括页码、多模态数据(如图像)、指向其他文档或外部数据的超链接等。
在本笔记本和示例中,我们将向你展示如何轻松创建法律文档中条款的文档层次图、法律文档中的定义图以及多图多代理递归检索流程,该流程可通过智能浏览文档层次图和定义图来回答问题,从而获得针对所提问题的完整上下文。
在这个例子中,我们以马来西亚中央银行发布的一份简单的监管文件为例,特别是这份管理各种组织合规规则的文件。
作为一个快速基准,当我们使用 GPT4o(内嵌代理推理功能)通过 ChatGPT 执行此操作时,我们得到了以下答案:
The management of control functions by the Board and the Chief Compliance
Officer (CCO) involves several key responsibilities as outlined in the
compliance document:
Responsibilities of the Board:
1. Oversight of Compliance Risk Management: The Board is responsible for
overseeing the management of compliance risk within the financial institution. This includes approving the compliance policy, establishing the compliance function, and appointing the CCO.
2. Approval and Support for the Compliance Function: The Board must
ensure that the compliance function and the CCO have the appropriate standing,
authority, and independence to carry out their duties effectively. The Board
must also ensure that the compliance function is adequately resourced.
3.Engagement with the CCO: The Board should engage with the CCO regularly to
discuss issues faced by the compliance function and ensure the CCO has direct
access to the Board.
4. Annual Evaluation: The Board must evaluate the effectiveness of the
institution’s management of compliance risk at least annually, considering
assessments from senior management, internal audit, and the CCO.
Responsibilities of the Chief Compliance Officer (CCO):
1. Coordination and Management of Compliance Risk: The CCO is responsible for
coordinating the identification, management, and monitoring of compliance risk
across the institution. This includes ensuring compliance monitoring and
testing are consistent across the organization.
2. Reporting and Advising: The CCO must report regularly to senior management
on the findings and analysis of compliance risks and must ensure that reports
are readily available to internal audit and regulatory authorities. The CCO
also advises the Board and senior management on legal and regulatory
requirements, keeping them informed of developments and their implications.
3. Independence: The CCO must maintain independence from business lines to
effectively carry out the role of a control function. This includes ensuring
that there is no conflict of interest in their responsibilities, reporting
lines, or remuneration.
4. Resources and Training: The CCO must ensure that the compliance function
is sufficiently resourced with officers who have the necessary qualifications
and experience. The CCO is also responsible for ensuring that adequate training
is provided to officers on relevant legal and regulatory requirements.
Interaction with Other Control Functions:
The compliance function should be organized in a way that allows effective
management of compliance risk, considering the size, nature, and complexity of
the financial institution’s operations. Where responsibilities are shared
between a dedicated compliance unit and other control functions,
clear documentation and coordination are required to ensure that the CCO can
perform their duties effectively without impairing independence or focus.
By fulfilling these responsibilities, the Board and the CCO can ensure that
the institution's compliance risks are managed effectively, supporting the
overall safety and soundness of the financial institution.
这个答案很有意思,但不完整,因为它:
这是可以理解的,因为条款、页面和页脚的递归检索并不明确属于管理 RAG 的典型语义相似性检索过程。
多图谱多代理工作流程摘要
创建图表
在本笔记本中,我们首先提取由 Reducto 文档摄取引擎解析的文档结构。文档结构将每个页面分解为不同的元素,如章节页眉、列表项或页脚。
然后,根据元素出现的顺序及其隐含的层次结构(例如,“章节标题 ”是 “列表项 ”的父级元素)将其组合起来。然后,我们分析文档中的链接,以识别可在词法图中建模的提取元素之间的联系。
然后,我们将这些块和三元组导入WhyHow的知识图谱工作室(Knowledge Graph Studio),并在此使用我们的SDK创建词法图谱。
我们还创建了文档的法律定义图。在法律文件中,每份文件都有一个独特的定义部分,帮助定义某些术语的具体解释方式。这可能因文件、使用情况和客户而异。在本案例中,文件第 4-5 页包含定义。这些文本被提取出来并传入 GPT-4o,GPT-4o 会提示逐字提取法律术语及其定义,并以结构化输出的形式返回。输出结果被转换成 CSV 文件,并使用 SDK 和预定义模式作为单独的图表上传。定义代理在需要时会调用该定义图,用特定的相关定义来增强上下文。在这种情况下,定义代理会在检索到初始条款后调用。
然后,我们将 WhyHow 中的节点导入笔记本,并使用 LlamaIndex 对节点信息进行本地索引,同时保留 WhyHow 生成的嵌入。我们结合使用了 LlamaIndex 的矢量、BM25 和关键词检索器。在法律文档的使用案例中,查询和检索过程需要精确的术语,而 BM25 和关键词检索器的加入则有助于实现这一目标。BM25 可帮助识别高度重复文本中的关键术语,而关键词检索器则可确保在需要时检索到重要术语,尽管这些术语并不经常出现。
LangGraph 被用于利用 WhyHow SDK 和 GPT-4o 围绕词法图构建多代理工作流。从本质上讲,当传递查询时,系统首先通过初始搜索代理搜索相关的向量块。在这种情况下,向量块就是条款或子条款。随后,定义代理(Definition Agent)会被调用,用相关定义来增强这些条款。然后,路由器代理会检测是否有需要参考的其他链接部分或脚注,如果有,则会检索适当的部分并加以考虑。如果随后检索到的条款(如这里的情况)涉及更多条款,递归检索代理将递归检索这些条款。
它检索的第一个条款是第 6.3 和 7.2 段。定义图被引用,以检查是否有任何应包括在定义部分的附加上下文。其中包括 “CCO ”和 “高级管理层 ”的附加定义。
第 6.3 段第 6.3.f 分段如下:
路由器代理根据其检索到的第一个条款的信息,帮助检测材料中是否有提及的条款或页脚。在本例中,脚注(脚注 3)与第一个条款相关联。然后,路由器代理触发页脚解析代理。脚注解析代理识别相关脚注,并返回以下脚注:
这里需要进行另一次遍历,使用递归检索代理遍历词法图并检索第 7.3 和 7.4 段中的语块/语段。
第 7.3 和 7.4 段中的新信息将被合并在一起。第 7.4(b)段提到了第 9.1 段:
第 7.4 段包含一个指向第 9.1 段的链接,路由器代理检测到该链接后,会指示递归检索代理在下一次传递时检索该链接。应答代理会跟踪所有接收到的信息,对其进行归纳总结,最终形成最终答案返回给用户。
所构建的最终答案反映了我们的理想流程,吸收了定义页面、第 6.3 和 7.2 段、页脚注释 3、第 7.3、7.4 和 9.1 段中的信息,并通过智能遍历吸收了所有相关信息,为用户提供了准确的摘要。
为了确保我们不是碰运气,我们运行了 3 次最终查询,结果显示成功检索到了相关信息。
To manage control functions effectively, the Board and the Chief Compliance
Officer (CCO) have distinct responsibilities that they must exercise:
Responsibilities of the Board:
- Approve critical decisions regarding the CCO, including appointment,
remuneration, and termination (Section 6.3(a)).
- Ensure the CCO has sufficient stature to engage effectively with senior
management (Section 6.3(b)).
- Regularly engage with the CCO to discuss compliance issues and consider
interactions without senior management present (Section 6.3(c)).
- Provide the CCO with unimpeded access to communicate with the board directly
(Section 6.3(d)).
- Support the CCO with adequate resources to perform duties effectively,
including competent staff (Section 6.3(e)).
- Satisfy themselves that combined responsibilities, if any, do not compromise
the control environment (Section 6.3(f)).
Responsibilities of the CCO:
- Coordinate the identification and management of institution-wide compliance
risks (Section 7.2(b)).
- Ensure consistent conduct of compliance monitoring and testing across the
organization (Section 7.2(b)).
- Maintain independence and sufficient focus on compliance duties, even when
tasked with additional control functions (Section 7.3).
Shared Responsibilities & Coordination:
- The board must approve any sharing of compliance function responsibilities
between the compliance unit and other control functions (Section 7.4(a)).
- Function responsibilities, including timely communication of issues, should
be well-defined and documented (Section 7.2(a)).
- Effective arrangements for coordination among control functions should be in
place to facilitate the CCO’s responsibilities (Section 7.2(d)).
- Compliance responsibilities must not compromise the separation of the
internal audit function (Section 9.1).
The board should ensure comprehensive oversight, and the CCO should maintain
effective coordination and communication across the organization to manage
control functions efficiently.
总之,通过这次练习,我们开发了一个系统,展示了以下内容:
WhyHow.AI的知识图谱工作室平台(目前处于测试阶段)是构建模块化、代理式知识图谱的最简单方法,它结合了LLM、开发人员和非技术领域专家的工作流程。
附录
代理代码片段
定义代理
def definitions_search(query_prompt: str, client: Optional[WhyHow]=None) -> Dict[str, str]:definitions_search(query_prompt: str, client: Optional[WhyHow]=None) -> Dict[str, str]:
"""
Search for definitions of terms in a question prompt and return them as a dictionary.
"""
if client is None:
client = WhyHow(api_key=WHYHOW_API_KEY, base_url=WHYHOW_API_URL)
definitions_response = client.graphs.query_unstructured(
graph_id=definitions_graph.graph_id,
query=query_prompt,
)
response_text = definitions_response.answer
term_def_pairs = response_text.split('\n')
definitions_dict = {}
for pair in term_def_pairs:
if ':' in pair:
term, definition = pair.split(':', 1)
definitions_dict[term.strip()] = definition.strip()
return definitions_dict
query_prompt = """Return me definitions for the terms in this query: "How can the Board and the CCO manage control functions?" Ensure the term-definition pairs are separated by newlines, properly capitalised"""
definitions_dict = definitions_search(query_prompt)
def print_prompt_definitions_dict(definitions_dict):
prompt = "Relevant Definitions:\n"
for term, definition in definitions_dict.items():
prompt += f"{term}: {definition}\n"
return prompt
print(print_prompt_definitions_dict(definitions_dict))
路由器代理
def router_agent(state: AgentState) -> AgentState:
# decide if process should should stop or continue
starter_prompt_footer = f"""
You are an intelligent agent overseeing a multi-agent retrieval process of graph nodes from a document. These nodes are to answer the query:
```{state['query']}```
Below this request is a list of nodes that were automatically retrieved.
You must determine if the list of nodes is enough to answer the query. If there isn't enough information, you must identify any relevant footer information in the nodes.
A node can footer information asking to look in another section/part of the document, which will require a separate natural language search.
Example: If the footer says "see paragraph x", a search query e.g. "Return paragraph x to answer the query '{state['query']}'" should be made.
If there are no further nodes worth analyzing, return an empty response. ONLY RETURN QUERIES FOR FOOTERS THAT ARE RELEVANT TO ANSWERING THE QUERY
Else, if any relevant nodes require a footer search, specify the node_id and the search query.
Nodes are identified by node_id and must be quoted in backticks.
"""
starter_prompt_link = f"""
You are an intelligent agent overseeing a multi-agent retrieval process of graph nodes from a document. These nodes are to answer the query:
```{state['query']}```
Below this request is a list of nodes that were automatically retrieved.
You must determine if the list of nodes is enough to answer the query. If there isn't enough information, you must identify any linked nodes that could be worth exploring.
If there are no further nodes worth analyzing, return an empty response.
Return a list of node_ids. ONLY RETURN NODE_IDS for NODES THAT ARE RELEVANT TO ANSWERING THE QUERY. Nodes are identified by node_id and must be quoted in backticks.
"""
# collect latest nodes, and all nodes
last_fetched_nodes_flattened: Dict[str, MultiAgentSearchLocalNode] = {}
all_nodes_flattened: Dict[str, MultiAgentSearchLocalNode] = {}
监督员代理
def supervisor_agent(state:AgentState) -> AgentState:state:AgentState) -> AgentState:
# Look for search failures. This might be an instance where multiple searches were made for certain parts of the document, but no relevant information was found.
# This means that the search has to be ended prematurely to prevent infinite loops.
printout = ""
for node in state["previous_nodes"]:
printout += node.print_node_prompt()
for node in state["last_fetched_context_nodes"]:
printout += node.print_node_prompt()
prompt = f"""
You are a supervisor agent overseeing the multi-agent retrieval process of graph nodes from a document. The nodes are to answer the query:
```{state['query']}```
Below is a list of nodes that were automatically retrieved, followed by a list of errors. If there are many similar, repeated errors in the retrieval process , where no further linked or relevant nodes could be retrieved, return END to end the process. Else return CONTINUE.
Return only a single word, either END or CONTINUE.
"""
completion = openai_client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{"role": "system", "content": prompt},
{"role": "user", "content": printout},
{"role": "user", "content": state['search_failures']},
],
)
递归代理
def recursive_retrieval(state: AgentState) -> AgentState:state: AgentState) -> AgentState:
current_nodes = state["last_fetched_context_nodes"]
for current_node in current_nodes:
state["previous_nodes"].append(current_node)
new_current_nodes = []
# look up the nodes to fetch by id
for node_id in state["node_links_to_fetch"]:
# sometimes GPT returns node ids with or without backticks
if node_id[0] == "`":
node_id = node_id[1:-1]
if node_id in local_nodes_map:
new_current_nodes.append(local_nodes_map[node_id])
else:
state["search_failures"].append(f"Failed to fetch node with id: {node_id}")
for node_id, search_query in state["node_footers_to_fetch"].items():
# fetch nodes by keyword and bm25 search
footer_retrieved_nodes = retrieve_with_keywords_bm25(search_query)
# LLM prunes nodes that are not relevant
footer_retrieved_nodes, _ = prune_nodes(search_query, footer_retrieved_nodes)
for node in footer_retrieved_nodes:
new_current_nodes.append(node)
# if no nodes fetched, log failure
if len(footer_retrieved_nodes) == 0:
state["search_failures"].append(
f"Failed to fetch nodes for query: {search_query}"
)
state["last_fetched_context_nodes"] = new_current_nodes
state["pass_count"] += 1
state["node_footers_to_fetch"] = {}
state["node_links_to_fetch"] = []
return state
应答代理
def answering_agent(state: AgentState) -> AgentState:
# answer the query
prompt = f"""
You are an answering agent. You will be given a list of document nodes that were automatically retrieved by the system. These nodes are to answer the query:
```{state['query']}```
Give references to sections/paragraphs if possible, but do not output full node ids with backticks and the hash.
"""
通过多图多代理系统检索到的答案的变体
答案 #2:
To manage control functions effectively, both the Board and the Chief
Compliance Officer (CCO) must collaborate and establish a structured
framework. Key measures include:
Board’s Responsibilities:
- The Board must approve the appointment, remuneration, and termination of the
CCO, ensuring they have the necessary stature to engage effectively with
senior management (S 6.3(a), (b)).
- The Board should engage regularly with the CCO to discuss compliance issues
and ensure the CCO has direct access to the Board (S 6.3(c), (d)).
- The Board must also assure that the CCO is granted adequate resources and
support, including competent officers, to fulfill their duties (S 6.3(e)).
CCO’s Role and Independence:
- The CCO should oversee the institution-wide compliance risk management and
ensure that responsibilities are clearly documented and communicated in the
compliance policy (S 7.2(a), (b)).
- If the CCO is involved in other control functions, they must ensure their
independence and ability to focus on compliance responsibilities remain
uncompromised (S 7.3).
Coordination with Other Functions:
- Proper coordination between the CCO and other control functions is essential.
This includes robust information flows and mechanisms for seeking compliance
advice (S 7.2(d)).
- The Board must approve the sharing of compliance responsibilities to ensure
there is no dilution of focus or effectiveness (S 7.4(a)).
Separation from Internal Audit:
- Compliance functions should remain distinct from the internal audit to
maintain the efficacy of independent reviews (S 7.4(b), S 9.1).
Training and Capacity Building:
- The compliance function is responsible for training and providing guidance
to other officers on legal and regulatory requirements to manage compliance
risk effectively (S 8.11).
By implementing these steps, the Board and the CCO can effectively manage
control functions, maintaining robust compliance and governance frameworks.
答案#3
The management of control functions by the Board and the Chief Compliance
Officer (CCO) involves several key responsibilities and structures:
Role of the Board in Managing CCO Functions:
- The board must approve the appointment, remuneration, and termination of the
CCO (Section 6.3(a)).
- The board should ensure that the CCO has sufficient stature to interact
effectively with the CEO and senior management (Section 6.3(b)).
- Regular engagement between the board and the CCO is important to discuss
compliance issues directly (Section 6.3(c)).
- The CCO must have direct, unimpeded access to the board (Section 6.3(d)).
- There must be adequate resources and support for the CCO to perform his
duties effectively (Section 6.3(e)).
Shared Responsibilities and Independence:
- Where compliance functions are shared, the board must approve this
arrangement, and responsibilities should be clearly defined and documented
in the compliance policy (Section 7.2).
- The CCO should not assume responsibilities for internal audit, as this can
compromise independent review processes (Sections 7.4, 9.1).
- The CCO must ensure that their independence and ability to focus on
compliance are not impaired by additional responsibilities (Section 7.3).
Responsibilities Within the Organization:
- Compliance is the responsibility of all officers within the institution.
Business lines manage compliance risk through their managerial controls,
while the compliance function ensures that these controls are adequate
(Section 1.2).
- The internal audit function provides independent assurance on the quality
and effectiveness of the institution’s controls, including those concerning
compliance (Section 1.2(c)).
Coordination Across Control Functions:
- Arrangements for coordination among control functions and the CCO must
promote a consistent approach to managing compliance risk, with adequate
information flows and avenues for advice (Section 7.2(d)).
By following these guidelines, the Board and the CCO can manage the
compliance control functions effectively, ensuring that compliance risks
are appropriately identified, managed, and mitigated across the organization.