Python代码生成:AutoGen Conversable Agent的验证实践

2024年04月12日 由 alex 发表 289 0

在本文中,我将快速演示如何使用 AutoGen 提供的会话式代理来解决这种缺乏 "验证 "的问题。


什么是 AutoGen?


"AutoGen是一个框架,可以使用多个代理开发LLM应用程序,这些代理可以相互对话以解决任务"。


介绍 LeetCode Problem Solver:


首先悄悄安装 autogen:


!pip install pyautogen -q --progress-bar off


我使用的是谷歌 Colab,因此我在 "secrets "选项卡中输入了 OPENAI_API_KEY,并将其与其他模块一起安全加载:


import os
import csv
import autogen
from autogen import Cache
from google.colab import userdata
userdata.get('OPENAI_API_KEY')


我使用 gpt-3.5-turbo 只是因为它比 gpt4 便宜。如果你有能力进行更昂贵的实验,并且/或者你做事更 "认真",你显然应该使用更强大的型号。


llm_config = {
    "config_list": [{"model": "gpt-3.5-turbo", "api_key": userdata.get('OPENAI_API_KEY')}],
    "cache_seed": 0,  # seed for reproducibility
    "temperature": 0,  # temperature to control randomness
}


现在,我将复制我最喜欢的 LeetCode 问题 Two Sum 的问题陈述。这是 Leetcode 面试中最常见的问题之一,涵盖了使用哈希表缓存和基本等式运算等基本概念。


LEETCODE_QUESTION = """
Title: Two Sum
Given an array of integers nums and an integer target, return indices of the two numbers such that they add up to target. You may assume that each input would have exactly one solution, and you may not use the same element twice. You can return the answer in any order.
Example 1:
Input: nums = [2,7,11,15], target = 9
Output: [0,1]
Explanation: Because nums[0] + nums[1] == 9, we return [0, 1].
Example 2:
Input: nums = [3,2,4], target = 6
Output: [1,2]
Example 3:
Input: nums = [3,3], target = 6
Output: [0,1]
Constraints:
2 <= nums.length <= 104
-109 <= nums[i] <= 109
-109 <= target <= 109
Only one valid answer exists.
Follow-up: Can you come up with an algorithm that is less than O(n2) time complexity?
"""


现在我们可以定义两个代理了。其中一个代理作为 "助理 "代理提出解决方案,另一个代理作为我们(用户)的代理,同时负责执行建议的 Python 代码。


# create an AssistantAgent named "assistant"
SYSTEM_MESSAGE = """You are a helpful AI assistant.
Solve tasks using your coding and language skills.
In the following cases, suggest python code (in a python coding block) or shell script (in a sh coding block) for the user to execute.
1. When you need to collect info, use the code to output the info you need, for example, browse or search the web, download/read a file, print the content of a webpage or a file, get the current date/time, check the operating system. After sufficient info is printed and the task is ready to be solved based on your language skill, you can solve the task by yourself.
2. When you need to perform some task with code, use the code to perform the task and output the result. Finish the task smartly.
Solve the task step by step if you need to. If a plan is not provided, explain your plan first. Be clear which step uses code, and which step uses your language skill.
When using code, you must indicate the script type in the code block. The user cannot provide any other feedback or perform any other action beyond executing the code you suggest. The user can't modify your code. So do not suggest incomplete code which requires users to modify. Don't use a code block if it's not intended to be executed by the user.
If you want the user to save the code in a file before executing it, put # filename: <filename> inside the code block as the first line. Don't include multiple code blocks in one response. Do not ask users to copy and paste the result. Instead, use 'print' function for the output when relevant. Check the execution result returned by the user.
If the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes. If the error can't be fixed or if the task is not solved even after the code is executed successfully, analyze the problem, revisit your assumption, collect additional info you need, and think of a different approach to try.
When you find an answer, verify the answer carefully. Include verifiable evidence in your response if possible.
Additional requirements:
1. Within the code, add functionality to measure the total run-time of the algorithm in python function using "time" library.
2. Only when the user proxy agent confirms that the Python script ran successfully and the total run-time (printed on stdout console) is less than 50 ms, only then return a concluding message with the word "TERMINATE". Otherwise, repeat the above process with a more optimal solution if it exists.
"""
assistant = autogen.AssistantAgent(
    name="assistant",
    llm_config=llm_config,
    system_message=SYSTEM_MESSAGE
)
# create a UserProxyAgent instance named "user_proxy"
user_proxy = autogen.UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=4,
    is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config={
        "work_dir": "coding",
        "use_docker": False,
    },
)


我将 human_input_mode 设置为 "NEVER"(从不),因为我自己不打算提供任何输入,并将 max_consecutive_auto_reply 设置为 4,以限制对话中的前后转折。已指示助理代理回复 "TERMINATE"(终止)一词,告诉用户代理何时结束对话。


使用 AutoGen(即使是非代理工作流)的一个额外好处是,它提供了显式缓存功能,可以帮助你在开发过程中节省 API 成本。在这里,我将响应缓存在磁盘上,但你也可以为此目的集成 redis。


# Use DiskCache as cache
with Cache.disk(cache_seed=7) as cache:
  # the assistant receives a message from the user_proxy, which contains the task description
  chat_res = user_proxy.initiate_chat(
      assistant,
      message="""Solve the following leetcode problem and also comment on it's time and space complexity:\n\n""" + LEETCODE_QUESTION
)
输出:
user_proxy (to assistant):
Solve the following leetcode problem and also comment on it's time and space complexity:

Title: Two Sum
...
Follow-up: Can you come up with an algorithm that is less than O(n2) time complexity?

--------------------------------------------------------------------------------
assistant (to user_proxy):
To solve this problem efficiently, we can use a dictionary to store the elements we have seen so far along with their indices. While iterating through the array, we can check if the complement of the current element (target - current element) is already in the dictionary. If it is, we have found the pair that sums up to the target.
Here's the Python code to solve the Two Sum problem with a time complexity less than O(n^2):
```python
# filename: two_sum.py
def two_sum(nums, target):
    num_dict = {}
    for i, num in enumerate(nums):
        complement = target - num
        if complement in num_dict:
            return [num_dict[complement], i]
        num_dict[num] = i
# Test the function with the given examples
nums1 = [2, 7, 11, 15]
target1 = 9
print(two_sum(nums1, target1))  # Output: [0, 1]
nums2 = [3, 2, 4]
target2 = 6
print(two_sum(nums2, target2))  # Output: [1, 2]
nums3 = [3, 3]
target3 = 6
print(two_sum(nums3, target3))  # Output: [0, 1]
```
The time complexity of this solution is O(n) because we iterate through the list of numbers only once. The space complexity is also O(n) because we store elements in a dictionary.
You can run this code and check if it provides the correct output. Let me know the total run-time, and if it's less than 50 ms, you can reply with "TERMINATE". If it's not, I can suggest a more optimal solution.
--------------------------------------------------------------------------------
>>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...
user_proxy (to assistant):
exitcode: 0 (execution succeeded)
Code output: 
[0, 1]
[1, 2]
[0, 1]

--------------------------------------------------------------------------------
assistant (to user_proxy):
The Python script ran successfully and provided the correct output for the given examples. Since the total run-time is not provided, I will suggest measuring the total run-time of the algorithm using the "time" library to determine if it's less than 50 ms. 
You can add the functionality to measure the total run-time by importing the "time" library and using it to calculate the elapsed time before and after calling the `two_sum` function. Here's how you can modify the code:
```python
# filename: two_sum.py
import time
def two_sum(nums, target):
    num_dict = {}
    for i, num in enumerate(nums):
        complement = target - num
        if complement in num_dict:
            return [num_dict[complement], i]
        num_dict[num] = i
# Test the function with the given examples and measure the total run-time
nums1 = [2, 7, 11, 15]
target1 = 9
start_time = time.time()
print(two_sum(nums1, target1))  # Output: [0, 1]
end_time = time.time()
print("Total run-time:", (end_time - start_time) * 1000, "ms")
nums2 = [3, 2, 4]
target2 = 6
start_time = time.time()
print(two_sum(nums2, target2))  # Output: [1, 2]
end_time = time.time()
print("Total run-time:", (end_time - start_time) * 1000, "ms")
nums3 = [3, 3]
target3 = 6
start_time = time.time()
print(two_sum(nums3, target3))  # Output: [0, 1]
end_time = time.time()
print("Total run-time:", (end_time - start_time) * 1000, "ms")
```
Please run this modified code and provide the total run-time for each example. If the total run-time is less than 50 ms for all examples, you can reply with "TERMINATE".
--------------------------------------------------------------------------------
>>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...
user_proxy (to assistant):
exitcode: 0 (execution succeeded)
Code output: 
[0, 1]
Total run-time: 0.01239776611328125 ms
[1, 2]
Total run-time: 0.00286102294921875 ms
[0, 1]
Total run-time: 0.0016689300537109375 ms

--------------------------------------------------------------------------------
assistant (to user_proxy):
The Python script ran successfully, and the total run-time for each example is significantly less than 50 ms. Therefore, you can reply with "TERMINATE". The algorithm provided an efficient solution with a time complexity of O(n) and a space complexity of O(n).


现在的情况就是这样:


  1. UserProxyAgent 要求助手根据任务描述解决问题。
  2. 助理用 Python 代码块提出解决方案
  3. 用户代理执行 python 代码。
  4. 助手读取控制台输出,并反馈修改后的解决方案(带时间测量功能)。老实说,我本来以为会立即得到修改后的解决方案,但这种行为可以通过提示工程或使用更强大的 LLM 进行调整)。


使用 AutoGen,还可以显示代理工作流程的成本。


chat_res.cost


({'total_cost': 0,
  'gpt-3.5-turbo-0125': {'cost': 0,
   'prompt_tokens': 14578,
   'completion_tokens': 3460,
   'total_tokens': 18038}}



文章来源:https://towardsdatascience.com/generate-verified-python-code-using-autogen-conversable-agents-2102b4f706ba
欢迎关注ATYUN官方公众号
商务合作及内容投稿请联系邮箱:bd@atyun.com
评论 登录
热门职位
Maluuba
20000~40000/月
Cisco
25000~30000/月 深圳市
PilotAILabs
30000~60000/年 深圳市
写评论取消
回复取消