创建笔记本里的Copilot

2024年01月18日由 alex 发表 579 0

以上是你将要构建的copilot功能之一的示例。%graphs 发信号告诉我们的 copilot，我需要提出与图形相关的问题。它可以接收单元格引用，如--in16，其中包含我们需要分析的图形。此外，输入提示指定了你需要询问的有关图形的信息，它还能输出准确的结果。它可以在 Anaconda Jupyter Notebooks、VS Code Notebooks、Jupyter Lab 或任何本地笔记本环境中运行。

设置舞台

要创建 Copilot 的功能，第一步是初始化 Gemini MultiModel。为此，你需要安装一些库：

# Install necessary libraries
pip install -q -U google-generativeai grpcio grpcio-tools

现在，我们需要导入必要的库，以获取 Gemini LLM API 调用并实例化所需的 API 密钥。

# Import the Google Generative AI library
import google.generativeai as genai
# Initialize the GenerativeModel with 'gemini-pro' for chat and code
text_model = genai.GenerativeModel('gemini-pro')
# Initialize the GenerativeModel with 'gemini-pro-vision' for graphs
image_model = genai.GenerativeModel('gemini-pro-vision')
# Configure the library with your API key
genai.configure(api_key="Your-API-key")

我们已经加载了两个模型：gemini-pro，作为我们生成代码或进行代码相关对话的文本模型；gemini-pro-vision，将用于管理 Copilot 的图像相关功能。接下来，我们需要导入用于创建 Copilot 功能的库。

# Regular expression for pattern matching
import re
# IPython for working with IPython environment
import IPython
# OS for interacting with the operating system
import os
# JSON for working with JSON data
import json
# Base64 for encoding and decoding base64 data
import base64
# Image class from IPython.display for displaying images
from IPython.display import Image
# register_line_magic for registering custom magic commands
from IPython.core.magic import register_line_magic

让我们开始编码 Copilot 的一个简单功能，即聊天。从这个功能开始的原因是，当我们构建更复杂的功能时，它将使我们更容易理解后面的代码。

简单的聊天功能

你正在笔记本中编码，然后你意识到需要向 ChatGPT 询问一些问题。为了避免切换到浏览器选项卡进行聊天，我们将创建一个聊天功能，允许你在代码单元旁边聊天。我们的“chat”功能需要一个输入，这就是我们的提示，作为响应，Gemini 文本模型将提供答案。

# Registering a Jupyter Notebook magic command named 'chat'
@register_line_magic
def chat(contents):
    # Generating a response using the 'generate_content' method of the 'text_model' object
    # The method takes a formatted string containing the provided 'contents'
    response = text_model.generate_content(f'''
                                    Answer the question in a short quick readable paragraph, dont provide answer in any format or code
                                    {contents}
                                    ''').text
    # Printing the generated response to the output
    print(response)

聊天函数中有两行很重要，其中一行是@register_line_magic 装饰器。它有助于我们用 %chat 而不是 chat( ) 调用函数。这让它更像人工智能的感觉，尽管这并不是必须的。第二个重要部分是使用的提示模板。之所以选择这个提示模板，是因为 Gemini 有一个习惯，那就是在大多数情况下都以 markdown 格式生成聊天回复。因此，有必要指示 Gemini 回复不能使用标记符或代码格式。你可以根据自己的需要更新提示模板。

你可以在任何代码单元格中使用 "chat "功能。为此，你需要通过 %chat [your_question]，它就会打印出回复。

# Running Chat Feature
%chat What are some useful libraries for coding neural networks in Python

与代码聊天功能

此功能可让你在笔记本中与代码聊天，你不必单独使用 ChatGPT，只需前往 ChatGPT，粘贴代码并提问即可。“Chat with Code”功能需要两样东西，你的提示和你要提问的代码。

# Define a function named 'chatn' that takes 'contents' as a parameter
@register_line_magic
def chatn(contents):
    try:
        # Use regular expression to find all occurrences of '--in' followed by digits in 'contents'
        numbers = [int(match.group().replace('--in', '')) for match in re.finditer(r'--in\d+', contents)]
        # Remove the found pattern '--in\d+' from 'contents'
        contents_filter = re.sub(r'--in\d+', '', contents)
        # Check if there are any references (numbers) found
        if numbers:
            # Retrieve the current cell contents for all references using the IPython 'In' variable
            current_cell_contents = [In[number] for number in numbers]
            # Combine the contents into a single string with line breaks
            combined_content = '\n'.join(current_cell_contents)
            # Execute the text_model to generate response
            response = text_model.generate_content(f'''
                                            {combined_content}
                                            Answer the question in a short readable paragraph, don't provide the answer in any format or code
                                            {contents_filter}
                                            ''').text
            # Print the generated response
            print(response)
        else:
            # Print an error message if no references are found
            print('Please provide a correct codeblock reference.')
    except Exception as e:
        # Print an error message if an exception occurs
        print('Please provide a correct codeblock reference.')

让我们来了解一下我们的 chatn 函数。try-except 块用于避免在未传递单元格引用的情况下出错。我们要做的第一件事是使用 regex 提取单元格引用的所有--in 模式，并清理提示符，以避免将其传递给 Gemini API。对于单元格编号引用，我使用了 --in 格式，因为它更容易记忆。In[number]会从你在提示中提到的单元格编号中获取所有代码，将其合并，并与经过清理的提示一起传递给你。你可以传递任意数量的单元格引用，无需对它们进行排序。

要使用 "Chat with Code "功能，你需要传递 %chatn [cell references][your_question]，然后它就会打印回复。

# Running Chat with Code Feature
%chatn --in17 --in11 I sum element wise but it is not working

你可能会认为这是一个非常简单的问题，但它对更复杂的代码也有效。

生成代码功能

生成代码是最重要的功能之一，你可能每时每刻都在使用它。我们将生成两个版本的代码，一个是根据提示生成代码，另一个是生成关系代码。简单的 "Generate Code "功能只需一个输入，即你的提示，它就会在下一个单元格中生成代码。

# Register a custom line magic command
@register_line_magic
def code(contents):
    # Get the IPython shell instance
    from IPython.core.getipython import get_ipython
    shell = get_ipython()
    # Generate code content using a text model
    response = text_model.generate_content(f'''
                                    write a python code that and dont answer anything else
                                    {contents}
                                    ''').text
    # Remove ``` and python from the response
    response = response.replace('```', '')
    # Clean up the response
    response = response.replace('python', '').strip('\n').rstrip('\n').replace('```python', '')
    # Prepare payload for setting the next input
    payload = dict(
        source='set_next_input',
        text=response,
        replace=False,
    )
    # Write the payload to the IPython shell
    shell.payload_manager.write_payload(payload, single=False)

在我们的代码函数中，get_ipython 模块负责在你提供提示的当前单元格旁边生成代码。清理是必要的，因为生成的 Python 代码包含一些需要删除的额外字符。有效载荷将获取 Gemini 模型的响应，并创建一个新单元格来粘贴它。

要使用 "Generate Code"功能，你需要传入%code [your_prompt]，它就会在下一个单元格中生成你要求的代码。

# Running Generate Code Feature
%code load my data.csv and take random sample of 100 rows

生成关系代码功能

关系编码功能非常重要，因为大多数情况下，你可能要在其他代码之上进行编码。好在这一功能与我们在 chatn 功能中使用的功能相同。“Relational Code”功能需要两样东西：你的提示和你要关联的代码。

# Define a function named 'coden' that takes 'contents' as a parameter
@register_line_magic
def coden(contents):
    try:
        # Get the IPython shell instance
        from IPython.core.getipython import get_ipython
        shell = get_ipython()
        # Use regular expression to find all occurrences of '--in' followed by digits in 'contents'
        numbers = [int(match.group().replace('--in', '')) for match in re.finditer(r'--in\d+', contents)]
        # Remove the found pattern '--in\d+' from 'contents'
        contents_filter = re.sub(r'--in\d+', '', contents)
        # Check if there are any references (numbers) found
        if numbers:
            # Retrieve the current cell contents for all references using the IPython 'In' variable
            current_cell_contents = [In[number] for number in numbers]
            # Combine the contents into a single string with line breaks
            combined_content = '\n'.join(current_cell_contents)
            # Execute the text_model to generate code
            response = text_model.generate_content(f'''{combined_content}
                                                  {contents_filter}
                                                  please write Python code and don't answer anything else, dont provide output of the code
                                                  ''').text
            # Remove ``` and python from the response
            response = response.replace('```', '')
            # Clean up the response
            response = response.replace('python', '').strip('\n').rstrip('\n').replace('```python', '')
            # Prepare payload for setting the next input
            payload = dict(
                source='set_next_input',
                text=response,
                replace=False,
            )
            # Write the payload to the IPython shell
            shell.payload_manager.write_payload(payload, single=False)
        else:
            # Print an error message if no references are found
            print('Please provide a correct codeblock reference.')
    except Exception as e:
        # Print an error message if an exception occurs
        print('Please provide a correct codeblock reference.')

payload和清理文本代码从函数中使用code，而其余代码则从chatn函数中获取。要使用“Relational Code”功能，你需要传递%coden [cell references] [your_prompt]，它将在下一个单元格中创建你请求的代码。你可以根据需要传递任意数量的单元格引用。

要使用“Relational Code”功能，你需要传递%code [cell_references] [your_prompt]，它将在下一个单元格中创建你请求的代码。

# Running Relational Code Feature
%coden --in83 --in76 multiply y with each x item

与图表聊天功能

这个功能会比较复杂。让我们一步一步来构建它。首先，你必须以编程方式获取正在编写代码的文件名。

# Import the IPython module
import IPython
# Import the os module for interacting with the operating system
import os
# Extract the local variables from the IPython environment
file_path = IPython.extract_module_locals()[1]['__vsc_ipynb_file__']
# Extract the base name (file name) from the file path
file_name = os.path.basename(file_path)
# Return the file name
print(file_name)

############### OUTPUT ###############
      myfile.ipynb
  
############### OUTPUT ###############

这只能在 VSCode 中使用，而不能在 Jupyter Lab 或 Anaconda 笔记本中使用。如果你不使用 VSCode，可以跳过这一步，因为我们的最终代码将具备这一功能，允许你在提示符中手动传递文件名。接下来，我们需要在 json 中加载这个笔记本。

# Import the json module for working with JSON data
import json
import base64
from IPython.display import Image
# Open the notebook file in read mode
with open(file_name, "r") as f:
    # Load the content of the notebook file as JSON
    notebook_json = json.load(f)

加载笔记本文件后，我们可以循环浏览数据，并获取存在图形的特定单元格输出。假设我们的图形存在于 65 号单元格。

# Import the base64 module for encoding and decoding base64 data
import base64
# Import the Image class from the IPython.display module for displaying images in an IPython environment
from IPython.display import Image
####### Cell Number #######
cell_number = 65
# Find the cell in the notebook JSON with execution count equal to 65
element = next(cell for cell in notebook_json['cells'] if 'execution_count' in cell and cell['execution_count'] == cell_number)
# Extract the base64-encoded PNG image data from the cell's outputs
image_data = element['outputs'][0]['data']['image/png']
# Decode the base64-encoded image data
image_base64 = base64.b64decode(image_data)
# Save the decoded image data as a JPG file in the local directory
with open('img_code.jpg', 'wb') as f:
    f.write(image_base64)
# Assuming 'Image' is imported from the IPython.display module, load the saved image using the Image() function
image = Image(filename='img_code.jpg')

Gemini 图像模型只接受本地存储的图像，你必须保存提取的图形并使用图像模块加载图像。它将接收两个输入，一个是提示，另一个是包含图形的单元格引用。

# Try to get the current notebook filename using IPython
try:
    file_name = IPython.extract_module_locals()[1]['__vsc_ipynb_file__']
    # Extract the base name (file name) from the file path
    file_name = os.path.basename(file_name)
except:
    # If an exception occurs, print a message indicating no file
    file_name = None
# Register a custom magic command for the Jupyter notebook
@register_line_magic
def graph(contents):
    # Search for the pattern --in<number>
    pattern = re.compile(r'--in\d+')
    # Find the first occurrence of the pattern in the contents
    match = pattern.search(contents)
    # Remove the pattern from the contents
    contents_filter = pattern.sub('', contents)
    # Define a new pattern for --filename=<word>
    pattern_f = re.compile(r'--filename=\w+')
    # Find the first occurrence of the new pattern in the contents
    match_f = pattern_f.search(contents)
    # Remove the new pattern from the filtered contents
    contents_filter = pattern_f.sub('', contents_filter)
    # If the --in<number> pattern is found
    if match:
        # Get the global variable file_name
        global file_name
        # Check if file_name is available from the IPython magic command
        if file_name:
            notebookName = file_name
            with open(notebookName, "r") as f:
                # Load the notebook JSON data
                notebook_json = json.load(f)
        elif match_f:
            # Extract the filename from the --filename=<word> pattern
            match_c = match_f.group().replace('--filename=', '')
            notebookName = match_c + '.ipynb'
            with open(notebookName, "r") as f:
                # Load the notebook JSON data
                notebook_json = json.load(f)
        else:
            # If neither file_name nor --filename=<word> is provided, print an error message
            return 'Please provide a correct file path using --filename=<filename>.ipynb, e.g., --filename=mycode.ipynb'
        # Extract the number from the --in<number> pattern
        number = int(match.group().replace('--in', ''))
        # Find the cell with the specified execution_count in the notebook JSON data
        element = next(cell for cell in notebook_json['cells'] if 'execution_count' in cell and cell['execution_count'] == number)
        # Extract image data from the cell's output
        image_data = element['outputs'][0]['data']['image/png']
        # Decode base64 image data
        image_base64 = base64.b64decode(image_data)
        # Save the image in the local directory as img_code.jpg
        with open('img_code.jpg', 'wb') as f:
            f.write(image_base64)
        # Load the image using the Image() function
        image = Image(filename='img_code.jpg')
        # extract information using image model
        response = image_model.generate_content([contents_filter, image])
        print(response.text)
    else:
        # If --in<number> pattern is not found, print an error message
        print('Please provide a correct code block reference.')

与图表对话需要使用 image_model，我们对文件名进行了文本模式提取，其方法与单元格引用中的--in 方法相同。要使用 "Chat with Graph "功能，你需要传递 %graph [single_cell_reference] [your_prompt] [filename]，它就会打印响应。

# Running Chat with Image Feature
%coden --in143 how many outliers are there

与文件聊天功能

小型项目通常依赖于多个 Python 文件。当你想在笔记本中与 py 文件聊天，而不是逐个检查它们的代码时，这个功能就很有用。“Chat with Files ”功能需要两样东西，你的提示符和包含 py 文件的文件夹名称。

# Register a custom magic command for IPython
@register_line_magic
def chatf(contents):
    try:
        # Parse the folder name from the provided argument
        folder_match = re.search(r'--folder_name=(\S+)', contents)
        if not folder_match:
            # Print an error message if folder name is not provided in the correct format
            print("Please provide a valid folder name using the format '--folder_name=<folder_name>'.")
            return
        # Extract the folder name from the regex match
        folder_name = folder_match.group(1)
        # Get a list of Python files in the specified folder
        python_files = [file for file in os.listdir(folder_name) if file.endswith('.py')]
        # Check if any Python files were found
        if not python_files:
            print(f"No Python files found in the folder '{folder_name}'.")
            return
        # Initialize an empty string to store combined content
        combined_content = ""
        # Iterate through each Python file in the folder
        for file_name in python_files:
            with open(os.path.join(folder_name, file_name), 'r') as file:
                # Read the content of the file
                file_content = file.read()
                # Format the combined content with file name and its code
                combined_content += f"\nfile: {file_name}\n{file_content}\n{'_'*15}\n"
        # Remove the pattern of folder from the input contents
        contents_filter = re.sub(r'--folder_name=\S+', '', contents)
        # Generate content using a model and display the response
        response = text_model.generate_content(f'''
                                        {combined_content}
                                        Answer the question in a short readable paragraph, don't provide the answer in any format or code
                                        {contents_filter}
                                        ''').text
        print(response)
    except Exception as e:
        # Print an error message if an exception occurs
        print(f'An error occurred: {str(e)}')

chatf 函数将接收文件夹引用，与我们提供单元格引用的方式类似。然后，它将合并所有文件名及其内容，其余代码与 chatn 函数中的代码保持一致。要使用 "Chat with Files "功能，你需要传入 %chatf [single_folder_reference] [your_prompt]，然后它会打印响应。

# Running chat with files Feature
%chatf --folder_name=myfolder How to clean and format data

编译功能

你不想为不同的项目反复输入每个功能函数，这将是一项耗时的任务。你可以将所有功能合并到一个 py 文件中。我将其命名为 my_copilot.py，然后就可以简单地导入这个模块，使用其中的任何功能。

# Importing all features of your copilot
from my_copilot import *
# using generate code feature
%code load my data.csv file using pandas

文章来源：https://levelup.gitconnected.com/create-copilot-inside-your-notebooks-that-can-chat-with-graphs-write-code-and-more-e9390e2b9ed8

标签：

Python

0 评论

欢迎关注ATYUN官方公众号

商务合作及内容投稿请联系邮箱:bd@atyun.com

上一篇如何用Python学习人工智能？

下一篇异常和离群值检测在数据分析中的关键作用

评论登录

要发表评论，您必须先登录。

jonatasgrosman/wav2vec2-large-xlsr-53-english facebook/dino-vitb16 bert-base-uncased xlm-roberta-large xlm-roberta-base gpt2 microsoft/resnet-50 facebook/dino-vits8

AGENTIC AI如何塑造未来