使用Google的Gemini Flash与CSV文件聊天

2024年06月25日由 alex 发表 387 0

LangChain、LangGraph、LlamaIndex、CrewAI......当你开始习惯使用一种工具时，另一种工具又冒了出来。如果你真的想构建一些东西，而不仅仅是为了赶时髦而学习这些库，那么LangChain 非常出色，但它的内容也非常密集。就文档本身而言，我们有 Python 文档、JavaScript 文档和 API 的父文档。在 Python 中，有多个函数执行类似的工作。这一切都很好，但你对代码的控制力不够。

最近GPT-4o 和双子座更新接连在两天内推出。

GPT-4o 的 ScarJo语音在演示中极具冲击力，在人工智能行业掀起了一股浪潮，使 Gemini 在谷歌 I/O 大会上的表现黯然失色。老实说，GPT 永远比所有 LLM 领先一步，但对于构建 MVP 和个人项目来说，Gemini 对像我这样穷困开发者来说是个福音，它的免费层级可以访问 API 和最新的模型，速率限制为 15 rpm。

ChatGPT 的免费版本现在可以上传 GPT-4o 文件，但如果没有代码解释器，它仍然无法解释 CSV 或 Excel 数据。因此，为了测试 Gemini Flash 的能力，我决定自己制作一个 CSV 解释器。市面上有许多教程都会告诉你如何使用 Langchain 创建，但我们将从头开始创建。它比 Langchain 更容易理解，尤其是对于初学者。

管道

基本原理就是让 LLM 为你生成代码。为了让 LLM 生成代码，我们必须向它提供一些上下文，让它知道自己在处理什么数据集。

第 1 阶段涉及代码生成。根据我的经验，head()、describe()、columns() 和 dtypes 应该能提供足够的上下文。如果你有使用 LLM 的经验，你可能会想我们是否可以使用 RAG（检索增强生成）。RAG 使用矢量数据库和语义搜索（矢量搜索）进行数据检索，并将其与查询一起注入 LLM。这对文本数据很有效，但对结构化或非结构化数据的分析却不合适，因为整个数据集会被分割成若干块，从而严重影响上下文并限制功能。例如，如果你想查询数据集中的总行数，RAG 根本无法实现。

第 2 阶段为 LLM 提供执行第 1 阶段代码后生成的输出，并将其与用户查询绑定。这样，我们就能将 pandas 命令的输出转换为自然语言，从而改善对话体验。

使用的提示

大多数人都会犯的一个新手错误就是在提示中详细说明所有可能的指令。LLM 并不智能，它们不会遵守所有指令（至少在实现 AGI 之前不会）。为了避免这个问题，人们开发了多种技术和策略，例如零样本提示、思维链、少样本提示等。虽然听起来很酷，但它们其实都很基础。我不会对它们进行深入探讨，因为已经有很多文献对它们进行了介绍。在所有技巧中，角色提示对我来说最有效。在基于角色的提示中，你会在询问之前为 LLM 分配一个角色。这有助于达到预期效果。

总共需要四个提示：两个系统提示和两个主要提示（每个阶段的系统提示 + 主要提示）。系统提示起到框架的作用，为人工智能在特定参数范围内运行并生成连贯、相关且符合预期结果的回复创造条件。

阶段 1

系统提示

You are an expert python developer who works with pandas. You have to generate simple pandas ‘command’ for the user queries in JSON format. No need to add ‘print’ function. Analyse the datatypes of the columns before generating the command. If unfeasible, return ‘None’.

主提示

The dataframe name is ‘df’. df has the columns {cols} and their datatypes are {dtype}. df is in the following format: {desc}. The head of df is: {head}. You cannot use df.info() or any command that cannot be printed. Write a pandas command for this query on the dataframe df: {user_query}

我们将把元数据动态注入这个提示符，以生成 pandas 命令。我已指示它不要将 df.info() 作为命令生成，因为它会直接将信息打印到控制台，而不使用打印命令。这将导致变量中出现 None。

响应模式

class Command(typing_extensions.TypedDict):
    command: str

在 Gemini SDK 中，你可以切换到 JSON 模式。要获得一致的 JSON 输出，可将响应模式定义为 Python 类，并将其作为参数传递给 LLM 响应生成函数。在命令类中，JSON 将只有一个带字符串值的关键命令。

// For User Query: "Show me top 5 rows"
{
  "command":"df.head()"
}

第 2 阶段

系统提示

Your task is to comprehend. You must analyse the user query and response data to generate a response data in natural language.

主提示

The user query is {final_query}. The output of the command is {str(data)}. If the data is ‘None’, you can say ‘Please ask a query to get started’. Do not mention the command used. Generate a response in natural language for the output.

将所有内容整合在一起

为了方便起见，我们将在本项目中使用 Streamlit，这样可以节省大量时间。本项目旨在了解 80% 的 GenAI 应用程序的内部结构。

数据帧元数据

df = pd.read_csv(uploaded_file)
head = str(df.head().to_dict())
desc = str(df.describe().to_dict())
cols = str(df.columns.to_list())
dtype = str(df.dtypes.to_dict())

系统提示

# Stage 1
model_pandas = genai.GenerativeModel('gemini-1.5-flash-latest', system_instruction="You are an expert python developer who works with pandas. You make sure to generate simple pandas 'command' for the user queries in JSON format. No need to add 'print' function. Analyse the datatypes of the columns before generating the command. If unfeasible, return 'None'. ")
# Stage 2 
model_response = genai.GenerativeModel('gemini-1.5-flash-latest', system_instruction="Your task is to comprehend. You must analyse the user query and response data to generate a response data in natural language.")

主提示

# Stage 1
final_query = f"The dataframe name is 'df'. df has the columns {cols} and their datatypes are {dtype}. df is in the following format: {desc}. The head of df is: {head}. You cannot use df.info() or any command that cannot be printed. Write a pandas command for this query on the dataframe df: {user_query}"
# Stage 2
natural_response = f"The user query is {final_query}. The output of the command is {str(data)}. If the data is 'None', you can say 'Please ask a query to get started'. Do not mention the command used. Generate a response in natural language for the output."

生成响应

# Stage 1
response = model_pandas.generate_content(
                final_query,
                generation_config=genai.GenerationConfig(
                    response_mime_type="application/json",
                    response_schema=Command,
                    temperature=0.3
                )
            )

# Stage 2
bot_response = model_response.generate_content(
                natural_response,
                generation_config=genai.GenerationConfig(temperature=0.7)
            )

temperature允许我们控制生成的响应的随机性。较高的温度会产生更具创意和多样性的响应，但可能会偏离实际的用户查询。较低的温度会产生更一致和更集中的响应，尽管它们的创意可能较少。这就是为什么我为第 1 阶段选择了较低的温度以获得准确的命令，为第 2 阶段选择了较高的温度，这样响应就不会单调和机械化。你可以尝试找到最佳点。

为了执行 pandas 命令，我们将使用 Python 中的 exec() 函数，它允许我们动态运行 Python 代码。我知道 exec() 函数存在安全漏洞，但在这种情况下，别无他法。你当然可以改进架构或添加更多验证，使 exec() 的使用更加安全。由于这只是一个个人项目，我还没有采取这些措施。

用户界面

文章来源：https://levelup.gitconnected.com/chat-with-csv-files-using-googles-gemini-flash-no-langchain-0e8f79d63348

标签：

人工智能

0 评论

欢迎关注ATYUN官方公众号

商务合作及内容投稿请联系邮箱:bd@atyun.com

上一篇预测时间序列数据：使用机器学习、生成式人工智能和深度学习

下一篇 GraphRAG：构建知识图谱

评论登录

要发表评论，您必须先登录。

jonatasgrosman/wav2vec2-large-xlsr-53-english facebook/dino-vitb16 bert-base-uncased xlm-roberta-large xlm-roberta-base gpt2 microsoft/resnet-50 facebook/dino-vits8

AGENTIC AI如何塑造未来