Vanna.ai:让GPT-4o与你的SQL数据库实现智能对话

2024年05月22日 由 alex 发表 725 0

OpenAI 刚刚发布了其最新旗舰机型 GPT 4o。这款先进的模型具有多模态功能,能够处理图像、文本和视频输入,生成超越以往任何模型的响应。本文章重点介绍如何使用 GPT 4o 和 Vanna AI 构建文本到 SQL 管道,与数据库聊天。


3


4


入门


from vanna.openai import OpenAI_Chat
from vanna.vannadb import VannaDB_VectorStore
class MyVanna(VannaDB_VectorStore, OpenAI_Chat):
    def __init__(self, config=None):
        MY_VANNA_MODEL = # Your model name from https://vanna.ai/account/profile
        VannaDB_VectorStore.__init__(self, vanna_model=MY_VANNA_MODEL, vanna_api_key=MY_VANNA_API_KEY, config=config)
        OpenAI_Chat.__init__(self, config=config)
# Add your OpenAI api_key
vn = MyVanna(config={'api_key': 'sk-...', 'model': 'gpt-4o'})


连接数据库

Vanna 已为以下 8 个数据库内置了连接器(你只需额外编写几行代码即可连接到其他数据库):


  1. Postgres SQL
  2. 甲骨文
  3. DuckDB
  4. MySQL
  5. SQLite
  6. 大查询
  7. 雪花
  8. 微软 SQL


在本篇文章中,我将连接到 DuckDB StackOverFlow 数据库。该数据库可在此处找到!


#This is how you can connect to a DuckDB database
vn.connect_to_duckdb(url='motherduck:[<database_name>]?motherduck_token=<token>&saas_mode=true')


训练


5


6


7


计划培训(信息模式)


# The information schema query may need some tweaking depending on your database. This is a good starting point.
df_information_schema = vn.run_sql("SELECT * FROM INFORMATION_SCHEMA.COLUMNS")


# This will break up the information schema into bite-sized chunks that can be referenced by the LLM
plan = vn.get_training_plan_generic(df_information_schema)
plan


# If you like the plan, then uncomment this and run it to train
vn.train(plan=plan)


DDL 培训


# In duckDB the describe statement can fetch the DDL for any table
vn.train(ddl="DESCRIBE SELECT * FROM Stackoverflow.users;")


SQL 语句培训


# here is an example of training on SQL statements
vn.train(
question="What are the top 10 users with highest amount of Badges?"
,sql="""SELECT UserId, COUNT(*) AS badge_count
FROM stackoverflow.main.badges
GROUP BY UserId
ORDER BY badge_count DESC
LIMIT 10
""")
# Another example
vn.train(
question="What is the difference in total answers for the user who answered the most answers and the user who answered the least questions?", 
,sql="SELECT MAX(answer_count) - MIN(answer_count) AS difference
FROM (
    SELECT OwnerUserId, COUNT(*) AS answer_count
    FROM stackoverflow.main.posts
    WHERE PostTypeId = 2
    GROUP BY OwnerUserId
) AS answer_counts;
")


文档培训


# You can feed in contextual information using documentation
vn.train(documentation="We call the user with the highest answers in a year the Grand master")


你可以使用 vn.get_training_data() 查看培训数据。


# vn.ask is runs these following functions in sequence, which can be run individually
# 1. vn.generate_ql
# 2. vn.run_sql
# 3. vn.generate_plotly_code
# 4. vn.get_plotly_figure
# this is how you can ask Vanna question's post training
vn.ask('Find the top 10 users with the highest amount of Badges?')


8


9


使用 Flask 应用程序

Vanna 自带一个内置用户界面 Flask 应用程序。它可以在 jupyter 笔记本或 Python 脚本中启动。


from vanna.flask import VannaFlaskApp
app = VannaFlaskApp(vn)
app.run()


这就是如何使用 GPT 4o 进行文本到 SQL 转换。


基准测试

下面是 GPT 4o 与其他同类机型的比较,可以看出它的准确率最高,达到 61%,ChatGPT 4 turbo 为 59%,Claude Opus 为 56


11

文章来源:https://arslanshahid-1997.medium.com/chat-with-your-sql-database-using-gpt-4o-via-vanna-ai-b87e3296f8dc
欢迎关注ATYUN官方公众号
商务合作及内容投稿请联系邮箱:bd@atyun.com
评论 登录
写评论取消
回复取消