使用OpenAI Whisper和Ollama (Llama3)进行语音转SQL

2024年10月29日由 alex 发表 573 0

我最近在重温《哈利·波特》系列，一直在想：老师们和级长们不停地给各个学院扣分！但是他们怎么在这么多班级中追踪这些分数的变化呢？数据完整性怎么保证？可扩展性如何？写入冲突怎么解决？他们肯定需要一种可扩展的系统，比如用于学院分数更新的发布-订阅系统。除了规模问题，语音识别需要达到什么水平呢？

这让我开始思考——我们能用人工智能重现其中的一些功能吗？如果我们能直接从语音转换到SQL会怎么样？这就是我最终投入到这个有趣小实验的原因：使用OpenAI的Whisper进行转录，再使用Meta的Llama3将文本转换为SQL查询，从而实现语音到SQL的转换。

以下是我实现这一功能的方法，你也只需四个简单步骤就能做到：

第一步：录制音频

我们首先使用一个简单的Python设置来捕获音频。利用sounddevice库，我们直接从你的麦克风录制音频，然后将其临时保存为.wav文件，以便稍后进行转录。

import sounddevice as sd
import tempfile
import wave
# Function to record audio from the microphone and save it as a WAV file
def record_audio(duration, sample_rate=16000):
    print("Recording...")
    audio_data = sd.rec(int(duration * sample_rate), samplerate=sample_rate, channels=1, dtype='float32')
    sd.wait()  # Wait for the recording to finish
    print("Recording finished.")
    
    # Save the audio to a temporary WAV file
    temp_wav = tempfile.NamedTemporaryFile(delete=False, suffix=".wav")
    with wave.open(temp_wav.name, 'wb') as wf:
        wf.setnchannels(1)  # Mono channel
        wf.setsampwidth(2)   # 16-bit audio
        wf.setframerate(sample_rate)
        wf.writeframes(np.int16(audio_data * 32767))  # Convert float32 to int16
    
    return temp_wav.name

第二步：使用Whisper进行语音转文本

接下来，我们使用OpenAI的Whisper模型对音频进行转录。这个模型非常擅长将语音转换为文本。它就像是一个个人助理，能够听取你的指令并写下来——只是更加可靠和可扩展。

import whisper
import os
# Function to transcribe audio from the microphone using Whisper
def audio_to_text_from_mic(duration=5):
    # Record audio from the microphone
    audio_file = record_audio(duration)
    # Load Whisper model
    model = whisper.load_model("turbo")  # You can use "turbo", "small", etc.
    
    # Transcribe the recorded audio file
    result = model.transcribe(audio_file)
    # Delete the temporary audio file after transcription
    os.remove(audio_file)
    return result['text']
# Example usage
text = audio_to_text_from_mic(duration=3)  # Record for 5 seconds
print("Transcription:", text)

Transcription:  10 points to Gryffindor

第三步：使用Llama 3进行文本到SQL的转换

现在，真正的魔法来了——将转录后的文本转换成SQL指令。我们使用Llama 3模型，输入自然语言指令（例如“给Gryffindor加10分”），然后输出一个有效的SQL查询。

我们首先构建一个提示，给出关于数据库模式（schema）的上下文。在我们的例子中，house_points表有两列：house_name（学院名称）和points（当前总分）。提示清楚地解释了这一结构，并指示模型返回一个格式良好的SQL UPDATE查询，无需不必要的解释。

以下是逐步进行的过程：

定义表模式：我们提供表的结构，使模型清楚地了解它的样子。模式指定表包含house_name和points。

 table_schemas = """
    house_points(house_name TEXT PRIMARY KEY, points INTEGER)
    """

创建提示：我们生成一个提示，要求Llama 3将自然语言指令转换为SQL UPDATE查询。它明确要求以JSON格式返回响应，并且只包含查询，以确保输出干净且可用。

  prompt = f"""
    You are a SQL expert.
    
    Please help to convert the following natural language command into a valid UPDATE SQL query. Your response should ONLY be based on the given context and follow the response guidelines and format instructions.
    ===Tables
    {table_schemas}
    ===Response Guidelines
    1. If the provided context is sufficient, please generate a valid query WITHOUT any explanations for the question.
    2. Please format the query before responding.
    3. Please always respond with a valid well-formed JSON object with the following format
    4. There are only UPDATE queries and points are either added or deducted from a house
    ===Response Format
    {{
        "query": "A valid UPDATE SQL query when context is sufficient.",
    }}
    ===command
    {natural_language_text}
    """

向Llama 3发送请求：然后，文本通过Ollama API发送到大型语言模型（LLM）。模型处理请求并返回一个包含SQL查询的JSON对象。我们解析模型的JSON响应以提取SQL查询。如果出现问题，比如无法解析响应，则会返回一个错误。这确保了代码的健壮性。

import ollama
import json
response = ollama.chat(
            model="llama3",
            messages=[{"role": "user", "content": prompt}]
        )
# Directly return the content as it should now be only the SQL query
# Parse the JSON response and return the SQL query if provided
response_content = response['message']['content']
# Directly return the content as it should now be only the SQL query
    # Parse the JSON response and return the SQL query if provided
    response_content = response['message']['content']
    
    try:
        response_json = json.loads(response_content)
        if "query" in response_json:
            return response_json["query"]
        else:
            return f"Error: {response_json.get('explanation', 'No explanation provided.')}"
    except json.JSONDecodeError:
        return "Error: Failed to parse response as JSON."

由此，你的“给Gryffindor加10分”就变成了如下的SQL查询：

UPDATE house_points SET points = points + 10 WHERE house_name = 'Gryffindor';

第四步：执行SQL查询

最后，我们获取生成的SQL查询并在数据库上执行，以更新学院分数。但在深入查询执行之前，让我们确保初始设置已经到位。

首先，你需要一个表来跟踪每个霍格沃茨学院的分数。这里有一个简单的表结构可以完成这项工作：

CREATE TABLE house_points (
  house_name VARCHAR(50) PRIMARY KEY,
  points INT
);

现在，用每个学院的初始分数填充这个表。这里有一个快速的SQL命令，给每个学院100分作为起始分数：

INSERT INTO house_points (house_name, points)
VALUES ('Gryffindor', 100), ('Hufflepuff', 100), ('Ravenclaw', 100), ('Slytherin', 100);

一旦你的数据库准备就绪，你需要建立连接来运行查询。使用SQLAlchemy可以让这一切变得非常简单。以下是建立连接的方法：

from sqlalchemy import create_engine, text
engine = create_engine('postgresql://db_user:db_password@localhost/db_name')
def run_sql_query(query):
    with engine.connect() as conn:
        conn.execute(text(query))
        conn.commit()

将'db_user'、'db_password'和'db_name'替换为你实际的PostgreSQL凭据和数据库名称。

这个函数接收由我们的语音到SQL脚本生成的SQL查询，并在你的数据库上执行它。每次新的语音指令更新分数时，这个函数都会运行相应的SQL并提交更改，确保对学院分数表进行实时更新。

结论

只需几行代码，你就构建了一个语音驱动的SQL执行器。非常适合那些你扮演内心深处的麦格教授，优雅地扣除分数的时刻。所以，无论你是在管理霍格沃茨还是任何数据集，语音到SQL或许正是你节省时间所需的魔法。

文章来源：https://medium.com/towards-artificial-intelligence/speech-to-sql-using-openais-whisper-and-ollama-llama3-0429675c7d68

标签：

人工智能深度学习

0 评论

欢迎关注ATYUN官方公众号

商务合作及内容投稿请联系邮箱:bd@atyun.com

上一篇如何使用LangGraph、LangServe和AWS构建和部署AI代理

下一篇使用Ollama增强Llama 3.2-Vision的OCR功能

评论登录

要发表评论，您必须先登录。

jonatasgrosman/wav2vec2-large-xlsr-53-english facebook/dino-vitb16 bert-base-uncased xlm-roberta-large xlm-roberta-base gpt2 microsoft/resnet-50 facebook/dino-vits8

AGENTIC AI如何塑造未来