人工智能能做什么
人工智能通过以下方式带来重大改进:
然而,一个重大挑战依然存在:如何处理人工智能模型返回的非结构化文本数据。
传统上,从模型输出中提取结构化数据(如 JSON)需要复杂的提示工程或正则表达式 (RegEx),由于人工智能模型的不可预测性和用户输入的可变性,这可能容易出错且不一致。
为解决这一问题,OpenAI 推出了两项创新功能: Json 模式和函数调用。本文将深入探讨函数调用功能,说明它如何简化从模型输出中提取结构化数据的过程,并辅以 TypeScript 代码示例。
当前的挑战
工作原理
根据 OpenAI 的文档,函数调用的基本步骤顺序如下:
四个关键概念,起初可能令人困惑:
构建代理的分步指南
业务案例: 开发农场旅行助理代理
我们的目标是开发一个农场旅行助理代理,旨在提升用户规划农场参观的体验。该数字助理将通过以下方式提供全面支持:
应用架构:
该流程图显示了应用程序的架构:
先决条件:
OpenAI API 密钥:可从 OpenAI 平台获取。
第 1 步:准备调用模型:
要启动对话,首先需要系统消息和用户任务提示:
const messages: ChatCompletionMessageParam[] = [];
console.log(StaticPrompts.welcome);
messages.push(SystemPrompts.context);
const userPrompt = await createUserMessage();
messages.push(userPrompt);
根据我的个人偏好,所有提示都存储在对象中,以便于访问和修改。请参考以下代码片段,了解应用程序中使用的所有提示。你可以根据自己的需要采用或修改这种方法。
StaticPrompts: 在整个对话过程中使用的静态信息。
export const StaticPrompts = {
welcome:
"Welcome to the farm assistant! What can I help you with today? You can ask me what I can do.",
fallback: "I'm sorry, I don't understand.",
end: "I hope I was able to help you. Goodbye!",
} as const;
UserPrompts: 根据用户输入生成的用户信息。
import OpenAI from "openai";
type ChatCompletionUserMessageParam = OpenAI.ChatCompletionUserMessageParam;
type UserPromptKey = "task";
type UserPromptValue = (userInput?: string) => ChatCompletionUserMessageParam;
export const UserPrompts: Record<UserPromptKey, UserPromptValue> = {
task: (userInput) => ({
role: "user",
content: userInput || "What can you do?",
}),
};
SystemPrompts: 根据系统上下文生成的系统信息。
import OpenAI from "openai";
type ChatCompletionSystemMessageParam = OpenAI.ChatCompletionSystemMessageParam;
type SystemPromptKey = "context";
export const SystemPrompts: Record<
SystemPromptKey,
ChatCompletionSystemMessageParam
> = {
context: {
role: "system",
content:
"You are an farm visit assistant. You are upbeat and friendly. You introduce yourself when first saying `Howdy!`. If you decide to call a function, you should retrieve the required fields for the function from the user. Your answer should be as precise as possible. If you have not yet retrieve the required fields of the function completely, you do not answer the question and inform the user you do not have enough information.",
},
};
FunctionPrompts: 函数信息,基本上是函数的返回值。
import OpenAI from "openai";
type ChatCompletionToolMessageParam = OpenAI.ChatCompletionToolMessageParam;
type FunctionPromptKey = "function_response";
type FunctionPromptValue = (
args: Omit<ChatCompletionToolMessageParam, "role">
) => ChatCompletionToolMessageParam;
export const FunctionPrompts: Record<FunctionPromptKey, FunctionPromptValue> = {
function_response: (options) => ({
role: "tool",
...options,
}),
};
第 2 步:定义工具
如前所述,工具本质上是对模型可调用功能的描述。在本例中,我们定义了四个工具来满足农场旅行助理代理的要求:
下面的代码片段演示了如何定义这些工具:
import OpenAI from "openai";
import {
ConvertTypeNameStringLiteralToType,
JsonAcceptable,
} from "../utils/type-utils.js";
type ChatCompletionTool = OpenAI.ChatCompletionTool;
type FunctionDefinition = OpenAI.FunctionDefinition;
// An enum to define the names of the functions. This will be shared between the function descriptions and the actual functions
export enum DescribedFunctionName {
FileComplaint = "file_complaint",
getFarms = "get_farms",
getActivitiesPerFarm = "get_activities_per_farm",
bookActivity = "book_activity",
}
// This is a utility type to narrow down the `parameters` type in the `FunctionDefinition`.
// It pairs with the keyword `satisfies` to ensure that the properties of parameters are correctly defined.
// This is a workaround as the default type of `parameters` in `FunctionDefinition` is `type FunctionParameters = Record<string, unknown>` which is overly broad.
type FunctionParametersNarrowed<
T extends Record<string, PropBase<JsonAcceptable>>
> = {
type: JsonAcceptable; // basically all the types that JSON can accept
properties: T;
required: (keyof T)[];
};
// This is a base type for each property of the parameters
type PropBase<T extends JsonAcceptable = "string"> = {
type: T;
description: string;
};
// This utility type transforms parameter property string literals into usable types for function parameters.
// Example: { email: { type: "string" } } -> { email: string }
export type ConvertedFunctionParamProps<
Props extends Record<string, PropBase<JsonAcceptable>>
> = {
[K in keyof Props]: ConvertTypeNameStringLiteralToType<Props[K]["type"]>;
};
// Define the parameters for each function
export type FileComplaintProps = {
name: PropBase;
email: PropBase;
text: PropBase;
};
export type GetFarmsProps = {
location: PropBase;
};
export type GetActivitiesPerFarmProps = {
farm_name: PropBase;
};
export type BookActivityProps = {
farm_name: PropBase;
activity_name: PropBase;
datetime: PropBase;
name: PropBase;
email: PropBase;
number_of_people: PropBase<"number">;
};
// Define the function descriptions
const FunctionDescriptions: Record<
DescribedFunctionName,
FunctionDefinition
> = {
[DescribedFunctionName.FileComplaint]: {
name: DescribedFunctionName.FileComplaint,
description: "File a complaint as a customer",
parameters: {
type: "object",
properties: {
name: {
type: "string",
description: "The name of the user, e.g. John Doe",
},
email: {
type: "string",
description: "The email address of the user, e.g. john@doe.com",
},
text: {
type: "string",
description: "Description of issue",
},
},
required: ["name", "email", "text"],
} satisfies FunctionParametersNarrowed<FileComplaintProps>,
},
[DescribedFunctionName.getFarms]: {
name: DescribedFunctionName.getFarms,
description: "Get the information of farms based on the location",
parameters: {
type: "object",
properties: {
location: {
type: "string",
description: "The location of the farm, e.g. Melbourne VIC",
},
},
required: ["location"],
} satisfies FunctionParametersNarrowed<GetFarmsProps>,
},
[DescribedFunctionName.getActivitiesPerFarm]: {
name: DescribedFunctionName.getActivitiesPerFarm,
description: "Get the activities available on a farm",
parameters: {
type: "object",
properties: {
farm_name: {
type: "string",
description: "The name of the farm, e.g. Collingwood Children's Farm",
},
},
required: ["farm_name"],
} satisfies FunctionParametersNarrowed<GetActivitiesPerFarmProps>,
},
[DescribedFunctionName.bookActivity]: {
name: DescribedFunctionName.bookActivity,
description: "Book an activity on a farm",
parameters: {
type: "object",
properties: {
farm_name: {
type: "string",
description: "The name of the farm, e.g. Collingwood Children's Farm",
},
activity_name: {
type: "string",
description: "The name of the activity, e.g. Goat Feeding",
},
datetime: {
type: "string",
description: "The date and time of the activity",
},
name: {
type: "string",
description: "The name of the user",
},
email: {
type: "string",
description: "The email address of the user",
},
number_of_people: {
type: "number",
description: "The number of people attending the activity",
},
},
required: [
"farm_name",
"activity_name",
"datetime",
"name",
"email",
"number_of_people",
],
} satisfies FunctionParametersNarrowed<BookActivityProps>,
},
};
// Format the function descriptions into tools and export them
export const tools = Object.values(
FunctionDescriptions
).map<ChatCompletionTool>((description) => ({
type: "function",
function: description,
}));
了解功能描述
功能描述需要以下按键:
添加新函数
引入新函数的步骤如下:
第 3 步:使用消息和工具调用模型
设置好消息和工具后,我们就可以使用它们调用模型了。
值得注意的是,截至 2024 年 3 月,只有 gpt-3.5-turbo-0125 和 gpt-4-turbo-preview 模型支持函数调用。
代码实现:
export const startChat = async (messages: ChatCompletionMessageParam[]) => {
const response = await openai.chat.completions.create({
model: "gpt-3.5-turbo",
top_p: 0.95,
temperature: 0.5,
max_tokens: 1024,
messages, // The messages array we created earlier
tools, // The function descriptions we defined earlier
tool_choice: "auto", // The model will decide whether to call a function and which function to call
});
const { message } = response.choices[0] ?? {};
if (!message) {
throw new Error("Error: No response from the API.");
}
messages.push(message);
return processMessage(message);
};
工具选择选项
tool_choice 选项控制模型的函数调用方法:
第 4 步:处理模型响应
模型的响应主要分为两类,其中一类可能会出错,因此需要回退信息:
1. 函数调用请求: 模型表示希望调用函数。这是函数调用的真正潜力所在。模型会根据上下文和用户查询智能地选择要执行的函数。例如,如果用户询问农场推荐,模型可能会建议调用 get_farms 函数。
但这并不仅限于此,模型还会分析用户输入,以确定其中是否包含函数调用所需的信息(参数)。如果没有,模型会提示用户提供缺失的详细信息。
一旦收集到所有必要信息(参数),模型就会返回一个 JSON 对象,详细说明函数名称和参数。这种结构化的响应可以毫不费力地转化为应用程序中的 JavaScript 对象,使我们能够无缝地调用指定的函数,从而确保流畅的用户体验。
此外,模型可以选择同时或按顺序调用多个函数,每个函数都需要特定的细节。在应用程序中管理这些细节对于流畅运行至关重要。
模型响应示例:
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_JWoPQYmdxNXdNu1wQ1iDqz2z",
"type": "function",
"function": {
"name": "get_farms", // The function name to be called
"arguments": "{\"location\":\"Melbourne\"}" // The arguments required for the function
}
}
... // multiple function calls can be present
]
}
2. 纯文本回复: 模型提供直接的文本回复。这是我们所习惯的人工智能模型的标准输出,为用户查询提供直接的答案。对于这些回复,只需返回文本内容即可。
模型响应示例:
{
"role": "assistant",
"content": {
"text": "I can help you with that. What is your location?"
}
}
区别的关键在于是否存在用于函数调用的 tool_calls 键。如果存在 tool_calls,则模型请求执行一个函数;否则,它将提供一个直接的文本响应。
要处理这些响应,可以考虑采用以下基于响应类型的方法:
type ChatCompletionMessageWithToolCalls = RequiredAll<
Omit<ChatCompletionMessage, "function_call">
>;
// If the message contains tool_calls, it extracts the function arguments. Otherwise, it returns the content of the message.
export function processMessage(message: ChatCompletionMessage) {
if (isMessageHasToolCalls(message)) {
return extractFunctionArguments(message);
} else {
return message.content;
}
}
// Check if the message has `tool calls`
function isMessageHasToolCalls(
message: ChatCompletionMessage
): message is ChatCompletionMessageWithToolCalls {
return isDefined(message.tool_calls) && message.tool_calls.length !== 0;
}
// Extract function name and arguments from the message
function extractFunctionArguments(message: ChatCompletionMessageWithToolCalls) {
return message.tool_calls.map((toolCall) => {
if (!isDefined(toolCall.function)) {
throw new Error("No function found in the tool call");
}
try {
return {
tool_call_id: toolCall.id,
function_name: toolCall.function.name,
arguments: JSON.parse(toolCall.function.arguments),
};
} catch (error) {
throw new Error("Invalid JSON in function arguments");
}
});
}
然后,从函数调用中提取的参数将用于执行应用程序中的实际函数,而文本内容则有助于进行对话。
下面是一个 if-else 模块,说明了这一过程是如何展开的:
const result = await startChat(messages);
if (!result) {
// Fallback message if response is empty (e.g., network error)
console.log(StaticPrompts.fallback);
} else if (isNonEmptyString(result)) {
// If the response is a string, log it and prompt the user for the next message
console.log(`Assistant: ${result}`);
const userPrompt = await createUserMessage();
messages.push(userPrompt);
} else {
// If the response contains function calls, execute the functions and call the model again with the updated messages
for (const item of result) {
const { tool_call_id, function_name, arguments: function_arguments } = item;
// Execute the function and get the function return
const functionReturn = await AvailableFunctions[
function_name as keyof typeof AvailableFunctions
](function_arguments);
// Add the function output back to the messages with a role of "tool", the id of the tool call, and the function return as the content
messages.push(
FunctionPrompts.function_response({
tool_call_id,
content: functionReturn,
})
);
}
}
第 5 步:执行函数并再次调用模型
当模型请求函数调用时,我们会在应用程序中执行该函数,然后用新信息更新模型。这样,模型就能随时了解函数的结果,从而给用户一个中肯的答复。
保持函数执行的正确顺序至关重要,尤其是当模型选择按顺序执行多个函数来完成一项任务时。使用 for 循环而不是 Promise.all 可以保持执行顺序,这对成功的工作流程至关重要。但是,如果函数是独立的,可以并行执行,则应考虑自定义优化以提高性能。
下面是执行函数的方法:
for (const item of result) {
const { tool_call_id, function_name, arguments: function_arguments } = item;
console.log(
`Calling function "${function_name}" with ${JSON.stringify(
function_arguments
)}`
);
// Available functions are stored in an object for easy access
const functionReturn = await AvailableFunctions[
function_name as keyof typeof AvailableFunctions
](function_arguments);
}
下面是如何用函数响应更新消息数组:
for (const item of result) {
const { tool_call_id, function_name, arguments: function_arguments } = item;
console.log(
`Calling function "${function_name}" with ${JSON.stringify(
function_arguments
)}`
);
const functionReturn = await AvailableFunctions[
function_name as keyof typeof AvailableFunctions
](function_arguments);
// Add the function output back to the messages with a role of "tool", the id of the tool call, and the function return as the content
messages.push(
FunctionPrompts.function_response({
tool_call_id,
content: functionReturn,
})
);
}
可调用的函数示例:
// Mocking getting farms based on location from a database
export async function get_farms(
args: ConvertedFunctionParamProps<GetFarmsProps>
): Promise<string> {
const { location } = args;
return JSON.stringify({
location,
farms: [
{
name: "Farm 1",
location: "Location 1",
rating: 4.5,
products: ["product 1", "product 2"],
activities: ["activity 1", "activity 2"],
},
...
],
});
}
带有函数响应的工具信息示例:
{
"role": "tool",
"tool_call_id": "call_JWoPQYmdxNXdNu1wQ1iDqz2z",
"content": {
// Function return value
"location": "Melbourne",
"farms": [
{
"name": "Farm 1",
"location": "Location 1",
"rating": 4.5,
"products": [
"product 1",
"product 2"
],
"activities": [
"activity 1",
"activity 2"
]
},
...
]
}
}
第 6 步:将结果汇总反馈给用户
运行函数并更新消息数组后,我们用这些更新的消息重新启动模型,向用户简要介绍结果。这需要通过循环反复调用 startChat 函数。
为避免无休止地循环,监控用户输入(如 “再见 ”或 “结束”)标志着对话结束,确保循环适当终止是至关重要的。
代码实现:
const CHAT_END_SIGNALS = [
"end",
"goodbye",
...
];
export function isChatEnding(
message: ChatCompletionMessageParam | undefined | null
) {
// If the message is not defined, log a fallback message
if (!isDefined(message)) {
throw new Error("Cannot find the message!");
}
// Check if the message is from the user
if (!isUserMessage(message)) {
return false;
}
const { content } = message;
return CHAT_END_SIGNALS.some((signal) => {
if (typeof content === "string") {
return includeSignal(content, signal);
} else {
// content has a typeof ChatCompletionContentPart, which can be either ChatCompletionContentPartText or ChatCompletionContentPartImage
// If user attaches an image to the current message first, we assume they are not ending the chat
const contentPart = content.at(0);
if (contentPart?.type !== "text") {
return false;
} else {
return includeSignal(contentPart.text, signal);
}
}
});
}
function isUserMessage(
message: ChatCompletionMessageParam
): message is ChatCompletionUserMessageParam {
return message.role === "user";
}
function includeSignal(content: string, signal: string) {
return content.toLowerCase().includes(signal);
}
结论
OpenAI 的函数调用代表了人工智能领域的一大进步,它允许模型执行自定义函数来响应用户查询。这一功能简化了从输出中获取结构化数据的过程,改善了用户交互,并实现了更复杂的交流。