使用LlamaIndex和OpenLLM构建智能查询响应系统

2024年01月10日由 alex 发表 784 0

在过去的一年中，像GPT-4这样的大型语言模型（LLM）不仅改变了我们与机器的互动方式，还重新定义了自然语言处理（NLP）领域内的可能性。在这一演变中一个显著的趋势是开源大型语言模型（如Llama 2、Falcon、OPT和Yi）的日益流行。一些人可能会因为它们在可访问性、数据安全与隐私、定制潜力、成本和供应商依赖方面的优势，而偏好于它们的商业对标产品。

在大型语言模型（LLM）领域越来越受到关注的工具包括OpenLLM和LlamaIndex——两个强大的平台，结合起来，可以为构建AI驱动的应用程序解锁新的使用案例。

OpenLLM是一个用于在生产中部署和操作任何开源LLM的开源平台。它的灵活性和易用性使其成为寻求利用LLMs能力的AI应用开发者的理想选择。你可以轻松地在一系列富有创意和实用的应用中进行微调、服务、部署和监控LLM。

LlamaIndex提供了一个全面的框架，用于管理和检索私有和特定领域的数据。它充当着LLM广泛知识和特定应用独特的上下文数据需求之间的桥梁。

OpenLLM对多种开源LLM的支持和LlamaIndex无缝集成自定义数据源的能力，为两个社区中的开发者提供了极大的定制性。这种结合允许他们创建既高度智能又能针对特定数据上下文适度定制的AI解决方案，这对于查询-响应系统来说非常重要。

在本文中，我将解释如何利用OpenLLM和LlamaIndex的组合优势来构建一个智能查询-响应系统。这个系统可以理解、处理和对查询作出响应，通过挖掘定制的语料库。

设置环境

第一步是在你的机器上创建一个虚拟环境，这有助于避免与你可能在进行的其他Python项目发生冲突。我们就叫它llamaindex-openllm并激活它。

python -m venv llamaindex-openllm
source llamaindex-openllm/bin/activate

安装所需的包。此命令安装带有可选的 vllm 组件的 OpenLLM（我稍后会解释这个）。

pip install "openllm[vllm]" llama-index"openllm[vllm]" llama-index

为了处理请求，你需要有一个LLM服务器。在这里，我使用以下命令在http://localhost:3000启动一个Llama 2 7B本地服务器。随意选择任何符合你需求的模型。如果你已经有一个远程LLM服务器，你可以跳过这一步。

openllm start meta-llama/Llama-2-7b-chat-hf --backend vllm

OpenLLM会自动选择最适合模型的运行时实现。对于支持vLLM的模型，OpenLLM默认使用vLLM。否则，它将回退到使用PyTorch。vLLM是一个高吞吐量和内存高效的大语言模型(LLMs)推理和服务引擎。根据这份报告，使用vLLM你可以实现23倍的LLM推理吞吐量，同时减少P50延迟。

注意：要使用vLLM后端，你需要至少搭载有Ampere架构（或更新）和CUDA版本11.8的GPU。这个演示使用了一台装有Ampere A100–80G GPU的机器。如果你的机器有兼容的GPU，你也可以选择vLLM。否则，仅需在前一个命令中安装标准的OpenLLM包（pip install openllm）即可。

v1: 创建一个简单的补全服务

在构建一个查询-响应系统之前，让我们先熟悉OpenLLM和LlamaIndex的整合，并使用它来创建一个简单的补全服务。

这种整合提供了两个与LLM交互的API：

OpenLLM：这可以用于直接启动一个本地LLM服务器，无需使用像openllm start这样的命令启动一个单独的服务器。使用方法如下：

from llama_index.llms.openllm import OpenLLM
llm = OpenLLM('meta-llama/Llama-2-7b-chat-hf')

OpenLLMAPI：这个接口可以用来与其他地方托管的服务器进行交互，例如我之前启动的Llama 2 7B模型。

我们来尝试完整的端点，并看看Llama 27B模型是否能够通过完成句子“OpenLLM is an open source tool for”来告诉我们什么是OpenLLM。

from llama_index.llms import OpenLLMAPI
remote_llm = OpenLLMAPI(address="http://localhost:3000")
completion_response = remote_llm.complete("OpenLLM is an open source tool for", max_new_tokens=1024)
print(completion_response)

执行这个脚本，以下是输出结果：

learning lifelong learning models. It is designed to be easy to use, even for those without extensive knowledge of machine learning. OpenLLM allows users to train, evaluate, and deploy lifelong learning models using a variety of datasets and algorithms.
OpenLLM provides a number of features that make it useful for learning lifelong learning models. Some of these features include:
1. Easy-to-use interface: OpenLLM provides an easy-to-use interface that makes it simple to train, evaluate, and deploy lifelong learning models.
2. Support for a variety of datasets: OpenLLM supports a variety of datasets, including images, text, and time-series data.
3. Support for a variety of algorithms: OpenLLM supports a variety of algorithms for lifelong learning, including neural networks, decision trees, and support vector machines.
4. Evaluation tools: OpenLLM provides a number of evaluation tools that allow users to assess the performance of their lifelong learning models.
5. Deployment tools: OpenLLM provides a number of deployment tools that allow users to deploy their lifelong learning models in a variety of environments.
OpenLLM is written in Python and is available under an open source license. It is designed to be used in a variety of settings, including research, education, and industry.
Some potential use cases for OpenLLM include:
1. Training lifelong learning models for image classification: OpenLLM could be used to train a lifelong learning model to classify images based on their content.
2. Training lifelong learning models for natural language processing: OpenLLM could be used to train a lifelong learning model to process and analyze natural language text.
3. Training lifelong learning models for time-series data: OpenLLM could be used to train a lifelong learning model to predict future values in a time-series dataset.
4. Deploying lifelong learning models in a production environment: OpenLLM could be used to deploy a lifelong learning model in a production environment, such as a recommendation system or a fraud detection system.
Overall, OpenLLM is a powerful tool for learning lifelong learning models. Its ease of use, flexibility, and support for a variety of datasets and algorithms make it a valuable resource for researchers and practitioners in a variety of fields.

显然，这个模型无法正确解释OpenLLM时偶尔会出现的一些幻觉。尽管如此，由于服务器对请求作出了响应，代码仍然运行良好。这是我们继续构建系统的一个良好开端。

v2：增强查询响应系统

初版揭示了一个关键限制：模型对OpenLLM的具体知识缺乏。一个解决方案是向模型提供特定领域的信息，让它学习并根据主题特定的查询进行回应。此时，LlamaIndex的作用就显现出来，它可以让你建立一个包含相关信息的本地知识库。具体来说，你可以创建一个目录（例如，data）并为文件夹中的所有文档构建一个索引。

创建一个文件夹，并让我们将OpenLLM的GitHub README文件导入该文件夹：

mkdir data
cd data
wget https://github.com/bentoml/OpenLLM/blob/main/README.md

返回到上一个目录，并创建一个名为 starter.py 的脚本，如下所示：

from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.llms import OpenLLMAPI
from llama_index.text_splitter import SentenceSplitter
# Change the address to your OpenLLM server
llm = OpenLLMAPI(address="http://localhost:3000")
# Break down the document into manageable chunks (each of size 1024 characters, with a 20-character overlap)
text_splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=20)
# Create a ServiceContext with the custom model and all the configurations
service_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model="local",
    text_splitter=text_splitter,
    context_window=8192,
    num_output=4096,
)
# Load documents from the data directory
documents = SimpleDirectoryReader("data").load_data()
# Build an index over the documents using the customized LLM in the ServiceContext
index = VectorStoreIndex.from_documents(documents, service_context=service_context)
# Query your data using the built index
query_engine = index.as_query_engine()
response = query_engine.query("What is OpenLLM?")
print(response)

为了提高你的回应质量，我建议你定义一个SentenceSplitter，以便对输入处理进行更精细的控制，从而得到更好的输出质量。

此外，你可以设置streaming=True来流式传输你的回应。

query_engine = index.as_query_engine(streaming=True)
response = query_engine.query("What is OpenLLM?")
response.print_response_stream()

你的目录结构现在应该如此:

├── starter.py
└── data
    └── README.md

执行 starter.py 来测试查询-响应系统。输出应与 OpenLLM README 的内容保持一致。以下是我收到的回复：

OpenLLM 是一个开源平台，用于在各种环境中部署和管理大型语言模型（LLM），包括本地、云端和边缘设备。它为LLMs的微调、服务、部署和监控提供了全面的工具和功能集合，简化了LLM的端到端部署工作流程。

结论

本文中的探索强调了为满足特定需求而定制 AI 工具的重要性。通过使用 OpenLLM 来灵活部署 LLM和使用 LlamaIndex 进行数据管理，我演示了如何创建一个 AI 驱动的系统。它不仅能理解和处理查询，还能基于独特知识库提供响应。

文章来源：https://medium.com/llamaindex-blog/building-an-intelligent-query-response-system-with-llamaindex-and-openllm-ff253a200bdf

标签：

人工智能

0 评论

欢迎关注ATYUN官方公众号

商务合作及内容投稿请联系邮箱:bd@atyun.com

上一篇 ResNeXt：利用宽残差网络彻底改变深度学习

下一篇 MobileNet：利用高效的神经网络彻底改变移动和边缘计算

评论登录

要发表评论，您必须先登录。

jonatasgrosman/wav2vec2-large-xlsr-53-english facebook/dino-vitb16 bert-base-uncased xlm-roberta-large xlm-roberta-base gpt2 microsoft/resnet-50 facebook/dino-vits8

AGENTIC AI如何塑造未来