BasicLingua：基于LLM的NLP库使用指南

2024年03月13日由 alex 发表 340 0

安装BasicLingua

要安装我们的库，你可以使用 pip：

pip install basiclingua

初始化

要初始化，你需要导入库，并使用 Gemini API 密钥初始化客户端。你可以从此处免费获取 Gemini API 密钥。

# Import the library
from basiclingua import BasicLingua
# Initialize the client
client = BasicLingua("YOUR_GEMINI_API_KEY")

使用方法

该库为标记化、词干化、词素化等语言任务提供了广泛的功能。共有 31 种功能可用。下面举例说明如何使用该库：

模式提取

在下面的示例中，我们将使用 extract_patterns 方法从用户输入中提取电子邮件、电话号码和姓名等模式，而不是使用 regex。

# user_input is the text from which you want to extract patterns
user_input = '''The phone number of fareed khan and asad rizvi are 
                123-456-7890 and 523-456. Please call for
                assistance and email me at x123@gmail.com'''
# patterns is the list of patterns you want to extract from the user_input
patterns = '''email, phone number, name'''
# Extract patterns from the user_input
extracted_patterns = client.extract_patterns(user_input, patterns)
# Print the extracted patterns
print(extracted_patterns)
######## Output ########
['123-456-7890', '523-456', 
'fareed khan', 'asad rizvi', 'x123@gmail.com']

只需定义用逗号分隔的模式名称，我们的提示模板就能指示 LLM 提取这些模式，而无需额外的信息或编码。它能从输入中正确识别出所要求的模式，并以列表形式返回。执行时间约为 1 到 2 秒。

意图识别

在下面的示例中，我们将使用 text_intent 方法从用户输入中找出用户的意图。

# user_input is the text from which you want to find intent of the user
user_input = '''let's book a flight for our vacation and reserve 
                a table at a restaurant for dinner, Also going 
                to watch football match at 8 pm.'''
# Find the intent of the user
intent = client.text_intent(user_input)
# Print the intent
print(intent)
######## Output ########
['Book Flight', 'Reserve Restaurant', 'Watch Football Match']

NER 检测

在下面的示例中，我们将使用 detect_ner 方法从用户输入中查找 NER 标记。

# user_input is the text from which you want to find NER tags
user_input = '''I love Lamborghini, but Bugatti is even better.
                Although, Mercedes is a class above all and I work in Google'''
# ner_tags is the list of NER tags you want to find from the user_input
ner_tags="cars, date, time"
# Find the NER tags from the user_input
answer = client.detect_ner(user_input, ner_tags)
# Print the NER tags
print(answer)
######## Output ########
[('Lamborghini', 'cars'), ('Bugatti', 'cars'), 
('Mercedes', 'cars'), ('Google', 'organization')]

拼写检查程序

在下面的示例中，我们将使用 text_spellcheck 方法来纠正用户输入中的拼写错误。

# # user_input is the text from which you want to correct spelling
user_input = '''we wlli oderr pzzia adn buregsr at nghti'''
# calling spellcheck method
corrected_text = client.text_spellcheck(user_input)
# printing the result
print(corrected_text)
######## Output ########
we will order pizza and burgers at night

文本聚类

在下面的示例中，我们将使用 text_cluster 方法对 user_input 变量中提供的句子进行聚类。

# User input on which clustering has to be performed
user_input = '''
"The company reported record profits for the third quarter.", "The latest fashion trends for spring and summer are unveiled.",
"Profits soared in the third quarter, reaching unprecedented levels.", "Tips for improving productivity in the workplace."
'''
# calling clustering method
clusters = client.text_cluster(user_input)
# printing clusters
print(clusters)
######## Output ########
{0: ['"The company ..."', '"Profits soared in the ... levels."'], 
1: ['"The latest fashion trends for spring and summer are unveiled."'], 
2: ['"Tips for improving productivity in the workplace."']
}

主题分类

在下面的示例中，我们将使用 text_topic 方法对给定文本进行主题分类。

# User input on which classification has to be performed
user_input = '''a ghost is chasing me in the dark forest. 
                I am scared and running for my life. 
                I hope I can make it out alive.'''
# topics
num_classes = "story, horror, comedy"
# calling the method
answer = client.text_topic(user_input, num_classes, explanation=True)
# printing output with explanation parameter TRUE
print(answer, explanation=True)

######## Output ########
{
  'prediction': 'horror', 
  'explanation': 'The text is about a ghost chasing the speaker ...'
}

这个库中还有更多可用的功能，例如从 OCR 提取到文本异常检测，包括一些你可能从未想过可以通过编码实现的功能。请查看我的 GitHub 仓库，了解其他功能。

驱动 BasicLingua 的核心理念

下面是我们的库如何工作的直观图示：

你可以利用这一概念创建自己的 NLP 库，从而提高工作效率。这只是 LLM 如何重塑 NLP 任务和简化文本数据处理的一个缩影。你可以根据自己的特定领域，随意调整该库。

文章来源：https://levelup.gitconnected.com/llm-based-nlp-library-1596c267e54c

标签：

自然语言处理

0 评论

欢迎关注ATYUN官方公众号

商务合作及内容投稿请联系邮箱:bd@atyun.com

上一篇如何构建LLM申请（四）：矢量数据库

下一篇使用TensorRT加速Pytorch模型推理

评论登录

要发表评论，您必须先登录。

jonatasgrosman/wav2vec2-large-xlsr-53-english facebook/dino-vitb16 bert-base-uncased xlm-roberta-large xlm-roberta-base gpt2 microsoft/resnet-50 facebook/dino-vits8

AGENTIC AI如何塑造未来