安装BasicLingua
要安装我们的库,你可以使用 pip:
pip install basiclingua
初始化
要初始化,你需要导入库,并使用 Gemini API 密钥初始化客户端。你可以从此处免费获取 Gemini API 密钥。
# Import the library
from basiclingua import BasicLingua
# Initialize the client
client = BasicLingua("YOUR_GEMINI_API_KEY")
使用方法
该库为标记化、词干化、词素化等语言任务提供了广泛的功能。共有 31 种功能可用。下面举例说明如何使用该库:
模式提取
在下面的示例中,我们将使用 extract_patterns 方法从用户输入中提取电子邮件、电话号码和姓名等模式,而不是使用 regex。
# user_input is the text from which you want to extract patterns
user_input = '''The phone number of fareed khan and asad rizvi are
123-456-7890 and 523-456. Please call for
assistance and email me at x123@gmail.com'''
# patterns is the list of patterns you want to extract from the user_input
patterns = '''email, phone number, name'''
# Extract patterns from the user_input
extracted_patterns = client.extract_patterns(user_input, patterns)
# Print the extracted patterns
print(extracted_patterns)
######## Output ########
['123-456-7890', '523-456',
'fareed khan', 'asad rizvi', 'x123@gmail.com']
只需定义用逗号分隔的模式名称,我们的提示模板就能指示 LLM 提取这些模式,而无需额外的信息或编码。它能从输入中正确识别出所要求的模式,并以列表形式返回。执行时间约为 1 到 2 秒。
意图识别
在下面的示例中,我们将使用 text_intent 方法从用户输入中找出用户的意图。
# user_input is the text from which you want to find intent of the user
user_input = '''let's book a flight for our vacation and reserve
a table at a restaurant for dinner, Also going
to watch football match at 8 pm.'''
# Find the intent of the user
intent = client.text_intent(user_input)
# Print the intent
print(intent)
######## Output ########
['Book Flight', 'Reserve Restaurant', 'Watch Football Match']
NER 检测
在下面的示例中,我们将使用 detect_ner 方法从用户输入中查找 NER 标记。
# user_input is the text from which you want to find NER tags
user_input = '''I love Lamborghini, but Bugatti is even better.
Although, Mercedes is a class above all and I work in Google'''
# ner_tags is the list of NER tags you want to find from the user_input
ner_tags="cars, date, time"
# Find the NER tags from the user_input
answer = client.detect_ner(user_input, ner_tags)
# Print the NER tags
print(answer)
######## Output ########
[('Lamborghini', 'cars'), ('Bugatti', 'cars'),
('Mercedes', 'cars'), ('Google', 'organization')]
拼写检查程序
在下面的示例中,我们将使用 text_spellcheck 方法来纠正用户输入中的拼写错误。
# # user_input is the text from which you want to correct spelling
user_input = '''we wlli oderr pzzia adn buregsr at nghti'''
# calling spellcheck method
corrected_text = client.text_spellcheck(user_input)
# printing the result
print(corrected_text)
######## Output ########
we will order pizza and burgers at night
文本聚类
在下面的示例中,我们将使用 text_cluster 方法对 user_input 变量中提供的句子进行聚类。
# User input on which clustering has to be performed
user_input = '''
"The company reported record profits for the third quarter.", "The latest fashion trends for spring and summer are unveiled.",
"Profits soared in the third quarter, reaching unprecedented levels.", "Tips for improving productivity in the workplace."
'''
# calling clustering method
clusters = client.text_cluster(user_input)
# printing clusters
print(clusters)
######## Output ########
{0: ['"The company ..."', '"Profits soared in the ... levels."'],
1: ['"The latest fashion trends for spring and summer are unveiled."'],
2: ['"Tips for improving productivity in the workplace."']
}
主题分类
在下面的示例中,我们将使用 text_topic 方法对给定文本进行主题分类。
# User input on which classification has to be performed
user_input = '''a ghost is chasing me in the dark forest.
I am scared and running for my life.
I hope I can make it out alive.'''
# topics
num_classes = "story, horror, comedy"
# calling the method
answer = client.text_topic(user_input, num_classes, explanation=True)
# printing output with explanation parameter TRUE
print(answer, explanation=True)
######## Output ########
{
'prediction': 'horror',
'explanation': 'The text is about a ghost chasing the speaker ...'
}
这个库中还有更多可用的功能,例如从 OCR 提取到文本异常检测,包括一些你可能从未想过可以通过编码实现的功能。请查看我的 GitHub 仓库,了解其他功能。
驱动 BasicLingua 的核心理念
下面是我们的库如何工作的直观图示:
你可以利用这一概念创建自己的 NLP 库,从而提高工作效率。这只是 LLM 如何重塑 NLP 任务和简化文本数据处理的一个缩影。你可以根据自己的特定领域,随意调整该库。