模型:
pszemraj/e5-small-LinkedCringe-setfit-skl-20it-2e
在 LinkedCringe v0.2 上进行微调,使用训练集 intfloat/e5-small
这是一个初步测试/进行中的工作,但到目前为止还不错。
这是一个用于文本分类的 SetFit model ,采用了一种高效的少样本学习技术进行训练,包括以下步骤:
经过上述方法的训练,该模型可以预测 `
# numeric id: text label
{
1: 'cringe',
2: 'relevant',
3: 'info',
4: 'noise'
}
` 的单个类标签。 要将此模型用于推断,请先安装SetFit库:
python -m pip install setfit
然后可以按以下方式运行推断:
from setfit import SetFitModel
# Download from Hub and run inference
model = SetFitModel.from_pretrained("pszemraj/e5-small-LinkedCringe-setfit-skl-20it-2e")
# Run inference
preds = model(["i loved the spiderman movie!", "pineapple on pizza is the worst 🤮"])
# manually refer to labels above
preds
创建一个具有标签的“custom”包装类:
from setfit import SetFitModel
from typing import List, Dict
class PostClassifier:
DEFAULT_ID2LABEL = {1: "cringe", 2: "relevant", 3: "info", 4: "noise"}
def __init__(
self,
model_id: str = "pszemraj/e5-small-LinkedCringe-setfit-skl-20it-2e",
id2label: Dict[int, str] = None,
):
"""Initialize PostClassifier with model name and/or label mapping."""
self.model = SetFitModel.from_pretrained(model_id)
self.id2label = id2label if id2label else self.DEFAULT_ID2LABEL
def classify(self, texts: List[str]) -> List[str]:
"""Classify list of texts, return list of corresponding labels."""
preds = self.model(texts)
return [self.id2label[int(pred)] for pred in preds]
def predict_proba(self, texts: List[str]) -> List[Dict[str, float]]:
"""Predict label probabilities for a list of texts, return a list of probability dictionaries."""
proba = self.model.predict_proba(texts)
return [
{self.id2label.get(i + 1, "Unknown"): float(pred) for i, pred in enumerate(pred)}
for pred in proba
]
def __call__(self, texts: List[str]) -> List[str]:
"""Enable class instance to act as a function for text classification."""
return self.classify(texts)
实例化和分类:
# import PostClassifier if you defined it in another script etc
model_name="pszemraj/e5-small-LinkedCringe-setfit-skl-20it-2e"
classifier = PostClassifier(model_name)
# classify some posts (these should all be cringe maaaaybe noise)
posts = [
"🚀 Innovation is our middle name! We're taking synergy to new heights and disrupting the market with our game-changing solutions. Stay tuned for the next paradigm shift! 💥 #CorporateRevolution #SynergisticSolutions",
"🌟 Attention all trailblazers! Our cutting-edge product is the epitome of excellence. It's time to elevate your success and ride the wave of unparalleled achievements. Join us on this journey towards greatness! 🚀 #UnleashYourPotential #SuccessRevolution",
"🌍 We're not just a company, we're a global force for change! Our world-class team is committed to revolutionizing industries and making a lasting impact. Together, let's reshape the future and leave a legacy that will be remembered for ages! 💪 #GlobalTrailblazers #LegacyMakers",
"🔥 Harness the power of synergy and unlock your true potential with our transformative solutions. Together, we'll ignite a fire of success that will radiate across industries. Join the league of winners and conquer new frontiers! 🚀 #SynergyChampions #UnleashThePowerWithin",
"💡 Innovation alert! Our visionary team has cracked the code to redefine excellence. Get ready to be blown away by our mind-boggling breakthroughs that will leave your competitors in the dust. It's time to disrupt the status quo and embrace the future! 🌟 #InnovationRevolution #ExcellenceUnleashed",
"🌐 Welcome to the era of limitless possibilities! Our revolutionary platform will empower you to transcend boundaries and achieve unprecedented success. Together, let's shape a future where dreams become realities and ordinary becomes extraordinary! ✨ #LimitlessSuccess #DreamBig",
"💥 Brace yourselves for a seismic shift in the industry! Our game-changing product is set to revolutionize the way you work, think, and succeed. Say goodbye to mediocrity and join the league of pioneers leading the charge towards a brighter tomorrow! 🚀 #IndustryDisruptors #PioneeringSuccess",
"🚀 Attention all innovators and disruptors! It's time to break free from the chains of convention and rewrite the rulebook of success. Join us on this exhilarating journey as we create a new chapter in the annals of greatness. The sky's not the limit—it's just the beginning! 💫 #BreakingBarriers #UnleashGreatness",
"🌟 Unlock the secret to unprecedented achievements with our exclusive formula for success. Our team of experts has distilled years of wisdom into a powerful elixir that will propel you to the zenith of greatness. It's time to embrace the extraordinary and become a legend in your own right! 💥 #FormulaForSuccess #RiseToGreatness",
"🔑 Step into the realm of infinite possibilities and seize the keys to your success. Our groundbreaking solutions will unlock doors you never knew existed, propelling you towards a future filled with limitless growth and prosperity. Dare to dream big and let us be your catalyst for greatness! 🚀 #UnlockYourPotential #LimitlessSuccess"
]
post_preds = classifier(posts)
print(post_preds)
***** Running evaluation *****
{'accuracy': 0.8,
'based_model_id': 'intfloat/e5-small',
'tuned_model_id': 'e5-small-LinkedCringe-setfit-skl-20it-2e'}
# 10-post results
['cringe',
'cringe',
'info',
'cringe',
'cringe',
'cringe',
'cringe',
'cringe',
'cringe',
'cringe']
注意:此处是针对 setfit 而非此检查点的引用。
@article{https://doi.org/10.48550/arxiv.2209.11055,
doi = {10.48550/ARXIV.2209.11055},
url = {https://arxiv.org/abs/2209.11055},
author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Efficient Few-Shot Learning Without Prompts},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}