模型:
pszemraj/e5-small-LinkedCringe-setfit-skl-20it-2e
在 LinkedCringe v0.2 上进行微调,使用训练集 intfloat/e5-small
这是一个初步测试/进行中的工作,但到目前为止还不错。
这是一个用于文本分类的 SetFit model ,采用了一种高效的少样本学习技术进行训练,包括以下步骤:
经过上述方法的训练,该模型可以预测 `
# numeric id: text label { 1: 'cringe', 2: 'relevant', 3: 'info', 4: 'noise' }` 的单个类标签。
要将此模型用于推断,请先安装SetFit库:
python -m pip install setfit
然后可以按以下方式运行推断:
from setfit import SetFitModel # Download from Hub and run inference model = SetFitModel.from_pretrained("pszemraj/e5-small-LinkedCringe-setfit-skl-20it-2e") # Run inference preds = model(["i loved the spiderman movie!", "pineapple on pizza is the worst ?"]) # manually refer to labels above preds
创建一个具有标签的“custom”包装类:
from setfit import SetFitModel from typing import List, Dict class PostClassifier: DEFAULT_ID2LABEL = {1: "cringe", 2: "relevant", 3: "info", 4: "noise"} def __init__( self, model_id: str = "pszemraj/e5-small-LinkedCringe-setfit-skl-20it-2e", id2label: Dict[int, str] = None, ): """Initialize PostClassifier with model name and/or label mapping.""" self.model = SetFitModel.from_pretrained(model_id) self.id2label = id2label if id2label else self.DEFAULT_ID2LABEL def classify(self, texts: List[str]) -> List[str]: """Classify list of texts, return list of corresponding labels.""" preds = self.model(texts) return [self.id2label[int(pred)] for pred in preds] def predict_proba(self, texts: List[str]) -> List[Dict[str, float]]: """Predict label probabilities for a list of texts, return a list of probability dictionaries.""" proba = self.model.predict_proba(texts) return [ {self.id2label.get(i + 1, "Unknown"): float(pred) for i, pred in enumerate(pred)} for pred in proba ] def __call__(self, texts: List[str]) -> List[str]: """Enable class instance to act as a function for text classification.""" return self.classify(texts)
实例化和分类:
# import PostClassifier if you defined it in another script etc model_name="pszemraj/e5-small-LinkedCringe-setfit-skl-20it-2e" classifier = PostClassifier(model_name) # classify some posts (these should all be cringe maaaaybe noise) posts = [ "? Innovation is our middle name! We're taking synergy to new heights and disrupting the market with our game-changing solutions. Stay tuned for the next paradigm shift! ? #CorporateRevolution #SynergisticSolutions", "? Attention all trailblazers! Our cutting-edge product is the epitome of excellence. It's time to elevate your success and ride the wave of unparalleled achievements. Join us on this journey towards greatness! ? #UnleashYourPotential #SuccessRevolution", "? We're not just a company, we're a global force for change! Our world-class team is committed to revolutionizing industries and making a lasting impact. Together, let's reshape the future and leave a legacy that will be remembered for ages! ? #GlobalTrailblazers #LegacyMakers", "? Harness the power of synergy and unlock your true potential with our transformative solutions. Together, we'll ignite a fire of success that will radiate across industries. Join the league of winners and conquer new frontiers! ? #SynergyChampions #UnleashThePowerWithin", "? Innovation alert! Our visionary team has cracked the code to redefine excellence. Get ready to be blown away by our mind-boggling breakthroughs that will leave your competitors in the dust. It's time to disrupt the status quo and embrace the future! ? #InnovationRevolution #ExcellenceUnleashed", "? Welcome to the era of limitless possibilities! Our revolutionary platform will empower you to transcend boundaries and achieve unprecedented success. Together, let's shape a future where dreams become realities and ordinary becomes extraordinary! ✨ #LimitlessSuccess #DreamBig", "? Brace yourselves for a seismic shift in the industry! Our game-changing product is set to revolutionize the way you work, think, and succeed. Say goodbye to mediocrity and join the league of pioneers leading the charge towards a brighter tomorrow! ? #IndustryDisruptors #PioneeringSuccess", "? Attention all innovators and disruptors! It's time to break free from the chains of convention and rewrite the rulebook of success. Join us on this exhilarating journey as we create a new chapter in the annals of greatness. The sky's not the limit—it's just the beginning! ? #BreakingBarriers #UnleashGreatness", "? Unlock the secret to unprecedented achievements with our exclusive formula for success. Our team of experts has distilled years of wisdom into a powerful elixir that will propel you to the zenith of greatness. It's time to embrace the extraordinary and become a legend in your own right! ? #FormulaForSuccess #RiseToGreatness", "? Step into the realm of infinite possibilities and seize the keys to your success. Our groundbreaking solutions will unlock doors you never knew existed, propelling you towards a future filled with limitless growth and prosperity. Dare to dream big and let us be your catalyst for greatness! ? #UnlockYourPotential #LimitlessSuccess" ] post_preds = classifier(posts) print(post_preds)
***** Running evaluation ***** {'accuracy': 0.8, 'based_model_id': 'intfloat/e5-small', 'tuned_model_id': 'e5-small-LinkedCringe-setfit-skl-20it-2e'} # 10-post results ['cringe', 'cringe', 'info', 'cringe', 'cringe', 'cringe', 'cringe', 'cringe', 'cringe', 'cringe']
注意:此处是针对 setfit 而非此检查点的引用。
@article{https://doi.org/10.48550/arxiv.2209.11055, doi = {10.48550/ARXIV.2209.11055}, url = {https://arxiv.org/abs/2209.11055}, author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren}, keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences}, title = {Efficient Few-Shot Learning Without Prompts}, publisher = {arXiv}, year = {2022}, copyright = {Creative Commons Attribution 4.0 International} }