模型:

pszemraj/e5-small-LinkedCringe-setfit-skl-20it-2e

英文

LinkedCringe v0.2: e5-small

在 LinkedCringe v0.2 上进行微调,使用训练集 intfloat/e5-small

这是一个初步测试/进行中的工作,但到目前为止还不错。

模型

这是一个用于文本分类的 SetFit model ,采用了一种高效的少样本学习技术进行训练,包括以下步骤:

  • 使用对比学习对 Sentence Transformer 进行微调。
  • 使用来自经过微调的Sentence Transformer的特征训练分类头。
  • 标签

    经过上述方法的训练,该模型可以预测 `

    # numeric id: text label 
    {
        1: 'cringe',
        2: 'relevant',
        3: 'info',
        4: 'noise'
    }
    
    ` 的单个类标签。

    使用方法

    要将此模型用于推断,请先安装SetFit库:

    python -m pip install setfit
    

    基本推断

    然后可以按以下方式运行推断:

    from setfit import SetFitModel
    
    # Download from Hub and run inference
    model = SetFitModel.from_pretrained("pszemraj/e5-small-LinkedCringe-setfit-skl-20it-2e")
    # Run inference
    preds = model(["i loved the spiderman movie!", "pineapple on pizza is the worst ?"])
    
    # manually refer to labels above
    preds
    

    具有utils的Class对象

    创建一个具有标签的“custom”包装类:

    from setfit import SetFitModel
    from typing import List, Dict
    
    
    class PostClassifier:
        DEFAULT_ID2LABEL = {1: "cringe", 2: "relevant", 3: "info", 4: "noise"}
    
        def __init__(
            self,
            model_id: str = "pszemraj/e5-small-LinkedCringe-setfit-skl-20it-2e",
            id2label: Dict[int, str] = None,
        ):
            """Initialize PostClassifier with model name and/or label mapping."""
            self.model = SetFitModel.from_pretrained(model_id)
            self.id2label = id2label if id2label else self.DEFAULT_ID2LABEL
    
        def classify(self, texts: List[str]) -> List[str]:
            """Classify list of texts, return list of corresponding labels."""
            preds = self.model(texts)
            return [self.id2label[int(pred)] for pred in preds]
    
        def predict_proba(self, texts: List[str]) -> List[Dict[str, float]]:
            """Predict label probabilities for a list of texts, return a list of probability dictionaries."""
            proba = self.model.predict_proba(texts)
            return [
                {self.id2label.get(i + 1, "Unknown"): float(pred) for i, pred in enumerate(pred)}
                for pred in proba
            ]
    
        def __call__(self, texts: List[str]) -> List[str]:
            """Enable class instance to act as a function for text classification."""
            return self.classify(texts)
    

    实例化和分类:

    # import PostClassifier if you defined it in another script etc
    model_name="pszemraj/e5-small-LinkedCringe-setfit-skl-20it-2e"
    classifier = PostClassifier(model_name)
    
    # classify some posts (these should all be cringe maaaaybe noise)
    posts = [
        "? Innovation is our middle name! We're taking synergy to new heights and disrupting the market with our game-changing solutions. Stay tuned for the next paradigm shift! ? #CorporateRevolution #SynergisticSolutions",
        "? Attention all trailblazers! Our cutting-edge product is the epitome of excellence. It's time to elevate your success and ride the wave of unparalleled achievements. Join us on this journey towards greatness! ? #UnleashYourPotential #SuccessRevolution",
        "? We're not just a company, we're a global force for change! Our world-class team is committed to revolutionizing industries and making a lasting impact. Together, let's reshape the future and leave a legacy that will be remembered for ages! ? #GlobalTrailblazers #LegacyMakers",
        "? Harness the power of synergy and unlock your true potential with our transformative solutions. Together, we'll ignite a fire of success that will radiate across industries. Join the league of winners and conquer new frontiers! ? #SynergyChampions #UnleashThePowerWithin",
        "? Innovation alert! Our visionary team has cracked the code to redefine excellence. Get ready to be blown away by our mind-boggling breakthroughs that will leave your competitors in the dust. It's time to disrupt the status quo and embrace the future! ? #InnovationRevolution #ExcellenceUnleashed",
        "? Welcome to the era of limitless possibilities! Our revolutionary platform will empower you to transcend boundaries and achieve unprecedented success. Together, let's shape a future where dreams become realities and ordinary becomes extraordinary! ✨ #LimitlessSuccess #DreamBig",
        "? Brace yourselves for a seismic shift in the industry! Our game-changing product is set to revolutionize the way you work, think, and succeed. Say goodbye to mediocrity and join the league of pioneers leading the charge towards a brighter tomorrow! ? #IndustryDisruptors #PioneeringSuccess",
        "? Attention all innovators and disruptors! It's time to break free from the chains of convention and rewrite the rulebook of success. Join us on this exhilarating journey as we create a new chapter in the annals of greatness. The sky's not the limit—it's just the beginning! ? #BreakingBarriers #UnleashGreatness",
        "? Unlock the secret to unprecedented achievements with our exclusive formula for success. Our team of experts has distilled years of wisdom into a powerful elixir that will propel you to the zenith of greatness. It's time to embrace the extraordinary and become a legend in your own right! ? #FormulaForSuccess #RiseToGreatness",
        "? Step into the realm of infinite possibilities and seize the keys to your success. Our groundbreaking solutions will unlock doors you never knew existed, propelling you towards a future filled with limitless growth and prosperity. Dare to dream big and let us be your catalyst for greatness! ? #UnlockYourPotential #LimitlessSuccess"
    ]
    
    
    post_preds = classifier(posts)
    print(post_preds)
    

    eval - 详细

    ***** Running evaluation *****
    {'accuracy': 0.8,
     'based_model_id': 'intfloat/e5-small',
     'tuned_model_id': 'e5-small-LinkedCringe-setfit-skl-20it-2e'}
    
    
    # 10-post results
    
    ['cringe',
     'cringe',
     'info',
     'cringe',
     'cringe',
     'cringe',
     'cringe',
     'cringe',
     'cringe',
     'cringe']
    

    BibTeX条目和引用信息

    注意:此处是针对 setfit 而非此检查点的引用。

    @article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
    }