模型:
argilla/roberta-base-reward-model-falcon-dolly
这是一个使用TRL进行训练的实验性奖励模型,使用了Dolly v2数据集的对比数据和Falcon生成的回答。
出于测试目的,我们假设人工编写的回答(由Databricks员工编写)优先于Falcon生成的回答。实际情况可能并非总是如此,但您可以使用 Argilla 设置一个对比数据收集,以获取关于优选回答的真实反馈。
要使用此模型进行评分:
from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("argilla/roberta-base-reward-model-falcon-dolly") model = AutoModelForSequenceClassification.from_pretrained("argilla/roberta-base-reward-model-falcon-dolly") def get_score(model, tokenizer, prompt, response): # Tokenize the input sequences inputs = tokenizer.encode_plus(prompt, response, truncation=True, padding="max_length", max_length=512, return_tensors="pt") # Perform forward pass with torch.no_grad(): outputs = model(**inputs) # Extract the logits logits = outputs.logits return logits.item() # Example usage prompt = "What is Depreciation" example_less_pref_response = "What is Depreciation – 10 Important Facts to Know? When a business buys a new asset, the purchase price of that asset is depreciated over time to reflect its usage and eventual obsolescence. Depreciation expense can be a tax deductible expense and is usually a non-cash expense reported on a company’s income statement and balance sheet. The amount of depreciation expense a company reports each year is the difference between the original purchase price of the asset and what the current value of that asset might be. Here are 10 important facts to know about depreciation: 1. Depreciation is a non-cash expense. It is an expense that is reported in a business’s income statement and balance sheet and not a cash flow expense. 2. Depreciation is an accounting standard and it is required to be disclosed in a business’s financial statements. 3. The amount of depreciation is usually a tax expense and not a cash expense reported on a company’s income statement" example_preferred_response = "Depreciation is the drop in value of an asset due to wear and tear, age and obsolescence (going out of date) as recorded in an organization's financial records." score = get_score(model, tokenizer, prompt, example_less_pref_response) print(score) # >> -3.915163993835449 score = get_score(model, tokenizer, prompt, example_preferred_response) print(score) # >> 7.460323333740234
以下是一些示例的评分(请记住,此奖励模型已经经过训练以优选原始回答):
Prompt | Original Response | Response-1 | Score Original | Score Response-1 |
---|---|---|---|---|
What is Depreciation | Depreciation is the drop in value of an asset due to wear and tear, age and obsolescence (going out of date) as recorded in an organization's financial records. | What is Depreciation – 10 Important Facts to Know? When a business buys a new asset, the purchase price of that asset is depreciated over time to reflect its usage and eventual obsolescence. Depreciation expense can be a tax deductible expense and is usually a non-cash expense reported on a company’s income statement and balance sheet. The amount of depreciation expense a company reports each year is the difference between the original purchase price of the asset and what the current value of that asset might be. Here are 10 important facts to know about depreciation: 1. Depreciation is a non-cash expense. It is an expense that is reported in a business’s income statement and balance sheet and not a cash flow expense. 2. Depreciation is an accounting standard and it is required to be disclosed in a business’s financial statements. 3. The amount of depreciation is usually a tax expense and not a cash expense reported on a company’s income statement | 1.9764 | -9.01173 |
What do you know about the city of Aberdeen in Scotland? | Aberdeen is a city located in the North East of Scotland. As of 2020 it had a population of 200,000 people and is the third largest city in Scotland, behind Edinburgh (435,000) and Glasgow (610,000). Aberdeen has been referred to as "The Silver City" due to the high number of buildings in the city constructed of Granite from the local quarry, which may sparkle on the rare occasion that the sunlight hits the buildings. Since the 1960's the city's economy has largely revolved around Oil & Gas due to its proximity to the North Sea Oil discovery. Aberdeen has two universities; the University of Aberdeen and Robert Gordon University. Notable people who have been associated with Aberdeen include singer Annie Lennox, golfer Paul Lawrie and architect Archibald "Archie" Simpson. | Aberdeen, the city in which I've lived for the past 4 years, is the 3rd largest city in Scotland with a population of around 700,000. It has a long history, with evidence of human occupation dating back around 5000 years. The city is home to the famous "Rock" of Aberdeen (which is in fact the city itself) as well as several universities and is one of the biggest financial centers in the country. | 7.03729 | -6.4728 |
Describe thunderstorm season in the United States and Canada. Summer is traditionally associated with hot or warm weather. In Mediterranean climates, it is also associated with dry weather, while in other places (particularly in Eastern Asia because of the monsoon) it is associated with rainy weather. The wet season is the main period of vegetation growth within the savanna climate regime. Where the wet season is associated with a seasonal shift in the prevailing winds, it is known as a monsoon. In the northern Atlantic Ocean, a distinct tropical cyclone season occurs from 1 June to 30 November. The statistical peak of the Atlantic hurricane season is 10 September. The Northeast Pacific Ocean has a broader period of activity, but in a similar timeframe to the Atlantic. The Northwest Pacific sees tropical cyclones year-round, with a minimum in February and March and a peak in early September. In the North Indian basin, storms are most common from April to December, with peaks in May and November. In the Southern Hemisphere, the tropical cyclone season runs from the start of November until the end of April with peaks in mid-February to early March. Thunderstorm season in the United States and Canada runs in the spring through summer but sometimes can run as late as October or even November in the fall. These storms can produce hail, strong winds and tornadoes, usually during the afternoon and evening. | Thunderstorm season in the United States and Canada runs in the spring through summer but sometimes can run as late as October or even November in the fall. These storms can produce hail, strong winds and tornadoes, usually during the afternoon and evening. | Describe thunderstorm season in the United States and Canada. Thunderstorm season typically occurs in the summer months in the United States and Canada. It is characterized by the convective development of large, powerful thunderstorms. These storms often bring heavy rainfall, strong winds, hail, and lightning. Due to their intensity, thunderstorms can sometimes lead to severe weather and property damage. However, they are often a natural and necessary part of the ecosystem, helping to maintain the region's water supply and providing essential nutrients to plants through the release of precipitation. | -0.197739 | -9.12217 |
When did Peloton IPO? On September 26, 2019, Peloton became a public company via an initial public offering, raising $1.16 billion and valuing the company at $8.1 billion. After reaching a market valuation of $50 billion in January 2021, the company's valuation was back to around $8 billion by April 2022. | Peloton became a public company via an initial public offering on September 26, 2019. | When did Peloton IPO? Peloton launched its initial public offering on November 12, 2019. | -5.36242 | -8.97019 |
What is the best way to answer an interview question? | The first recommended step is to ask clarifying questions to ensure you understand the question correctly and gather the requirements needed to answer effectively. If it is a mathematical or computer science question, go ahead and work your way through the problem while you verbalize your thought process along the way. If it is a situation question, you should use the STAR method. 1. Situation - set the scene and provide context, 2. Task - describe what was needed to be done and why, 3. Action - explain what action you took, and 4. Result - describe the result of your actions and what you learned from the experience | The best way to answer an interview question may depend on the specific question being asked and the job you are applying for. However, some tips include staying calm, maintaining eye contact, being honest and confident about your abilities, and using specific examples or anecdotes to answer questions. It's also important to be aware of your nonverbal communication and how that may impact your answer. | -7.57853 | -8.82935 |