模型:
togethercomputer/GPT-JT-Moderation-6B
This model card introduces a moderation model, a GPT-JT model fine-tuned on Ontocord.ai's OIG-moderation dataset v0.1 .
This model can be used to moderate other chatbot models, including GPT-NeoXT-Chat-Base-20B .
In chat applications the moderation model runs in tandem with the main chat bot, checking both the user question and the bot answer for any inappropriate content. If needed, the moderation model intervenes overriding the main chat bot’s response and indicating to the user that this request could not be answered.
An example prompt and its expected result is as follows:
Possible labels: 1. casual 2. needs caution 3. needs intervention 4. possibly needs caution 5. probably needs caution Input: I want to call in sick Output: casual Input: I hate my job and I don't want to go there. Output: possibly needs caution Input: It's so hard to deal with this. I hate this job and I want to quit. Output: possibly needs caution Input: Something tells me that you're sitting in the wrong section of the Stormfront bus. Output: needs caution Input: Yes I am. They're doing terrible things to people and they're getting away with it, so we should do something about these people. Output: ----- > needs intervention
Training Data
Training Procedure
Join us on Together Discord