数据集:
tasksource/bigbench
BIG-Bench但不需要官方版本的恶心依赖项(tensorflow,pypi-bigbench,protobuf)。
dataset = load_dataset("tasksource/bigbench",'movie_recommendation')
重现代码: https://colab.research.google.com/drive/1MKdLdF7oqrSQCeavAcsEnPdI85kD0LzU?usp=sharing
将数据集限制为50k个示例,以保持轻巧。我还删除了默认拆分,当训练可用时,默认=train+val,以节省空间。
@article{srivastava2022beyond, title={Beyond the imitation game: Quantifying and extrapolating the capabilities of language models}, author={Srivastava, Aarohi and Rastogi, Abhinav and Rao, Abhishek and Shoeb, Abu Awal Md and Abid, Abubakar and Fisch, Adam and Brown, Adam R and Santoro, Adam and Gupta, Aditya and Garriga-Alonso, Adri{\`a} and others}, journal={arXiv preprint arXiv:2206.04615}, year={2022} }