数据集:

bigcode/programming-languages-keywords

英文

数据集卡片: "programming-languages-keywords"

以结构化形式呈现 https://github.com/e3b0c442/keywords 的版本

使用的生成工具:

r = requests.get("https://raw.githubusercontent.com/e3b0c442/keywords/main/README.md")
keywords = r.text.split("### ")[1:]
keywords = [i for i in keywords if not i.startswith("Sources")]
keywords = {i.split("\n")[0]:[j for j in re.findall("[a-zA-Z]*", i.split("\n",1)[1]) if j] for i in keywords}
keywords = pd.DataFrame(pd.Series(keywords)).reset_index().rename(columns={"index":"language", 0:"keywords"})
keywords['language'] = keywords['language'].str.split("\) ").str[0]
keywords['keywords'] = keywords['keywords'].apply(lambda x: sorted(list(set(x))))
ds = Dataset.from_pandas(keywords)