模型:
thu-ml/zh-clip-vit-roberta-large-patch14
您可以从 ? thu-ml/zh-clip-vit-roberta-large-patch14 下载 ZH-CLIP 模型。模型结构如下所示:
Model | Text-to-Image | Image-to-Text | ||||||
---|---|---|---|---|---|---|---|---|
R@1 | R@5 | R@10 | Mean | R@1 | R@5 | R@10 | Mean | |
Clip-Chinese | 22.60 | 50.04 | 65.24 | 45.96 | 22.8 | 49.8 | 64.1 | 45.57 |
mclip | 56.51 | 83.57 | 90.79 | 76.95 | 59.9 | 87.3 | 94.1 | 80.43 |
Taiyi-CLIP | 52.52 | 81.10 | 89.93 | 74.52 | 45.80 | 75.80 | 88.10 | 69.90 |
CN-CLIP | 64.10 | 88.79 | 94.40 | 82.43 | 61.00 | 84.40 | 93.10 | 79.5 |
altclip-xlmr-l | 62.87 | 87.18 | 94.01 | 81.35 | 63.3 | 88.3 | 95.3 | 82.3 |
ZH-CLIP | 68.00 | 89.46 | 95.44 | 84.30 | 68.50 | 90.10 | 96.50 | 85.03 |
Model | Text-to-Image | Image-to-Text | ||||||
---|---|---|---|---|---|---|---|---|
R@1 | R@5 | R@10 | Mean | R@1 | R@5 | R@10 | Mean | |
Clip-Chinese | 17.76 | 40.34 | 51.88 | 36.66 | 30.4 | 55.30 | 67.10 | 50.93 |
mclip | 62.3 | 86.42 | 92.58 | 80.43 | 84.4 | 97.3 | 98.9 | 93.53 |
Taiyi-CLIP | 53.5 | 80.5 | 87.24 | 73.75 | 65.4 | 90.6 | 95.7 | 83.9 |
CN-CLIP | 67.98 | 89.54 | 94.46 | 83.99 | 81.2 | 96.6 | 98.2 | 92.0 |
altclip-xlmr-l | 69.16 | 89.94 | 94.5 | 84.53 | 85.1 | 97.7 | 99.2 | 94.0 |
ZH-CLIP | 69.64 | 90.14 | 94.3 | 84.69 | 86.6 | 97.6 | 98.8 | 94.33 |
Model | Text-to-Image | |||
---|---|---|---|---|
R@1 | R@5 | R@10 | Mean | |
Clip-Chinese | 15.06 | 34.96 | 46.21 | 32.08 |
mclip | 22.34 | 41.15 | 50.26 | 37.92 |
Taiyi-CLIP | 42.09 | 67.75 | 77.21 | 62.35 |
cn-clip | 56.25 | 79.87 | 86.50 | 74.21 |
altclip-xlmr-l | 29.69 | 49.92 | 58.87 | 46.16 |
ZH-CLIP | 56.75 | 79.75 | 86.66 | 74.38 |
Model | Zero-shot Classification (ACC1) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
CIFAR10 | CIFAR100 | DTD | EuroSAT | FER | FGVC | KITTI | MNIST | PC | VOC | ImageNet | |
Clip-Chinese | 86.85 | 44.21 | 18.40 | 34.86 | 14.21 | 3.87 | 32.63 | 14.37 | 52.49 | 67.73 | 22.22 |
mclip | 92.88 | 65.54 | 29.57 | 46.76 | 41.18 | 7.20 | 23.21 | 52.80 | 51.64 | 77.56 | 42.99 |
Taiyi-CLIP | 95.62 | 73.30 | 40.69 | 61.62 | 36.22 | 13.98 | 41.21 | 73.91 | 50.02 | 75.28 | 49.82 |
CN-CLIP | 94.75 | 75.04 | 44.73 | 52.34 | 48.57 | 20.55 | 20.11 | 61.99 | 62.59 | 79.12 | 53.40 |
Altclip-xlmr-l | 95.49 | 77.29 | 42.07 | 56.96 | 51.52 | 26.85 | 24.89 | 65.68 | 50.02 | 77.99 | 59.21 |
ZH-CLIP | 97.08 | 80.73 | 47.66 | 51.58 | 48.48 | 20.73 | 20.11 | 61.94 | 62.31 | 78.07 | 56.87 |
您可以从 https://github.com/thu-ml/zh-clip 克隆代码
from PIL import Image import requests from models.zhclip import ZhCLIPProcessor, ZhCLIPModel # Code in https://github.com/thu-ml/zh-clip version = 'thu-ml/zh-clip-vit-roberta-large-patch14' model = ZhCLIPModel.from_pretrained(version) processor = ZhCLIPProcessor.from_pretrained(version) url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw) inputs = processor(text=["一只猫", "一只狗"], images=image, return_tensors="pt", padding=True) outputs = model(**inputs) image_features = outputs.image_features text_features = outputs.text_features text_probs = (image_features @ text_features.T).softmax(dim=-1)
另外,为了比较不同方法的有效性,已经整合了其他中文 CLIP 模型的推理方法。为了方便使用,推理代码也已公开,并且如果有任何侵权行为,请与我们联系。该代码仅实现与 clip-vit-large-patch14 相同级别的模型,但将来可能会适应更多不同版本模型的使用。
# | model | alias |
---|---|---|
0 | 1237321 | zhclip |
1 | 1238321 | altclip |
2 | 1239321 | cnclip |
3 | 12310321 | taiyiclip |
4 | 12311321 | mclip |
5 | 12312321 | clip-chinese |
在 inference.py 中的用法