使用Python、OpenCV、Transformers和Qdrant构建面部识别应用程序

2023年12月20日由 alex 发表 654 0

方法1：使用Python、OpenCV和Qdrant进行面部识别

面部识别技术已经成为一种无所不在的力量，重塑了安全、社交媒体和智能手机认证等行业。在本文中，我们携带着强大的三剑客——Python、OpenCV、图像嵌入以及Qdrant——进入面部识别的迷人领域。

第1部分：面部识别简介

在第1部分，我们通过深入了解面部识别技术的基础知识来打下基础。理解其底层原理，探索其应用，并掌握Python和OpenCV在我们的开发栈中的重要性。

第2部分：环境搭建

任何项目的关键步骤是准备开发环境。学习如何无缝集成Python、OpenCV和Qdrant，为我们的面部识别系统创建一个和谐的生态系统。

第3部分：实现面部识别算法

基础搭建好后，我们将深入项目的核心。探索面部识别算法的复杂性，并见证我们使用Python和OpenCV实现它们的展现。揭示面部检测、特征提取和模型训练的内部工作原理。

第4部分：使用Qdrant整合数据库

没有强大的数据库来有效存储和管理面部数据，任何面部识别系统都是不完整的。在最后一部分，我们将指导你整合Qdrant，以增强我们系统的存储和检索能力。

逐步实施

将所有感兴趣的图片下载到本地文件夹。
识别并从图片中提取面孔。
从提取的面孔中计算面部嵌入。
将这些面部嵌入存储在Qdrant数据库中。
获取同事的照片用于识别目的。
将面部与提供的照片匹配。
计算在提供的照片中确定面部的嵌入。
使用Qdrant距离函数，从数据库中检索最接近匹配的面部和对应照片。

技术栈

Qdrant：用于存储图像嵌入的矢量存储。
OpenCV：从图片中检测出面孔。为了从图片中“提取”面孔，我们使用了Python、OpenCV这一计算机视觉工具，和一个预训练的Haar Cascade模型。
imgbeddings：一个用于使用OpenAI的强大CLIP模型通过Hugging Face transformers从图像生成嵌入向量的Python包。

OpenCV概述

OpenCV，即开源计算机视觉库，是一个开源的计算机视觉和机器学习软件库。最初由英特尔开发，现在由开发者社区维护。它提供了一系列工具和函数，用于图像和视频分析，包括各种图像处理、计算机视觉和机器学习算法。

OpenCV的关键特性包括：

图像处理：OpenCV提供了大量用于基本和高级图像处理任务的函数，如过滤、变换和颜色操纵。
计算机视觉算法：库包括各种计算机视觉算法的实现，包括特征检测、物体识别和图像拼接。
机器学习：OpenCV与机器学习框架集成，并提供了用于训练和部署机器学习模型的工具。这对于任务如物体检测和面部识别特别有用。
相机校准：OpenCV包括用于相机校准的函数，在计算机视觉应用中至关重要，以纠正相机镜头造成的畸变。
实时计算机视觉：它支持实时计算机视觉应用，适用于视频分析、运动跟踪和增强现实等任务。
跨平台支持：OpenCV兼容多种操作系统，包括Windows、Linux、macOS、Android和iOS。这使它适用于广泛的应用。
社区支持：OpenCV拥有庞大而活跃的社区，持续发展，得到来自世界各地的研究人员、开发人员和工程师的贡献。

OpenCV广泛用于学术、工业和研究中，用于从简单的图像操作到复杂的计算机视觉和机器学习应用。其多功能性和全面的工具集使其成为计算机视觉领域工作的开发人员的首选库。

imgbeddings概述

这是一个Python包，使用OpenAI的强大CLIP模型通过Hugging Face transformers从图像生成嵌入向量。这些图像嵌入向量源自一个截至2020年中期就已经浏览过全互联网的图像模型，可用于许多事情：无监督聚类（例如通过umap）、嵌入搜索（例如通过faiss）以及用作其他框架无关的ML/AI任务的下游应用，如建立分类器或计算图像相似度。

嵌入生成模型是ONNX INT8量化的——意味着它们在CPU上速度快20-30%，磁盘占用更小，并且不需要PyTorch或TensorFlow作为依赖！
适用于许多不同的图像领域，感谢CLIP的零样本表现。
包括使用主成分分析（PCA）的实用工具，用于在不丢失太多信息的情况下降低生成嵌入的维度。

矢量存储解释

定义

矢量存储是专门设计用于高效存储和检索矢量嵌入的专用数据库。这种专业化至关重要，因为传统数据库如SQL并没有为处理大量矢量数据进行良好调整。

嵌入的作用

嵌入在高维空间内以数值矢量格式表示数据，通常是非结构化数据，如文本或图像。传统关系型数据库不适合存储和检索这些矢量表示。

矢量存储的关键特性

高效索引：矢量存储可以索引并快速搜索相似矢量，使用相似性算法。
增强检索：该功能允许应用程序基于提供的目标矢量查询识别相关矢量。

Qdrant概述

Qdrant是一个专门的矢量相似性搜索引擎，设计用于通过用户友好的API提供可用于生产的服务。它促进了点（矢量）的存储、搜索和管理，以及额外的有效载荷。这些有效载荷作为补充信息，增强了搜索的精确度，并为用户提供了有价值的数据。

开始使用Qdrant是无缝的。使用Python qdrant-client，获取Qdrant的最新Docker镜像并建立本地连接，或探索Qdrant的云免费层选项，直到你准备进行全面转型。

Qdrant高层架构

代码实现

安装所需依赖项

pip install qdrant-client imgbeddings pillow opencv-python

创建一个文件夹来存储所需的图片

mkdir photos

下载模型参数文件

从OpenCV GitHub仓库下载预训练的Haar Cascade模型haarcascade_frontalface_default.xml，并将其存储在本地。

示例代码

导入所需的依赖项

#import required libraries
import cv2
import numpy as np
from imgbeddings import imgbeddings
from PIL import Image

从图像中提取人脸的辅助功能

def detect_face(image_path,target_path):
  # loading the haar case algorithm file into alg variable
  alg = "haarcascade_frontalface_default.xml"
  # passing the algorithm to OpenCV
  haar_cascade = cv2.CascadeClassifier(alg)
  # loading the image path into file_name variable
  file_name = image_path
  # reading the image
  img = cv2.imread(file_name, 0)
  # creating a black and white version of the image
  gray_img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
  # detecting the faces
  faces = haar_cascade.detectMultiScale(gray_img, scaleFactor=1.05, minNeighbors=2, minSize=(100, 100))
  # for each face detected
  for x, y, w, h in faces:
      # crop the image to select only the face
      cropped_image = img[y : y + h, x : x + w]
      # loading the target image path into target_file_name variable
      target_file_name = target_path
      cv2.imwrite(
          target_file_name,
          cropped_image,
      )

以下代码负责人脸检测：

faces = haar_cascade.detectMultiScale(gray_img, scaleFactor=1.05, minNeighbors=2, minSize=(100, 100))

其中：

gray_img — 需要寻找人脸的源图像。
scaleFactor — 缩放因子；比率越高，压缩越多，图像质量损失也越多。
minNeighbors — 需要收集的邻近人脸数量。数字越大，相同人脸出现多次的可能性越高。
minSize — 检测到的人脸的最小尺寸，在此情况下是100像素的正方形。

for 循环遍历所有检测到的人脸，并将它们存储在不同的文件中。你可能想定义一个变量（也许使用 x 和 y 参数），以将不同的人脸存储在不同的文件中。

人脸检测阶段的结果并不完美：它识别了四个可见的人脸中的三个，但对我们的目的来说已经足够好。你可以微调算法参数，以便为你的用例找到更好的匹配。

计算嵌入的辅助函数

def generate_embeddings(image_path):
  #
  # loading the face image path into file_name variable
  file_name = "/content/target_photo_1.jpg"
  # opening the image
  img = Image.open(file_name)
  # loading the `imgbeddings`
  ibed = imgbeddings()
  # calculating the embeddings
  embedding = ibed.to_embeddings(img)[0]
  emb_array = np.array(embedding).reshape(1,-1)
  return emb_array

从图片中检测面孔并将其转换为目标文件夹中的灰度图像

os.mkdir("target")
# loop through the images in the photos folder and extract faces
file_path = "/content/photos"
for item in os.listdir(file_path):
    if item.endswith(".jpeg"):
        detect_face(os.path.join(file_path,item),os.path.join("/content/target",item))

循环遍历目标文件夹中提取的面部并生成嵌入

img_embeddings = [generate_embeddings(os.path.join("/content/target",item)) for item in os.listdir("/content/target")]
print(len(img_embeddings))
#
print(img_embeddings[0].shape)
#
#save the vector of embeddings as a NumPy array so that we don't have to run it again later
np.save("vectors_cv2", np.array(img_embeddings), allow_pickle=False)

建立向量存储以存储图像嵌入

# Create a local Qdrant vector store
client =QdrantClient(path="qdrant_db_cv2")
#
my_collection = "image_collection_cv2"
client.recreate_collection(
    collection_name=my_collection,
    vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE)
)
# generate metadata
payload = []
files_list= os.listdir("/content/target")
for i in range(len(os.listdir("/content/target"))):
    payload.append({"image_id" :i,
                    "name":files_list[i].split(".")[0]})
print(payload[:3])
ids = list(range(len(os.listdir("/content/target"))))
#Load the embeddings from the save pickle file
embeddings = np.load("vectors_cv2.npy").tolist()
#
# Load the image embeddings
for i in range(0, len(os.listdir("/content/target"))):
    client.upsert(
        collection_name=my_collection,
        points=models.Batch(
            ids=[ids[i]],
            vectors=embeddings[i],
            payloads=[payload[i]]
        )
    )

确保通过计数来上传向量成功

client.count(
    collection_name=my_collection,
    exact=True,
)
##Response
CountResult(count=6)

视觉检查所创建的集合

client.scroll(
    collection_name=my_collection,
    limit=10
)

图像搜索

加载新图像并提取面部

load_image_path = '/content/target/Aishw.jpeg'
target_image_path = 'black.jpeg'
detect_face(load_image_path,target_path)

检查已保存的图片

Image.open("/content/black.jpeg")

生成图像嵌入

query_embedding = generate_embeddings("/content/black.jpeg")
print(type(query_embedding))
#
print(query_embedding.shape)
##Response
numpy.ndarray
(1, 768)

搜索图像以识别提供的输入图像

results = client.search(
    collection_name=my_collection,
    query_vector=query_embedding[0],
    limit=5,
    with_payload=True
)
print(results)
files_list= [ os.path.join("/content/target",f) for f in os.listdir("/content/target")]
print(files_list)
##Response
[ScoredPoint(id=3, version=0, score=0.9999998807907104, payload={'image_id': 3, 'name': 'Aishw'}, vector=None, shard_key=None),
 ScoredPoint(id=2, version=0, score=0.9999998807907104, payload={'image_id': 2, 'name': 'deepika'}, vector=None, shard_key=None),
 ScoredPoint(id=1, version=0, score=0.9999998807907104, payload={'image_id': 1, 'name': 'nohra'}, vector=None, shard_key=None),
 ScoredPoint(id=0, version=0, score=0.9999998807907104, payload={'image_id': 0, 'name': 'kajol'}, vector=None, shard_key=None),
 ScoredPoint(id=5, version=0, score=0.9999998211860657, payload={'image_id': 5, 'name': 'kareena'}, vector=None, shard_key=None)]

['/content/target/kajol.jpeg',
 '/content/target/nohra.jpeg',
 '/content/target/deepika.jpeg',
 '/content/target/Aishw.jpeg',
 '/content/target/aish.jpeg',
 '/content/target/kareena.jpeg']

帮助函数来显示结果

def see_images(results, top_k=2):
    for i in range(top_k):
        image_id = results[i].payload['image_id']
        name    = results[i].payload['name']
        score = results[i].score
        image = Image.open(files_list[image_id])
        print(f"Result #{i+1}: {name} was diagnosed with {score * 100} confidence")
        print(f"This image score was {score}")
        display(image)
        print("-" * 50)
        print()

展示搜索结果 - 显示前五个匹配的图片

see_images(results, top_k=5)

图片搜索结果

结果 #1：使用99.99998807907104的置信度诊断出Aishw。

这张图片的得分是0.9999998807907104。

结果#2：deepika被诊断出患有99.99998807907104的置信度。

该图像得分为0.9999998807907104。

结果 #3：以 99.99998807907104 的置信度诊断出 nohra。

这张图片的评分是 0.9999998807907104。

结果 #4: 卡乔尔被诊断出患有99.99998807907104的信心度。

这张图片的得分是0.999999880790714。

结果 #5：卡琳娜被诊断出的置信度为99.99998211860657。

这张图片的评分是0.9999998211860657。

正如你所见，我们使用了一张现有的图片，并且回收到了其他图片以及原始图片。相似度得分也为我们提供了一个关于我们查询图片和数据库中图片相似度的良好指标。

方法 2. 使用Transformers和Qdrant进行图像识别

除了OpenCV之外，我们还可以使用视觉变换器（Vision Transformers）来执行相同的任务。请参考以下的示例代码：

代码实现

安装所需的依赖项

pip install -qU qdrant-client transformers datasets

导入所需的库

from transformers import ViTImageProcessor, ViTModel
from qdrant_client import QdrantClient
from qdrant_client.http import models
from datasets import load_dataset
import numpy as np
import torch

设置向量存储

# Create a local Qdrant vector store
client =QdrantClient(path="qdrant_db")
my_collection = "image_collection"
client.recreate_collection(
    collection_name=my_collection,
    vectors_config=models.VectorParams(size=384, distance=models.Distance.COSINE)
)

载入模型

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
processor = ViTImageProcessor.from_pretrained('facebook/dino-vits16')
model = ViTModel.from_pretrained('facebook/dino-vits16').to(device)

对照片文件夹中的图像进行预处理并加载到数据框中

import pandas as pd
import os
image_file = []
image_name =[]
#
for file in os.listdir("/content/photos"):
    if file.endswith(".jpeg"):
        image_name.append(file.split(".")[0])
        image_file.append(Image.open(os.path.join("/content/photos",file)))
#
df = pd.DataFrame({"Image":image_file,"Name":image_name})
descriptions = df['Name'].tolist()
print(descriptions)

使用 ViT 生成嵌入

在计算机视觉系统中，向量数据库用于存储图像特征。这些图像特征是图像的向量表示，包含了它们的视觉内容，它们用来提升计算机视觉任务，例如物体检测、图像分类和图像检索的表现。

为了从我们的图像中提取这些有用的特征表示，我们会使用视觉变换器（ViT）。ViTs是先进的算法，使得计算机能够以类似于人类的方式“看到”并理解视觉信息。它们使用变换器架构来处理图像，并从中提取有意义的特征。

为了理解ViTs的工作方式，假设你有一个包含许多不同碎片的大型拼图。为了解决这个拼图，你通常需要观察各个碎片，它们的形状，以及它们如何拼合在一起形成完整的图片。ViTs的工作方式类似，意思是，它们不是一次性观察整个图像，而是将其拆分成叫做“补丁”的较小部分。每一个这样的补丁就像是拼图的一片，抓取了图像的特定部分，然后这些片段被ViTs分析和处理。

通过分析这些补丁，ViTs识别出重要的模式，诸如边缘、颜色和纹理，并将它们结合起来形成对给定图像的连贯理解。

final_embeddings = []
for item in df['Image'].values.tolist():
    inputs = processor(images=item, return_tensors="pt").to(device)
    with torch.no_grad():
        outputs = model(**inputs).last_hidden_state.mean(dim=1).cpu().numpy()
    final_embeddings.append(outputs)

保存嵌入

np.save("vectors", np.array(final_embeddings), allow_pickle=False)

生成元数据

payload = []
for i in range(df.shape[0]):
    payload.append({"image_id" :i,
                    "name":df.iloc[i]['Name']})
ids = list(range(df.shape[0]))
embeddings = np.load("vectors.npy").tolist()

将嵌入加载进向量存储

for i in range(0, df.shape[0]):
    client.upsert(
        collection_name=my_collection,
        points=models.Batch(
            ids=[ids[i]],
            vectors=embeddings[i],
            payloads=[payload[i]]
        )
    )
#check if the update is successful
client.count(
    collection_name=my_collection,
    exact=True,
)
#To visually inspect the collection we just created, we can scroll through our vectors with the client.scroll() method.
client.scroll(
    collection_name=my_collection,
    limit=10
)

在数据存储中搜索一张图片/照片

img = Image.open("YOUR IMAGE PATH")
inputs = processor(images=img, return_tensors="pt").to(device)
one_embedding = model(**inputs).last_hidden_state
#
results = client.search(
    collection_name=my_collection,
    query_vector=one_embedding.mean(dim=1)[0].tolist(),
    limit=5,
    with_payload=True
)
see_images(results, top_k=2)

搜索结果

原始图片待搜索

搜索结果

结果 #1：Aishw 被诊断出患病的确信度为 100.00000144622251。

图片得分为 1.0000000144622252。

结果 #2: Deepika 被诊断出病情的置信度为 90.48531271076924。

图像评分为 0.9048531271076924。

结果 #3：已诊断出nohra，置信度为88.62201422801974。

图片得分为0.8862201422801974。

结果 #4: aish 的诊断置信度为 87.71421890846095。

这张图片的得分是 0.8771421890846095。

结果 #5：卡琳娜被诊断出患病的置信度为86.80090570447916。

该图像的得分是0.8680090570447916。

文章来源：https://medium.com/@nayakpplaban/building-an-application-for-facial-recognition-using-python-opencv-transformers-and-qdrant-a144871f40d9

标签：

OpenCV Python Qdrant

0 评论

欢迎关注ATYUN官方公众号

商务合作及内容投稿请联系邮箱:bd@atyun.com

上一篇探索生成语言模型的复杂性：聚焦GLaM

下一篇用于新闻推荐的大型语言模型和向量数据库

评论登录

要发表评论，您必须先登录。

jonatasgrosman/wav2vec2-large-xlsr-53-english facebook/dino-vitb16 bert-base-uncased xlm-roberta-large xlm-roberta-base gpt2 microsoft/resnet-50 facebook/dino-vits8

AGENTIC AI如何塑造未来