Hello everyone ๐. Welcome back. In this blog, we will learn how to use One shot learning algorithms to recognize someone from a single image. Normally, we need multiple images of a single person to recognize his/her face using a Deep Learning model. But, in real time applications, we can’t get tons of data to recognize the users of our application. How to deal with those scenarios ๐ค? Let’s check it out.
One shot Learning
In One shot learning, we would use less images or even a single image to recognize user’s face. But, as we all know Deep Learning models require large amount of data to learn something. So, we will use pre trained weights of a popular Deep Learning network called FaceNet and also it’s architecture to get the embeddings of our new image.
FaceNet takes an image as an input and converts it into a vector embedding with length of 128. We will make use of the pretrained weights of FaceNet model, which we found in its Keras implementation here. We will also make use of utility functions and the Keras implementation of FaceNet architecture from this github repo.
Let’s implement the FaceNet architecture and load weights into our model.
Import modules
from keras.models import Sequential
from keras.layers import Conv2D, ZeroPadding2D, Activation, Input, concatenate
from keras.models import Model
from keras.layers.normalization import BatchNormalization
from keras.layers.pooling import MaxPooling2D, AveragePooling2D
from keras.layers.merge import Concatenate
from keras.layers.core import Lambda, Flatten, Dense
from keras.engine.topology import Layer
from keras import backend as K
import cv2
import os
import numpy as np
from numpy import genfromtxt
import pandas as pd
import tensorflow as tf
from utils import LRN2D
import utils
Defining Keras Model
myInput = Input(shape=(96, 96, 3))
x = ZeroPadding2D(padding=(3, 3), input_shape=(96, 96, 3))(myInput)
x = Conv2D(64, (7, 7), strides=(2, 2), name='conv1')(x)
x = BatchNormalization(axis=3, epsilon=0.00001, name='bn1')(x)
x = Activation('relu')(x)
x = ZeroPadding2D(padding=(1, 1))(x)
x = MaxPooling2D(pool_size=3, strides=2)(x)
x = Lambda(LRN2D, name='lrn_1')(x)
x = Conv2D(64, (1, 1), name='conv2')(x)
x = BatchNormalization(axis=3, epsilon=0.00001, name='bn2')(x)
x = Activation('relu')(x)
x = ZeroPadding2D(padding=(1, 1))(x)
x = Conv2D(192, (3, 3), name='conv3')(x)
x = BatchNormalization(axis=3, epsilon=0.00001, name='bn3')(x)
x = Activation('relu')(x)
x = Lambda(LRN2D, name='lrn_2')(x)
x = ZeroPadding2D(padding=(1, 1))(x)
x = MaxPooling2D(pool_size=3, strides=2)(x)
# Inception3a
inception_3a_3x3 = Conv2D(96, (1, 1), name='inception_3a_3x3_conv1')(x)
inception_3a_3x3 = BatchNormalization(axis=3, epsilon=0.00001, name='inception_3a_3x3_bn1')(inception_3a_3x3)
inception_3a_3x3 = Activation('relu')(inception_3a_3x3)
inception_3a_3x3 = ZeroPadding2D(padding=(1, 1))(inception_3a_3x3)
inception_3a_3x3 = Conv2D(128, (3, 3), name='inception_3a_3x3_conv2')(inception_3a_3x3)
inception_3a_3x3 = BatchNormalization(axis=3, epsilon=0.00001, name='inception_3a_3x3_bn2')(inception_3a_3x3)
inception_3a_3x3 = Activation('relu')(inception_3a_3x3)
inception_3a_5x5 = Conv2D(16, (1, 1), name='inception_3a_5x5_conv1')(x)
inception_3a_5x5 = BatchNormalization(axis=3, epsilon=0.00001, name='inception_3a_5x5_bn1')(inception_3a_5x5)
inception_3a_5x5 = Activation('relu')(inception_3a_5x5)
inception_3a_5x5 = ZeroPadding2D(padding=(2, 2))(inception_3a_5x5)
inception_3a_5x5 = Conv2D(32, (5, 5), name='inception_3a_5x5_conv2')(inception_3a_5x5)
inception_3a_5x5 = BatchNormalization(axis=3, epsilon=0.00001, name='inception_3a_5x5_bn2')(inception_3a_5x5)
inception_3a_5x5 = Activation('relu')(inception_3a_5x5)
inception_3a_pool = MaxPooling2D(pool_size=3, strides=2)(x)
inception_3a_pool = Conv2D(32, (1, 1), name='inception_3a_pool_conv')(inception_3a_pool)
inception_3a_pool = BatchNormalization(axis=3, epsilon=0.00001, name='inception_3a_pool_bn')(inception_3a_pool)
inception_3a_pool = Activation('relu')(inception_3a_pool)
inception_3a_pool = ZeroPadding2D(padding=((3, 4), (3, 4)))(inception_3a_pool)
inception_3a_1x1 = Conv2D(64, (1, 1), name='inception_3a_1x1_conv')(x)
inception_3a_1x1 = BatchNormalization(axis=3, epsilon=0.00001, name='inception_3a_1x1_bn')(inception_3a_1x1)
inception_3a_1x1 = Activation('relu')(inception_3a_1x1)
inception_3a = concatenate([inception_3a_3x3, inception_3a_5x5, inception_3a_pool, inception_3a_1x1], axis=3)
# Inception3b
inception_3b_3x3 = Conv2D(96, (1, 1), name='inception_3b_3x3_conv1')(inception_3a)
inception_3b_3x3 = BatchNormalization(axis=3, epsilon=0.00001, name='inception_3b_3x3_bn1')(inception_3b_3x3)
inception_3b_3x3 = Activation('relu')(inception_3b_3x3)
inception_3b_3x3 = ZeroPadding2D(padding=(1, 1))(inception_3b_3x3)
inception_3b_3x3 = Conv2D(128, (3, 3), name='inception_3b_3x3_conv2')(inception_3b_3x3)
inception_3b_3x3 = BatchNormalization(axis=3, epsilon=0.00001, name='inception_3b_3x3_bn2')(inception_3b_3x3)
inception_3b_3x3 = Activation('relu')(inception_3b_3x3)
inception_3b_5x5 = Conv2D(32, (1, 1), name='inception_3b_5x5_conv1')(inception_3a)
inception_3b_5x5 = BatchNormalization(axis=3, epsilon=0.00001, name='inception_3b_5x5_bn1')(inception_3b_5x5)
inception_3b_5x5 = Activation('relu')(inception_3b_5x5)
inception_3b_5x5 = ZeroPadding2D(padding=(2, 2))(inception_3b_5x5)
inception_3b_5x5 = Conv2D(64, (5, 5), name='inception_3b_5x5_conv2')(inception_3b_5x5)
inception_3b_5x5 = BatchNormalization(axis=3, epsilon=0.00001, name='inception_3b_5x5_bn2')(inception_3b_5x5)
inception_3b_5x5 = Activation('relu')(inception_3b_5x5)
inception_3b_pool = Lambda(lambda x: x**2, name='power2_3b')(inception_3a)
inception_3b_pool = AveragePooling2D(pool_size=(3, 3), strides=(3, 3))(inception_3b_pool)
inception_3b_pool = Lambda(lambda x: x*9, name='mult9_3b')(inception_3b_pool)
inception_3b_pool = Lambda(lambda x: K.sqrt(x), name='sqrt_3b')(inception_3b_pool)
inception_3b_pool = Conv2D(64, (1, 1), name='inception_3b_pool_conv')(inception_3b_pool)
inception_3b_pool = BatchNormalization(axis=3, epsilon=0.00001, name='inception_3b_pool_bn')(inception_3b_pool)
inception_3b_pool = Activation('relu')(inception_3b_pool)
inception_3b_pool = ZeroPadding2D(padding=(4, 4))(inception_3b_pool)
inception_3b_1x1 = Conv2D(64, (1, 1), name='inception_3b_1x1_conv')(inception_3a)
inception_3b_1x1 = BatchNormalization(axis=3, epsilon=0.00001, name='inception_3b_1x1_bn')(inception_3b_1x1)
inception_3b_1x1 = Activation('relu')(inception_3b_1x1)
inception_3b = concatenate([inception_3b_3x3, inception_3b_5x5, inception_3b_pool, inception_3b_1x1], axis=3)
# Inception3c
inception_3c_3x3 = utils.conv2d_bn(inception_3b,
layer='inception_3c_3x3',
cv1_out=128,
cv1_filter=(1, 1),
cv2_out=256,
cv2_filter=(3, 3),
cv2_strides=(2, 2),
padding=(1, 1))
inception_3c_5x5 = utils.conv2d_bn(inception_3b,
layer='inception_3c_5x5',
cv1_out=32,
cv1_filter=(1, 1),
cv2_out=64,
cv2_filter=(5, 5),
cv2_strides=(2, 2),
padding=(2, 2))
inception_3c_pool = MaxPooling2D(pool_size=3, strides=2)(inception_3b)
inception_3c_pool = ZeroPadding2D(padding=((0, 1), (0, 1)))(inception_3c_pool)
inception_3c = concatenate([inception_3c_3x3, inception_3c_5x5, inception_3c_pool], axis=3)
#inception 4a
inception_4a_3x3 = utils.conv2d_bn(inception_3c,
layer='inception_4a_3x3',
cv1_out=96,
cv1_filter=(1, 1),
cv2_out=192,
cv2_filter=(3, 3),
cv2_strides=(1, 1),
padding=(1, 1))
inception_4a_5x5 = utils.conv2d_bn(inception_3c,
layer='inception_4a_5x5',
cv1_out=32,
cv1_filter=(1, 1),
cv2_out=64,
cv2_filter=(5, 5),
cv2_strides=(1, 1),
padding=(2, 2))
inception_4a_pool = Lambda(lambda x: x**2, name='power2_4a')(inception_3c)
inception_4a_pool = AveragePooling2D(pool_size=(3, 3), strides=(3, 3))(inception_4a_pool)
inception_4a_pool = Lambda(lambda x: x*9, name='mult9_4a')(inception_4a_pool)
inception_4a_pool = Lambda(lambda x: K.sqrt(x), name='sqrt_4a')(inception_4a_pool)
inception_4a_pool = utils.conv2d_bn(inception_4a_pool,
layer='inception_4a_pool',
cv1_out=128,
cv1_filter=(1, 1),
padding=(2, 2))
inception_4a_1x1 = utils.conv2d_bn(inception_3c,
layer='inception_4a_1x1',
cv1_out=256,
cv1_filter=(1, 1))
inception_4a = concatenate([inception_4a_3x3, inception_4a_5x5, inception_4a_pool, inception_4a_1x1], axis=3)
#inception4e
inception_4e_3x3 = utils.conv2d_bn(inception_4a,
layer='inception_4e_3x3',
cv1_out=160,
cv1_filter=(1, 1),
cv2_out=256,
cv2_filter=(3, 3),
cv2_strides=(2, 2),
padding=(1, 1))
inception_4e_5x5 = utils.conv2d_bn(inception_4a,
layer='inception_4e_5x5',
cv1_out=64,
cv1_filter=(1, 1),
cv2_out=128,
cv2_filter=(5, 5),
cv2_strides=(2, 2),
padding=(2, 2))
inception_4e_pool = MaxPooling2D(pool_size=3, strides=2)(inception_4a)
inception_4e_pool = ZeroPadding2D(padding=((0, 1), (0, 1)))(inception_4e_pool)
inception_4e = concatenate([inception_4e_3x3, inception_4e_5x5, inception_4e_pool], axis=3)
#inception5a
inception_5a_3x3 = utils.conv2d_bn(inception_4e,
layer='inception_5a_3x3',
cv1_out=96,
cv1_filter=(1, 1),
cv2_out=384,
cv2_filter=(3, 3),
cv2_strides=(1, 1),
padding=(1, 1))
inception_5a_pool = Lambda(lambda x: x**2, name='power2_5a')(inception_4e)
inception_5a_pool = AveragePooling2D(pool_size=(3, 3), strides=(3, 3))(inception_5a_pool)
inception_5a_pool = Lambda(lambda x: x*9, name='mult9_5a')(inception_5a_pool)
inception_5a_pool = Lambda(lambda x: K.sqrt(x), name='sqrt_5a')(inception_5a_pool)
inception_5a_pool = utils.conv2d_bn(inception_5a_pool,
layer='inception_5a_pool',
cv1_out=96,
cv1_filter=(1, 1),
padding=(1, 1))
inception_5a_1x1 = utils.conv2d_bn(inception_4e,
layer='inception_5a_1x1',
cv1_out=256,
cv1_filter=(1, 1))
inception_5a = concatenate([inception_5a_3x3, inception_5a_pool, inception_5a_1x1], axis=3)
#inception_5b
inception_5b_3x3 = utils.conv2d_bn(inception_5a,
layer='inception_5b_3x3',
cv1_out=96,
cv1_filter=(1, 1),
cv2_out=384,
cv2_filter=(3, 3),
cv2_strides=(1, 1),
padding=(1, 1))
inception_5b_pool = MaxPooling2D(pool_size=3, strides=2)(inception_5a)
inception_5b_pool = utils.conv2d_bn(inception_5b_pool,
layer='inception_5b_pool',
cv1_out=96,
cv1_filter=(1, 1))
inception_5b_pool = ZeroPadding2D(padding=(1, 1))(inception_5b_pool)
inception_5b_1x1 = utils.conv2d_bn(inception_5a,
layer='inception_5b_1x1',
cv1_out=256,
cv1_filter=(1, 1))
inception_5b = concatenate([inception_5b_3x3, inception_5b_pool, inception_5b_1x1], axis=3)
av_pool = AveragePooling2D(pool_size=(3, 3), strides=(1, 1))(inception_5b)
reshape_layer = Flatten()(av_pool)
dense_layer = Dense(128, name='dense_layer')(reshape_layer)
norm_layer = Lambda(lambda x: K.l2_normalize(x, axis=1), name='norm_layer')(dense_layer)
# Final Model
model = Model(inputs=[myInput], outputs=norm_layer)
Loading Weights
weights = utils.weights
weights_dict = utils.load_weights()
# Set layer weights of the model
for name in weights:
if model.get_layer(name) != None:
model.get_layer(name).set_weights(weights_dict[name])
elif model.get_layer(name) != None:
model.get_layer(name).set_weights(weights_dict[name])
Recognizing input Faces
Now that, we have our pretrained model ready, we need to capture the users face for at least once. For better learning, we need to detect face and then crop the face and pass it to our Face Net model to load corresponding embeddings.
To detect faces and crop them, we will make use of haarcascade and opencv’s Cascade Classifier. Here’s the haarcascade.xml file which helps us in detecting faces in a given image.
Here’s the python code which helps us in doing this.
Image to Embeddings using pre-trained model.
def image_to_embedding(image, model):
#image = cv2.resize(image, (96, 96), interpolation=cv2.INTER_AREA)
image = cv2.resize(image, (96, 96))
img = image[...,::-1]
img = np.around(np.transpose(img, (0,1,2))/255.0, decimals=12)
x_train = np.array([img])
embedding = model.predict_on_batch(x_train)
return embedding
Euclidean Similarity
Recognize Faces in our test frames, by comparing test embeddings with train embeddings using Euclidean distance as similarity measure.
def recognize_face(face_image, input_embeddings, model):
embedding = image_to_embedding(face_image, model)
minimum_distance = 100
name = None
for (input_name, input_embedding) in input_embeddings.items():
euclidean_distance = np.linalg.norm(embedding-input_embedding)
# print('Euclidean distance from %s is %s' %(input_name, euclidean_distance))
if euclidean_distance < minimum_distance:
minimum_distance = euclidean_distance
name = input_name
if minimum_distance < 0.75:
return str(name)
else:
return None
Utility functions
We will make use of these functions to convert our train image set to input embeddings. We also detect faces in our test frames in our video using cv2 Cascade Classifier, then convert those frames into embeddings to perform similarity search on trained images.
import glob
def create_input_image_embeddings():
input_embeddings = {}
for file in glob.glob("images/*"):
person_name = os.path.splitext(os.path.basename(file))[0]
image_file = cv2.imread(file, 1)
input_embeddings[person_name] = image_to_embedding(image_file, model)
return input_embeddings
def recognize_faces_in_cam(input_embeddings):
cv2.namedWindow("Face Recognizer")
vc = cv2.VideoCapture(0)
font = cv2.FONT_HERSHEY_SIMPLEX
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
while vc.isOpened():
_, frame = vc.read()
img = frame
height, width, channels = frame.shape
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, 1.3, 5)
# Loop through all the faces detected
identities = []
for (x, y, w, h) in faces:
x1 = x
y1 = y
x2 = x+w
y2 = y+h
face_image = frame[max(0, y1):min(height, y2), max(0, x1):min(width, x2)]
identity = recognize_face(face_image, input_embeddings, model)
if identity is not None:
img = cv2.rectangle(frame,(x1, y1),(x2, y2),(255,255,255),2)
cv2.putText(img, str(identity), (x1+5,y1-5), font, 1, (255,255,255), 2)
key = cv2.waitKey(100)
cv2.imshow("Face Recognizer", img)
if key == 27: # exit on ESC
break
vc.release()
cv2.destroyAllWindows()
Let’s create our train set using WebCam
If you run this, window pops up and detects faces until 10 photos are captured with faces in them.
cam = cv2.VideoCapture(0)
face_detector = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
count = 0
while(True):
ret, img = cam.read()
#gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_detector.detectMultiScale(img, 1.3, 5)
for (x,y,w,h) in faces:
x1 = x
y1 = y
x2 = x+w
y2 = y+h
cv2.rectangle(img, (x1,y1), (x2,y2), (255,255,255), 2)
count += 1
# Save the captured image into the datasets folder
cv2.imwrite("images/User_" + str(count) + ".jpg", img[y1:y2,x1:x2])
cv2.imshow('image', img)
k = cv2.waitKey(200) & 0xff # Press 'ESC' for exiting video
if k == 27:
break
elif count >= 10: # Take 30 face sample and stop video
break
cam.release()
cv2.destroyAllWindows()
Let’s Test
input_embeddings = create_input_image_embeddings()
person=recognize_faces_in_cam(input_embeddings)
print(person)
Good Job!
Oooh, that worked well with few images in train set๐. To summarize, one shot learning mostly focuses on creating embeddings of both test and train ones and then perform similarity search among them to recognize faces. You can find the whole code in our github repo.
That’s all for the blog. Post your comments, if you have tried this. We will meet again with an interesting blog. Until then, stay safe. Cheers..๐ค
2 responses to “Face Recognition using One Shot Learning”
Good content
LikeLike
[…] everyone๐. In our previous blog we’ve discussed Face Recognition using One Shot Learning. In this post, we will discuss some of the widely used preprocessing methods in Deep Learning. So […]
LikeLike