Detecting stranger through CCTV camera and alerting the owner

Presenting by Madhan Kumar Selvaraj

In this blog, we are going to see some interesting stuff related to detection. Globally there are around 25 million CCTV cameras in the house. Sadly, for every 3 minutes robbery, theft and burglary are happening.

Real-world issue

Even though we have access to live streaming CCTV video footage through the internet. But most of the time we failed to check the video footage at the time of the robbery. So, there is a need for a smart alert system.

One-liner of the project

Detecting the human subject other than the family members through the CCTV footage and alerting the house owner through the mail.

Technologies used here

OpenCV computer vision
Facenet CNN model
MTCNN (Multi-task Cascaded Convolutional Neural Networks)
Support Vector Machine algorithm
Matplotlib library

The workflow of the project

Focusing only on the human subject in the CCTV footage other than animals, birds, etc
Take a screenshot of the video once we detect eyes, nose, and mouth of the person
Extract the face of each person from the screenshot image
Train the images of family members using an algorithm
Compare the images of a detected person to the family members
Strangers separated from the family members
Details of the stranger mailed to the house owner

Classifying humans

Before knowing the technique of detecting the human, we should understand how machine process the image by using deep learning (Artificial Intelligence) from my previous blog. Here we are not going to train the images that we did previously, but we are going to use the HOGDescriptor() from the OpenCV library.
We are going to do the project in Google's colab and you don't need to install anything here because of everything already available here and check this youtube video to get familiar with it.
I added the complete code later part of this blog and I am going to add only important coding part of this project. Because many people are not liking the coding kinds of stuff. So, download the complete code and play with it.
OpenCV is a powerful python's library for image processing and I took youtube warm-up video to detect the human subject by using the HOGDescriptor() architecture.
Note - I uploaded the video in my Google's drive under the path '/content/drive/My Drive/ColoabDataset/Video/dancetrim.mp4'.

import cv2 
import imutils 
from google.colab.patches import cv2_imshow
# Initializing the HOG person  detector 
hog = cv2.HOGDescriptor() 
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector()) 
cap = cv2.VideoCapture('/content/drive/My Drive/ColoabDataset/Video/dancetrim.mp4')
while cap.isOpened(): 
    # Reading the video stream 
    ret, image = cap.read() 
    if ret: 
        image = imutils.resize(image,width=min(400, image.shape[1])) 
        # Detecting all the person in the video
        # image – Source image. See gpu::HOGDescriptor::detect() for type limitations.
        # win_stride – Window stride. It must be a multiple of block stride.
        # padding – Mock parameter to keep the CPU interface compatibility. It must be (0,0).
        # scale – Coefficient of the detection window increase.
        (regions, _) = hog.detectMultiScale(image, winStride=(4, 4), padding=(4, 4), scale=1.05) 
   
        # Drawing the regions in the Image 
        for (x, y, w, h) in regions: 
            # cv2.rectangle(image, start_point, end_point, color, thickness) 
            cv2.rectangle(image, (x, y), (x + w, y + h),  (0, 0, 255), 2) 
        # Showing the output Image 
        cv2_imshow( image) 
        if cv2.waitKey(25) & 0xFF == ord('q'): 
            break
    else: 
        break
cap.release() 
cv2.destroyAllWindows()

By using the above script, we can able to detect the humans by pointing out them by the rectangular box. The output of the script populated by using GIF image.

The screenshot is taken when we recognize the human in the video and the file will be sent to our classification model to check whether the person is a family member or not. This code is added to Github and I am not going to mention here.

Face detection

Instead of using the sample photos, I am going to use my own photo and the photo contains many trees and most of the person in the image wore a cooling glass. This will make some challenges while detecting eyes and their face.

I am using the model MTCNN (Multi-task Cascaded Convolutional Neural Networks) which was trained by millions of human images to predict the human face, particularly eyes, nose, mouth. Also, I am pointing out by means of box and dots.

# face detection with mtcnn on a photograph
# draw an image with detected objects
def draw_image_with_boxes(filename, result_list):
 # load the image
 data = pyplot.imread(filename)
 # plot the image
 pyplot.imshow(data)
 # get the context for drawing boxes
 ax = pyplot.gca()
 # plot each box
 for result in result_list:
  # get coordinates
  x, y, width, height = result['box']
  # create the shape
  rect = Rectangle((x, y), width, height, fill=False, color='red')
  # draw the box
  ax.add_patch(rect)
  # draw the dots
  for key, value in result['keypoints'].items():
   # create and draw dot
   dot = Circle(value, radius=2, color='red')
   ax.add_patch(dot)
 # show the plot
 pyplot.show()
filename = '/content/drive/My Drive/ColoabDataset/image/frndgang.jpg'
# load image from file
pixels = pyplot.imread(filename)
# create the detector, using default weights
detector = MTCNN()
# detect faces in the image
faces = detector.detect_faces(pixels)
# display faces on the original image
draw_image_with_boxes(filename, faces)

It predicts the face by pointing out by square box and predicts the eyes, nose, mouth by means of red color dots.

Extracting the detected photos

Now we are going to extract the faces detected by our model and saving it in the drive folder. Later we'll use it to predict the person by using the Support Vector Machine algorithm.
Create a folder in the google drive by following the below structure and upload the photos in the respective folder. Detected images from the screenshot can be saved in any sub-folders of the "val" folder and it is used for the testing purpose.

StrangerDetection (folder name)
    |__data (folder name)
        |_train (folder name)
        |    |_madhan (folder name)
        |    |    |_madhanphoto1.jpg
        |    |    |_madhanphoto2.jpg
        |    |
        |    |_kishore (folder name)
        |        |_kishorephoto1.jpg
        |        |_kishorephoto2.jpg
        |_val (folder name)
            |_madhan (folder name)
            |    |_madhanphoto1.jpg
            |    |_madhanphoto2.jpg
            |
            |_kishore (folder name)
                |_kishorephoto1.jpg
                |_kishorephoto2.jpg

# extract and plot each detected face in a photograph
# draw each face separately
def draw_faces(filename, result_list):
 # load the image
 data = pyplot.imread(filename)
 # plot each face as a subplot
 for i in range(len(result_list)):
  # get coordinates
  x1, y1, width, height = result_list[i]['box']
  x2, y2 = x1 + width, y1 + height
  # define subplot
  pyplot.subplot(1, len(result_list), i+1)
  pyplot.axis('off')
  # plot face
  # cv2.imwrite('/content/drive/My Drive/ColoabDataset/StrangerDetection/data/val/%s.jpg'%(i), data[y1:y2, x1:x2])
  # pyplot.imsave('/content/drive/My Drive/ColoabDataset/StrangerDetection/data/val/%s.jpg'%(i), data[y1:y2, x1:x2])
  pyplot.imshow(data[y1:y2, x1:x2])
 # show the plot
 pyplot.show()
# filename = '/content/drive/My Drive/ColoabDataset/image/uday.jpg'
filename = '/content/drive/My Drive/ColoabDataset/image/frndgang.jpg'
# load image from file
pixels = pyplot.imread(filename)
# create the detector, using default weights
detector = MTCNN()
# detect faces in the image
faces = detector.detect_faces(pixels)
# display faces on the original image
draw_faces(filename, faces)

You can see here that, our model extracted all the human faces from the image and it works best compared to other models in the market.

Train the images of family members

I assume myself (Madhan) and one my friend (Kishore) as a family member and remaining all others as strangers. So, I trained a dozen images of myself and Kishore. Check the previous blog for training the image. Instead of training images like that method, I am using the face embedding technique Facenet developed by Google by using millions of images. The 'Face Embedding' model analyzes images and returns numerical vectors that represent each detected face in the image in a 1024-dimensional space.
Later I am using the Support Vector Machine (SVM) algorithm to classify them.

# load faces
data = load('stranger-faces-dataset.npz')
testX_faces = data['arr_2']
# load face embeddings
data = load('stranger-faces-embeddings.npz')
trainX, trainy, testX, testy = data['arr_0'], data['arr_1'], data['arr_2'], data['arr_3']
# normalize input vectors
in_encoder = Normalizer(norm='l2')
trainX = in_encoder.transform(trainX)
testX = in_encoder.transform(testX)
# label encode targets
out_encoder = LabelEncoder()
out_encoder.fit(trainy)
trainy = out_encoder.transform(trainy)
testy = out_encoder.transform(testy)
# fit model
model = SVC(kernel='linear', probability=True)
model.fit(trainX, trainy)
# test model on a random example from the test dataset
# selection = choice([i for i in range(testX.shape[0])])
for selection in range(testX.shape[0]):
  random_face_pixels = testX_faces[selection]
  random_face_emb = testX[selection]
  random_face_class = testy[selection]
  random_face_name = out_encoder.inverse_transform([random_face_class])
  # prediction for the face
  samples = expand_dims(random_face_emb, axis=0)
  yhat_class = model.predict(samples)
  yhat_prob = model.predict_proba(samples)
  # get name
  class_index = yhat_class[0]
  class_probability = yhat_prob[0,class_index] * 100
  predict_names = out_encoder.inverse_transform(yhat_class)
  if class_probability > 90:
    title = '%s (%.3f)' % (predict_names[0], class_probability)
  else:
    title = 'Stranger (%.3f)' % (100-class_probability)
    pyplot.imsave('/content/drive/My Drive/ColoabDataset/StrangerDetection/mail/stranger%s.jpg'%(selection), random_face_pixels)
  plt.figure(figsize=(10, 10))
  plt.subplot(1,10, selection+1)
  plt.title(title)
  plt.axis('off')
  plt.imshow(random_face_pixels)   
plt.show()

I gave the condition of above 90 percent as family members and the remaining persons as strangers. Because of the SVM model, we trained images only for two persons.

Mailing the strangers image

I am using the Gmail to send the mail and there is an option of attaching multiple images and sending it to multiple recipients. Use your Email ID and password. Also, add the receiver Email ID.

IST_time = pytz.timezone('Asia/Kolkata')
date_time = datetime.now(IST_time)
today_date = (str(date_time).split('.')[0]).split(' ')[0]
time_now = (str(date_time).split('.')[0]).split(' ')[1]
print(date_time)
#Set up crap for the attachments
files = "/content/drive/My Drive/ColoabDataset/StrangerDetection/mail"
filenames = [os.path.join(files, f) for f in os.listdir(files)]
number_of_person = len(filenames)
#Set up users for email
gmail_user = "your email ID"
gmail_pwd = "Email ID password"
recipients = ["Receiver email ID"]
# Record the MIME types of both parts - text/plain and text/html.
part1 = MIMEText(html, 'html')
#Create Module
def mail(to, subject, attach):
   msg = MIMEMultipart()
   msg['From'] = gmail_user
   msg['To'] = ", ".join(recipients)
   msg['Subject'] = subject
   msg.attach(part1)
   #get all the attachments
   for file in filenames:
      part = MIMEBase('application', 'octet-stream')
      part.set_payload(open(file, 'rb').read())
      encoders.encode_base64(part)
      part.add_header('Content-Disposition', 'attachment; filename="%s"' % file)
      msg.attach(part)
   mailServer = smtplib.SMTP("smtp.gmail.com", 587)
   mailServer.ehlo()
   mailServer.starttls()
   mailServer.ehlo()
   mailServer.login(gmail_user, gmail_pwd)
   mailServer.sendmail(gmail_user, to, msg.as_string())
   # Should be mailServer.quit(), but that crashes...
   mailServer.close()
#send it
mail(recipients,"Alert! someone enter into your house", filenames)

Reference

Coding part

Complete coding of this project is available in my Github repository and Colab.

Finally, we learned the technique to detect the human movements from the video and extracting the human face from the image and did the image classification. You can use this project as a sample model and improve this model as per your wish.

P.S - Around 85% of our problem comes because of worrying as mentioned in the book "How to stop worrying and start living by Dale Cargenie". Negativity is the root cause of most of our mental problems and it decreases self-confidence and leads to insecurity. Always stay busy and negativity hijacks our mind when we are alone. Stay happy and healthy. This too shall pass (இதுவும் கடந்து போகும்).

Search This Blog

Madhan Kumar Selvaraj's blog