Recently, a client stopped by and asked if Armedia had, or knew of any open source, face matching software that could scan a video and determine if a suspect’s face appeared anywhere in the video.
Our team was intrigued.
In today’s world, how available and how accurate was this type of software? This request launched a line of investigation that not only solved our client’s problem but proved to be insightful and rewarding.
A quick Internet search for open source products that could match a probe face in a video produced limited results, so we expanded our search parameters slightly to include low-cost cloud services. The products we decided to investigate were:
We assigned a developer to each product to prototype a solution, given only the following minimum requirements from our client:
- The solution shall identify (i.e., indicate time off-sets) all occurrences of a suspect’s face in a video.
- The solution shall output a CSV file for each suspect indicating in which videos they were detected, and at what time off-sets.
- The solution shall output a CSV file for each video containing the name and time off-set at which each suspect was identified in the video.
- The solution shall accept all probe images as a group.
- The solution shall accept all videos as a group.
- The solution shall perform better than real-time (i.e., a 5-minute video should be processed in less than 5 minutes).
- The solution shall have adjustable operating parameters to affect the sensitivity of detection and computer resource usage.
Early in the project, the pyAnnote developer determined that although the accuracy and output format of the pyAnnote software was impressive, it was intolerably slow (roughly 2-3 times real time). Therefore, she abandoned pyAnnote as a possible solution and moved on to investigate the Python Face Recognition software, taking a different tack than the developer already investigating the same software.
With these broad requirements, developers were free to architect creative solutions. Therefore, each project took on its own character and individual goals.
- Face to Face – a desktop, Java-based UI, which leveraged Amazon’s Rekognition API in addition to their S2 storage infrastructure. This project could scale as demand required.
- Azure Multi-Imaged Face Identifier (AMIFI) – a desktop, C#-based UI, which leveraged Microsoft’s Face Cognitive Service. This project could accept multiple samples of a face to build a more comprehensive model of the suspect.
- Distributed Face Recognition (DFaR) – a massive, enterprise-level solution utilizing multiple servers, GPUs, load balanced queueing, etc. This project leveraged Python Face Recognition had no GUI and was designed to be integrated into a larger forensic process.
- Face Recognizer – the other project using Python Face Recognition took the opposite approach and built a solution that ran on a laptop with no special hardware or network connections.
The developers and the project managers met weekly for 3 months to demonstrate progress, discuss problems, and to solicit advice from one another. These meetings were dubbed “face offs” as they took on a tone of friendly competition. The developers went all out to produce the most accurate and fastest performing solution.
Two interesting findings resulted from these collaborative/competitive meetings.
- None of these software products could match face images in videos directly. They were designed to compare and match faces in still images. As a result, all the developers converted the videos into sequential series of still images (i.e., frames) using FFmpeg and compared the frames to the probe image.
- The conversion of videos to frames resulted in thousands (sometimes tens of thousands) of JPG files. Often, the majority of these images contained no faces and incurred processing and resource usage penalties unnecessarily. This was especially the case with cloud solutions where storage and processing incurred real costs. To address this problem, all of the developers filtered the images using Dlib. Only the frames in which Dlib indicated a face was present were allowed to pass through the identification process.
Perhaps the most interesting result was that all of these prototypes and underlying technologies achieved nearly the same level of accuracy (see Precision-Recall chart in Figure 1). Open source, community-driven technology held its own against the technologies developed by the global R&D powerhouse companies.
Although the results were nearly identical, that doesn’t mean they were sound. Looking at the Precision-Recall chart in Figure 1, you see that the data points cluster around 0.6 precision and 0.45 recall. This means that 60% of the time the software identified a suspect it was correct, while it only identified 45% of the suspects it was presented. Ideally, both precision and recall values should be high for an accurate, well-performing solution.
It is important to note that our test videos and real-life videos were low quality, grainy, surveillance videos. We would expect better results using higher quality videos.
Armedia satisfactorily identified 19 suspects in nearly 6,000 surveillance videos with confidence values and time off-sets so manual confirmation could be made.
The Face Match Face Off is a good example of the type of research and development Armedia brings to client engagements. Our team is well versed and well equipped to research, develop, prototype, and deploy technologies in the areas of computer vision, machine learning, big data, signals processing, and computer forensics. The Face Match Face Off not only solved a real-world problem for our client, it provided them with information about four leading face recognition technologies, enabling more informed decision making in future technology investments.