Recently, a client stopped by and asked if Armedia had, or knew of any open source, face matching software that could scan a video and determine if a suspect’s face appeared anywhere in the video.
Our team was intrigued.
In today’s world, how available and how accurate was this type of software? This request launched a line of investigation that not only solved our client’s problem but proved to be insightful and rewarding.
A quick Internet search for open source products that could match a probe face in a video produced limited results, so we expanded our search parameters slightly to include low-cost cloud services. The products we decided to investigate were:
We assigned a developer to each product to prototype a solution, given only the following minimum requirements from our client:
The solution shall identify (i.e., indicate time off-sets) all occurrences of a suspect’s face in a video.
The solution shall output a CSV file for each suspect indicating in which videos they were detected, and at what time off-sets.
The solution shall output a CSV file for each video containing the name and time off-set at which each suspect was identified in the video.
The solution shall accept all probe images as a group.
The solution shall accept all videos as a group.
The solution shall perform better than real-time (i.e., a 5-minute video should be processed in less than 5 minutes).
The solution shall have adjustable operating parameters to affect the sensitivity of detection and computer resource usage.
Early in the project, the pyAnnote developer determined that although the accuracy and output format of the pyAnnote software was impressive, it was intolerably slow (roughly 2-3 times real time). Therefore, she abandoned pyAnnote as a possible solution and moved on to investigate the Python Face Recognition software, taking a different tack than the developer already investigating the same software.
With these broad requirements, developers were free to architect creative solutions. Therefore, each project took on its own character and individual goals.
Face to Face – a desktop, Java-based UI, which leveraged Amazon’s Rekognition API in addition to their S2 storage infrastructure. This project could scale as demand required.
Azure Multi-Imaged Face Identifier (AMIFI) – a desktop, C#-based UI, which leveraged Microsoft’s Face Cognitive Service. This project could accept multiple samples of a face to build a more comprehensive model of the suspect.
Distributed Face Recognition (DFaR) – a massive, enterprise-level solution utilizing multiple servers, GPUs, load balanced queueing, etc. This project leveraged Python Face Recognition had no GUI and was designed to be integrated into a larger forensic process.
Face Recognizer – the other project using Python Face Recognition took the opposite approach and built a solution that ran on a laptop with no special hardware or network connections.
The developers and the project managers met weekly for 3 months to demonstrate progress, discuss problems, and to solicit advice from one another. These meetings were dubbed “face offs” as they took on a tone of friendly competition. The developers went all out to produce the most accurate and fastest performing solution.
Two interesting findings resulted from these collaborative/competitive meetings.
None of these software products could match face images in videos directly. They were designed to compare and match faces in still images. As a result, all the developers converted the videos into sequential series of still images (i.e., frames) using FFmpeg and compared the frames to the probe image.
The conversion of videos to frames resulted in thousands (sometimes tens of thousands) of JPG files. Often, the majority of these images contained no faces and incurred processing and resource usage penalties unnecessarily. This was especially the case with cloud solutions where storage and processing incurred real costs. To address this problem, all of the developers filtered the images using Dlib. Only the frames in which Dlib indicated a face was present were allowed to pass through the identification process.
Perhaps the most interesting result was that all of these prototypes and underlying technologies achieved nearly the same level of accuracy (see Precision-Recall chart in Figure 1). Open source, community-driven technology held its own against the technologies developed by the global R&D powerhouse companies.
Although the results were nearly identical, that doesn’t mean they were sound. Looking at the Precision-Recall chart in Figure 1, you see that the data points cluster around 0.6 precision and 0.45 recall. This means that 60% of the time the software identified a suspect it was correct, while it only identified 45% of the suspects it was presented. Ideally, both precision and recall values should be high for an accurate, well-performing solution.
It is important to note that our test videos and real-life videos were low quality, grainy, surveillance videos. We would expect better results using higher quality videos.
Armedia satisfactorily identified 19 suspects in nearly 6,000 surveillance videos with confidence values and time off-sets so manual confirmation could be made.
The Face Match Face Off is a good example of the type of research and development Armedia brings to client engagements. Our team is well versed and well equipped to research, develop, prototype, and deploy technologies in the areas of computer vision, machine learning, big data, signals processing, and computer forensics. The Face Match Face Off not only solved a real-world problem for our client, it provided them with information about four leading face recognition technologies, enabling more informed decision making in future technology investments.
It turns out I was wrong…which happens at an alarmingly increasing rate these days—though I chalk that to a thirst to challenge myself…errr, my story!
So, for a while now, I had convinced myself that I knew what the most important thing was about successfully doing predictive analytics: accuracy of data and the model (separating the noise). Veracity, as they say. In working with a few clients lately though, I no longer think that’s the case. Seems the most important thing is actually the first thing: What is the thing you want to know? The Question.
As technologists, we often tend to over-complicate and possibly over-engineer. And it’s easy to make predictive analytics focus on the how; the myriad of ways to integrate large volumes and exotic varieties of data, the many statistical models to evaluate for fit, the integration of the technology components, the visualization techniques used to best surface results, etc. All of that has its place. But ultimately, first, and most importantly, we need to articulate the business problem and the question we want answers for.
What do we want to achieve from the analytics? How will the results help us make a decision?
Easy as that sounds, in practice it is not particularly easy to articulate the business question. It requires a real understanding of the business, its underlying operations, data and analytics and what would really move the meter. There is a need to marry the subject matter expert (say, the line of business owner) with a quant or a data scientist and facilitate the conversation. This is where we figure out the general shape and size of the result and why it would matter; also, what data (internal and external) feeds into it.
Articulating The Question engages the rest of the machinery. Answers are the outcome we care about. The process and the machinery (see below for how we do it) give us repeatability and ways to experiment with both asking questions and getting answers.
Armedia Predictive Analytics process for getting from The Question to answers
Big data is a term that is the main talking point of most companies that deal with the vast amounts of data out there from sales figures to trend analysis. Most of the time the task that is assigned to the ‘Big Data’ team is to find a way to get the information, which already exists somewhere, into a format that can be used to derive information – data science. Now what do you do when you don’t have enough data to do a formative analysis or some other data intensive task? Well, you have to create data that meets your needs. This may sound like a trivial task, but in fact, it is quite intensive as you must first define what you need before you even try to determine what you can use to create the new data, and then determine the best way to create and store it for your need.
I will not cover the basics of what big data is or the tools to do the aggregating and analysis today as much of this is already covered in the myriad literature out there. What I will cover is a sample of what we have had to go through in order to meet some of the mission goals of our customer with other implementations to follow. The problem that our customer and many other customers have is that they use tools which need to be trained to perform a task or to establish some statistical model to base decisions off of. These so called ‘expert’ systems need to be ‘trained’ with known artifacts, essentially a ground-truth training corpus. The decisions that these systems then make on real world data is based entirely on what they have learned from the training data.
How do we make sure that enough variety or similarity is provided for a tool to reach a certain level of accuracy, precision and recall? As before, the answer varies, so we must be prepared to provide any level needed until we reach the accuracy, precision and recall set points that meet requirements. Two specific tasks where we had to create very large amounts of data of which one is presented here for discussion with the other to follow:
A requirement to train an algorithm to be able to recognize two and three dimensional objects from any perspective
A requirement to train a software product to be able to recognize a variety of text and fonts in various languages in various file types
For the first task, the product/algorithm being developed and tested needed to understand what an object was and all specifics about it so it would be available to identify the object uniquely. The type of information that is used for both training and subsequently for testing are two-dimensional images of a three dimensional object. Essentially, we want to be able to find a specific object in a two dimensional image regardless of its orientation or occlusion (hidden behind another object partially). So, where is the big data aspect of this – let’s see:
A simple image (jpeg, nef, png or similar), depending on details can take up a large amount of space from a thousand bytes for a low resolution image to multiple megabytes for a high resolution image
For training at least a thousand images of an object from various camera angles are required
For training lighting effects may be significant so certain images must be darker and brighter
For training, both color, black & white, and grey-scale images may be required
We need the ability to track meta-data about each image so we know what the baseline ground truth training corpora is
We need to be able to search on the metadata and analyze the results continually to make sure that we have all possible training criteria defined
This has to be done for many objects
At this point, let’s see how much data we have for a simple image of a U.S. based Stop Sign before we explore a more complex object. This is essentially a two dimensional object which can be rotated around the x and y axis with minimal z-axis attributes. What we are not covering here is how we handle the rotation, which we can briefly state is fully automated through a variety of open source tools.
As you can see from this table, the space requirements alone for a ‘single’ two dimensional object, is already 19M. Though the metadata size is minimal and inconsequential when doing big data analysis, it is still critical as it defines an index for us to track what we have and determine if scenarios are missing.
Now let’s take a more complex, 3-dimensional object and see how the storage requirements change, such as a vehicle or plane – hmm lots of things that differentiate this plane from other planes.
85GB of images for a single object and how many objects are out there – Big Data of a different type. Even with petabyte storage available, you would soon use a lot of that up. So what exactly was our solution for this Big Data problem? We implemented a hybrid solution with the capability to generate data as needed while maintaining the requirement of a ground-truth corpora as well as the ability to search on metadata which depicted what all the data would be once created. This facilitates the main task users have to be able to do:
Determine if test data already exists which is applicable to them by searching
Determine if enough data already exists which is applicable
Ability to request creation of additional test data within specification of the corpora set with minimal impact to the storage footprint on a long term basis
Be able to baseline test cases to corpora version
Needless to say, the project is continually evolving as more types of data request come in to quickly and accurately evaluate products, systems, and algorithms. The main efficiency gained to date is in the preservation of storage space and time to completion for a specific task. For the former case, though storage is cheaper now, it can still be a procurement nightmare, especially in the Federal sector.
The next blog will cover how we handled the issue of text based files which was more of a traditional big data task, albeit, without the Map-Reduce interface.
In the first article in this 3-part series The New Data Visualization – This is Not Your Father’s Pie Chart we discussed the three main categories of data and the standard methods used to best portray each of them. If you haven’t read that one yet I’d like to suggest you CLICK HERE and do that first as it will give you a good foundation through which to understand this article.
When beginning to craft a data visualization solution, it’s important to understand the scope of your task. Often times, a very large amount of data will be the starting point. What we want to do first then is use this data to give us clues about where we need to focus our time. We aren’t going query on this initial dump, we’re going to put it into a tool that will show us what we have.
A good way to illustrate this tactic is via a network diagram. Also referred to as node/link diagrams, these are charts that show us how entities – the nodes – are “linked” to other entities in our data set.
Look at this screenshot from an open source tool called Gephi
In a Statement by Dr. David McClure, Associate Administrator, Office of Citizen Services and Innovative Technology, at an April 2011 Senate Subcommittee Hearing, McClure stated one of the biggest challenges federal agencies face in migrating to the cloud is data management. Data management in cloud computing is something that needs to be critically analyzed and strategized before solutions can be implemented. So lets take a look at some of the Data Management challenges that exist in Federal Cloud Computing Solutions:
First of all, it is important to understand that the IT needs of global organizations pale in comparison with those of the US federal government. Quite simply, the US Federal Government is enormous – composed of more than 2.1 million full-time federal employees, each of who use at least one IT system and 2,094 federal government data centers composed of thousands of servers.