Qualcomm Technologies, Inc. hosts an annual hackathon for our interns, and every year they push the boundaries of mobile innovation. This year was no exception and we were, once again, dazzled by their creations! After just 16 hours, Team FaceBlock, consisting of Oles Andrienko, Zhiyu Liang, and Alexander Li, was named one of the top three finalists. The team creatively used the Qualcomm® Neural Processing SDK to develop an application that blocks out unwanted people in videos for use as a privacy function for social media. After their final presentation to our executive team, the interns behind FaceBlock were named the 2018 Hack Mobile winners!
The hackathon was also instrumental to answer some important questions: is Artificial Intelligence (AI) ready for commercialization on smartphones? Is it conceivable to run custom neural networks on an everyday APP? Can it be done in 16 hours? And, is it fun? Team FaceBlock answered all these questions with a vibrant yes. Read along to learn how to mix the latest on-device technologies, skills, and talent to provide a solution to a real-world use case.
Tell us a little about yourselves and how your team got interested in the hack?
Oles: I’m a junior at the University of Waterloo, majoring in robotics with a minor in computer science. During my internship I’ve been working as a software engineer on the Machine Learning Software Architecture team, to develop new use cases for hardware accelerated on-device machine learning.
Zhiyu: I’m a junior at the University of Toronto, majoring in computer science with a focus on AI and computer vision. I’ve been interning as a computer vision researcher on the Edge AI team, to develop efficient deep learning algorithms for mobile.
Alexander: I’m also a junior at the University of Toronto, majoring in electrical and computer engineering. I’ve been working as a software engineer on the Machine Learning Software team, to develop solutions for our machine learning platform.
Since all three of us are working in separate groups at QTI, we saw the hackathon as a great opportunity to collaborate and were drawn in by the flexibility to work on any project that we chose. We also knew that we’d get the chance to work with QTI’s latest hardware and software, which was a big plus.
When you initially started working with the Qualcomm Neural Processing SDK, what was the first idea that came to mind? What type of project did you want to work on?
All of our internships are focused on computer vision and deep learning, so we knew right away we wanted to do something related. We started off with the idea of removing unwanted people from photos using a Generative Adversarial Network (GAN). However, we quickly realized that this might not be feasible since training GANs is not always straightforward. Nevertheless, the idea of removing unwanted people from photos was interesting.
With the popularity of live-streaming and the recent concerns with privacy, we settled on the idea of creating an app that blocks out unwanted people in videos. This meant being able to accurately detect faces while tracking a main user's face on a device. Not an easy task when dealing with mobile hardware without something like the Neural Processing SDK.
How did you go about building your project? Describe the process.
We started building a simple prototype with OpenCV in Python. Although the tracking had worked relatively well, the default Haar Cascade-based facial detection in OpenCV was very unreliable. After testing the concept, we started working on the actual production app. We began by kicking-off a training process using the Tensorflow Object Detection API. Using pre-trained COCO weights, we fine-tuned a MobileNet SSD Object Detection network on the WIDER face dataset. This allowed us to get the facial detection accuracy and speed we wanted.
While the training was running, we got to work on the actual Android app. To take advantage of time, we downloaded a sample app which ran a general detection network on top of the Qualcomm Accelerated Neural Network runtime, the Neural Processing SDK. This was a great starting point which allowed us to quickly swap out the default network once we were done training our own.
The next step was tracking. We wanted to release the app to Android, so we went through the process of compiling OpenCV for Android from scratch just to have access to the native tracking API. After a successful compilation, we integrated a MIL-based tracker into the detection logic to filter out the tracked face.
The final steps involved switching in our custom detection model after eight hours of training, adding some logic to overlay the emojis on the faces which were not being tracked, and performing some minor touch-ups.
Why did you choose to run your model on mobile instead of hosting it in the cloud?
We decided to run inference on the device because we were processing videos. We wanted the experience to be smooth, so it was vital to have the model run as fast as possible. By relying on the cloud or external servers, we would not have been able to process our video in real-time as uploading and downloading videos both need huge overhead and network connections that are not always available. Privacy was also a major consideration in our application. Making sure images of unwanted faces don’t leave the device is critical to maintaining the security of those images.
How much AI preparation did you do in advance of the hack? How much background research did you do before getting started?
Before starting our internships, we had all taken various machine learning courses which was a major help coming into the hack. Plus, all three of us were focusing on machine learning related work in our internships which made preparing for our project a lot easier.
Once we had thought of an idea, most of the preparation we did came in the form of reading documentation. We made sure we went into the project fully understanding the tools we were using such as the TensorFlow Object Detection API, OpenCV, and the Neural Processing Engine Java API. We also read about the most recent object detection models available.
What surprised you the most about the features/functionality of the Neural Processing SDK?
The most surprising thing was the performance we were getting when we deployed our model. Since we wanted to run everything on the device, we initially thought there would be quite a bit of latency when doing inference on videos. However, with the help of the Neural Processing SDK, we were able to run our model at over 25 frames per second. The SDK allowed us to take our model and deploy it straight to the Adreno GPU without worrying about the details in hardware acceleration.
What advice would you give someone who has never used the Neural Processing SDK? What tips would you give them for getting started on their own projects?
The great thing about the Neural Processing SDK is that it manages to make the process of deploying your model to specialized hardware much easier and much more accessible to the developer. The most helpful tool to get started is to read the documentation online. It goes through all the steps needed to deploy your network on QTI hardware from any of the major machine learning frameworks. Also, make sure the SDK supports the layers you are using - this will save you a lot of time.
What do you think your next AI project will be?
There are definitely a lot of avenues we could pursue given the growth of the field. We’re all very interested in topics related to mobile inference, such as quantization, compression, and hardware acceleration, so most likely something in that context.
Congratulations again to Team FaceBlock on a great entry to the 2018 Hack Mobile event at Qualcomm Technologies, Inc. If you are a developer new to AI, check out our Developers Guide to AI eBook. Additional information on the Neural Processing SDK can be found on our site.