In Your Face! Figuring Out Apple’s Face Detection API
I am making a native iOS app that has face detection. Apple has an awesome image detection API that can find faces, barcodes, and even rectangular shapes in images or video frames. The API came out with iOS 5.0, but I thought that an updated example with Swift 2.2 and Xcode 7.3 would hopefully help people out.
The code, located at https://github.com/dcandre/face-it, will allow you to view a video feed from your iOS device’s camera and superimpose a Storyboard file on top of a preview view at the left and right eye positions.
I am going to assume you can create a Single View Application in Xcode. My code restricts the device orientation to portrait mode. I have created a group in the Project Navigator called Video-Capture
. In that group, you can create the file VideoCaptureController.swift
.
VideoCaptureController Class
import Foundation import UIKit class VideoCaptureController: UIViewController { var videoCapture: VideoCapture? override func viewDidLoad() { videoCapture = VideoCapture() } override func didReceiveMemoryWarning() { stopCapturing() } func startCapturing() { do { try videoCapture!.startCapturing(self.view) } catch { // Error } } func stopCapturing() { videoCapture!.stopCapturing() } @IBAction func touchDown(sender: AnyObject) { let button = sender as! UIButton button.setTitle("Stop", forState: UIControlState.Normal) startCapturing() } @IBAction func touchUp(sender: AnyObject) { let button = sender as! UIButton button.setTitle("Start", forState: UIControlState.Normal) stopCapturing() } }
The magic that captures the video and performs the face detection will be encapsulated in the VideoCapture
class, which we will create next. For now, we will assume the interface for the VideoCapture class will have two methods startCapturing
and stopCapturing
. Notice the two action methods. When a user pushes the button the video capture will start and when they lift up on the button the video capture will stop. Like Snapchat, Instragram, Vine or other video capturing apps. You can check out the storyboard in my code, but feel free to create your own interface to start and stop the video capture.
The viewDidLoad
and didReceiveMemoryWarning
methods from the UIViewController class are overwritten. These will be used to instantiate our video capture object and to stop it from capturing if we have memory warnings.
Go into your storyboard and select your view controller. In the Identity Inspector, change the custom class to your VideoCaptureController
file. I used Touch Down, Touch Up Inside, and Touch Up Outside events to attach to the view controller’s action methods.
Before I talk about the VideoCapture class, I want to summarize Apple’s video capture process. To capture images or video from your iOS device’s camera you use the AVFoundation framework. The AVCaptureSession class couples inputs, like a camera, and outputs, like saving to an image file. We will use an output called AVCaptureVideoDataOutput. This will capture frames from a video and allow us to see what the camera sees.
Go ahead and create a file in the VideoCapture group called VideoCapture.swift
. The completed VideoCapture class can be found on GitHub. Here is the class declaration:
VideoCapture Class
import Foundation import AVFoundation import UIKit import CoreMotion import ImageIO class VideoCapture: NSObject, AVCaptureVideoDataOutputSampleBufferDelegate { var isCapturing: Bool = false var session: AVCaptureSession? var device: AVCaptureDevice? var input: AVCaptureInput? var preview: CALayer? var faceDetector: FaceDetector? var dataOutput: AVCaptureVideoDataOutput? var dataOutputQueue: dispatch_queue_t? var previewView: UIView? enum VideoCaptureError: ErrorType { case SessionPresetNotAvailable case InputDeviceNotAvailable case InputCouldNotBeAddedToSession case DataOutputCouldNotBeAddedToSession } override init() { super.init() device = VideoCaptureDevice.create() faceDetector = FaceDetector() } func startCapturing(previewView: UIView) throws { isCapturing = true self.previewView = previewView self.session = AVCaptureSession() try setSessionPreset() try setDeviceInput() try addInputToSession() setDataOutput() try addDataOutputToSession() addPreviewToView(self.previewView!) session!.startRunning() } func stopCapturing() { isCapturing = false stopSession() removePreviewFromView() removeFeatureViews() preview = nil dataOutput = nil dataOutputQueue = nil session = nil previewView = nil } }
This class needs to inherit from NSObject
, because the AVCaptureVideoDataOutputSampleBufferDelegate protocol inherits from NSObjectProtocol. NSObject takes care of implementing NSObjectProtocol
. We can talk about implementing captureOutput
for AVCaptureVideoDataOutputSampleBufferDelegate
later.
I have created an enumeration so users of this class can catch specific errors. I will talk about the device
and faceDetector
objects later in the overwritten init method. I have created the startCapturing
and stopCapturing
methods, but we have not implemented all of the methods that they call. We will go through all of them and then implement the VideoCaptureDevice
and FaceDetector
classes.
Session
When you want to capture video or an image from an iOS device camera, you need to instantiate a AVCaptureSession class. We assign the variable session
, in the startCapturing
method. Then we call the setSessionPreset
method. This should be added to your VideoCapture class.
private func setSessionPreset() throws { if (session!.canSetSessionPreset(AVCaptureSessionPreset640x480)) { session!.sessionPreset = AVCaptureSessionPreset640x480 } else { throw VideoCaptureError.SessionPresetNotAvailable } }
This checks to see if the camera can capture video at a 640×480 resolution. If not, then it will throw an error. I am using a resolution of 640×480, but you can use other resolutions. Here is a list of them. Now that we have a AVCaptureSession object we can start adding input and output classes.
Input Device
We are going to add two functions to our VideoCapture class. The setDeviceInput
method instantiates a AVCaptureDeviceInput class. This will handle the input device’s ports and allow you to use the camera on your iOS device.
private func setDeviceInput() throws { do { self.input = try AVCaptureDeviceInput(device: self.device) } catch { throw VideoCaptureError.InputDeviceNotAvailable } } private func addInputToSession() throws { if (session!.canAddInput(self.input)) { session!.addInput(self.input) } else { throw VideoCaptureError.InputCouldNotBeAddedToSession } }
Data Output
We have session and input classes, but now we want to capture the video frames for face detection. We will add another method to the VideoCapture class.
private func setDataOutput() { self.dataOutput = AVCaptureVideoDataOutput() var videoSettings = [NSObject : AnyObject]() videoSettings[kCVPixelBufferPixelFormatTypeKey] = Int(CInt(kCVPixelFormatType_32BGRA)) self.dataOutput!.videoSettings = videoSettings self.dataOutput!.alwaysDiscardsLateVideoFrames = true self.dataOutputQueue = dispatch_queue_create("VideoDataOutputQueue", DISPATCH_QUEUE_SERIAL) self.dataOutput!.setSampleBufferDelegate(self, queue: self.dataOutputQueue!) }
The AVCaptureVideoDataOutput class will allow us to process the uncompressed frames from our video feed. The videoSettings
property is a dictionary with one key/value pair.kCVPixelBufferPixelFormatTypeKey
is the type of format the video frames should be returned in. It is a four-character code that is converted to a integer for the AVCaptureVideoDataOutput
class.
As you can imagine there will be a lot of video frames being produced by the camera. Maybe even 60/second. That is why we use the dispatch_queue_create
method to create a serial queue with Grand Central Dispatch. This kind of queue will process one request at a time in the order in which they were added to the queue.
Finally a sample buffer is created for the queue. If we take too much time processing the frames the request will be removed from the queue, since we marked the property alwaysDiscardsLateVideoFrames
as true.
Next, add this method to your VideoCapture class.
private func addDataOutputToSession() throws { if (self.session!.canAddOutput(self.dataOutput!)) { self.session!.addOutput(self.dataOutput!) } else { throw VideoCaptureError.DataOutputCouldNotBeAddedToSession } }
This will add the AVCaptureVideoDataOutput class to the AVCaptureSession class.
Seeing is Believing
Add the following method to your VideoCapture class.
private func addPreviewToView(view: UIView) { self.preview = AVCaptureVideoPreviewLayer(session: session!) self.preview!.frame = view.bounds view.layer.addSublayer(self.preview!) }
We are going to instantiate a AVCaptureVideoPreviewLayer class. This class will allow you to see the video frames from the input device. Then we are going to add this as a sublayer of the view we pass in from the VideoCaptureController. In our example it is the main UIView associated with that controller. You will notice that we are setting the frame of the layer to the bounds of the enclosing view. Basically, it is the full size of the view.
If you check out the startCapturing
method in the VideoCapture class you will see that all of the methods are in place. We are definitely not done yet, though. How do we receive the video frames from the queue and how do we actually detect faces in those frames?
AVCaptureVideoDataOutputSampleBufferDelegate Protocol
There is only one method that is required for the AVCaptureVideoDataOutputSampleBufferDelegate protocol. It is captureOutput
.
func captureOutput(captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, fromConnection connection: AVCaptureConnection!) { let image = getImageFromBuffer(sampleBuffer) let features = getFacialFeaturesFromImage(image) let formatDescription = CMSampleBufferGetFormatDescription(sampleBuffer) let cleanAperture = CMVideoFormatDescriptionGetCleanAperture(formatDescription!, false) dispatch_async(dispatch_get_main_queue()) { self.alterPreview(features, cleanAperture: cleanAperture) } }
We can grab the video frame from the CMSampleBuffer that is passed to this function. We grab some properties about the image and then dispatch an asynchronous request through Grand Central Dispatch. We use the main thread, dispatch_get_main_queue. You want to use the main thread when updating the app’s UI, because other requests will not happen before your request, causing errors.
Let’s add the function getImageFromBuffer
to your VideoCapture class.
private func getImageFromBuffer(buffer: CMSampleBuffer) -> CIImage { let pixelBuffer = CMSampleBufferGetImageBuffer(buffer) let attachments = CMCopyDictionaryOfAttachments(kCFAllocatorDefault, buffer, kCMAttachmentMode_ShouldPropagate) let image = CIImage(CVPixelBuffer: pixelBuffer!, options: attachments as? [String : AnyObject]) return image }
CMSampleBufferGetImageBuffer will return the image buffer. The attachments
dictionary is populated by CMCopyDictionaryOfAttachments, which copies all of the properties of the sampleBuffer
object. A Core Image object is returned.
Go ahead and add the getFacialFeaturesFromImage
method to your VideoCapture class. The face detection code is encapsulated in the FaceDetector class. We will get to that later.
private func getFacialFeaturesFromImage(image: CIImage) -> [CIFeature] { let imageOptions = [CIDetectorImageOrientation : 6] return self.faceDetector!.getFacialFeaturesFromImage(image, options: imageOptions) }
I have set the orientation for the detection to portrait, which is 6, since this app is locked in portrait mode. The getFacialFeaturesFromImage
method on the FaceDetector class returns an array of CIFeature objects. In our case they will be the subclass CIFaceFeature. This object can tell you if eyes and a mouth are visible and where they are positioned. It will even tell you if it detects a smile. Before we create the FaceDetector class, let’s look at the alterPreview
method that we asynchronously dispatch to interact with the ui.
private func alterPreview(features: [CIFeature], cleanAperture: CGRect) { removeFeatureViews() if (features.count == 0 || cleanAperture == CGRect.zero || !isCapturing) { return } for feature in features { let faceFeature = feature as? CIFaceFeature if (faceFeature!.hasLeftEyePosition) { addEyeViewToPreview(faceFeature!.leftEyePosition.x, yPosition: faceFeature!.leftEyePosition.y, cleanAperture: cleanAperture) } if (faceFeature!.hasRightEyePosition) { addEyeViewToPreview(faceFeature!.rightEyePosition.x, yPosition: faceFeature!.rightEyePosition.y, cleanAperture: cleanAperture) } } } private func removeFeatureViews() { if let pv = previewView { for view in pv.subviews { if (view.tag == 1001) { view.removeFromSuperview() } } } } private func addEyeViewToPreview(xPosition: CGFloat, yPosition: CGFloat, cleanAperture: CGRect) { let eyeView = getFeatureView() let isMirrored = preview!.contentsAreFlipped() let previewBox = preview!.frame previewView!.addSubview(eyeView) var eyeFrame = transformFacialFeaturePosition(xPosition, yPosition: yPosition, videoRect: cleanAperture, previewRect: previewBox, isMirrored: isMirrored) eyeFrame.origin.x -= 37 eyeFrame.origin.y -= 37 eyeView.frame = eyeFrame }
In the alterPreview
method we remove the views we are positioning over the eyes, because we reposition them on each frame. If there were no facial features found, then we will bail without doing anything to the frame. If a left or right eye is found, then we call the addEyeViewToPreview(xPosition
method. This method contains a couple of methods that we need to also add to our VideoCapture class. The getFeatureView
will load a Storyboard file, which I have named HeartView
in my project.
private func getFeatureView() -> UIView { let heartView = NSBundle.mainBundle().loadNibNamed("HeartView", owner: self, options: nil)[0] as? UIView heartView!.backgroundColor = UIColor.clearColor() heartView!.layer.removeAllAnimations() heartView!.tag = 1001 return heartView! } private func transformFacialFeaturePosition(xPosition: CGFloat, yPosition: CGFloat, videoRect: CGRect, previewRect: CGRect, isMirrored: Bool) -> CGRect { var featureRect = CGRect(origin: CGPoint(x: xPosition, y: yPosition), size: CGSize(width: 0, height: 0)) let widthScale = previewRect.size.width / videoRect.size.height let heightScale = previewRect.size.height / videoRect.size.width let transform = isMirrored ? CGAffineTransformMake(0, heightScale, -widthScale, 0, previewRect.size.width, 0) : CGAffineTransformMake(0, heightScale, widthScale, 0, 0, 0) featureRect = CGRectApplyAffineTransform(featureRect, transform) featureRect = CGRectOffset(featureRect, previewRect.origin.x, previewRect.origin.y) return featureRect }
The getFeatureView
method loads the XIB file and tags it with an integer, 1001, so that we can go back later and easily remove it with the removeFeatureViews
method. The transformFacialFeaturePosition
method uses CGRectApplyAffineTransform to transform coordinates from one coordinate system to another. Why do we have to do that you ask? The video is being captured at 640×480, but our preview view is in portrait mode with a different width and height, depending on how the view is rendered to fit within the window. The different views are represented by CGRect objects, videoRect
and previewRect
. Once we have a CGRect object that represents the eye positions in the preview view coordinate system we can attach them to the previewView
as a subview.
Our VideoCapture class is looking pretty good. We can finish up by creating our two remaining classes: the VideoCaptureDevice class and the FaceDetector class.
VideoCaptureDevice Class
Create a new file called VideoCaptureDevice.swift
in the VideoCapture group. Here is the complete class on GitHub. Now we have our device object for the setDeviceInput
method in the VideoCapture class.
import Foundation import AVFoundation class VideoCaptureDevice { static func create() -> AVCaptureDevice { var device: AVCaptureDevice? AVCaptureDevice.devicesWithMediaType(AVMediaTypeVideo).forEach { videoDevice in if (videoDevice.position == AVCaptureDevicePosition.Front) { device = videoDevice as? AVCaptureDevice } } if (nil == device) { device = AVCaptureDevice.defaultDeviceWithMediaType(AVMediaTypeVideo) } return device! } }
This class contains a static factory method that creates an instance of AVCaptureDevice. This app uses video, so we use the devicesWithMediaType
method to find an array of devices of type AVMediaTypeVideo. Since we are doing face detection, I thought the front camera would be ideal. If the front camera is not found then the defaultDeviceWithMediaType
method is used to return a camera that can shoot video, which will most likely be the rear camera.
FaceDetector Class
Add a file called FaceDetector.swift
to the VideoCapture group. Here is the complete class on GitHub.
import Foundation import CoreImage import AVFoundation class FaceDetector { var detector: CIDetector? var options: [String : AnyObject]? var context: CIContext? init() { context = CIContext() options = [String : AnyObject]() options![CIDetectorAccuracy] = CIDetectorAccuracyLow detector = CIDetector(ofType: CIDetectorTypeFace, context: context!, options: options!) } func getFacialFeaturesFromImage(image: CIImage, options: [String : AnyObject]) -> [CIFeature] { return self.detector!.featuresInImage(image, options: options) } }
The CIDetector class is the interface we will use to detect faces in our CIImage object we created from the sampleBuffer. The CIDetectorTypeFace
param is a string which will tell the CIDetector
class to search for faces. One of the options for the CIDetector is CIDetectorAccuracy. We have set it to low, so that we don’t see performance issues with the amount of frames that we are going to be processing.
Final Thoughts
Remember that the iOS simulator doesn’t have a camera, so you need to test the app with a real device. When you run the app and start capturing, you should see your storyboard file superimposed over your eyes. In the future I would like to improve this code, so that it doesn’t have to create new UIView objects for every frame. It would just move existing ones or remove them when the video capturing was complete.
And that’s it! Let me know if you have any questions or comments!
Reference: | In Your Face! Figuring Out Apple’s Face Detection API from our JCG partner Derek Andre at the Keyhole Software blog. |
I really hope that soon we will switch to verification using facial recognition, because it would significantly increase the level of security on the Internet. By the way, I also like to read articles like this one https://www.kvalifika.com/blog/What-to-expect-from-the-SBC-Summit-2021 about what’s in store for us in the future, how these technologies will be developed and used. It is interesting.