Automatic Seamless Face Replacement in Videos

Similar documents
CS 231A Computer Vision (Winter 2014) Problem Set 3

Digital Makeup Face Generation

Panoramic Image Stitching

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

Shape Contexts. Newton Petersen 4/25/2008

Midterm Examination CS 534: Computational Photography

ANIMATING ANIMAL MOTION

Real-time Object Detection CS 229 Course Project

technique: seam carving Image and Video Processing Chapter 9

Mosaics. Today s Readings

CS 231A Computer Vision (Winter 2018) Problem Set 3

CS1114: Study Guide 2

CIS581: Computer Vision and Computational Photography Project 2: Face Morphing and Blending Due: Oct. 17, 2017 at 3:00 pm

Recognizing hand-drawn images using shape context

Content-Aware Image Resizing

EE795: Computer Vision and Intelligent Systems

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

CS 223B Computer Vision Problem Set 3

Image Compression and Resizing Using Improved Seam Carving for Retinal Images

Image gradients and edges April 11 th, 2017

Image gradients and edges April 10 th, 2018

EE795: Computer Vision and Intelligent Systems

VIDEO FACE BEAUTIFICATION

Rectangling Panoramic Images via Warping

Duck Efface: Decorating faces with duck bills aka Duckification

Interpreter aided salt boundary segmentation using SVM regression shape deformation technique

CS 231A Computer Vision (Autumn 2012) Problem Set 1

Local Image preprocessing (cont d)

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

Learning-based Neuroimage Registration

PROCESS > SPATIAL FILTERS

Nonrigid Surface Modelling. and Fast Recovery. Department of Computer Science and Engineering. Committee: Prof. Leo J. Jia and Prof. K. H.

CS 231A Computer Vision (Fall 2012) Problem Set 3

Image processing and features

EE795: Computer Vision and Intelligent Systems

Defining a Better Vehicle Trajectory With GMM

Filtering Applications & Edge Detection. GV12/3072 Image Processing.

Classification and Detection in Images. D.A. Forsyth

CS 664 Image Matching and Robust Fitting. Daniel Huttenlocher

Video Aesthetic Quality Assessment by Temporal Integration of Photo- and Motion-Based Features. Wei-Ta Chu

Seam-Carving. Michael Rubinstein MIT. and Content-driven Retargeting of Images (and Video) Some slides borrowed from Ariel Shamir and Shai Avidan

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Image Processing. Filtering. Slide 1

Understanding Tracking and StroMotion of Soccer Ball

Obtaining Feature Correspondences

Classification of objects from Video Data (Group 30)

Image Processing

09/11/2017. Morphological image processing. Morphological image processing. Morphological image processing. Morphological image processing (binary)

Tracking and Recognizing People in Colour using the Earth Mover s Distance

Algorithms for Recognition of Low Quality Iris Images. Li Peng Xie University of Ottawa

MediaTek Video Face Beautify

2.1 Optimized Importance Map

RECOGNITION AND AGE PREDICTION WITH DIGITAL IMAGES OF MISSING CHILDREN. CS 297 Report by Wallun Chan

Computer Vision CSCI-GA Assignment 1.

Spatial Localization and Detection. Lecture 8-1

Edge Detection. CMPUT 206: Introduction to Digital Image Processing. Nilanjan Ray. Source:

Raghuraman Gopalan Center for Automation Research University of Maryland, College Park

Applications. Foreground / background segmentation Finding skin-colored regions. Finding the moving objects. Intelligent scissors

Image gradients and edges

Template Matching Rigid Motion

CS233: The Shape of Data Handout # 3 Geometric and Topological Data Analysis Stanford University Wednesday, 9 May 2018

CS664 Lecture #21: SIFT, object recognition, dynamic programming

CSE 252B: Computer Vision II

A System of Image Matching and 3D Reconstruction

Dynamic Routing Between Capsules

University of Cambridge Engineering Part IIB Module 4F12 - Computer Vision and Robotics Mobile Computer Vision

Edges and Binary Image Analysis April 12 th, 2018

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

Texture. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors

Fourier Descriptors. Properties and Utility in Leaf Classification. ECE 533 Fall Tyler Karrels

Generic Face Alignment Using an Improved Active Shape Model

The Insight Toolkit. Image Registration Algorithms & Frameworks

CRF Based Point Cloud Segmentation Jonathan Nation

Shape Context Matching For Efficient OCR

convolution shift invariant linear system Fourier Transform Aliasing and sampling scale representation edge detection corner detection

Transformation Functions for Image Registration

Classifying Depositional Environments in Satellite Images

An Exploration of Computer Vision Techniques for Bird Species Classification

Occluded Facial Expression Tracking

REAL-TIME FACE SWAPPING IN VIDEO SEQUENCES: MAGIC MIRROR

ECE 5775 High-Level Digital Design Automation, Fall 2016 School of Electrical and Computer Engineering, Cornell University

Figure 1. Example sample for fabric mask. In the second column, the mask is worn on the face. The picture is taken from [5].

Feature Detectors - Canny Edge Detector

Template Matching Rigid Motion. Find transformation to align two images. Focus on geometric features

ECE 172A: Introduction to Intelligent Systems: Machine Vision, Fall Midterm Examination

CHAPTER-4 LOCALIZATION AND CONTOUR DETECTION OF OPTIC DISK

An Efficient Single Chord-based Accumulation Technique (SCA) to Detect More Reliable Corners

ME/CS 132: Introduction to Vision-based Robot Navigation! Low-level Image Processing" Larry Matthies"

Multimedia Computing: Algorithms, Systems, and Applications: Edge Detection

Model Fitting, RANSAC. Jana Kosecka

SCALE INVARIANT FEATURE TRANSFORM (SIFT)

CS334: Digital Imaging and Multimedia Edges and Contours. Ahmed Elgammal Dept. of Computer Science Rutgers University

Today s lecture. Image Alignment and Stitching. Readings. Motion models

IRIS SEGMENTATION OF NON-IDEAL IMAGES

Dynamic programming - review

Ping Tan. Simon Fraser University

EN1610 Image Understanding Lab # 4: Corners, Interest Points, Hough Transform

Chapter 11 Representation & Description

Image Warping. Srikumar Ramalingam School of Computing University of Utah. [Slides borrowed from Ross Whitaker] 1

Transcription:

Automatic Seamless Face Replacement in Videos Yiren Lu, Dongni Wang University of Pennsylvania December 2016 1 Introduction In this project, the goal is automatic face detection and replacement in videos. Given a video, we first detect faces and extract critical facial feature and contour coordinates from every frame of the given video, using a 3rd party library, dlib. Then, we run through a process pipeline to achieve a seamless face replacement result on each of the frames. Finally, we generate a video with processed frames, in which all chosen faces are replaced by our pre-generated source faces of choice. Our team choose Option 2, and we aim to achieve satisfactory overall results for seamless face replacement with the help of standalone 3rd party libraries. 2 Pipeline Overview Our pipeline takes two video inputs, namely the given target video in which target faces would be replaced and a source video clip which provide the source faces. The output of our pipeline is one video with the same number of frames as the given target video, with one or all faces replaced by a similar face detected in the source video. Normally, source faces are generated in images. However, to choose the best source faces that are most similar to our target faces in terms of shape and pose, and furthermore, to avoid the jiggling of faces resulted from per-frame wrapping, our team choose to generate source faces directly from videos. The first step in the pipeline is to take the videos and pass them to textitdlib for the face detection, the results include the critical facial feature and contour coordinates for all the faces from every frame. Then, if faces are detected, we would start/update the face tracking using KLT, otherwise we would use the face tracking results to generate estimated facial feature and contour coordinates. After this step, we smooth the face-region images and conduct shape and episode matching, to find the most similar 20-frame slices in source and target videos. More details of the matching method will be discussed in the next section. After we find the source faces, we would perform the face wrapping using TPS and face replacement using masks generated by either convex hull or seamless carving in log-polar space. Next, we do the Poisson blending to adjust the color of wrapped faces and finally, generate the resulting video. 3 Method 3.1 Face Detection For the task of face detection. Our team first experimented with Matlab built-in CascadeObjectDetector and tune the parameters to get benchmarks on video frames. However, the precision of this detector is not promising. Tuning the thresholds does not seem to improve overall detection performace. We turned our eyes to third party libraries for more reliable detection results then. The first third party detection method we tried is Facepp. Compared with the built-in detector, this method significantly improves prevision and achieve better over detection accuracy. What s more, the facial landmarks were also detected, which could be of help to get the face region for carving. One con for this method that it is slow because we are not provided with the source code, but rather submitting detection jobs online. Each 500x500 frame takes at least a few seconds. We first relied on this detection method to proceed with our pipeline implementation, until later we found a open source library which provides us with comparable detection accuracy while being significantly faster. Our final face detection employed the dlib library. As

Figure 1: Pipeline stated earlier, this method performs well both in terms of speed and accuracy. dlib a library most written in C++ and comes with Python API. We wrote a python wrapper for this task to further process the results and export them in the desired.mat format. number of frames (e.g. 10 frames), the tracking results would go wild. In this case, we stop the tracking and wait for the next successful detection. The problem in directly using frame-by-frame detection to perform replacement is that the detection results between frames are completely independent, therefore, they tended to jitter a lot, which results in unstable face warping and blending. To achieve stable tracking trajectories, after tracking, we applied a 1-d temporal Gaussian convolution to each landmark point s trajectory. This smoothing scheme resulted in nature and realistic face landmark trajectories. t(i) = t(i k) g(k) k Where, t is the trajectory of one landmark, g is a 1-d Gaussian kernel, and the range of k controls the span of the convolution. Figure 2: Face detection results by Face++ 3.3 3.2 Face Matching In video, for most times, the pose of the target face (the face to be replaced) does not completely agree to the source face (the face for replacement). This results in unnatural face replacement results. To address this problem, we implemented landmarks shape matching to match similar pose of faces for replacement. Face Tracking & Trajectories Smoothing Results from dlib face detection are pretty good. However, for some challenging cases (lighting changes/boundary head angles), dlib failed to detect faces. We filled in the detection failure frames by a KLT tracker. If the face detector failed for a consecutive 3.3.1 Face Shape Features Feature extraction of the target face follows [2]. 2

Convert the landmark points into log-polar coordinates For each landmark point, compute the relative coordinates of the remaining points. Generate a histogram of all the relative coordinates. A visualization of one face feature is the following: 1. Loop N iterations 2. Randomly pick a consecutive sequences of same length in the source video 3. Compute error for each pair of corresponding frames. 4. Count inliers with a threshold e 5. Keep the matching with largest number of inliers Here, we just used euclidean distance, but other metrics are also worth trying. 3.4 Face Replacement The matching algorithm provides us with the source faces that are most similar with their associated target faces. Now, we could proceed to the next step in pipeline: face replacement. 3.4.1 TPS Face Warping There are a few good properties of this feature: Translation invariant: since it computes the relative coordinates Rotation invariant: subtract an offset angle of vector between two eyes center. Scale invariant: we normalize the radius feature by dividing the largest radius (The paper used median of radius, but we found it to be 0). To match the features, we can directly use euclidean distance or using χ 2 distance: 1 2 K k=1 [g(k) h(k)] 2 g(k) + h(k) In practice, we found that euclidean distance actually worked better for our task. 3.3.2 RANSAC Episodic Matching The matching algorithm looks for the best matching for each frame. However, due to head angle changes, the lighting conditions of matched source faces change a lot. This results in flashing in the final replacement result. To address this problem, we adopted episodic matching. The basic idea is to match a sequence of consecutive frames (e.g. 20 frames) instead of a single frame. To implement this, we can employ RANSAC framework. For each eposide: Our team decide to use thin plate spline (TPS) warping for this task. Compared to the affine transformation, TPS usually gives smoother curves around the carving edges, the resulting warping image does confirm our statement. Specifically, since TPS is applied to coordinate transformation, we use two splines, one for the displacement in the x direction and one for the displacement in the y direction. We then transform all the pixels in target faces by the TPS model, and read back the pixel value in source faces directly. 3.4.2 Log-Polar Carving To automatically select the face region for replacement, we implement the seamless carving algorithm in log-polar coordinates. The reason that we transform the image from using Euler coordinates to log-polar coordinates is that we want to find a seam carving around the face, which should also form a closed cycle in Euler coordinates to make a valid mask. The algorithm details as follows: 1. Set the origin of the log-polar coordinates to be the detected position of the nose (center of the face). Set the radius of the region to be considered to be the distance between the center of the face and the furthest contour points from the center of the face. Set the number of wedges to be 360 and the number of rings to be considered equal to the radius*2. 2. Reorganize the approximate face region in log-polar coordinate using the parameters set earlier. 3

3. Find the minimum-energy (we also tried edge map, besides energy map) seam carving of the reorganized image. To force it to return a cycle in the Euler space, we must ensure the first and the last angle should have the same carving radius. In other words, the starting and ending column (ring) has to be the same for finding a vertical seam from the first to the last row (wedge). We can achieve this by making all the other options very expensive (cost high energy). 4. We could also force the seam to go through other contour points returned by the detector. However, we need to be careful since the dynamic programming algorithm we implemented in calculating the min seam would only consider three possible positions in the next level. If the contour points to be crossed are too close together, it can potentially affect the returned seam negatively. 5. Lastly, after getting the seam carving in the logpolar space, we transfer the coordinates back to the image grid. We will then have a complete cycle which is the face region to be replaced.[1] 4 Implementation All our code was fully vectorized. To save runtime, we re-sized the test video from 1280x720 to 640x360. On this resolution, our algorithm achieved 0.8-1.5s per face at runtime depending on the size of the face. (Detection was done offline by dlib). 5 Results & Discussion Below are some face replacement results. Our algorithm yields pretty desirable results both in static image and in video. (video results were attached in the submission). Here are some main factors that we think contributed to the final results: Good face landmarks detection. We have to admit that our pipeline built up on the good face landmarks detection provided by dlib. Trajectories smoothing made the face replacement results very stable. Figure 3: Min-energy seam carving returned by the logpolar carving algorithm 3.4.3 Poisson Blending After we have the wrapped face and carving face region ready, we want to seamlessly blend the wrapped face into the target image. Poisson Blending is a natural fit for this task. The key idea of the Poisson blending (gradient domain blending) is to apply the gradient of the source image (wrapped face) to the target image s replacement pixels (carved region), but keep boundary pixels unchanged. [3] Face episodic matching selects the most similar sequence to the target face episode. This compensates the head pose changes. Seamless poisson blending helps blend the source face with the target face. There are two aspects that we can improve base on the current pipeline: Refined face carving with GMM. We can actually use GMM to do face pixels classification. To do so, we can crop the detected face range and train a GMM online, and then use GMM to generate a face likelihood map. With proper threshold, we can obtain another face mask. Along with the previous convex hull and the log-polar carving, we can generate a refined face mask. This can be effect since 4

References including (even a small amount of) background pixels into the carving is damaging to the final blending. GMM can rule out the background pixels. [1] AVIDAN, S., AND S HAMIR, A. Seam carving for content-aware image resizing. In ACM Transactions on graphics (TOG) (2007), vol. 26, ACM, p. 10. Head pose estimation by PnP. There are mature technologies to estimate the head pose base on 2d images. These methods can improve episodic matching. [2] B ELONGIE, S., M ORI, G., AND M ALIK, J. Matching with shape contexts. 81 105. [3] P E REZ, P., G ANGNET, M., AND B LAKE, A. Poisson image editing. In ACM Transactions on Graphics (TOG) (2003), vol. 22, ACM, pp. 313 318. Figure 4: Left:original face; right: replaced face Figure 5: Side angle. Left:original face; right: replaced face Figure 6: Side angle. Left:original face; right: replaced face 5