Trajectory extraction for automatic face sketching Linje extraktion för automatisk ritning av ansikten - PDF

Description
Trajectory extraction for automatic face sketching Linje extraktion för automatisk ritning av ansikten RADU-MIHAI PANA-TALPEANU Master s Thesis at NADA Supervisor: Josephine Sullivan Examiner: Stefan Carlsson

Please download to get full document.

View again

of 38
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information
Category:

Internet

Publish on:

Views: 20 | Pages: 38

Extension: PDF | Download: 0

Share
Transcript
Trajectory extraction for automatic face sketching Linje extraktion för automatisk ritning av ansikten RADU-MIHAI PANA-TALPEANU Master s Thesis at NADA Supervisor: Josephine Sullivan Examiner: Stefan Carlsson Abstract This project consists of a series of algorithms employed to obtain a simplistic but realistic representation of a human face. The final goal is for the sketch to be drawn onto paper by a robotic arm with a gripper holding a drawing instrument. The end application is mostly geared towards entertainment and combines the fields of human-machine interaction, machine learning and image processing. The first part focuses on manipulating an input digital image in order to obtain trajectories of a suitable format for the robot to process. Different techniques are presented, tested and compared, such as edge extraction, landmark detection, spline generation and principal component analysis. Results showed that an edge detector yields too many lines, while the generative spline method leads to overly simplistic faces. The best facial depiction was obtained by combining landmark localization with edge detection. The trajectories outputted by the different techniques are passed to the arm through the high level interface provided by ROS, the Robot Operating System and then drawn on paper. Referat Detta projekt består av en serie av algoritmer som används för att erhålla en förenklad men realistisk återgivning av ett mänskligt ansikte. Det slutgiltiga målet är att skissen ska ritas på papper av en robotarm med en gripare som håller ett ritinstrument. Tillämpningen är nöjesorienterad och kombinerar områdena människamaskin-interaktion, maskininlärning och bildbehandling. Den första delen fokuserar på att manipulera en mottagen digital bild så att banor i ett format lämpligt för roboten erhålls. Olika tekniker presenteras, testas och jämförs såsom kantdetektion, igenkänning av landmärke, spline-generering och principalkomponentanalys. Resultaten visade att en kantdetektor ger alltför många linjer och att spline-genererings-metoden leder till alltför förenklade ansikten. Den bästa ansiktsskildringen erhölls genom att kombinera lokalisering av landmärke med kantdetektering. De banor som erhållits genom de olika teknikerna överförs till armen genom ett högnivågränssnitt till ROS, Robot Operating System och sedan ritas på papper. Contents 1 Introduction 1 2 Background and related work Canny edge extractor Robotic drawing of faces Non-photorealistic rendering Kyprianidis et al Winnemöller et al Hertzmann Obtaining line drawings Edge extraction Tresset & Leymarie edges Canny edge detection Results and discussion Detecting facial landmarks The landmark extraction model: FMP, flexible mixture of parts Improving landmark localization Better jawline landmarks through gradient magnitude image Better eyebrow landmarks through intensity image Better jawline landmarks with outlier detection Better eye landmarks through corner detection Regularized splines for the jawline Principal component analysis Results and discussion Edge extraction with landmark localization Conclusions on obtaining line drawings Sketching the trajectories Dumbo ROS Drawing in a simulated environment using ROS Point trimming Drawing on paper Conclusions and future work 35 Bibliography 37 Chapter 1 Introduction The number of existing digital images has increased dramatically over the past years and people have found a myriad of ways of manipulating them. Many of the image processing techniques available today are combined with machine learning algorithms and applied to parts of this massive database of photos. The result is a classifier which can identify the presence of faces, buildings, trees, cars, or whatever object it was trained to recognize. Therefore, the field of image processing plays a pivotal role in the evolution of artificial intelligence and machine learning in particular, as it acts as the sight with which machines see the world. In order for an autonomous robot to move it must be able to get information about its surroundings, which comes primarily in the form of digital images. This project also focuses on image processing as a way to interact with machines. The main goal is to manipulate a given digital image of a face in such a way that a sequence of contour trajectories is obtained, referred to as a line drawing. These trajectories are then communicated to a robotic arm holding a pen, that will attempt to draw a representation of the human face captured in the photo. The principal focus of the work will be the image processing part, while the robotics will play a smaller role in the research, acting mainly as a limitation on the possible line drawings that can be used. Tresset and Leymarie s work with Paul the robot [17] was an important inspiration for this project. They explore the possibilities of sketching portraits through the use of various techniques ranging from edge extraction to complex shading behaviour. Although the hardware used by the authors differs from the one this project relied on, the essence remains the same: finding an efficient way to automatically draw human faces. An initial approach is to use edge detection algorithms such as the Canny method or simpler edge filters. The resulting binary image is manipulated to obtain a line drawing that still captures the essential human facial features, while at the same time eliminates unnecessary clutter. The results were poor for all types of edge extraction algorithms tested, due to the difficulty in emphasizing salient regions of the face. Because of the lack of information about the importance of certain areas such as the eyes, nose or mouth, all edges are treated equally and any attempt to trim the clutter edges will also affect the correct detections. The line drawing will either be overrun by unneeded artefacts or it will lack important areas. An alternative to the edge extraction techniques is the generative approach of producing lines. If coordinates of visual importance would be available, they could be used in a connectthe-dots fashion to draw curves corresponding to the jawline, eyes, eyebrows, nose and mouth. These points are called facial landmarks and there exist very accurate and efficient ways of obtaining them [21]. Once they are available, splines or regular polynomials can be fitted to them and line drawings can be created generatively. The landmark detections will not be perfect, so improvement can also be done here in a series of ways, such as gradient magnitude correction or outlier detection. By combining an edge detector with the knowledge of landmark locations, the problem of 1 CHAPTER 1. INTRODUCTION Figure 1.1. Overview of the drawing process. Top: original image, bottom: edge extraction, edge trimming, drawn sketch. clutter edges can be diminished. The landmarks hold information about salient areas, which was lacking when using only edge detection. Further edge trimming by means of imposing a minimum connected component length and applying a skeleton extraction led to visually pleasing results and to the final line drawings sent to the robotic arm. The robotics part consisted of simulating the arm movement in the framework offered by the Robot Operating System followed by the actual interaction with Dumbo, the dual-arm robot at KTH s CVAP laboratory. The drawing process commences by moving the arm into an initial position of the joints, where the gripper can hold a pen and move it across the paper. Then, movement begins through a series of target coordinates on the drawing plane, during which time the pen is lifted and lowered when needed. An overview of the entire process is presented in figure Chapter 2 Background and related work The problem tackled is part of the image processing domain concerned with manipulation of human faces, employing techniques that span from detection to the line drawing transformation. The literature of interest therefore ranges from papers on edge extraction techniques to articles on face detection methods. There is also the aspect of the type of representation desired, which can either be photorealistic or non-photorealistic. After a brief exploration of the latter, it was decided to pursue the realistic option, as it is simpler and faster to draw, as it outputs fewer trajectories. 2.1 Canny edge extractor The first paper of interest is J.F. Canny s work on edge detection [1]. His algorithm described in 1986 is considered to give the best overall results and is the most widely used. The steps he takes include an initial smoothing of the image with a low-pass Gaussian filter, followed by computation of the edge gradient and direction for every point in the image by applying an edge detection operator such as a Sobel mask, which returns the value of the first derivative in the horizontal and vertical directions, G x and G y. The gradient magnitude is therefore G = G 2 x + G 2 y and the angle is Θ = arctan( G y G x ) Θ is rounded to one of four possible directions: 0, 45, 90 and 135 degrees. Next, a non-maximum suppression is carried out in order to remove thick edges and to obtain a binary representation of the edge points. A point is considered to be part of an edge, if its gradient magnitude in direction Θ is larger than the magnitude in the perpendicular direction. The final step is the hysteresis thresholding. A high threshold is applied to the binary image in order to be more certain that the edge points are actually correct and not caused by noise (these false edges would have low intensity and will therefore be removed by the threshold). A low threshold is then used in combination with the directional information to permit the drawing of faint edges, as long as a starting point exists. Thus, the final binary representation of edges is obtained. 2.2 Robotic drawing of faces In Sketches by Paul the robot [17], the functioning of the drawing robot Paul is described and some sketches are presented. The set-up contains a robotic arm holding a black Biro pen and a web-cam bolted to the table and can be seen in figure CHAPTER 2. BACKGROUND AND RELATED WORK Figure 2.1. Paul sketching a human subject s face. Reproduced from [17]. The authors use Gabor filters to extract contours, which, in the spatial domain, are Gaussian kernels modulated by a sinusoidal plane wave [10]: where g λ,θ,ϕ,σ,γ (x, y) = exp( x2 + γ 2 y 2 2σ 2 ) cos(2π x λ + ϕ) x = x cos θ + y sin θ y = x sin θ + y cos θ They suggest that Gabor filters are biologically motivated and contribute to the stylising effect similar to human made drawings. Afterwards, edge blobs are extracted to which a medial axis transform is applied (skeleton extraction) in order to make them thinner. The authors also mention a shading behaviour, meant to bring more realism to the drawn figure. Though simplistic, Paul s sketches are aesthetically appealing and the authors credit this to how the basic algorithms are combined. 2.3 Non-photorealistic rendering One of the domains of computer generated drawings is the one where imagination is more important than realism. This is where the idea of non-photorealistic rendering (NPR) comes in. It has been a growing field since the early 90s and provides a wide variety of techniques that can be used to accomplish the line extraction task by combining human-computer interaction, computer vision and graphics. All the following techniques output some form of artistic images from input received as digital photos, which are manipulated through various algorithms. The problem, however, is that the obtained images are very difficult to bring out of the computer screen and onto a piece of paper. A robotic arm has a limited number of degrees of freedom to move and can only hold one pen. All the colours and shading obtained through NPR couldn t be captured in this way. Neither could the precise curves which are a key feature. Therefore, non-photorealistic rendering will remain only an interesting theoretical aspect and is included for completeness. 4 2.3. NON-PHOTOREALISTIC RENDERING Figure 2.2. Taxonomy of SBR techniques. Reproduced from [11] Kyprianidis et al. A very good presentation of the current state of NPR is given by Kyprianidis et al. in State of the Art : A taxonomy of image stylization techniques for images and video [11]. The part of interest is image based artistic rendering (IB-AR), as the initial input is a 2-D image. The first NPR techniques used stroke-based rendering, where the canvas would be painted with primitives called strokes. These could be brush strokes consisting of either short dabs, or long curves placed according to local or global criteria. The content of an image could also be approximated by using other primitives like stipples (small points distributed over a region with the purpose of creating a tonal depiction), or mosaics (tiles patched together). After the year 2000, region-based techniques came into the picture. They used mid-level computer vision algorithms, instead of lower-level ones and the primitives used were whole regions. Example-based techniques imply a texture or colour transfer. Texture transferring was done by filling holes in an image with similar patches and colour transferring was achieved by mapping the colour histograms of the example image and the image being rendered. The classical way of determining the stroke placement was with a function of the Sobel gradient. This led to constant size rectangular strokes that were too regular to give an artistic impression. Hertzmann s idea of curved strokes at different coarse-to-fine scales [8] greatly improved renderings and served as a base for many other techniques. DeCarlo and Santella proposed a segmentation technique by using a variant of mean-shift at multiple down-sampled resolutions [4, 11]. It was semi-automatic as it relied on interaction with a user to observe the location and duration of the viewer s gaze. The important areas were then painted with finer detail. In the case of human faces this step can be replaced by an automatic detection of regions of interest, such as the eyes and mouth and thereafter paint them in finer detail than the rest of the image. This was a step in the direction of artistic rendering as the representations were no longer based on gradient magnitudes but on perceptually important aspects. Another view of NPR is from a local/global standpoint. Most of the algorithms discussed 5 CHAPTER 2. BACKGROUND AND RELATED WORK Figure 2.3. Automatic cartoon stylization pipeline. Reproduced from [20]. so far approach the problem in a local manner. Global techniques place the rendering elements such as to minimize a global objective function. This function however turned out to be tied to the realistic world most of the times and led to a return to photorealism. Machine learning and IB-AR come together in the form of rendering by example [9]. Hertzmann et al. considered that if a computer was presented with a pair of images, A and A and extracted the function that connects them, then, when it would be presented with a new image B, it could create B that is correlated to B in the same way as A was connected to A. This can be done either by colour matching, or texture transfer, the latter being the most common approach Winnemöller et al. In the paper Real-time video abstraction the authors discuss a highly parallel automatic pipeline for cartoon stylization of images and video [20]. The algorithm is depicted in figure 2.3. Firstly, the RGB coordinates are transformed into CIELab coordinates, which are used to better approximate human vision [19]. Bilateral filtering is applied twice, followed by luminance quantization done in parallel with the extraction of edges using difference of Gaussians. The two results are combined and the outcome is converted back to RGB coordinates, thus providing a cartooned version of the original image Hertzmann Aaron Hertzmann describes a method for obtaining an image with a hand painted appearance from a photograph by using a series of spline brush strokes [8]. The procedure follows a pyramid scheme: the bottom of the pyramid is represented by a coarse rendering of the photograph and the higher levels are tied to finer details reserved for important patches of the image. This coarse to fine approach is obtained by using different sizes of brush strokes. Hertzmann also implemented curved brush strokes to obtain a more artistic impression. They are placed according to the image gradient and connect points via a cubic spline. Different styles are also discussed, such as impressionist and expressionist and methods of obtaining them are presented 6 2.3. NON-PHOTOREALISTIC RENDERING Figure 2.4. Two expressionist paintings. Reproduced from [8]. by modifying the parameters of the algorithm. Results obtained by Hertzmann using this method and the impressionist style are presented in figure Chapter 3 Obtaining line drawings The project s principal focus is on finding ways to obtain simplistic but discriminative representations of human faces, referred to as line drawings. This chapter discusses and compares various methods of getting from an input image to a line drawing. All the methods discussed in the following sections are applied in MATLAB and many of them use predefined functions for obtaining some initial starting points. The line drawings are represented as binary images (0 marking the absence of a line pixel and 1 marking the presence of one). They should preferably contain thin, continuous components that could easily be drawn by linearly connecting the pixels of value 1 which make up each component. 3.1 Edge extraction The fundamental way of moving from an average photo to a line drawing is to perform an edge extraction on the grayscale image. Two algorithms are explored further and used throughout the project: the Canny method [1] and the Gabor filters method, employed by the authors of [17] Tresset & Leymarie edges In their work on Paul the robot [17], the authors explore the use of Gabor filters with up to 8 orientations in order to obtain what they call salient lines at different resolutions, starting from coarse and going up to the original scale. The grayscale image is convoluted with the filters, leading to 4 or 8 representations of the different edge orientations (the lowest level of resolution uses only 4 orientations while the other 3 use 8). After thresholding and thinning, these binary images are added together and the salient line figure at that particular scale is produced. The final result does indeed resemble an artist s sketch, as seen in figure 3.1, but it may not be a good candidate for drawing by the robotic arm, as it is quite detailed. An alternative approach could be to use less resolution levels, or less orientations, therefore yielding fewer edges Canny edge detection Another option for the task at hand is the one proposed by Canny in 1986 [1]. Unfortunately, the automatic setting of the hysteresis thresholds led to far too many edges. Through empirical observations, a new value for the high threshold was obtained. Even with this modification, some clutter edges remained, problem that was tackled by combining edge and landmark detection, described in the next section. Figure 3.2 shows how much the extracted edge images can vary by modifying the Gaussian blurring variance and/or the hysteresis threshold. Although it captures edges with high fidelity, there are times when Gabor filters produce better results in terms of human face depiction. Since there is no obvious winner between the 8 3.1. EDGE EXTRACTION Figure 3.1. Results after applying Gabor filters. Each black and white image shows the output from the convolution between the original picture and a filter with a specific orientation. The final image is the sum of the 8 prior ones. two methods, the decision on which one to use must be made at runtime, by t
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks