Segmentation of Cast Shadows from Moving Objects. Søren Gylling Erbou. s September PDF

Description
Segmentation of Cast Shadows from Moving Objects Master of Science Thesis in Electrical and Electronic Engineering (M.Sc.E.E.) Søren Gylling Erbou s September 2004 Section for Electronics and Signal

Please download to get full document.

View again

of 140
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information
Category:

Math & Engineering

Publish on:

Views: 4 | Pages: 140

Extension: PDF | Download: 0

Share
Transcript
Segmentation of Cast Shadows from Moving Objects Master of Science Thesis in Electrical and Electronic Engineering (M.Sc.E.E.) Søren Gylling Erbou s September 2004 Section for Electronics and Signal Processing, ØrstedDTU Technical University of Denmark (DTU) DK-2800 Kgs. Lyngby In cooperation with: The Danish Defence Research Establishment (DDRE) Supervisors: Helge B.D. Sørensen, ØrstedDTU Bjarne Stage, DDRE F-15/2004 Preface This thesis is the result of work carried out at the section for Electronics and Signal Processing, ØrstedDTU, Technical University of Denmark (DTU). The thesis accounts for 30 ECTS units and is a partial requirement for obtaining the degree of Master of Science in Electrical and Electronic Engineering (M.Sc.E.E.). The work has been carried out over a period of six months, in cooperation with the Danish Defence Research Establishment (DDRE) (Forsvarets Forskningstjeneste, FOFT). The thesis is inted as a contribution to reducing the problems introduced by cast shadows, when detecting moving objects in systems for automated video surveillance. It is assumed that the reader has a basic knowledge within the areas of image analysis and statistics. Key owcharts, which are referred to throughout the thesis, are additionally placed in the nal appix F, page 191, for the convenience of the reader. Svanemøllen Kaserne, September 16, Søren Gylling Erbou, s i Acknowledgements Several people have contributed to this thesis with encouragement and support through many fruitful discussions. My supervisor at the DDRE, Bjarne Stage, had the ability to always ask the right questions in times of despair. I owe great debt to Christian Birkemark, who never lacked any interest in discussing minor or major aspects of methods, or in reecting over my writing. Thomas Sams showed great interest in discussing many of the more physics-based aspects of the thesis. Erik Thiesen was an invaluable help during the data acquisition. Torben Christensen and Anders F. Johnsen were always available for discussion of statistical considerations. Additionally, I would like to thank the whole Institut for Sensorsystemer at the DDRE for providing a pleasant and inspiring atmosphere during my stay. My supervisor at DTU, Helge Sørensen, was always supportive and constructive in suggestions for improvements. Furthermore, Omar Javed, University of Central Florida, claried some of the more subtle aspects of his work. Finally I would like to thank my family and fris for always being supportive and for proofreading. Jane, in particular, has been ever patient and encouraging during the whole period. ii Abstract This thesis describes and implements methods for segmentation of cast shadows from moving objects, detected in an outdoor surveillance application. Cast shadows reduce the general ability of robust classication, and tracking, of moving objects in such applications. A data set, consisting of 90 dierent foreground objects including cast shadows, is obtained using a high resolution digital video camera, in a typical surveillance scenario. 18 of the foreground objects constitute a training set used for manually optimizing central parameters. 72 foreground objects constitute the test set, used for validation. A state-of-the-art statistical-based method for handling cast shadows, suggested by Javed et al. [21], is implemented as a reference, and its central parameters optimized using the training set. A physics-based method for shadow removal in still images, suggested by Finlayson et al. [15] and not previously applied in a surveillance application, is examined for use in such an application, but found to be too sensitive when used with a standard dynamic range of 8 bits. Instead an enhanced method for segmentation of cast shadows is suggested, combining an improved color segmentation of regions, with the introduction of an enhanced similarity feature for classication of regions. None of the methods are, in practice, limited by spatial assumptions. Based on the 72 examples of the test set, the enhanced method for shadow removal signicantly improves the mean absolute accuracy (69:2), and mean relative accuracy (14:9), at a 5 signicance level, compared to the reference method, whose mean absolute accuracy is 64:9. The enhanced method ts to improve examples substantially, where the reference method fails completely. Therefore the enhanced method is also more robust than the reference method. iii Resumé I denne afhandling beskrives og implementeres metoder til segmentering af kasteskygger fra objekter i bevægelse, i et system til automatisk udørs videoovervågning. Kasteskygger er et generelt problem i overvågningssystemer, da de har negativ indydelse på den senere klassikation og sporing af objekter. Et datasæt beståe af 90 forskellige forgrundsobjekter med kasteskygge, er blevet optaget med et digitalt videokamera, i et typisk overvågningsscenarie. 18 forgrundsobjekter udgør et træningssæt, der anves til at optimere centrale parametre, og 72 forgrundsobjekter udgør et testsæt, der anves til validering. En state-of-the-art statistik-baseret metode, foreslået af Javed et al. [21], implementeres som referencemodel, og dens ydelse optimeres i fht. centrale parametre, ud fra træningssættet. En fysik-baseret metode, til fjernelse af skygger fra enkelt-billeder og foreslået af Finlayson et al. [15], undersøges også for anvelse i videoovervågning. Denne vurderes at være for følsom ved anvelse af et videokamera med et standard dynamikområde på 8 bits. I stedet for foreslås en forbedret metode til segmentering af kasteskygger, som kombinerer en bedre farvesegmentering med indførelsen af en ny egenskab til klassikation. Ingen af metoderne er i praksis begrænset af spatiale antagelser om sammensætningen af forgrundsobjekterne. På baggrund af træningssættet viser den forbedrede metode en signikant forbedring i absolut middel-nøjagtighed (69:2), og i relativ middel-nøjagtighed (14:9), sammenlignet med referencemetoden, hvis absolute middel-nøjagtighed er 64:9. Den forbedrede metode giver en meget stor forbedring i tilfælde hvor referencemetoden fejler fuldstændigt, hvorfor den forbedrede metode derfor også er mere robust referencemetoden. iv Contents Preface Acknowledgements Abstract Resumé Contents i ii iii iv v 1 Introduction Motivation Objectives System Specications Thesis Overview Related Work Computer Vision in Video Surveillance W 4 - A System For Automated Video Surveillance Shadow Removal in General Statistical-Based Shadow Removal Hsieh et al Javed et al Physics-Based Shadow Removal Nadimi et al Finlayson et al Comparison Summary Data Acquisition Camera Data Sets Summary v vi CONTENTS 4 Finlayson's Approach Using a Video Camera Spectral Sensor Functions Color Calibration Summary Implementation and Optimization Background Modelling and Noise Reduction Measuring Performance Javed's Method Improving Javed's Method Finlayson's Shadow Removal Applied for Surveillance Detecting Edges due to Shadows Reconstructing the RGB-image without Shadows Enhanced Shadow Removal Enhanced Similarity Feature Applying the Enhanced Similarity Feature Summary Validation and Comparison Absolute Performance Relative Performance Comparison Per-Example (Binomial) Comparison of Means (Paired t-test) Absolute Means Relative Means Summary Discussion Results Limitations Data Acquisition Javed's method Finlayson's method Enhanced method Future Work Perspectives Conclusion Implementation of State-of-the-Art Reference Method Improving Reference Method Applying Physics-Based Method Final Results Contributions CONTENTS vii List of Figures 75 List of Tables 77 Bibliography 78 Appices 81 A Macbeth Color Chart 81 B Data Sets - Foreground Objects to Classify 82 B.1 Training Set B.2 Test Set C Additional Figures 97 C.1 Detecting Shadow Edges from Illumination-Invariants D Additional Results 100 D.1 Performance of Training Set D.2 Performance of Test Set E Matlab Routines 131 E.1 MakeFiles.m E.2 BayerGR_fast.m E.3 main01_sge.m E.4 ChooseFrames.m E.5 Noise_Reduction_SGE.m E.6 Do_Shadow.m E.7 DataSets.m E.8 Javed.m E.9 JavedImproved.m E.10 MergeRegions.m E.11 Finlayson_Ill_Inv.m E.12 Finlayson_FGmask.m E.13 SolvePoisson.m E.14 EnhancedSegmentation.m E.15 DetectVariance.m E.16 Performance.m E.17 Compare.m E.18 PlotComparison.m E.19 PlotPerformance.m E.20 Calibration.m E.21 CropPNG.m E.22 OptimizeJaved.m E.23 OptimizeJavedImproved.m viii CONTENTS E.24 OptimizeEnhanced.m F Flowcharts 191 Chapter 1 Introduction For several decades video cameras have been a popular means for crime solving by surveillance. Conventional surveillance applications require an operator to determine when action is needed. A single operator can only monitor a limited amount of scenes simultaneously, for a limited amount of time, because the process of manual surveillance becomes tedious. The introduction of digital video cameras, and recent advances in computer technology, make it possible to apply (semi-)automated processing steps to reduce the amount of data presented to the operator. This way the amount of trivial tasks are reduced, and the operator can focus on a correct and immediate interpretation of the activities in a scene. In recent years the main attention has been on surveillance applications where it is necessary to take immediate action because human lives or installations of vital interest are at stake. This could e.g. be scenarios where a terrorist leaves a bag containing a bomb in a scene, or perimeter surveillance where it is crucial to detect unwanted intrusion. In such surveillance applications it is vital to ensure a consistent way of monitoring and registration of objects of interest. Automated or semi-automated video surveillance are steps in this direction, since they are capable of monitoring larger scenes over a longer period of time. The Danish Defence Research Establishment (DDRE) is currently focusing part of it's research on implementing a system for automated video surveillance. The main objectives of the DDRE are to gain general knowledge in this area, and eventually implement an automated surveillance application that is capable of detecting, tracking and classifying moving objects of interest. At this point the DDRE has carried out some initial studies [28, 18] in testing and implementing parts of the W 4 -system [19] for automated video surveillance. The W 4 - system eectively detects moving objects, tracks them through simple occlusions (blocking of the view), classies them and performs an analysis of their behavior. This procedure corresponds well to the system that the DDRE would like to implement, and therefore the W 4 has been chosen as a primary reference. One limitation of W 4 is that the tracking, classication and analysis of objects fails when large parts of the moving objects are actually cast shadows. Distinguishing between cast shadows and self shadows is crucial for the further analysis 1 2 CHAPTER 1 INTRODUCTION of moving objects in a surveillance application. Self shadows occur when parts of an object are not illuminated directly, but only by diuse lighting. Cast shadows occur when the shadow of an object is cast onto background areas, cf. gure 1.1. The latter are a major concern in today's automated surveillance systems because they make shape-based classication of objects very dicult. Furthermore cast shadows can make objects that interact dicult to track. Figure 1.1: Types of shadows. Self shadow is shadow on the object itself, a person in this case. Cast shadow is the shadow cast onto the background. 1.1 Motivation Cast shadows in outdoor scenarios are very likely to occur, and the problem of cast shadows in surveillance applications, is yet to be solved in general. Several approaches have been tried, but they all are limited by context depent threshold optimized for specic applications and data sets. The DDRE surveillance application also lacks a robust shadow handling for the moving objects detected. In [18], Hansen implements and improves upon a method for cast shadow removal based on work by Hsieh et al. [20]. The use of the method is limited to people in standing posture, because of some initial spatial assumptions of the composition of objects. For instance it often fails to segment cast shadows from vehicles. This makes the method less useful if the outdoor environment to be monitored, contains roads or parking lots, as required by the DDRE. Javed et al. [21] use a statistical approach for segmenting foreground pixels darker than a reference image into cast shadow, self shadow and object pixels darker than the background. This method is considered state-of-the-art in surveillance applications but still faces fundamental problems concerning some very context depent parameters. Finlayson et al. [15] use a physics-based approach to derive an illumination invariant (therefore shadow free) gray-scale image of an RGB image. From this image the original RGB image, without shadows, is derived. Finlayson's approach is aimed at shadow elimination in general in images obtained with a standard digital still camera. Due to assumptions in the model, and in the derivation of the shadow free RGB image, the method is far from perfect, but shadows are attenuated signicantly. The method has 1.2 OBJECTIVES 3 not been applied in a surveillance application yet. The topic of the present thesis is therefore based on the need for a more robust way of dealing with cast shadows in surveillance applications. 1.2 Objectives The main objective is to contribute to the design of an overall system for automated outdoor video surveillance. More specically the focus is on methods for robust segmentation of cast shadows from moving objects. An overview of recent methods for shadow removal is given, with emphasis on two fundamentally dierent appoaches: A statistical approach suggested by Javed et al. [21] and a physics-based approach suggested by Finlayson et al. [15]. Both methods are studied in detail and are implemented in Matlab [23]. In order to evaluate and compare methods, a data set consisting of images typical of the environment that the DDRE wishes to monitor, is acquired. Finlayson's approach has not previously been applied in a surveillance application or when using a digital video camera. Using such a setup, Finlayson's approach is examined to determine it's applicability. Javed's statistical approach is considered state-of-the-art and is optimized with respect to a training set and chosen as a reference (J ). Then an improved version of Javed's approach (I ) is suggested based on the results from the training set. Finally Finlayson's ideas are combined with Javed's improved approach in an enhanced algorithm for shadow removal (E). The three methods (J,I and E) are then compared to each other using a test set, to determine if there are any statistically signicant improvements in performance and from where such improvements might originate. 1.3 System Specications Several specications for a system for shadow removal are outlined by the DDRE and the author to encompass a suitable master thesis. The focus of the thesis is on applications using a single camera, for which reason a single digital video camera should be used to obtain the data set used to train and test the methods. The data set should represent objects that are relevant in reference to the present DDRE application, i.e. vehicles, people and bicycles. Input for the shadow removal algorithm are the moving foreground pixels detected by the algorithm implemented by Hansen [18], for the DDRE. These consist of both object pixels and cast shadow pixels (cf. gure 1.1). The segmentation of pixels should not be limited by any spatial assumptions of the object, since this would limit the object types that the method can handle. The implementation of Javed's method is used as a reference, since it is considered a state-of-the-art method for shadow removal. From an analysis of the reference method and Finlayson's ideas for shadow removal, the enhanced method for shadow removal should result in an increased performance. Finally, the data set used for comparing 4 CHAPTER 1 INTRODUCTION the methods should be of an appropriate size to ensure statistical signicance at a 5 level, when interpreting the results. 1.4 Thesis Overview Chapter 2 gives an introduction to computer vision in automated video surveillance with the W 4 -system as the key reference. Several approaches for shadow removal are compared, with emphasis on the statistical approach by Javed et al. and the physicsbased approach by Finlayson et al. Chapter 3 describes the equipment used for data acquisition and the data sets used for training and validation of the methods. In chapter 4 Finlayson's ideas for shadow removal are examined for the digital video camera at hand, i.e. the illumination invariant image is estimated including a color calibration of the camera. Chapter 5 gives a brief overview of the implementation of the background model and detection of foreground objects. Then performance measures are introduced leading to the implementation and optimization of Javed's shadow removal (J ), which is used as a reference. Using the training set an improved version of Javed's method is suggested (I ). Finally the latter is combined with some of Finlayson's ideas for shadow removal in an enhanced algorithm (E). In chapter 6 the three methods are applied on a test set and compared to each other. Chapter 7 discusses the results obtained and how they should be interpreted. It also summarizes the work of the thesis and gives proposals for future work. Chapter 8 is the nal conclusion. Appices contain supplementary gures, results and Matlab routines. The nal appix F, page 191, contains the key owcharts, for the convenience of the reader. Chapter 2 Related Work In order to make appropriate decisions on how to design a robust system for shadow handling, that corresponds to the intentions of the DDRE, a detailed study of important previous work is presented in this chapter. Both work related to video surveillance in general, and shadow removal in particular, are described. 2.1 Computer Vision in Video Surveillance Computer Vision is a broad term covering a range of applications. When applied in surveillance tasks, it usually consists of one or more of the following parts: Object detection, tracking, classication and/or analysis. The use varies from trac monitoring through video conference applications to use in security systems. In the relevant literature several approaches have been tried in order to obtain robust and good results in the highly complex task of interpreting video sequences. Outdoor surveillance in particular is a dicult task because of non-stationary conditions imposed by various types of weather, time of day, season, etc. Therefore the best performance is achieved for specialized systems, using a lot of a priori information and assumptions. The drawback being the inability to apply the methods in other situations. Moeslund et al. [25] make a comprehensive survey (2001) of computer vision systems for human motion capture. Systems are divided into three application areas: Surveillance, control and analysis. Surveillance tasks usually take place in uncontrolled outdoor environments, requiring a high degree of robustness. As a state-of-the-art example, the W 4 system [19] is emphasized, cf. section 2.2. It uses a robust monocular 2D-approach, and deals with all of the previously mentioned aspects of surveillance. Control applications are characterized by an increasing number of assumptions, typically in indoor scenes, concerning e.g. gesture recognition etc. Complex 2D or 3D human models are often introduced, e.g. Pnder/SPnder by Wren et al. [39]. Analysis applications are even more specialized, typically for clinical use, and therefore of no relevance within the present framework. Other interesting work within the area is perfo
Related Search
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks