Plate Extraction by Flavio Bernardotti

Automatic recognition of license plates

Study group IN6-621 Henrik Hansen, Anders Wang Kristensen, Morten Porsborg Køhler, Allan Weber Mikkelsen, Jens Mejdahl Pedersen and Michael Trangeled May 26, 2002

Det Teknisk-Naturvidenskabelige Fakultet Aalborg Universitet

Institut for Elektroniske Systemer

TITEL: Automatisk genkendelse af nummerplader PROJEKTPERIODE: 5. februar - 31. maj, 2002

PROJEKT GRUPPE: IN6-621

GRUPPEMEDLEMMER: Henrik Hansen Anders Wang Kristensen Morten Porsborg Køhler Allan Weber Mikkelsen Jens Mejdahl Pedersen Michael Trangeled

VEJLEDER: Thomas Moeslund

SYNOPSIS: Denne rapport beskriver analyse, design samt test af et system til automatisk genkendelse af nummerplader, tænkt som et delsystem til en automatisk fartkontrol. Input til systemet er en serie farvebilleder af køretøjer i bevægelse, og output består af nummerpladens registreringsnummer. Fremskaffelsen af de ønskede informationer sker i tre dele. Først udtrækkes nummerpladen fra det samlede billede, derefter adskilles de syv tegn fra hinanden, og til sidst genkendes de enkelte karakterer ved brug af statistisk mønstergenkendelse samt korrelation. Algoritmerne blev udviklet ved hjælp af et sæt træningsbilleder, og testet på billeder taget under varierende forhold. Det færdige program er i stand til at

ANTAL KOPIER: 9 RAPPORT SIDEANTAL: 123 APPENDIKS SIDEANTAL: 13 TOTAL SIDEANTAL: 136

uddrage de ønskede informationer i en høj procentdel af testbillederne.

Faculty of Engineering and Science Aalborg University

Institute of Electronic Systems

TITLE: Automatic recognition of license plates SYNOPSIS: PROJECT PERIOD: February 5. - May 31. 2002

PROJECT GROUP: IN6-621

GROUP MEMBERS: Henrik Hansen Anders Wang Kristensen Morten Porsborg Køhler Allan Weber Mikkelsen Jens Mejdahl Pedersen Michael Trangeled

SUPERVISOR: Thomas Moeslund

This report describes analysis, design and implementation of a system for automatic recognition of license plates, which is considered a subsystem for automatic speed control. The input to the system is a series of color images of moving vehicles, and output consists of the registration number of the license plate. Extraction of the desired information is done in three steps. First, the license plate is extracted from the original image, then the seven characters are isolated, and finally each character is identified using statistical pattern recognition and correlation. The algorithms were developed using a set of training images, and tested on images taken under varying conditions.

NUMBER OF COPIES: 9 REPORT PAGES: 123 APPENDIX PAGES: 13 TOTAL PAGES: 136

The final program is capable of extracting the desired information in a high percentage of the test images.

Preface

This report has been written as a 6th semester project at the Institute of Electronic Systems at Aalborg University. The main theme of the semester is gathering and description of information, and the goal is to collect physical data, represent these symbolically, and demonstrate techniques for processing these data. The report mainly applies to the censor and supervisor, along with future students at the 6th semester in Informatics. This report includes analysis, design and test of a system designed to automatically recognize license plates from color images. Source code and the corresponding executable program are included on the attached CD (See Appendix C for full contents of the CD, as well as instructions of use). We would like to thank the Aalborg Police Department for information used in the report, and for their guided tour of the present speed control system. Also, we would like to thank our supervisor of this project, Thomas Moeslund.

——————————— Henrik Hansen

——————————— Anders Wang Kristensen

——————————— Morten Porsborg Køhler

——————————— Allan Weber Mikkelsen

——————————— Jens Mejdahl Pedersen

——————————— Michael Trangeled

Contents

Introduction

Analysis

1 Traffic control 1.1 Current system . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Disadvantages of the current system . . . . . . . . . . . 1.2 Improved system . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 The components of the system . . . . . . . . . . . . . . . . . . . 1.3.1 Differences from the existing system . . . . . . . . . . . . 1.3.2 Project focus . . . . . . . . . . . . . . . . . . . . . . . . 1.4 License plates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.3 Mounting and material . . . . . . . . . . . . . . . . . . . 1.5 System definition . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Delimitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.1 License plate formats . . . . . . . . . . . . . . . . . . . . 1.6.2 Video processing . . . . . . . . . . . . . . . . . . . . . . 1.6.3 Identification of driver . . . . . . . . . . . . . . . . . . . 1.6.4 Transportation of data . . . . . . . . . . . . . . . . . . . 1.6.5 Quality of decisions . . . . . . . . . . . . . . . . . . . . .

17 17 18 19 21 22 23 23 23 24 24 25 25 25 25 26 26 26

Design

2 License plate extraction 29 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

CONTENTS

2.2

Hough transform . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Edge detection . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Line detection using Hough transform . . . . . . . . . . . 2.2.3 Line segment extraction . . . . . . . . . . . . . . . . . . 2.2.4 Candidate region extraction . . . . . . . . . . . . . . . . 2.2.5 Strengths and weaknesses . . . . . . . . . . . . . . . . . Template matching . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Template construction . . . . . . . . . . . . . . . . . . . 2.3.2 Cross correlation . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Normalized cross correlation . . . . . . . . . . . . . . . . 2.3.4 Strengths and weaknesses . . . . . . . . . . . . . . . . . Region growing . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 License plate features . . . . . . . . . . . . . . . . . . . . 2.4.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Extracting the regions . . . . . . . . . . . . . . . . . . . 2.4.4 Strengths and weaknesses . . . . . . . . . . . . . . . . . Combining the method . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Scale invariance . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Rotation invariance . . . . . . . . . . . . . . . . . . . . . 2.5.3 Lighting invariance . . . . . . . . . . . . . . . . . . . . . 2.5.4 Summarizing the differences . . . . . . . . . . . . . . . . 2.5.5 Candidate selection . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30 31 33 35 36 36 37 37 39 41 41 42 42 43 44 46 46 47 47 47 48 48 50

3 Character isolation 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Isolating the characters . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Static bounds . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Pixel count . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Connected components . . . . . . . . . . . . . . . . . . . 3.2.4 Improving image quality . . . . . . . . . . . . . . . . . . 3.2.5 Combined strategy . . . . . . . . . . . . . . . . . . . . . 3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51 51 52 52 53 53 55 56 58

4 Character identification 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Template matching . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Statistical pattern recognition . . . . . . . . . . . . . . . . . . . 4.4 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

59 59 60 60 61 62

2.3

2.4

2.5

2.6

CONTENTS

4.5 4.6

4.7 4.8

III

4.4.2 End points . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Circumference . . . . . . . . . . . . . . . . . . . . . . . . 4.4.4 Compounds . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.5 The value of each pixel . . . . . . . . . . . . . . . . . . . Feature space dimensionality . . . . . . . . . . . . . . . . . . . . 4.5.1 Result using SEPCOR . . . . . . . . . . . . . . . . . . . Decision theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Bayes decision rule . . . . . . . . . . . . . . . . . . . . . 4.6.2 Discriminant functions . . . . . . . . . . . . . . . . . . . 4.6.3 Test of normal distribution . . . . . . . . . . . . . . . . . 4.6.4 Parameter estimation . . . . . . . . . . . . . . . . . . . . Comparing the identification strategies . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Test

62 65 65 67 67 68 68 69 70 75 79 81 82

5 Extraction test 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Region growing and Hough transform . . . . . . . . . . . . . . . 5.2.1 Criteria of success . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Test data . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Test description . . . . . . . . . . . . . . . . . . . . . . . 5.2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Criteria of success . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Test data . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Test description . . . . . . . . . . . . . . . . . . . . . . . 5.3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Combined method . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Criteria of success . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Test data . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.3 Test description . . . . . . . . . . . . . . . . . . . . . . . 5.4.4 Result of test . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85 85 85 85 86 86 87 88 88 88 88 89 90 90 90 90 90 91

6 Isolation test 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Criteria of success . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Test description . . . . . . . . . . . . . . . . . . . . . . . . . . .

93 93 93 93

CONTENTS

6.4 6.5 6.6 6.7 6.8

Test data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Connected components . . . . . . . . . . . . . . . . . . . . . . . Pixel count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Static Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . Combined method . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . .

94 94 95 96 97 97

7 Identification test 99 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 7.2 Criteria of success . . . . . . . . . . . . . . . . . . . . . . . . . . 99 7.3 Test data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 7.4 Feature based identification . . . . . . . . . . . . . . . . . . . . 100 7.4.1 Test description . . . . . . . . . . . . . . . . . . . . . . . 100 7.4.2 Result of test using Euclidian distance . . . . . . . . . . 100 7.4.3 Result of test using Mahalanobis distance . . . . . . . . 102 7.5 Identification through correlation . . . . . . . . . . . . . . . . . 104 7.5.1 Test description . . . . . . . . . . . . . . . . . . . . . . . 104 7.5.2 Result of test . . . . . . . . . . . . . . . . . . . . . . . . 104 7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 8 System test 107 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 8.2 Criteria of success . . . . . . . . . . . . . . . . . . . . . . . . . . 107 8.3 Test data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 8.4 Test description . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 8.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

Conclusion

111

9 Conclusion

113

115

Appendix

A Videocamera 117 A.1 Physical components . . . . . . . . . . . . . . . . . . . . . . . . 118 B Visiting the police force 121 B.1 The system at the precinct . . . . . . . . . . . . . . . . . . . . . 121

CONTENTS

B.2 The system in the field . . . . . . . . . . . . . . . . . . . . . . . 122 B.3 Evaluation of the current system . . . . . . . . . . . . . . . . . 123 C Contents of CD 125 C.1 The program . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Literature

127

List of Figures

1.1 1.2 1.3 1.4 1.5 1.6

Equipment inside the mobile control station . . . . . . . . . . . Workflow of current system . . . . . . . . . . . . . . . . . . . . Workflow of the improved system . . . . . . . . . . . . . . . . . The tasks of the system . . . . . . . . . . . . . . . . . . . . . . Data abstraction level . . . . . . . . . . . . . . . . . . . . . . . A standard rectangular Danish license plate . . . . . . . . . . .

18 19 20 21 22 24

2.1 Image used for examples in Chapter 2 . . . . . . . . . . . . . . . 2.2 Overview of the Hough transform method . . . . . . . . . . . . 2.3 Horizontal and vertical edges . . . . . . . . . . . . . . . . . . . . 2.4 Example image and corresponding Hough transform . . . . . . . 2.5 Example license plate and reconstructed line segments . . . . . 2.6 Images with different lighting . . . . . . . . . . . . . . . . . . . 2.7 Template created through opacity-merging of the probabilities . 2.8 The layout of the cross correlation . . . . . . . . . . . . . . . . . 2.9 The steps of region growing . . . . . . . . . . . . . . . . . . . . 2.10 Binary image . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.11 Largest contained rectangle . . . . . . . . . . . . . . . . . . . . 2.12 Rectangle from maximum and minimum values . . . . . . . . . 2.13 Selection of the most probable candidate . . . . . . . . . . . . .

31 32 33 34 35 38 39 40 43 44 45 45 49

3.1 3.2 3.3 3.4 3.5 3.6

Isolating characters . . . . . . . . . . . . . . . . . . . . . . . . . Example showing statically set character bounds . . . . . . . . . Horizontal projections of the actual characters . . . . . . . . . . An iteration in the process of finding connected components . . Horizontal and vertical projections . . . . . . . . . . . . . . . . Combination of character isolation procedures . . . . . . . . . .

52 52 53 54 55 57

4.1

Projections of all the digits . . . . . . . . . . . . . . . . . . . . . 62

LIST OF FIGURES

4.2 The neighbors of p1 . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Example where the number of 0-1 transitions is 2, Z(p1) = 2. . 4.4 The steps of the thinning algorithm . . . . . . . . . . . . . . . . 4.5 Result of the thinning algorithm . . . . . . . . . . . . . . . . . 4.6 Two digits with different circumferences. . . . . . . . . . . . . . 4.7 Compound example . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Result of applying a Laplacian filter. . . . . . . . . . . . . . . . 4.9 Number of features used based on maximum correlation . . . . . 4.10 Decision regions . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11 The covariance matrix equals the identity. . . . . . . . . . . . . 4.12 The features are statistically independent. . . . . . . . . . . . . 4.13 Positive and negative correlation. . . . . . . . . . . . . . . . . . 4.14 Histogram plot showing approximately a normal distribution . . 4.15 Normal plot produced in Matlab . . . . . . . . . . . . . . . . . . 4.16 χ2 -test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

63 64 64 65 65 66 66 69 71 73 73 74 76 76 77

5.1

Examples of falsely accepted regions

. . . . . . . . . . . . . . . 89

6.1 6.2

Example of unsuccessful and successful isolations . . . . . . . . 94 Plate that fails, using connected components . . . . . . . . . . . 95

7.1 7.2 7.3 7.4 7.5 7.6

Identification percentage plotted against number of features. . . 100 Error in character identification . . . . . . . . . . . . . . . . . . 101 Result of identification using training set as training . . . . . . . 103 Result of identification using test set as training . . . . . . . . . 104 The size problem illustrated. . . . . . . . . . . . . . . . . . . . . 105 The brightness problem exaggerated. . . . . . . . . . . . . . . . 106

8.1 8.2

Good quality license plates wrongly identified . . . . . . . . . . 109 Poor quality license plates correctly identified . . . . . . . . . . 109

A.1 Color aliasing using the Bayer pattern . . . . . . . . . . . . . . 118 A.2 CCD-chip with registers . . . . . . . . . . . . . . . . . . . . . . 119

List of Tables

1.1 1.2

Differences between the systems . . . . . . . . . . . . . . . . . . 23 Areas of the improved system addressed in this project . . . . . 23

2.1 2.2 2.3 2.4 2.5 2.6 2.7

Strengths of the Hough transform method . . . . . . . . . . . . Weaknesses of the Hough transform method . . . . . . . . . . . Strengths of the template method . . . . . . . . . . . . . . . . . Weaknesses of the template matching method . . . . . . . . . . Strengths of the region growing method . . . . . . . . . . . . . . Weaknesses of the region growing method . . . . . . . . . . . . Demands for license plate extraction method . . . . . . . . . . .

3.1

Strengths of different methods . . . . . . . . . . . . . . . . . . . 56

4.1

Class distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.1 5.2 5.3

Results for region growing and Hough transform. . . . . . . . . 87 Region selection by correlation. . . . . . . . . . . . . . . . . . . 89 Test of selecting most probable region . . . . . . . . . . . . . . . 90

6.1 6.2 6.3 6.4

Result from the connected component test . . . . . . . . . . . . Result from the pixel count test . . . . . . . . . . . . . . . . . . Result from the static bounds test . . . . . . . . . . . . . . . . . Result from the combined method . . . . . . . . . . . . . . . . .

7.1 7.2 7.3 7.4 7.5

Result of the test on the test set . . . . . . . . . . . . . . . . . . 101 Result of the test on the training set . . . . . . . . . . . . . . . 101 Result of the tests on the training sets . . . . . . . . . . . . . . 102 Result of the tests . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Identifying the characters using templates. . . . . . . . . . . . . 105

37 37 42 42 46 46 48

95 96 96 97

8.1

Overall result for the system . . . . . . . . . . . . . . . . . . . . 108

9.1

Main results obtained. . . . . . . . . . . . . . . . . . . . . . . . 114

Introduction

Many traffic accidents in Denmark are caused by high speed, and reports show that speed control is reducing the speed, because it has a deterrent effect on the drivers [8]. Therefore a new and more efficient way of speed control has been introduced. It is a partially automatic control where a photo of the offender is taken, instead of the traditional method where the police had to pull over each driver. Although the current system is far more effective than the old system, it still has some shortcomings. The goal of this project is to investigate the possibility of creating an improved system which alleviates some of these shortcomings. We want to examine the possibility of automating the work flow of the system. The reason behind this particular choice of philosophy is that the major disadvantage of the current system is the use of manual labor to accomplish tasks that we believe nowadays could be handled just as efficiently by computers. A secondary goal of this project is to explore the possibility of replacing parts of the costly equipment currently used with more reasonably priced equipment.

Part I Analysis

Chapter

Traffic control The goal of this chapter is to create a system definition which describes the system we want to develop. To identify the shortcomings of the current partially automatic speed control system, and to identify the parts of the system which arguably can be further automated, both the current system and a proposal for an improved system will be described in this chapter.

1.1

Current system

In the current system, one police officer is required to serve an automatic speed control station. The equipment is placed in anonymous vehicles and can easily be moved. This mobility makes it possible to monitor traffic in larger areas with the same equipment. When setting up the equipment at a new location, several routines must be performed before the automatic control can take place. The camera must be manually calibrated, and the system needs input such as the allowed speed of the monitored road. Figure 1.1 shows the equipment inside the control station. The picture was taken during an excursion with Aalborg Police Department [7], which is described in Appendix B. When a vehicle approaches the control station, the speed is measured by radar and if the vehicle is speeding, a photo is automatically taken. The photo shows a front view of the vehicle, so that the driver and license plate can be identified. For each picture taken, the speed of the vehicle is stored on a portable PC for later reference. The operator only takes action when special vehicles such as busses pass. When this occurs, the system must be told that another speed limit applies for the given vehicle. The film with the offenders is developed at a central laboratory and afterwards a manual registration of the license plates is made, and the driver of the car receives a fine. The registration is performed in the police district where

Chapter 1. Traffic control

Figure 1.1: Equipment inside the mobile control station

the pictures were originally taken. Here they receive a digitized version of the processed film. The entire procedure is illustrated in Figure 1.2. Two persons are required for the manual registration of the license plates. The first person uses a software tool for optimizing the pictures, so that driver and plate appears clearly. Then the license plate is registered along with the speed of the vehicle for each picture. If the vehicle was not speeding or if the face of the driver does not appear clearly, the picture is discarded. The second person verifies that the characters of the license plate have been correctly identified. If the vehicle is identifiable, a fine is issued to the offender. The entire process takes about 3-4 days from the picture is taken, until the fine is received by the offender.

1.1.1

Disadvantages of the current system

Although the current system provides efficiency far superior to that of manually pulling over vehicles, there are certain aspects that prevent the system from being optimal. For one, the pictures are initially taken on regular film, and later digitized. This means transport to and from the central laboratory, as well as the work required to digitize the pictures. Besides, the camera and speed measurement equipment is rather expensive. A fully equipped vehicle costs approximately 1.000.000 DKR [7]. Compared to the initial cost of the

1.2 Improved system

100 km/t 50 km/t

RL 39 312 0 km/t

Manual verification

Manned control station

Radar

Camera

Manual

registration

Central development laboratory

Figure 1.2: Workflow of current system

vehicle, this is an extra 700.000 DKR for the speed control equipment itself. Apart from the cost of materials, the manual registration of license plates requires that the personnel is trained in the use of the previously mentioned software tool.

1.2

Improved system

This section introduces the solution we propose, describes its architecture and shows how it differs from the existing system. Figure 1.3 shows the steps of the improved system. Basically the system can be divided into two parts. The first involves monitoring the scene by recording a video sequence of the road. The second part is the automated analysis of the gathered information. The righthand side of Figure 1.3 shows this situation with a camera placed near the side of the road recording the cars passing by. This camera could either be temporarily placed as part of a time limited surveillance post or the camera could be permanently placed. The second part of the mode of operation is an off-line analysis of the video sequence, where potential traffic offenders are identified. This is done using

Chapter 1. Traffic control

100 km/t 50 km/t

RL 39 312 0 km/t

Automatic registration

Unmanned control station

Video sequence

Figure 1.3: Workflow of the improved system

image analysis on a computer using the sequence as input. The lefthand side of Figure 1.3 shows this part. The purpose of the image analysis is two-fold; first it should be determined for each car in the sequence if an actual traffic violation took place. This means determining the speed of the vehicles based on the video sequence. Secondly any offenders should be identified by reading the license plate of their cars, so that they later can be held responsible. Developing reliable systems for extracting the information needed for speed estimation and vehicle identification is very important. In the case of estimating the speed of vehicles, the consequence of reaching the conclusion that a vehicle drove too fast, when in reality it did not, is much greater than letting a real offender slip by. In the case of identifying license plates it is even worse if a wrong conclusion is reached, since this would mean that an innocent person would be fined for the actions of someone else. Therefore, the algorithms must be designed to refuse to make such a decision if it is estimated that the probability of being correct is below some predetermined threshold. An alternative would be a manual decision in cases of doubt. Since the current system requires a manual

1.3 The components of the system

verification, it is reasonable to assume that the person could perform the same task, but verifying the computers output instead of that of a co-worker.

1.3

The components of the system

Figure 1.4 shows the details of the latter of the two tasks described in Section 1.2. The first task, the recording of the video sequence, is not shown. The recorded video sequence is considered input to the system. For a brief introduction to digital cameras and some basic optics, Appendix A will give the reader an understanding of image-related concepts such as resolution, as well as some insight into the physical components of the camera. Video sequence

Get next subsequence containing a vehicle

Identify characters

Isolate characters Determine speed Extract license plate

Select optimal frame No

Yes Speed violation?

Figure 1.4: The tasks of the system

Basically the system consists of a sequence of 6 tasks, that are ideally carried out once for each vehicle in the film. The first task is to extract the next subsequence in the film containing the next vehicle. This subsequence is then analyzed and the speed of the vehicle is determined. In the case where no speed violation took place, no further processing is necessary, and the next subsequence is fetched for processing. If a violation took place, the license plate of the vehicle must be identified. The part of the system responsible for this is shown in the righthand side of Figure 1.4 in the dashed box. The input to this subsystem is a video sequence

Chapter 1. Traffic control

with a vehicle exceeding the speed limit. The subsystem consists of four tasks; the first task selects the optimal frame of the incoming video sequence. The second task then extracts the region believed to contain the license plate. The third task isolates the seven characters and the last task identifies the individual characters. The seven characters of the license plate

Identify characters 7 images with the characters Isolate characters Subimage containing only the license plate Extract license plate Single image Select optimal frame Input video sequence

Figure 1.5: Data abstraction level

Figure 1.5 shows the four tasks of the system. The right half of the figure shows the input and output of these steps all the way from the input video sequence to the final identified characters of the license plate. Alternatively this progression could be viewed as the reduction or suppression of unwanted information from the information carrying signal, here a video sequence containing vast amounts of irrelevant information, to abstract symbols in the form of the characters of a license place.

1.3.1

Differences from the existing system

The architecture introduced above differs from the existing system in several aspects. The two most important differences are mentioned here. The first difference is the fact that the manual identification of the vehicle’s license plate has been eliminated, and is now carried out automatically by a computer. The second difference is that the expensive still frame camera and radar distance measurer have been replaced by a video camera, and therefore a later video processing step must be introduced for determining the speed of the vehicles. Table 1.1 shows a summary of the differences between the existing system, and the system that will be investigated in this report.

1.4 License plates

Topic Current system Improved system Speed determination Radar Postprocessing step Plate identification Manually Automatically Table 1.1: Differences between the systems

1.3.2

Project focus

In this project we delimit ourselves from the steps before extracting the license plate in the best suited frame. The entire issue of whether or not the vehicles are speeding will not be addressed in this project. The remainder of this report is only concerned with developing algorithms for extracting the information needed to identify the characters. The input is an image of a vehicle assumed to be speeding. Component Speed - Recording video - Speed determination Recognition - Selecting optimal frame - Extracting license plate - Isolating characters in plate - Identifying characters

In project

✓ ✓ ✓

Table 1.2: Areas of the improved system addressed in this project

Before summing up the requirements to the system, an essential component of the system, the Danish license plate, will be described, in order to establish the characteristics that will later form a basis for important design choices.

1.4

License plates

The system for catching speed violators depends primarily on recognizing the front side license plates. This section describes the characteristics for a standard Danish license plate based on the reference guide for Danish vehicle inspection [2].

1.4.1

Dimensions

All license plates must be either rectangular or quadric shaped. The dimensions of the plate can vary, depending on the vehicle type, but the most common

Chapter 1. Traffic control

license plate is the rectangular shaped shown in Figure 1.6. The plate has the dimensions (w × h) 504 × 120 mm and has all 7 characters written in a single line.

Figure 1.6: A standard rectangular Danish license plate

Some vehicles uses ‘square’ plates with characters written in two lines, but according to the guideline for vehicle inspection [2], all front side plates must be rectangular shaped, and therefore this will be the only license plate shape that this project will focus on. Other license plate dimensions are allowed for different vehicles such as tractors and motorbikes.

1.4.2

Layout

The front side license plate on all Danish vehicles is either white enclosed by a red frame, or yellow enclosed by a black frame. All standard license plates have two capital letters ranging from A to Z, followed by five digits ranging from 0 to 9. Beside the standard license plate, there is an option for buying a so called wish-plate or customized license plates, where the vehicle owner can choose randomly from 1 to 7 characters or digits to fill the plate. All characters and digits are written in black, both on yellow and white license plates.

1.4.3

Mounting and material

The front side license plate must be mounted horizontally, and in upright position seen from the side of the car. In other words, the textual part must face directly forward. The plate may not be changed in shape, decorated or embellished, and the plate may not be covered by any means. Bolts used for mounting may not affect the readability of the plate and must be painted in the same color as the mounting point on the plate. The reference guide says nothing about the material, but generally the plates are made of a reflecting material for increased readability.

1.5 System definition

1.5

System definition

The primary goal of this project, is to investigate how the existing system for catching speed violators can be automated, using a video sequence from an inexpensive video camera. Once the video sequence has been recorded, the analysis can be divided in two major parts. The first part is to have the system determine the speed of the vehicle in question. If it is determined that a speed violation took place, the second part of the analysis is carried out. The second part is the recognition of the license plate. The extraction of the license plate, the isolation of the individual characters and the recognition of these characters are the topics of the remainder of this report. Given that the previous steps were carried out successfully, the system can now extract the name and address of the offender and issue a fine, completely without human interaction. Â

This project will focus on the design of algorithms used for extracting the license plate from a single image, isolating the characters of the plate and identifying the individual characters.

1.6

Delimitation

This section concludes the chapter by summing up the delimitations for the system. The delimitations are grouped by subject.

1.6.1

License plate formats

This project does not consider square or customized plates. The vast majority of license plates currently in use are of the standard rectangular shape and are not customized. In a full-blown implementation of the system all license plate types should of course be handled.

1.6.2

Video processing

The license plate recognition developed in this report operates on single images, whereas a full implementation would also need to extract a suitable frame from the input video sequence. No investigation is performed into neither techniques for determining vehicle speed from a video sequence, nor for the extraction of a suitable frame from the sequence.

Chapter 1. Traffic control

1.6.3

Identification of driver

As mentioned in Section 1.1 the driver must be clearly identifiable before a fine can be issued. This report does not describe methods for automatically determining whether this condition holds, but only how to provide an output of the characters on the license plate.

1.6.4

Transportation of data

The question of how the film sequences are transferred to the processing computer before the analysis is beyond the scope of this project. It is however reasonable to imagine that permanent surveillance posts would have some sort of static network connection, whereas temporary posts would periodically require manual changes of the recording media. Alternatively some kind of wireless LAN could be employed.

1.6.5

Quality of decisions

As described in Section 1.2 a key element in making the proposed system useful is, that in cases of doubt when identifying the characters, that is when the probability of the best guess being correct is below some threshold, the system should refuse to make a decision. This issue is not addressed in this project and instead the best guess is always used as the final result.

Part II Design

Chapter

License plate extraction

The seven characters of the license plate Identify characters 7 images with the characters Isolate characters Subimage containing only the license plate Extract license plate Single image

2.1

Introduction

Before isolating the characters of the license plate in the image, it is advantageous to extract the license plate. This chapter presents three different extraction strategies. First the theory behind each method is developed, then the strengths and weaknesses are summarized and finally, in the last section of this chapter, the three strategies are combined. The strategies are Hough transform, template matching and region growing. The common goal of all these strategies are, given an input image, to

Chapter 2. License plate extraction

produce a number of candidate regions, which with high probability contains a license plate. The purpose of this chapter is not to develop three competing methods, since it is unlikely that one method will prove better in all cases. Instead the goal is to have the strengths of each method complement the weaknesses of the other methods and in this way improve the overall efficiency of the system. The methods presented in this chapter are all based on several assumptions concerning the shape and appearance of the license plate. The assumptions are listed in the following: 1. The license plate is a rectangular region of an easily discernable color 2. The width-height relationship of the license plate is known in advance 3. The orientation of the license plate is approximately aligned with the axes 4. Orthogonality is assumed, meaning that a straight line is also straight in the image and not optically distorted. The first two assumptions are trivially fulfilled, since the license plate is bright white or yellow and the size is known (see Section 1.4). The last two assumptions are based on the fact that the camera should always be aligned with the road. This is necessary in order to reduce perspective distortion. Several other objects commonly found in the streets of urban areas fit the above description as well. These are primarily signs of various kinds. This should be taken into account when finding locations for placing the camera in the first place. Even if some of these objects are mistakenly identified, this is not a great problem since later processing steps will cause these to be discarded. Throughout the chapter the same source image is used in all examples. This image is shown in Figure 2.1. The methods have been developed using 36 training images, and they can all be found on the attached CD-ROM along with 72 images used for testing.

2.2

Hough transform

This section presents a method for extracting license plates based on the Hough transform. As shown in Figure 2.2 the algorithm behind the method consists of five steps.

2.2 Hough transform

Figure 2.1: Source image

The first step is to threshold the gray scale source image. Then the resulting image is passed through two parallel sequences, in order to extract horizontal and vertical line segments respectively. The first step in both of these sequences is to extract edges. The result is a binary image with edges highlighted. This image is then used as input to the Hough transform, which produces a list of lines in the form of accumulator cells. These cells are then analyzed and line segments are computed. Finally the list of horizontal and vertical line segments are combined and any rectangular regions matching the dimensions of a license plate are kept as candidate regions. This is also the output of the algorithm.

2.2.1

Edge detection

Before computing the edges, the source image is thresholded. The choice of optimal threshold value is highly dependent on lighting conditions present when the source images were taken. The value chosen in this project was based on the training images. The first step is to detect edges in the source image. This operation in effect reduces the amount of information contained in the source image by removing everything but edges in either horizontal or vertical direction. This is highly desirable since it also reduces the number of points the Hough transform has to consider. The edges are detected using spatial filtering. The convolution kernels used are shown in Equation (2.1). Since horizontal and vertical edges must be

Chapter 2. License plate extraction

Source image (gray scale) Thresholding

Binary image

Hough transform

Line segment extraction

Vertical edge detection

Hough transform

Abstraction level

Horizontal edge detection

Binary image with edges highlighted

Accumulator cells (corresponding to lines)

Line segment extraction Horizontal and vertical line segments

Candidate region extraction Candidate regions

Figure 2.2: Overview of the Hough transform method

detected separately, two kernels are needed. "

−1 Rhorizontal = 1

Rvertical = −1 1

(2.1)

The choice of kernels was partly based on experiments and partly because they produce edges with the thickness of a single pixel, which is desirable input to the Hough transform. Figure 2.3 shows the two kernels applied to the inverted thresholded source image. The image is inverted to make is easier to show in this report. Since the two filters are high pass filters, which approximates the partial derivatives in either horizontal or vertical direction, a value of positive one in the resulting image corresponds to a transition from black to white and a value of minus one corresponds to the opposite situation. Since it is of no importance which transition it is, the absolute value of each pixel is taken before proceeding.

2.2 Hough transform

Figure 2.3: Horizontal and vertical edges

2.2.2

Line detection using Hough transform

The Hough transform[4] is a method for detecting lines in binary images. The method was developed as an alternative to the brute force approach of finding lines, which was computationally expensive. A brute force approach to finding all lines in an image with n points would be to form all lines between pairs of points and then check if the remaining points were located near the lines. Since there are roughly n(n − 1)/2 of these lines and roughly n comparisons per line this is an O(n3 ) operation [4]. In contrast the Hough transform performs in linear time. The Hough transform works by rewriting the general equation for a line through (xi , yi ) as:

yi = axi + b ⇔ b = −xi a + yi

(2.2) (2.3)

For a fixed (xi , yi ), Equation (2.3) yields a line in parameter space and a single point on this line corresponds to a line through (xi , yi ) in the original image. Finding lines in an image now simply corresponds to finding intersections between lines in parameter space. In practice, Equation (2.3) is never used since the parameter a approaches ∞ as the line becomes vertical. Instead the following form is used: x cos θ + y sin θ = ρ

(2.4)

In Equation (2.4) the θ parameter is the angle between the normal to the line and the x-axis and the ρ parameter is the perpendicular distance between the line and the origin. This is also illustrated in Figure 2.4. Also in contrast to the previous method, where points in the image corresponded to lines in

Chapter 2. License plate extraction

parameter space, in the form shown in Equation (2.4) points correspond to sinusoidal curves in the ρθ plane. (0, 0)

−π/2 y

θ1

0◦

π/2

ρ1

(x2 , y2 ) Hough transform

(ρ1 , θ1 )

(x1 , y1 )

Figure 2.4: Example image and corresponding Hough transform

As shown in Figure 2.4, the range of θ is ± π2 and for an image with a √ resolution of w × h the range of ρ is 0 to w2 + h2 . Figure 2.4 also shows two points, (x1 , y1 ) and (x2 , y2 ), and their corresponding curves in parameter space. As expected, the parameters of their point of intersection in parameter space corresponds to the parameters of the dashed line between the two points in the original image. In accordance with the preceding paragraph, the goal of the Hough transform is to identify points in parameter space, where a high number of curves intersect. Together these curves then correspond to an equal amount of collinear points in the original image. A simple way to solve this problem is to quantize the parameter space. The resulting rectangular regions are called accumulator cells and each cell corresponds to a single line in the image. The algorithm behind the Hough transform is now straight forward to derive. First the accumulator array is cleared to zero. Then for each point in the edge image iterate over all possible values of θ and compute ρ using Equation (2.4). Finally for each computed (ρ, θ) value, increment the corresponding accumulator cell by one. Since the algorithm iterates over all points it is clear that it performs in O(n) time. For a general purpose implementation of the algorithm as outlined above, choosing a higher resolution for the accumulator array yields more precise results, but at the cost of increased processing time and higher memory requirements. This is not the case for the implementation used in this project, since a full sized accumulator array is never needed in practice. The reason

2.2 Hough transform

is that only horizontal and vertical lines are considered. This corresponds to computing the Hough transform only in a small interval around either 0 or π/2. Since only a single ‘column’ of the accumulator array is ever computed, this vastly decreases both the memory requirements and processing time of the algorithm. After the algorithm has iterated over all points, the accumulator cells contain the number of points which contributed to that particular line. Finding lines is then a matter of searching the accumulator array for local maxima.

2.2.3

Line segment extraction

The Hough transform in its original form as proposed in [4] supports only line detection as opposed to line segment detection. A slight modification of the algorithm makes it possible to extract line segments. The key is simply to remember which points in the original image contributed to which lines. Instead of only storing the number of contributing points in the accumulator cells, references to the points are also kept. The downside to this procedure is that in general for an image with n points and a resolution in θ of rθ , exactly n × rθ references must be stored. As mentioned above, the transform is only computed for a single value of θ, so rθ is always 1 in this project, which makes this fact less of a problem. Finding line segments is now a matter of iterating over each cell and searching for line segments. This is done by first sorting all the points according to their position along the line in question and then grouping points into line segments. Since it is known that the points are in order, grouping is most easily done by iterating over all points and for each points checking whether the next point in line is within some maximum distance. If it is, the line segment is extended to include the new point, otherwise a new line segment is started. In this project the maximum distance, or maximum gap length, was chosen as the value which gave the best results for the training images. Also, as some of the line segments are potentially very short, specifying a minimum line segment length is advantageous and helps reduce computation in later steps.

Figure 2.5: Example license plate and reconstructed line segments

Figure 2.5 shows an ideal case, where the longest line segments from the

Chapter 2. License plate extraction

image have been extracted. The short line segments in the letters and in the top part have been eliminated. The first step is sorting the points along the line. This is done by first finding the parameterized equation of the line. A cell with parameters (ρi , θi ) corresponds to a line with Equation (2.5). # " # " # " x cos(θi + π/2) ρi cos θi = t+ y sin(θi + π/2) ρi sin θi

(2.5)

Computing t for each point and sorting after t is now trivial. The result of this step is a list of line segments.

2.2.4

Candidate region extraction

The final step is to find the candidate regions, that is the regions believed to contain license plates. The input to this step is a list of horizontal line segments and a list of vertical line segments. The procedure used is first to scan the lists for pairs of segments that meet the following requirements: 1. They should start and end at approximately the same position. 2. The ratio between their average length and distance should equal that of a standard license plate. The resulting two lists of regions, one with horizontal and one with vertical pairs, are then compared and any region not contained in both lists are discarded. The remaining regions are the final candidate regions.

2.2.5

Strengths and weaknesses

This section discusses the advantages and disadvantages of the Hough transform method. The strengths are listed in Table 2.1 and the weaknesses in Table 2.2. Strengths Scaling invariant

Explanation Since the algorithm does not look for regions of particular size, it is invariant to scaling of the license plate.

2.3 Template matching

Relatively independent of license plate color

As long as the license plate is brighter than the surroundings, the plate is usually correctly extracted.

Table 2.1: Strengths of the Hough transform method

Weaknesses Trouble detecting vertical lines

Finds more than just the license plate

Explanation Vertical lines in the license plate are typically more than a factor four shorter than the horizontal lines and thus more susceptible to noise. All rectangular regions with dimensions equal to that of a license plate are identified, which is sometimes many. This makes it difficult to choose the correct candidate later.

Table 2.2: Weaknesses of the Hough transform method

2.3

Template matching

The main philosophy behind extraction through template matching is, that by comparing each portion of the investigated image to a template license plate, the actual license plate in the image is found as the region bearing the most resemblance to the template. A common way to perform this task is to use a cross correlation scheme.

2.3.1

Template construction

The way the template is constructed plays a significant role in the success of template matching. There are no rules for template construction but the template must be constructed in such a way, that it has all the characteristics of a license plate as it is presented in the source image. Image preprocessing When examining the source images in the training set, it is quite obvious that there are two major differences in the plates. The plates vary in size because of the fact that the cars are at different distances and the license plates vary in light intensity due to different lighting conditions when the images are taken. Both are issues that cannot be handled by the template construction. The size issue cannot be helped at all. This observation might very well turn out

Chapter 2. License plate extraction

to be critical for this approach, but for now that fact will be ignored and the attention turned to the lighting issue. This can be helped by proper image preprocessing. A bit simplified, the goal is to make the plates look similar. This can be done by thresholding the image. The threshold value is not calculated dynamically because the image lighting is unpredictable and the different locations and colors of cars makes it impossible to predetermine the black (or white) pixel percentage needed for dynamic thresholding. Therefore the value giving the best result in a number of test images is chosen. Figure 2.6 shows two images, with very similar license plates, but different overall lighting, to demonstrate, that a dynamic threshold is hard to accomplish.

Figure 2.6: Images with different lighting

Also, a license plate has an edge, which on the processed image will appear to be solid black, and a bit thicker in the bottom. It is however not a good idea to add this feature to the template. Even a small rotation of the license plate means that the correlation would yield a big error.

Designing the template A subtemplate for the letters can be constructed by adding layer after layer containing the letters in the alphabet, and then merging the layers, giving each layer the opacity of the probability of the letter. In the same way a subtemplate can be made for the digits. Ignoring the diversity of the license plates on the road today and combination restrictions, each letter is given the same probability, as are the digits. Assuming that letters have the same probability does not reduce the efficiency of the system. Doing so, we end up with a gray-scale template as seen in Figure 2.7.

2.3 Template matching

Figure 2.7: Template created through opacity-merging of the probabilities. The left hand side shows a letter template, and the right hand side a digit template

2.3.2

Cross correlation

One of the more common ways to calculate the similarities between two images is to use cross correlation. Cross correlation is based on a squared Euclidean distance measure[5] in the form: d2f,t (u, v) =

XX x

[f (x, y) − t(x − u, y − v)]2

(2.6)

Equation (2.6) is an expression for the sum of squared distances between the pixels in the template and the pixels in the image covered by the template. The value calculated represents that of pixel (u, v) in the correlated image as shown in Figure 2.8. The pixels near the edges are ignored as the correlation is only carried out when the entire template can be correlated. A correlation near the edges would mean that a pseudo-pixel value in the image for the area under the template exceeding the image would have to be assigned. This could for instance be the average value of the rest of the region. This has not been done, because a license plate exceeding the image is useless in terms of recognition, so there is no point in spending computing power on these regions.

Chapter 2. License plate extraction

Origin

Source Image

Template v t(x−u,y−v)

Output Image

Figure 2.8: The layout of the cross correlation

Expanding the expression for the Euclidean distance, d2 , produces: d2f,t (u, v) =

XX x

[f 2 (x, y) − 2f (x, y) · t(x − u, y − v) + t2 (x − u, y − v)] (2.7)

P P Examining the expansion, it is noted that the term x y t2 (x − u, y − v) is constant, since it is the value of the sum of pixels in the entire template squared. P 2 Assuming that the term f (x, y) can be regarded as constant as well, means that it is assumed that the light intensity of the image does not vary in regions the size of the template over the entire image. This is useful, because based on this assumption, the remaining term as expressed in Equation (2.8) becomes a measure for the similarity between the image and the template. This measure for similarity is called the cross correlation. c(u, v) =

XX x

f (x, y) · t(x − u, y − v)

(2.8)

The assumption for which the validity of the measure is based, is however P P somewhat frail. The term x y f 2 (x, y) is only approximately constant for images in which the image energy only varies slightly. In most images this is not the case. The effect of this is that the correlation value might be higher in bright areas, than in areas where the template is actually matched. Also the range of the measure is totally dependent on the size of the template. These issues are addressed in the normalized cross correlation.

2.3 Template matching

2.3.3

Normalized cross correlation

The expression for calculating the normalized cross correlation coefficient, γ(u, v), is shown in Equation (2.9). The way it handles the issue of bright areas is by normalizing the measure, by dividing by the sum of the mean deviations squared[5][6]. Without this normalization the range is dependent on the size of the template, which seriously restricts the areas in which template matching can be used. Also, the mean value of the image regions are subtracted, corresponding to the cross covariance of the images[6]. Often the template will need to be scaled to a specific input image, and without normalization the output can take on any size depending on the scaling factor. P P x

γ(u, v) = qP P x

y [f (x, y) − f u,v ][t(x − u, y − v) − t]

2 y [f (x, y) − f u,v ]

P P x

(2.9)

2 y [t(x − u, y − v) − t]

f u,v is the mean value of the image pixels in the region covered by the template, and t is the mean value of the template. The value of γ lies between -1 and 1, where -1 is the value of a reversed match, 1 when a perfect match occurs. The value approaches 0 when there is no match. As can be seen in the expression for the cross correlation coefficient (Equation (2.9)), it is a computationally expensive task. For each pixel in the output image, the coefficient has to be calculated. Assuming an image of size M 2 , a template of size N 2 and not including the normalization (only the numerator of Equation (2.9)), the calculations involves approximately N 2 (M − N + 1)2 multiplications and the same number of additions [5]1 .

2.3.4

Strengths and weaknesses

Strengths Single and simple similarity measure

Explanation Template matching is a strong approach for finding a single similarity measure between two images. Through a simple off-the-page algorithm this measure can be easily calculated with good results without an investigation of the specifics of the region sought after. Often, using a sample region will be sufficient to identify similar regions.

For all the complexity evaluations it should be noted that these are estimates, and vary depending on method of implementation.

Chapter 2. License plate extraction

Table 2.3: Strengths of the template method

Weaknesses Slow algorithm

Not invariant to rotation and perspective distortion

Not invariant to scaling

Static threshold

Explanation A large input image and a smaller template, will make performing the simple calculations in the many nested summations a demanding task. If the region sought after is rotated or distorted in the input image, the region may very well bear little or no resemblance to the template on a pixel by pixel basis. This means the similarity measurement will fail. Scaling of input images proves to be an unsurpassable problem. It is an impossible task to examine the input image using all possible sizes for the template, and even the smallest variation in size will often lead to a wrong result. The images vary a great deal in overall brightness, depending on the surroundings

Table 2.4: Weaknesses of the template matching method

2.4

Region growing

This section will describe another method for finding uniform regions (such as a license plate) in an image. The basic idea behind region growing is to identify one or more criteria, that are characteristic for the desired region. Once the criteria have been established, the image is searched for any pixels that fulfill the requirements. Whenever such a pixel is encountered, its neighbors are checked, and if any of the neighbors also match the criteria, both of the pixels are considered as belonging to the same region. Figure 2.9 visualizes these steps. The criteria can be static pixel values, or depend on the region that is being expanded.

2.4.1

License plate features

When looking at a license plate, the most obvious feature that sets it apart from its surroundings, is its brightness. Since most of all Danish license plates

2.4 Region growing

Get next pixel in image

No Does pixel fulfil requirements?

Yes

Grow to include neighbors Yes

Expansion possible?

Mark found pixels as region

Figure 2.9: The steps of region growing

are made from a reflecting material (see Section 1.4), they will usually appear brighter than the rest of the vehicle. Exceptions occur with white cars, but in general, looking at the brightness is an effective way of finding possible candidates for license plates. Of course, it has to be considered, that the characters inside the plate are black. This is helpful when a license plate has to be distinguished from e.g. other parts of a white vehicle. License plates also have well defined dimensions. Combining the two features above, all bright rectangular regions with a certain height-width ratio should be considered candidates for license plates.

2.4.2

Preprocessing

Before searching through the image for any pixels bright enough to be part of a license plate, the image is prepared by converting it to binary. This is done, since color plays no role when looking for bright pixels. In performing this conversion, the threshold value is critical as to whether or not the license plate can be distinguished from its surroundings. On one hand, the threshold

Chapter 2. License plate extraction

should be low enough to convert all pixels from the license plate background into white, on the other hand it should be chosen high enough to convert as much of the other parts of the image as possible into black. Figure 2.10 shows an example of such a binary image. The license plate appears as a bright rectangle, but there are several other white regions.

Figure 2.10: Binary image

A problem arises, since the overall brightness of the images is not known in advance, and therefore it is reasonable to select a relatively low threshold to guarantee that the background of the license plates are always converted into white. Then other criteria, such as the ratio mentioned above, will have to help in selecting the most probable candidate for a license plate, from among the list of white regions.

2.4.3

Extracting the regions

When using the region growing concept to extract regions with similar features, the most intuitive approach is to use recursion. As described in Section 2.4, each pixel is compared to the criteria (in this case only brightness), and whenever a pixel fulfills the requirements, all of its neighbors are also compared to the criteria. This method requires, that all pixels that have been examined are marked to prevent the same region from being extracted multiple times. This can be achieved merely by setting the value of all previously found pixels to black. The nature of the region growing algorithm does not guarantee that a region satisfies the condition of being a square with certain dimensions. There are several solutions to this problem:

2.4 Region growing

A region is transformed into the largest contained rectangle within its area. This method is susceptible to noise, since a single black pixel along an otherwise white line of pixels, could prevent the rectangle from reaching the optimal size (see Figure 2.11).

Figure 2.11: Largest contained rectangle

A region is transformed into a rectangle, based on maximum and minimum values of the pixels it contains. This method reduces the effect of noise, but introduces the possibility of mistaking arbitrary white forms for a white square. It guarantees, however, that the entire license plate will be contained in a single region, even though the plate is not completely horizontal (see Figure 2.12).

Figure 2.12: Rectangle from maximum and minimum values

Since the method of transforming a region into the largest contained rectangle is susceptible to noise, it is reasonable to assume that the second method will prove to give the best results. An enhancement to the algorithm can be achieved by setting a dynamic criteria for when a neighbor pixel belongs to the same region. Instead of a

Chapter 2. License plate extraction

static threshold, dividing the picture in black and white pixels, the criteria could be that the neighboring pixel must not differ more than a given margin in brightness. This would mean that license plates partly covered in shadow could be seen as a coherent region, but also introduces the risk of letting the license plate region expand beyond the real license plate, if the border is not abrupt.

2.4.4

Strengths and weaknesses

Tables 2.5 and 2.6 summarizes the most important characteristics of the region growing method. Strengths Fast algorithm

Invariant to distance between camera and vehicle Resistant to noise

Explanation Each pixel is examined no more than once for each neighbor. This implies an O(n) algorithm. The method extracts candidates with the correct shape, it does not depend on size of regions. The region is expanded to the largest possible rectangle based on maximum and minimum values.

Table 2.5: Strengths of the region growing method

Weaknesses High demands for memory Static threshold

Explanation The recursive nature of the algorithm stores temporary results for each call to the recursive function. The images vary a great deal in overall brightness, depending on the surroundings.

Table 2.6: Weaknesses of the region growing method

2.5

Combining the method

In order to select which method to use and how to combine them, each method must be evaluated against the demands set by the system.

2.5 Combining the method

2.5.1

Scale invariance

The size of the license plate in the image varies and therefore the extraction method must be invariant to those changes. Hough transformation is only concerned with the horizontal and vertical lines in the image, and the ratio between them. Region growing finds a bright region and again it is basically the ratio between width and height that is evaluated. Therefore both the Hough transformation and the region growing method are, as they should be, invariant to changes in license plate size. Template matching is not scale invariant. Since template matching is based on a pixel by pixel comparison, any change in size makes the result unpredictable.

2.5.2

Rotation invariance

To reduce the number of needed calculations, the Hough transform was designed to find horizontal and vertical lines, based on the assumption that the license plate is aligned with the axes of the image. Because of this a license plate slightly rotated will not be extracted. Region growing finds the bright areas regardless of orientation, and a small rotation does not change the height-width ratio noticeably. This means that the method is capable of extracting rotated plates. Template matching is also seriously affected by rotation. If the plate is rotated even slightly, the outcome of the template matching is, as was the case with scaling, unpredictable.

2.5.3

Lighting invariance

The surroundings and weather conditions makes it difficult to use any form of dynamic thresholding, when converting the image into binary. The success rate of both template matching and region growing suffers because of it. Hough transform is however much less sensitive to changes in the lighting conditions under which the picture was taken. The reason is, that while both template matching and region growing thresholds the image with a predetermined value, the Hough transformation is applied to an image, on which an edge detection has been performed. As long as the lines in the image are detectable by the edge detection algorithm, the Hough transformation does not care how bright or dark the image is.

Chapter 2. License plate extraction

2.5.4

Summarizing the differences

Area Scale invariance Lighting invariance Rotation invariance

Hough Region transform growing ✓ ✓ ✓ ✗ ✗ ✓

Template matching ✗ ✗ ✗

Table 2.7: Demands for license plate extraction method

It cannot be guaranteed, that the images are taken in the same distance from each vehicles. Thus, the criteria must be invariant with respect to the distance between camera and vehicle, or in other terms it is crucial that the method is scaling invariant. Due to the scaling problem template matching will not be able to extract license plates single handedly. It might however be very useful as an extension to the other methods examined in this chapter, because it will be able to aid in the evaluation of the intermediate results. In real life, the assumption that license plates are aligned with the axes might not hold. It might not be possible to place the camera under optimal conditions, resulting in perspective and/or rotation distortion. Therefore a more robust system, with some invariance to these factors would be desirable. Here the region growing method seems like a good choice. But, since the system will be expected to function in all weather conditions, it would be preferable that the method is as insensitive to changes in lighting conditions as possible. The Hough transform has that quality to some extent. Combining the two methods ideally makes sure the system finds the plate under all conditions. Both methods will produce a set of possible regions, which serves as input to a method which is to determine the most probable candidate.

2.5.5

Candidate selection

As mentioned, the output from region growing and Hough transform consists of a list of regions, of which one hopefully contains the license plate. In the system, three steps are performed in order to select the best candidate. Correlation First correlation takes its turn to sort out regions, that bear no resemblance to license plates. This step sorts out regions such as bright areas of the sky.

2.5 Combining the method

Peak-and-valley This method is designed to sort out any uniform regions, such as pieces of the road. It works by examining a horizontal projection of the candidate region. In this projection, the numbers of sudden transitions from a high number of black pixels to low, and vice versa, is counted. If this number is below an experimentally determined threshold, the region cannot contain a license plate. Height-width ratio Although some times a region with the license plate will contain some edges around the actual plate, the ratio provides a mean of sorting out a lot of regions, that could not possibly contain a license plate. If more than one region passes through all three criteria, the one with the best ratio is selected.

Figure 2.13: Selection of the most probable candidate

These methods have been designed to complement each other, so that various types of regions can be sorted out from true license plates. Figure 2.13 illustrates, how the three steps each sort out a certain type of regions. The

Chapter 2. License plate extraction

first region from the left will not pass a correlation test, since the black pixels does not cover the correct areas. When resized, both regions 2 and 3 pass this test, but region 2 is too uniform to pass the peak-and-valley test. Now only region 3 and the real license plate are left, and the height-width ratio easily determines which is the better alternative. The performance of the combined methods will be examined in Section 5.4, as well as a discussion of the order in which the methods should be applied.

2.6

Summary

This chapter introduced three methods for finding candidate regions for the license plate. While region growing and Hough transform seem to be viable algorithms, the correlation scheme suffers from a scaling problem, which prevents it from being a real alternative. Also, it was demonstrated how the selection of the most probable candidate takes place, with three different methods that complement each others weaknesses. One of these was correlation, which proved to be a good discriminator, when sorting out regions that does not contain a license plate.

Chapter

Character isolation

The seven characters of the license plate Identify characters 7 images with the characters Isolate characters Subimage containing only the license plate Extract license plate Single image

3.1

Introduction

To ease the process of identifying the characters, it is preferable to divide the extracted plate into seven images, each containing one isolated character. This chapter describes several methods for the task of isolating the characters. Since no color information is relevant, the image is converted to binary colors before any further processing takes place. Figure 3.1 shows the ideal process of dividing the plate to seven images containing a character each.

Chapter 3. Character isolation

Figure 3.1: Isolating characters

3.2

Isolating the characters

There are several approaches for finding the character bounds, all with different advantages and disadvantages. The following sections describes three different approaches and how these methods are combined to provide a very robust isolation.

3.2.1

Static bounds

The simplest approach is to use static bounds, assuming that all characters are placed approximately at the same place on every license plate as shown in Figure 3.2. The license plate is simply divided into seven parts, based upon statistical information on the average character positions.

Figure 3.2: Example showing statically set character bounds

The advantage of this method is the simplicity, and that its success does not depend on the image quality (assuming that the license plate extraction is performed satisfactory). Its weakness is the fairly high risk of choosing wrong bounds and thereby making the identification of the characters difficult. This risk is directly proportional to the quality of the license plate extraction output.

3.2 Isolating the characters

The first character on the lowest plate in Figure 3.2 shows an example output from the plate isolation, where a portion of the mounting frame was included. Another weakness is that the method only separates the characters instead of finding the exact character bounds.

3.2.2

Pixel count

Instead of using static bounds, a horizontal projection of a thresholded plate often reveals the exact character positions, as demonstrated in Figure 3.3. The approach is to search for changes from valleys to peaks, simply by counting the number of black pixels per column in the projection. A change from a valley to a peak indicates the beginning of a character, and vice versa.

Figure 3.3: Horizontal projections of the actual characters

Depending on the quality of the license plate extraction, and the success of removing the frame, this method is very useful, since it is independent of the character positions. The downside is that this method is very dependent upon image quality and the result of the license plate extraction.

3.2.3

Connected components

Another method is to search for connected components in the image. The easiest way to perform this task is on a binary image, using simple morphology functions. The method for extracting connected components in a binary image is to search for a black pixel. When such a pixel is found, it it assumed that it is a part of a component and therefore the basis for the an iterative process, based on Equation (3.1) [4] Xk = (Xk−1

k = 1, 2, 3, ....

(3.1)

where Xk represents the extracted component, A is the source image and B is a structuring element of size 3×3 indicating 8-connectivity neighboring.

Chapter 3. Character isolation

X0 is the first black pixel from where the iteration starts. The algorithm starts by creating a new image X0 , only containing the first black pixel from where the it begins, illustrated as the black structure in Figure 3.4. In the first step of the iterative algorithm, X0 is dilated using the structuring element B. This means that the area of interest is expanded from the first black pixel, to all of its neighbors. The outcome X1 of the first iteration is then the similarities from the dilation of X0 and the original image A, found by an AN D operation. In other words, all the neighbors of the first black pixel that are also black, belong to the component searched for. In the next iteration X1 is dilated, and the result is the similarities between the dilation and the original image. This iterative process continues until the resulting component equals the previous, Xk = Xk−1 , meaning that resulting component no longer differs from the previous. The first two iterations of the process is illustrated in Figure 3.4.

Pixels in image with value ‘1’ Connected pixels found (X k) Result of dilation Structuring element B Figure 3.4: An iteration in the process of finding connected components

When a connected component has been found, it is tested whether it fulfills the size specifications for a character, and thereby sorting out unwanted parts, for example bolts or dirt. The advantages of this method is that it its independent of the image rotation and that the bounds found are very precise. Its disadvantages are that it requires a good image quality and a good conversion from the original image to the binary image, to avoid making two or more characters appear as one connected region. Small gaps in the characters can also cause the method

3.2 Isolating the characters

to fail.

3.2.4

Improving image quality

Variations in image quality and license plate conditions can affect all of the methods mentioned previously. The following sections describe how the image quality can be improved. Isolating the plate The image received from the extraction often contains more than just the license plate, for example the mounting frame as in Figure 3.1. This frame can be removed by further isolation of the actual license plate. The isolation of the license plate from any superfluous background is performed using a histogram that reflects the number of black pixels in each row and in each column. In most cases, projecting the amount of black pixels both vertically and horizontally reveals the actual position of the plate. An example of the projections is shown in Figure 3.5. A simple but effective method for removing the frame is based on the assumption, that the vertical projection has exactly one wide peak created by the rows of the characters. Therefore the start of the widest peak on the vertical projection, must be the top of the characters, and the end of the peak the bottom of the characters. It is expected that the horizontal projection has exactly seven wide peaks and eight valleys (one before each character, and one after the last character).

Figure 3.5: Horizontal and vertical projections

The success of this method depends on the assumption that the plate is horizontal. In some cases the method will result in a small part of the frame being left back in parts of the image, for example if the angle of the plate is too large, although depending on the thickness of the remaining frame, it is still possible to separate the characters.

Chapter 3. Character isolation

Dynamic threshold If the original image is dark or the license plate is dirty, the binary image created from the standard threshold value can be very dark and filled with unwanted noise. There are various methods to eliminate this problem; the threshold used when the original image is converted from color to binary, is calculated dynamically based on the assumption that an ideal license plate image on average contains approximately 69% white/yellow pixels and 31% black pixels including the frame1 . The idea is first to make the image binary. Second, the ratio between black and white pixels is calculated and compared with the expected value. Then a new threshold value is selected and the original image is converted again, until a satisfactory ratio has been achieved. Although this method is affected when the edges of the plate have not been cut of, it is far more reliable than setting a static threshold. Removing small objects A different approach is to remove the unwanted pixels from the binary image. Knowing that we are looking for seven fairly large regions of black pixels, it can be useful to process a number of erosions on the image before searching for the bounds, and thereby removing unwanted objects, for example bolts.

3.2.5

Combined strategy

Extra Edges Bad threshold Noise

Static bounds ✗ ✓ ✓

Pixel count ✓ ✗ ✓

Connected components ✓ ✗ ✗

Table 3.1: Strengths of different methods

To achieve the best possible isolation, a combined strategy can be used to improve the probability of success. Table 3.1 shows how the three methods complement each other. While static bounds are immune to threshold values and image noise, it is very dependent upon whether all edges have been eliminated. On the other hand, when using pixel count the edges can easily be distinguished from characters, but the threshold value is crucial to its success. Finally, finding the connected components is susceptible to noise as well as a bad threshold. However, the connected components are less likely to be disturbed, since the weaknesses towards threshold and noise are rather small. The 1

Based on the mean value of black pixels, calculated from the training set images

3.2 Isolating the characters

table should demonstrate, that it is possible to combine the three methods, so that none of the three potential difficulties remain a problem. Figure 3.6 shows how the different methods are combined, and the method is summarized below: First convert the image into binary colors using a dynamic threshold to achieve the best result Try the connected-component method to see if seven good components can be found If the first attempt is unsuccessful, try to isolate the plate by cutting away edges and try the connected components method again. If still unsuccessful, try the pixel-count method to search for the character bounds on the horizontal projection. If this method also fails, use static bounds as the final option. Input from license plate isolation

Dynamic threshold

Cut away edges

Isolate using connected components

Segment using connected components

Segment using pixel count method

Succes?

Yes

Segment using static bounds

Output from character isolation

Figure 3.6: Combination of character isolation procedures

Connected components is always performed as the first attempt, since it has a very good probability of success if the input image does not contain too much noise in the form of edges or similar black regions. If the connected components method fails, even after an attempt to cut away the edges, pixelcount is performed as the first alternative. The static bounds are not utilized, until all other options have been exhausted. The reason is, that the static bounds are very reliant on the fact, that there are no vertical edges, and that the plate is totally aligned with the axes.

Chapter 3. Character isolation

3.3

Summary

In this chapter several approaches to isolating the characters from an input image containing the entire license plate, was described. None of the methods were capable of providing reliable results on their own, due to the varying input image quality, but a combination of the methods ensures a very robust isolation scheme. The results of the combination can be seen in Section 6.8.

Chapter

Character identification 4.1

Introduction

The seven characters of the license plate

Identify characters 7 images with the characters Isolate characters Subimage containing only the license plate Extract license plate Single image

After splitting the extracted license plate into seven images, the character in each image can be identified. Identifying the character can be done in a number of ways. In this chapter, methods for this will be addressed. First, a solution based on the previously discussed template matching (Section 2.3), will be presented. Thereafter, a method based on statistical pattern recognition will be introduced, and a number of features used by this method will be described. Then, an algorithm for choosing the best features called SEPCOR, will be presented. After finding a proper set of features, means of selecting the most probable class is necessary. For this purpose, Bayes decision

Chapter 4. Character identification

rule will be introduced, along with theory on discriminant functions and parameter estimation. Finally the methods will be compared against each other and it will be examined if there is anything to be gained from combining the two.

4.2

Template matching

This section presents, how the isolated characters can be identified by calculating the normalized correlation coefficient. Previously it was investigated how template matching could help in the extraction of the license plates from the input images, and therefore the theory behind template matching can be reused. The conclusion then, was that it was unable to perform the task by itself, but was useful in combination with other methods. Searching through a large image with a small template could mean running the algorithm hundreds of thousands of times, whereas if a single similarity measurement of two same-sized images is needed, the algorithm would have to run only once. The idea behind an implementation of a correlation based identification scheme is simple. Two template pools, one consisting of all the possible values for the letters, and one of all the values of the digits, are constructed. Once the license plate has been cut into the seven characters, each image containing a single character is evaluated in turn in the following way. The normalized correlation coefficient between the image of the character and the appropriate template pool is computed. The template that yields the highest coefficient indicates what character is depicted in the input images.

4.3

Statistical pattern recognition

Having looked at identifying characters using template matching, the focus is now turned to statistical pattern recognition. Several methods for pattern recognition exist, but one of the more classical approaches is statistical pattern recognition, and therefore this approach has been chosen for identifying the characters. For simplicity, the focus will only be on the images containing digits, but the methods are completely similar for the letters. When using statistical pattern recognition, the first task is to perform feature extraction, where the purpose is to find certain features that distinguish the different digits from each other. Some of the features that are found might be correlated, and an algorithm called SEPCOR can therefore be used to minimize the number of redundant features. When an appropriate number of

4.4 Features

uncorrelated features have been found, a way of deciding which class an observation belongs to, is needed. For this purpose Bayes decision rule will be introduced, and finally an expression for the classification will be proposed.

4.4

Features

All of the features are extracted from a binary image because most of the features require this. The area and the circumference of the digits are the most simple features to distinguish. These features are not sufficient, because there are different digits with approximately the same area and circumference, e.g. the digits ‘6’ and ‘9’. To distinguish in such cases, the number of endpoints in the upper half and lower half of the image are taken into account. Here it is assumed that the digit ‘6’ has one endpoint in the upper half and zero endpoints in the lower half, and the endpoints of the digit ‘9’ are the reverse of a ‘6’. To distinguish between the digits ‘0’ and ‘8’, it is not possible to use the endpoint feature because they both have zero endpoints. Here the number of connected compounds is an appropriate feature. The number of compounds in the digit ‘8’ is three, whereas a ‘0’ has only two compounds. Furthermore the value of each pixel is chosen as a feature. A final feature is the area of each row and column, meaning the number of black pixels in each of the horizontal or vertical lines in the image. The features are listed below: Area Area for each row Area for each column Circumference End points in upper/lower half Compounds The value of each pixel Most of the features require that the characters have the same size in pixels, and therefore the images with the characters are initially normalized to the same height. The height is chosen because there is no difference between the height of the characters, whereas the width of the different digits differ, e.g. a ‘1’ and an ‘8’ have the same height but not the same width. The following describes the methods for extracting each feature.

Chapter 4. Character identification

4.4.1

Area

When calculating the area of a character, assuming background pixels are white and the character pixels are black, the number of black pixels are counted. In area for each row the number of black pixels in every horizontal line are counted. As seen in Figure 4.1, these vertical projections are distinct for many of the digits. The horizontal projections are more similar in structure, with either a single wide peak, or a peak in the beginning and end of the digit. Although more similar they will still be used to distinguish between the digits.

Figure 4.1: Projections of all the digits

4.4.2

End points

The number of end points is also a feature that distinguishes some of the digits from each other. It is also clear that the locations of these end points vary, and this can help in distinguishing between the digits. A ‘6’, for instance, has only a single end point in the upper half, and none in the lower. Although the positions also vary horizontally, there would be problems when deciding whether the end points of a ‘1’ are in the left or the right half of the image. Therefore, only the vertical position of the end points is considered in the implementation of the system. The number of end points is found from a skeleton of the character. To obtain the skeleton a thinning algorithm is used, and the result is a one pixel wide skeleton. The end points of the character are the pixels that have only one connected 8-neighbor. The thinning algorithm [4] iteratively deletes1 the edges of the character, until a one pixel wide skeleton is reached. The algorithm must fulfill that it does not delete end points and that it does not break connections. Breaking connections will result in additional end points, and therefore this is quite an important demand to the algorithm. In order to get the skeleton of the 1

In this situation deleting should be understood as setting the pixel to the background value.

4.4 Features

character, the algorithm is divided into two parts. The first part deletes edges in east, south or the northwest corner. The second part deletes edges in west, north or the southeast corner. During the processing of the two parts, pixels that satisfy the conditions listed below are flagged for deletion. The deletion is not applied until the entire image has been processed, so that it does not affect the analysis of the remaining pixels. The first part of the algorithm deletes pixels if all of the following conditions are satisfied. (a) 2 ≤ N (p1) ≤ 6 (b) Z(p1) = 1 (c) p2 · p4 · p6 = 0 (d) p4 · p6 · p8 = 0 N (p1) is the number of neighbors (depicted in Figure 4.2) of p1 with the value one, this condition ensures that end points (N (p1) = 1) and non-border pixels (N (p1) > 6) are not deleted. Z(p1) is the number of 0-1 transitions in the ordered sequence of p2, p3, p4, . . . , p8, p9, p2. An example of this can be seen in Figure 4.3, where Z(p1) = 2, it is a one pixel wide connection and in this situation p1 should not be deleted because this would break the connection, and because of condition (b) pixel p1 is not deleted. Conditions (c) and (d) satisfy that the pixel is placed in east, south or the northwest corner, see the gray pixels in Figure 4.4b. p9

Figure 4.2: The neighbors of p1

The second part of the algorithm deletes pixels if condition (a) and (b) combined with (c’) and (d’) are satisfied. (c’) p2 · p4 · p8 = 0 (d’) p2 · p6 · p8 = 0

Chapter 4. Character identification

Figure 4.3: Example where the number of 0-1 transitions is 2, Z(p1) = 2.

Condition (c’) and (d’) satisfy that the pixel is placed in west, north or the southeast corner, see the gray pixels in Figure 4.4c. An example of using the thinning algorithm is depicted in Figure 4.4. The gray pixels in are those marked for deletion. a) shows the original images, b) is the result of applying part 1 one time and c) is the output of using part 2 once, d) is part 1 once more, and e) is the final result of the thinning algorithm.

Pixels in image with value ‘1’ Pixels marked for deletion

Figure 4.4: The steps of the thinning algorithm

One iteration of the algorithm consists of: 1. Applying part one, and flag pixels for deletion. 2. Delete flagged pixels. 3. Applying part two, and flag pixels for deletion. 4. Delete flagged pixels. The algorithm terminates when no further pixels are flagged for deletion. When using the thinning algorithm even small structures are present in the final skeleton of the image, an example of this can be seen in Figure 4.5.

4.4 Features

Connection point

Figure 4.5: Result of the thinning algorithm

Normally a ‘0’ does not have any end points, but in this case it gets one end point. The way of finding end points is therefore slightly modified. Structures that are less than 3 pixels long before reaching a connection point are not taken into account. Then the digit ‘0’ in Figure 4.5 has no end points, which was expected.

4.4.3

Circumference

Another possible feature to distinguish between characters is their circumference. As it can be seen in Figure 4.6, a ‘2’ has a somewhat larger circumference than a ‘1’.

100

174

Figure 4.6: Two digits with different circumferences. The circumference is shown below

To find the circumference of a character, the pixels on the outer edge are counted. This is done by traversing the outer edge of the character, and counting the number of pixels, until the start position has been reached. The circumference is very dependent upon the selected threshold value, but relatively resistant to noise.

4.4.4

Compounds

Each character consists of one or more connected compounds. A compound is defined as an area that contains pixels with similar values. Since the image is binary, a compound consists of either ones or zeros. The background is only

Chapter 4. Character identification

considered a compound if it appears as a lake within the digit. Figure 4.7 shows that a ‘0’ has two compounds, while a ‘1’ has only one.

(a)

(b)

Figure 4.7: (a) shows the two compounds in a ‘0’, and (b) the one in a ‘1’

To find the number of compounds, a Laplacian filter is applied to the character. The Laplacian operator is defined as [4]: L[f (x, y)] =

∂2f ∂2f + ∂x2 ∂y 2

(4.1)

This leads to the following digital mask: 0 1 0 1 −4 1 0 1 0 Convolving the original binary image of the character with this mask, the result is an image with black background, and the edges represented by a one pixel thick white line, see Figure 4.8.

(a)

(b)

Figure 4.8: Result of applying a Laplacian filter to a ‘3’. (a) is the original and (b), is the resulting image.

The number of compounds is then found by counting the number of connected white lines. Since the Laplacian filter is very sensitive to noise[4], white lines with lengths below a certain threshold are discarded. As alternative method for finding connected compounds, the method used when isolating the characters could also be used. The alternative method was described in Section 3.2.3.

4.5 Feature space dimensionality

4.4.5

The value of each pixel

The value of each pixel is used as a feature. In this way, the feature based identification resembles correlation based identification. When the Euclidean distance between a sample and a class is calculated, the difference is, that in the template based identification the correlation measure is normalized, which it is not in the feature based identification. Additionally, in the feature based identification more features are being used, such as the number of end points. In both cases, the distance is measured from a mean calculated from the training set.

4.5

Feature space dimensionality

Having introduced a number of features, it is interesting to see, how they complement each other. If two features are redundant, one of the features may be removed without decreasing the efficiency of the system. For this purpose, the SEPCOR algorithm is presented. SEPCOR is an abbreviation of SEParability and CORrelation, and has the purpose of reducing dimensionality of the feature space, with the least possible loss of information. Before explaining the steps of the SEPCOR algorithm, a measure of variability is defined: V =

variance of class mean values mean value of class variances

(4.2)

This V-value says something about the different features. A higher V-value means a better and more distinct feature. This is because the variance of mean values tells how far the mean values of the different classes are from each other. The further away they are from each other, the easier they are to distinguish, thus the numerator is large. The mean value of the variances is a measure of variances of a given class. If this is a small value, the class is not spread over a wide area, and thus it is easier to distinguish, hence the denominator would be small. This description makes it clear that the bigger a V-value, the better the feature is. Knowing the V-value of all the features, the SEPCOR algorithm can be carried out. The steps of the SEPCOR algorithm are: 1. Create a list of all features, sorted after their V-value. 2. Repeat: (a) Remove and save the feature with the largest V-value.

Chapter 4. Character identification

(b) Calculate the correlation coefficient between the removed feature and the remaining features in the list. (c) Remove all features with a correlation greater than some threshold. This algorithm runs until a desired number of features have been removed from the list, or the list is empty. Calculating the correlation coefficient as mentioned in step 2.b is done as shown in Equation (4.3). c =| √

σij | σii · σjj

(4.3)

The correlation coefficient is actually the normalized correlation coefficient as presented in Section 2.3.2. Here the σij simply corresponds to the (i, j) entry in the covariance matrix, which is the covariance between feature i and j, and σii is the variance of feature i (see Section 4.6).

4.5.1

Result using SEPCOR

The number of features picked from using the SEPCOR algorithm depends on how the threshold for the correlation coefficient, c, is set. This threshold is hereafter mentioned as the maximum correlation. Its range is from 0 to 1, where a threshold of 1 would yield the maximum number of features, since even features that correlate completely will be used. Figure 4.9 shows how the number of selected features rises, as the maximum correlation approaches 1. When a maximum correlation of 1 is used, 306 features are picked. This is the maximum number of features that can be picked, and includes all the features mentioned in Section 4.4. The number of features to be used to identify the characters will be discussed in Chapter 7.

4.6

Decision theory

After having acquired a sample in the form of a feature vector with values for all the previously described features, the sample must be classified as belonging to one the ten different classes. To do this, a statistical classifier is used. In this chapter, Bayes classifier will be presented and evaluated as a discriminant function used to divide the feature space. The use of the Mahalanobis distance measure as an alternative to the Euclidean distance is based on the assumption, that the data set is normally distributed. Why this is a necessary assumption will be discussed, and whether or

4.6 Decision theory

300

250 Training set Test set

Number of features

200

150

100

0.1

0.2

0.3

0.4 0.5 0.6 Correlation threshold

0.7

0.8

0.9

Figure 4.9: Number of features selected as a function of the maximum correlation coefficient.

not the data set used in this project is normally distributed will be investigated by making various tests. When using the Bayes classifier it is necessary to know the class conditional probability density functions. Since the distribution of the data is assumed normal, this means that the parameters of this particular distribution has to be estimated. A method for doing this is also introduced.

4.6.1

Bayes decision rule

First, Bayes classifier is described: P (ωj | x̄) = in which p(x̄) =

s X j=1

p(x̄ | ωj ) · P (ωj ) p(x̄)

(4.4)

p(x̄ | ωj ) · P (ωj )

(4.5)

x̄ is the feature vector of the sample being investigated. A finite set of classes, s is defined as Ω = {ω1 , ω2 , . . . , ωs }. In this case, the classes represent the different characters that can occur on a license plate. Similar, a set of actions is declared as A = {α1 , α2 , . . . , αs }. These actions can be described as the

Chapter 4. Character identification

event that occurs when a character is determined to be e.g. of class ω2 . This implies assigning the correct character value to the given sample. The loss by performing the wrong action, meaning performing αi when the correct class is ωj is then denoted by λ(αi | ωj ). The total loss by taking the wrong action can then be described as: R(αi | x̄) =

s X j=1

λ(αi | ωj ) · P (ωj | x̄)

(4.6)

where P (ωj | x̄) is the probability that ωj is the true class given x̄. This can be used when having to make decisions. One is always interested in choosing the action that minimizes the loss. R(αi | x̄) is called the conditional risk. What is wanted is a decision rule, α(x̄), that determines the action to be taken on a given data sample. This means, that for a given x̄, α(x̄) evaluates to one of the actions in A. In doing so, the action that leads to the smallest risk should be chosen, in accordance with Equation 4.6. Minimum-error-rate classification As already stated, it is desirable to choose the action that reduces the risk the most. It can also be said, that if action αi corresponds to the correct class ωj , then the decision is correct for i = j, and incorrect for i 6= j. The symmetrical loss-function is built on this principle:    0 i=j λ(αi | ωj ) = (4.7) i, j = 1, . . . , s   1 i 6= j

If the decision made is correct, there is no loss. If it is not, a unit loss is achieved, thus all errors cost the same. The average probability of error is the same as the risk of the loss function just mentioned, because the conditional risk evaluates to: R(αi | x̄) = 1 − P (ωi | x̄) (4.8) where P (ωi | x̄) is the conditional probability that αi is the correct action[9]. So, in order to minimize the probability of error, the class i that maximizes P (ωi | x̄) should be selected, meaning that ωi should be selected if P (ωi | x̄) > P (ωj | x̄), when i 6= j. This is the minimum-error-rate classification.

4.6.2

Discriminant functions

Classifiers can be used to divide the feature space into decision regions. One way of partitioning is the use of discriminant functions, where a discriminant

4.6 Decision theory

function gi (x̄) for each class is defined. Then the classifier assigns vector x̄ to class ωi if: gi (x̄) > gj (x̄) for all j 6= i. An example of dividing the feature space into decision boundaries can be seen in Figure 4.10, which accomplishes the minimum error-rate.

Figure 4.10: Decision regions

The feature vector x̄ is assigned to the class corresponding to the region in which the vector is placed. A Bayes classifier can also be represented in this way, where gi (x̄) = P (ωi | x̄). The maximum discriminant function will then correspond to the a posteriori 2 probability, see Equation 4.4. When looking at the a posteriori probability the discriminant function can also be expressed as: gi (x̄) = p(x̄ | ωi )P (ωi ), because the denominator is a normalizing factor and therefore not important in this context. Bayes classifier is therefore simply determined by p(x̄ | ωi )P (ωi ). The most commonly used density function is the multivariate normal density function, because of its analytical qualities [9]. The univariate normal density function is given by: " µ ¶2 # 1 1 x−µ p(x) = √ exp − 2 σ 2π σ

(4.9)

Which is specified by two parameters; the mean µ and the variance σ 2 , or simply writing p(x) ∼ N (µ, σ 2 ). Normal distributed samples tend to cluster about the mean within 2σ. In the multivariate normal density function, the mean is instead represented as a vector, and the variance is represented as a matrix. The multivariate normal density function is given by: ·

1 ¯ −1 (x̄ − µ̄) exp − (x̄ − µ̄)t Σ̄ p(x̄) = ¯ d/2 1/2 2 (2π) | Σ̄ | 1

(4.10)

where x̄ is a column vector with d entries, µ̄ is the mean vector also with ¯ is a d × d matrix called the covariance matrix. Equation 4.10 d entries, and Σ̄ 2

Knowledge not known in advance; knowledge based on observations

Chapter 4. Character identification

¯ ), where the mean and the covariance is can be abbreviated as p(x̄) ∼ N (µ̄, Σ̄ calculated by: µ̄ = E[x̄]

(4.11)

¯ = E[(x̄ − µ̄)(x̄ − µ̄)t ] Σ̄

(4.12)

The covariance is calculated for each component in the vectors; xi corresponds to the i’th feature of x̄ and µi is the i’th component of µ̄, and σij is the ¯ . The entries in the mean and in the covariance matrix i-j’th component of Σ̄ is calculated from: µi = E[xi ]

(4.13)

σij = E[(xi − µi )(xj − µj )]

(4.14)

And when having n samples in the observation. µi =

σij =

n X xik

(4.15)

n k=1

n X (xik − µi )(xjk − µj )

n−1

k=1

(4.16)

In the covariance matrix the diagonal, σii , is the variance of feature i and the off-diagonal, σij , is the covariance of feature i and j. Samples drawn from a normal distributed population tend to gather in a single cloud. The center of such a cloud is determined by the mean vector, and the shape is dependent of the covariance matrix. The interpretation of the covariance matrix and the shape of the clouds can be divided into five different cases. For simplicity the examples have only two features. Case 1 ¯= Σ̄

σ11 σ12 σ21 σ22

1 0 0 1

(4.17)

If the covariance matrix equals the identity, the cloud is circular shaped, which is depicted in Figure 4.11, where the axes represent the two features.

4.6 Decision theory

Figure 4.11: The covariance matrix equals the identity.

Figure 4.12: The features are statistically independent.

Case 2 and 3 ¯= Σ̄

σ11 0 0 σ22

(4.18)

If all the off-diagonal entries are 0, then the features are uncorrelated and thus statistically independent, and the cloud is formed as an ellipse where the axes are the eigenvectors of the covariance matrix and the length is the corresponding eigenvalue3 . In these cases the ellipse is aligned with the featureaxes. If σ11 > σ22 the major axis is parallel with the axis of feature one, and if σ11 < σ22 the minor axis is parallel with feature one. Figure 4.12 shows the two cases. Case 4 and 5 ¯= Σ̄

σ11 σ12 σ21 σ22

(4.19)

If σ12 = σ21 is positive, it is called positive correlation. An example of this could be the features area and circumference, if the area increases, the circumference usually also increases, and therefore there is a positive correlation between the two features. Negative correlation is the opposite, and σ12 = σ21 3

In a n × n matrix there are n eigenvectors and n eigenvalues

Chapter 4. Character identification

Negative correlation

Positive correlation

Figure 4.13: Positive and negative correlation.

is negative. The orientation of the clouds in the two situations are depicted in Figure 4.13. In the case of multiple features the clouds are hyper-ellipsoids instead, where the principal axes again are given by the eigenvectors, and the eigenvalues determine the length of these axes. In the hyper-ellipsoids the quadratic form: ¯ −1 (x̄ − µ̄) r2 = (x̄ − µ̄)t Σ̄

(4.20)

is constant and sometimes called the squared Mahalanobis distance from x̄ to µ̄ [9]. To achieve the minimum error-rate, the classification can be achieved by the use of discriminant functions for the normal density. Since the discriminant equal function: gi (x̄) = p(x̄ | ωi )P (ωi ), it can be rewritten as: gi (x̄) = log p(x̄ | ωi ) + log P (ωi )

(4.21)

When combining this with the normal density function, under the assump¯ ) then [9]: tion that p(x̄ | ωi ) ∼ N (µ̄i , Σ̄ i 1 ¯ −1 (x̄ − µ̄ ) − d log 2π − 1 log | Σ̄ ¯ | + log P (ω ) (4.22) gi (x̄) = − (x̄ − µ̄i )t Σ̄ i i i i 2 2 2 Where d2 log 2π is a class independent normalizing factor, log P(ωi ) is the ¯ | is the determinant of the covariance matrix. The class probability and | Σ̄ i appearance of the digits on the license plate are assumed to have the same probability, meaning P(ωi ) is the same for all classes, and log P(ωi ) therefore is not necessary. In this project, when calculating the determinant of the covariance matrix, it often nearly equaled zero because the matrix was ill-conditioned meaning that the condition number is too large. This can be seen by calculating the

4.6 Decision theory

condition number for the matrices, which for all matrices was greater than 1020 . The condition number of a matrix A, is defined as the product of the norm of A and the norm of A−1 . Under the assumption that the classes have the same probability and the determinant of the covariance matrix nearly equals zero, the classification is instead done simply on basis of the squared Mahalanobis distance, as represented in Equation (4.20). As part of computing the squared Mahalanobis distance, the inverse covariance matrix must be computed. As stated above this was made difficult by the fact that these matrices were often ill-conditioned and traditional methods for computing the inverse matrix fails in these special cases. An alternative to computing the traditional inverse is to compute the pseudo-inverse. The pseudo-inverse is defined in terms of singular value decomposition and extends the notion inverse matrices to singular matrices[10]. As an alternative distance measure, the Euclidean distance can be used. The squared Euclidean distance is defined as[4]: d2 = (x̄ − µ̄)t (x̄ − µ̄)

(4.23)

When using the Euclidean distance, the inverse covariance matrix is replaced with the identity matrix. This means, that unlike Mahalanobis distance, the Euclidean distance is not a weighted distance.

4.6.3

Test of normal distribution

In order to use the discriminant function method presented in the previous section, it is necessary to test whether or not the observations are normal distributed, because the method requires data to be so. Three methods have been used to determine if data is normal distributed. First, plotting the observation in a histogram and then visually determine if it is approximately normal distributed. Second, Matlab can be used to display a normal probability plot of the data, again with visual inspection used to determine the similarity with the normal distribution. The third method is a goodness of fit test, where a value is calculated to determine if data is normal distributed. In this section the circumference of the number ‘0’ is used as an example for the three methods. Ideally all features should be investigated, and those that are not approximately normal distributed should be ignored. The histogram plot is depicted in Figure 4.14, and it is seen that the graph approximately resembles a normal distribution.

Chapter 4. Character identification

Figure 4.14: Histogram plot showing approximately a normal distribution

The Matlab plot displays true normal distributions as linear curves, and so it is expected, that the data from the circumference is approximately linear. Figure 4.15 shows, that this is almost the case. Normal Probability Plot 0.98 0.95 0.90

Probability

0.75

0.50

0.25

0.10 0.05 0.02 275

280

285

290

295

300 Data

305

310

315

320

Figure 4.15: Normal plot produced in Matlab

The goodness of fit test can be based on the χ2 (chi-square) distribution [1]. The observations made in a single class, e.g. the number ‘0’, are then grouped into k categories. The observed frequencies of sample data in the k categories are then compared to the expected frequencies of the k categories, under the assumption that the population has a normal distribution. The measure of the difference between the observed and the expected in all categories is calculated

4.6 Decision theory

by: 2

χ =

k X (oi − ei )2 i=1

(4.24)

where oi is the observed frequency for category i, ei is the expected frequency for category i, and k is the number of categories. The χ2 -test requires the expected frequencies to be five or more for all categories. When using the χ2 -test to check whether or not the data is normal distributed, data is assumed to be defined by the mean µ and the variance σ 2 . The null and alternative hypotheses are based on these assumptions. H0 : The population has a normal probability distribution, N (µ, σ 2 ). Ha : The population does not have a normal probability distribution. The rejection rule: Reject H0 if χ2 > χ2α where α is the level of significance and with k-3 degrees of freedom. When defining the categories, the starting point is the standard normal distribution, N(0,1), and the strategy is then to define approximately equally sized intervals. Here k = 6 categories are created, with the limits for the intervals defined as ]-∞;-1], ]-1;-0.44], ]-0.44;0], ]0;0.44], ]0.44;1] and ]1;∞[. The intervals and the area of probability are depicted in Figure 4.16. The degrees of freedom can then be calculated as Df reedom = k − 3 = 3.

Figure 4.16: Standard normal distribution, with the 6 categories and the corresponding areas of probability

The observations made from the training set are used to estimate the mean circumference, µ, and the standard deviation s, of the normal distribution. P xi 11360 µ= = = 298.95 (4.25) n 38

Chapter 4. Character identification

(xi − µ)2 = n−1

3977.87 = 10.37 37

(4.26)

The limits for the categories can then be calculated on the basis of the defined intervals, the mean and the standard deviation. x1 = 298.95 + (−1) · 10.37 = 288.58 x2 = 298.95 + (−0.44) · 10.37 = 294.39 x3 = 298.95 + (0) · 10.37 = 298.95 x4 = 298.95 + (0.44) · 10.37 = 303.51 x5 = 298.95 + (1) · 10.37 = 309.32 The observed frequency of data, oi , is summed, and the expected frequencies, ei , for each interval is computed by multiplying the sample size with the corresponding area of probability, in this case (0.1587, 0.1713, 0.17, 0.17, 0.1713, 0.1587). The observed and expected frequencies can be seen in Table 4.1. The χ2 test statistic can then by computed as stated in Equation (4.24). Interval 0→288 289→294 295→298 299→303 304→309 ≥310 Observed (o) 4 6 10 6 6 6 Expected (e) 6.03 6.51 6.46 6.46 6.51 6.03 (o - e) -2.03 -0.51 3.54 -0.46 -0.51 -0.03 Table 4.1: Class distribution

χ =

k X (oi − ei )2 i=1

= 2.74 < χ20.10 = 5.251

The null hypothesis, H0 , will not be rejected with a 10% level of significance, and with a 3 degrees of freedom. The results for the other features, were not always as close to a normal distribution as desired. A feature such as the number of compounds is very stable, and does not have any variance whatsoever. Other features have seemingly random distributions, which to a high extent could be caused by the relatively small amount of training data. Although some of the features are not normal distributed, the majority are. Therefore, in this project it is assumed that all features are normal distributed and hence we can apply Bayes decision rule. Section 4.6.4 will discuss methods for estimating the parameters that fit the best, under the assumption that the data is normal distributed.

4.6 Decision theory

4.6.4

Parameter estimation

In the general case of using Bayes classification it is necessary to know both the a priori probabilities as well as the class conditional probability density functions. The problem is not the a priori probabilities, since these are usually easy to obtain, but instead the class conditional probability density functions. In practice, exact specifications of these functions are never directly available. Instead, what is available is a number of design samples which should be used for training the system plus evidence that these samples has some known parametric distribution. It is assumed that the a priori probabilities of all the classes are equal. This reduces the problem of finding the class conditional probability density functions to a matter of estimating the parameters of a multivariate normal ¯. distribution, that is the mean, µ̄ and the square covariance matrix Σ̄ One method for parameter estimation is investigated. This method is called maximum likelihood estimation and takes a number of preclassified sets of samples X0 . . . X9 as input. These samples are images of each of the numbers from the ten different classes. Since the sets are preclassified, this procedure is a kind of supervised learning. Maximum likelihood estimation Given the design samples described above, maximum likelihood estimation works by estimating the ten sets of parameters, which maximize the probability of obtaining the design samples. If the ten sets of parameters are the vectors θ̄0 . . . θ̄9 each with d = f + f 2 elements for f features, corresponding to a mean vector and the covariance matrix, then the class conditional probability density function, which is to be maximized, can be written as being explicitly dependent on the parameter vectors as p(x̄ | ωj ; θ̄j ). Considering each class separately reduces the problem to estimating the parameters of ten independent distributions. If for instance the sample set X consists of the n samples X = {x̄1 , x̄2 , . . . , x̄n }. Then since the samples were drawn independently, the probability of the sample set given a fixed parameter vector θ̄ can be expressed as the product[9]:

p(X | θ̄) =

n Y

k=1

¯ p(x̄k | θ)

(4.27)

Equation (4.27) viewed as a function of θ̄ is called the likelihood of θ̄. The global maximum of this function, say θ̂, is the value which maximizes

Chapter 4. Character identification

Equation (4.27) and which makes p(x̄ | ωj ; θ̄j ) the best fit for the given sample set.

¯ = l(θ̄) = log p(X | θ)

n X k=1

log p(x̄k | θ̄)

(4.28)

If θ̄ = {θ1 , θ2 , . . . , θd }t then Equation (4.28) is a function of exactly d variables4 . Finding the critical points analytically can now be done by locating the points where all the partial derivatives are zero. One of these points is guaranteed to be the global maximum. Equation (4.29) shows the gradient operator for θ̄: 



∂ ∂θ1  ∂   ∂θ2   

∇θ̄ =  .   .. 

(4.29)

∂ ∂θd

All critical points of Equation (4.28), including θ̂, are then solutions to:

∇θ̄ l = ∇θ̄

n X k=1

log p(x̄k | θ̄) = 0

(4.30)

This is in general a set of d non-linear equations. As an example, Equation (4.30) is solved in its simplest case below for d = 2, which corresponds to a single feature. In this case the feature vector θ̄ has θ1 = µ and θ2 = σ 2 . The first task is to substitute θ1 for µ and θ2 for σ 2 into Equation (4.9) and take the logarithm as in Equation (4.28): ¯ l(θ̄) = log p(X | θ) n X log p(xk | θ̄) = k=1 n X

(4.31) (4.32)

Ã µ ¶2 !# 1 1 xk − θ 1 √ = log √ √ exp − 2 θ2 2π θ2 k=1 =

n X

1 1 (xk − θ1 )2 − log 2πθ2 − 2 2θ 2 k=1

(4.33) (4.34)

The reason for taking the logarithm is that it eases some algebraic steps later on (this is clearly legal, since the logarithm is a monotonically increasing function).

4.7 Comparing the identification strategies

The next step is to insert the last equation into Equation (4.30): "

∂ ∂θ1 ∂ ∂θ2

# n X

1 1 − log 2πθ2 − (xk − θ1 )2 2 2θ2 k=1 " # n 1 X (xk − θ1 ) θ2 =0 = −θ1 )2 − 2θ12 + (xk2θ 2 k=1

∇θ̄ l =

(4.35) (4.36)

Finally µ and σ 2 are back substituted and solved for. This yields: n

1X xk µ = n k=1

(4.37)

1X (xk − µ)2 n k=1

σ2 =

(4.38)

The results for the general multivariate version are similar: n

1X x̄k µ̄ = n k=1

(4.39)

X ¯ = 1 (x̄k − µ̄)(x̄k − µ̄)t Σ̄ n k=1

(4.40)

These results correspond well to Equations (4.15) and (4.16). Thus the maximum likelihood estimates are as expected the sample mean and the sample covariance matrix.

4.7

Comparing the identification strategies

In order to decide which method to use it must be clear in which situations each method is preferable. This is done by identifying their individual strengths and weaknesses. The strengths and weaknesses of template matching were listed in Table 2.3 and 2.4, and although directed towards license plate extraction, most of them do apply in more general terms. Template matching is a strong similarity measure, when the size and rotation of the object to be compared is known. The algorithm is simple and fast, when it only has to run once on small images such as the isolated characters. Feature based identification makes certain demands to the system in which it is to be used. The use of the involved classifiers demands that the a priori

Chapter 4. Character identification

probabilities are accessible, and it has to be possible to estimate the class conditional probability density functions. Once these demands are met, the strengths and weaknesses are totally dependent on the features selected. If good features are available, the classification can be performed with only a few features, and depending on the speed of extracting these features, the method can be very fast. Feature based identification is in many cases very sensitive to input quality. Noisy images may hinder the correct extraction of certain features, which can be very disruptive for systems that base the identification on a small amount of features. The amount of features in this system ensures, that although one or two features fail, in the sense that they are wrongly extracted, the overall identification is still possible. Template matching is also sensitive to noise and of course the accuracy decreases along with input quality, but not nearly as much as what might happen with feature identification.

4.8

Summary

In this chapter the two methods, template matching and statistical pattern recognition, for identifying characters were described. In connection with the statistical recognition, a method for reducing the number of features called SEPCOR was introduced. Mahalanobis distance was stated as a measure of identifying the characters, derived from Bayes decision rule. As an alternative measure, the Euclidean distance was used. A mean for testing, whether data is normal distributed was presented, and parameter estimation for the Bayes classifier is described. Finally, the two identification strategies were compared.

Part III Test

Chapter

Extraction test 5.1

Introduction

This chapter describes the tests performed to verify that the license plate extraction performs adequately. Since region growing and Hough transform have the same objective, they will be described together, whereas correlation is described separately, since it is used for verifying whether or not a region could potentially contain a license plate. Finally the combined method will be tested, in order to give an estimate of the actual performance of the subsystem.

5.2

Region growing and Hough transform

Both region growing and Hough transform were designed to go through the entire input image, searching for regions that correspond to some of the characteristics of a license plate. Although the approaches are very different, they have the same input, and the same success criteria for the output regions. This section will describe how the tests were performed, what the results were, and provide an evaluation of those results.

5.2.1

Criteria of success

When testing the license plate extraction of region growing and Hough transform, it is important to realize, that the goal is not to provide a single image, but rather a limited list of regions, where at least one of the regions should contain the license plate. The criteria for when a region is useful, is relatively straight-forward. It must encompass the entire license plate, so that all seven characters are fully visible. Also, it may not contain too much noise in the

Chapter 5. Extraction test

form of edges. If the region contains more than just the license plate, it will be harder to distinguish the correct region from the entire list of regions. As a secondary goal, the methods should not provide an excessive amount of regions. A very large amount of regions makes it harder to select the most probable license plate candidate.

5.2.2

Test data

Both methods were tested using two sets of images. One consisted of 36 images that were used as a training set to develop the algorithms. These images were all taken under the same lighting conditions, but shadows from trees nearby gave some variance in the brightness of the license plates. Also, the images were taken in various distances from the vehicles, to ensure that the algorithms were constructed to be invariant to the size of the license plate. Apart from the training set, the tests were run on a collection of 72 test images. These were all taken on the same road, but facing in two different directions, to provide some challenges regarding the lighting situation. As with the training images, they were taken in different distances. The goal is, of course, that the hit percentage of the test images is as high as that of the training set. If the training set results are a lot better than for the test set, it means that the system has been optimized towards a certain situation, which is undesirable.

5.2.3

Test description

As mentioned, the test was performed on all images. For each image, a manual inspection was made, to see if the license plate had been successfully extracted in at least one of the regions. This rather overwhelming task was eased by first trying to automatically identify the correct license plate, so that in many of the cases, it was not necessary to go through all of the regions. In the case of region growing, no suitable way of finding a dynamic threshold was found. Instead, the algorithm was run five times, with different values for the threshold. This caused a larger number of regions to be found, but also ensured that the algorithm had a better chance of finding the license plates under the varying lighting conditions. The Hough transform test was performed in a very similar way, with manual inspection of the regions, if automatic determination of the correct region did not succeed. For both methods, it was also registered, how many candidate regions were extracted. Generally, a lower number will make the task of finding the most

5.2 Region growing and Hough transform

probable candidate easier later on.

5.2.4

Results

The region growing method turned out to be a very robust way of finding the license plate. As Table 5.1 indicates, it almost always found the plate. In only 2 of the 72 test images, it was not capable of locating a satisfying region. The total amount of regions lies in the lower hundreds, which may at first glance seem a lot. It covers the fact that several regions are duplicated because of the iterations with different threshold values. There were some common characteristics of the two plates that were not found. They were both rather dark images, taken so that the shade was in front of the vehicle. Also, they were both yellow plates, further decreasing the brightness of the plates in contrast to the surroundings. A further scaling of the threshold value might have solved this issue, but for lower thresholds, the total amount of regions rise quickly, so that this might not provide better overall results anyway. Data set Method Successful Regions Training set Region growing 36/36 (100.0 %) 200 - 400 Hough transform 23/36 (63.9 %) 100 - 200 Test set Region growing 70/72 (97.2 %) 150 - 300 Hough transform 43/72 (59.7 %) 300 - 600 Table 5.1: Results for region growing and Hough transform.

For Hough transform, the percentage of found plates was somewhat lower. Again, the yellow plates were more troublesome, with a hit percentage of 43 compared too a rate of 64 % for white plates. Also, the number of candidate regions was approximately a factor 2 higher than for region growing in the test set, but with very large differences from image to image. In the training set the number was lower, again with large differences between the individual images. As with region growing, the results were slightly better for the training set, but the difference is so minuscule, that it does not imply that the algorithm had been artificially designed for the specific set of training images. As is also seen in Table 5.1, the amount of regions found by the Hough transform is higher for the test set than for the training set. The reason is, that a parking lot in the background of some of the images causes a very high number of edges to be detected. Region growing actually finds fewer regions in the test set, but the numbers represent an average, with very high deviations between the individual images.

Chapter 5. Extraction test

The plates that were not found by region growing, were not found by the Hough transform either. Therefore, the conclusion must be that the two methods do not complement each other. In conclusion, the region growing method will guarantee a very high success rate, but also a rather large amount of regions to be sorted out before finding the most probable candidate. A very large percentage of the regions can easily be discerned as random regions without much similarity with a license plate. The algorithms were proven to be almost as efficient when applied to images, that were not part of the training set, and were taken in different lighting situations and distances.

5.3

Correlation

The correlation was ruled out as an extraction method, due to the fact that it was not scaling invariant. Another use was found for it in the selection of the candidate regions, identified by the other two methods. It is for this use, that the correlation method will be tested.

5.3.1

Criteria of success

If the region containing the license plate is selected each time, it passes the test, even if some regions that are not actually license plates slip through. The reason for this is that in the implementation it will not function alone, but in combination with other methods, namely the peak-to-valley and height-width ratio methods. This cooperation of techniques will be tested in Section 5.4. Therefore the conclusion reached in this section is not a final conclusion on the selection among regions, but merely a conclusion on the correlation method.

5.3.2

Test data

The output regions from region growing and Hough transform are to be used as input. All the regions actually containing license plates (106) have been mixed with 700 random regions, to test whether the plates could be distinguished.

5.3.3

Test description

The correlation coefficient between the candidate regions and the template is calculated, and if the value exceeds an experimentally set threshold the region is marked as a possible plate. All regions that passed the test were then

5.3 Correlation

manually examined, to see if all plates were contained, and which type of other regions passed.

5.3.4

Results

As can be seen in Table 5.2, all of the plates have been found. Region type Regions accepted Percentage Plates 106/106 100.0 % Non-Plates 137/700 19.6 % Table 5.2: Region selection by correlation.

The table also shows that regions that are not license plates, are not always ruled out. This does not matter as it will be used in connection with other methods. The only demand is, that the combination rules them out. Looking at the non-plate regions that were not ruled out, there are some common characteristics (see Figure 5.1). Many of the regions are as a), bright regions with dark vertical lines, very similar to a license plate. The same goes for b), where bright regions with darker ends are mistaken for the plates. It should be noted that the correlation values for these types are much lower, than a typical license plate would be expected to produce. Still, the threshold value is set relatively low, so that even dark or slightly rotated license plates are not omitted. c) is a different matter altogether. It might very well be difficult to distinguish between a whole plate and a plate with the end cut off, at least it is expected, that the region is not discarded as being a plate. Still, the template has been designed so that the actual plate, where the characters are placed as on the template, would yield a higher correlation coefficient. a) b) c)

Figure 5.1: Examples of falsely accepted regions

This put aside, this test shows, that if the method was to be used alone, more effort would have to be put in determining threshold values and perhaps in template construction.

Chapter 5. Extraction test

5.4

Combined method

To test the actual algorithm for selecting the correct region, a combination of the three methods described in Section 3.2.5 was tested.

5.4.1

Criteria of success

The algorithm for selecting the most probable candidate region must be capable of sorting out a lot of different regions, that potentially bear close resemblance to a license plate. Such regions could include bright areas of the sky or the road. Even traffic signs pose a potential difficulty, since they obviously share some of the characteristics of a license plate.

5.4.2

Test data

Input to the algorithm are the same images as were used to test the correlation method. The optimal result therefore consists of finding all 106 plates, from a mix with 700 random regions.

5.4.3

Test description

Since correlation was proven to be an effective discriminator, when sorting out random regions, it is the first algorithm to be used. As shown in Section 5.3.4, 137 non-plate regions pass this test. Second, peak-to-valley is used to sort out any uniform regions. After this step, only 45 non-plates remain, which have to be sorted out by looking at the height-width ratio of the region. In this last step, the region with the best ratio is chosen as the final candidate.

5.4.4

Result of test

The results of the test are summarized in Table 5.3, where it is seen, that 105 of the 106 license plates were found, and that none of the random regions were mistaken for a license plate. Method applied Regions sorted out Remaining Correlation 563 / 806 243 Peak-to-valley 92/243 151 Height-width ratio 46/151 105 Total accuracy: 105 / 106 99.1% Table 5.3: Test of selecting most probable region

5.5 Summary

The license plate not found was a blurred, small yellow plate, and by tweaking the threshold values in the different steps, this plate could also be recognized as a plate. The number of non-plates found to be plates, could be seriously reduced, by tweaking the threshold values. For instance, a proper selection of the threshold value for the correlation coefficient makes the correlation validation more accurate. But it is pointless to do so, with the individual threshold values out of context with the other methods. If the threshold for the coefficient is increased, it is more likely that an actual plate is discarded.

5.5

Summary

The test showed, that the region growing method was capable of finding nearly all license plates. The Hough transform found a smaller amount of the plates, and it could not be established, that a combination would provide better results than for region growing on its own. The combined method for selecting the most probable license plate candidate is very effective. The three steps are each capable of sorting out a different type of region, and the combination makes it possible to find the correct license plate in a large collection of non-plate regions. The amount of successfully extracted license plates is not as high as the results above would indicate, however. This is due to the fact, that the amount of found regions is very different from one image to another. Therefore, some plates are not found when searching for the best region.

Chapter

Isolation test 6.1

Introduction

This chapter describes the test of the preprocessing step of isolating the characters in the license plate. The purpose of the chapter is to verify that the implementation of the isolation method described in Chapter 3 performs efficiently.

6.2

Criteria of success

The purpose of the method is to divide the extracted license plate into exactly seven subimages, each containing one of the seven characters. A successful isolation fulfills all of the following criteria: The plate must be divided into exactly seven subimages. None of the seven characters may be diminished in any way. The sequence of the subimages has to be in the correct order. This means that the first character of the plate is the first of the subimages. An example of an unsuccessful and a successful isolation is seen on Figure 6.1.

6.3

Test description

All the methods for isolating the characters in the license plate described in Chapter 3 has been tested. The test is a simple black box test. A series of input images is given to the algorithm, and the success of the test simply

Chapter 6. Isolation test

Figure 6.1: Example of unsuccessful and successful isolations

depends on the resulting output images. The test has been performed using the connected component method, the pixel count method and the method using static bounds. All methods are tested independently. Finally, the combined method also described in Chapter 3, is tested as well.

6.4

Test data

The test is performed on images received from a successful extraction process, meaning that each image does contain a readable license plate. As in the test of the extraction process described in Chapter 5, the isolation process was tested using two sets of input images. The first set consisted of the 36 images that were used as a training set throughout the project. The second set was the collection of 72 test images of which 70 images contain readable license plates. Therefore the isolation methods will be tested on 106 (36+70) images. All methods are tested both on images with no quality improvement (no preprocessing) as well as images which have undergone a quality improvement, such as cutting away irrelevant information or use of dynamic threshold, as described in Chapter 3.

6.5

Connected components

The first test was performed using the connected component method for the isolation of the characters, and the result is listed in Table 6.1. As described in the design chapter, the method finds connected regions in a binary image. First, the method was tested on images without using any form of quality improvement. The results proved almost identical for the two data sets. For both sets approximately 73% of the images were successfully divided into acceptable subimages. Next the border was removed, which did not affect the results at all. To improve the image quality further, a method for setting the threshold value dynamically, was applied. Thereby the quality of the binary images

6.6 Pixel count

Data set Method Training set Without preprocessing Border removed Dynamic threshold Both improvements Test set Without preprocessing Border removed Dynamic threshold Both improvements

Successful Percentage 26 of 36 72.2 % 26 of 36 72.2 % 33 of 36 91.7 % 33 of 36 91.7 % 52 of 70 74.3 % 52 of 70 74.3 % 68 of 70 97.1 % 68 of 70 97.1 %

Table 6.1: Result from the connected component test

improved drastically and this reflects clearly on the result of the test. Where a total of 78 images succeeded before, the improved method was able to divide another 23 images. for both sets, a total of 101 of the 106 images resulted in successful isolation. In general the resulting images from the method is of good quality, meaning that the bounds found by the method are accurate. From the test it is also clear that there is no remarkable difference in the result of the two data sets. The results are slightly better for the test set, but for both sets, a success rate of above 90 % is achieved. Figure 6.2 shows an example of a plate, that could not be successfully divided into 7 subimages using the connected component method. The reason is, that the third and fourth digit are part of the same component after the image has been thresholded, and therefore they cannot be isolated.

Figure 6.2: Plate that fails, using connected components

6.6

Pixel count

The next method tested was the pixel count method which simply searches for the characters by finding peaks and valleys in a horizontal projection. The method was tested on the same images as the previous method. The result of the test is shown in Table 6.2. The test performed on images with no improvement proved insufficient. Only 25 of the 106 images succeeded. Removing the border did not have any effect whatsoever.

Chapter 6. Isolation test

Data set Method Training set Without preprocessing Border removed Dynamic threshold Both improvements Test set Without preprocessing Border removed Dynamic threshold Both improvements

Successful Percentage 8 of 36 24.3 % 8 of 36 24.3 % 26 of 36 72.2 % 27 of 36 75.0 % 17 of 70 23.2 % 17 of 70 23.2 % 56 of 70 80.0 % 56 of 70 80.0 %

Table 6.2: Result from the pixel count test

The use of dynamic threshold gave a significantly better result. One image which succeeded before dynamic threshold was applied, failed, but a total of 72 images succeeded. This means, that using a combination of the output without improvement and the output when using dynamic thresholding, a total of 73 of 106 images resulted in a successful isolation. As in the previous test, there was no significant difference between the success percentage of the two data sets, when isolating without preprocessing. As with the connected components, the results were slightly better for the test set, when using dynamic thresholding. However, the 5 % difference is small enough to be caused by statistical uncertainty.

6.7

Static Bounds

Next, the method for setting the character bounds statically was tested. Since the method is based on dividing the plate at the same coordinates always, the method is independent of the quality of the image received from the plate extraction process. Therefore only one test was performed, using images with no quality improvement. The result of the test can be seen in Table 6.3. Data set Training set Test set

Successful Percentage 32 of 36 88.9 % 66 of 70 94.3 %

Table 6.3: Result from the static bounds test

Naturally the method always succeeds in dividing the image into seven subimages and thereby the first and the third criteria of success are always fulfilled. The success of the method depends merely on the quality of the isolated characters, meaning that the second criteria is fulfilled.

6.8 Combined method

The method succeeds in 98 of 106 images. The quality of the subimages is not as good as subimages from the connected component method, although in most cases they are useful. The image on the lefthand side of Figure 6.1 is an example of an unsuccessful subimage produced by the static bounds method. Once again, the test set provides the best results. This implies, that the training set was very diverse, so that the development of the algorithms was performed using ‘worse’ images, than those of the test set.

6.8

Combined method

Finally, the combined method described in Section 3.2.5 was tested. The method is a combination of all the methods that were tested above. As seen in Table 6.4, the result of the test is that all 106 images succeed, which indicates that the methods complement each other. Data set Training set Test set

Successful Percentage 36 of 36 100.0 % 70 of 70 100.0 %

Table 6.4: Result from the combined method

The images that failed both the connected component and the pixel count method, are all images that can be adequately divided by using static bounds. The connected component method produces the best output images since the bounds are very accurate, but when the method fails, both the pixel count method and static bounds proved useful.

6.8.1

Summary

The success criteria for the isolation of the characters in the license plate was that all seven characters should be isolated in the correct order and without being diminished in any way. The test shows that although the individual methods for isolating the characters described in Chapter 3 cannot perform the task by themselves, the combination was proven to be able to do this in all of the images used for testing.

Chapter

Identification test 7.1

Introduction

The final step in recognizing a license plate is identifying the single characters extracted in the previous steps. Two methods for doing this was presented earlier. The first that has been tested is the method based on statistical pattern recognition, and second the normalized cross correlation coefficient.

7.2

Criteria of success

For this test to be a success, as high a percentage as possible of the characters must be identified correctly. A few errors are acceptable, since a full implementation should be able to notify the operators, when it is unable to identify a character, allowing for manual identification. Since as many characters as possibly should be correctly identified, the success criteria for the character identification test is a hit rate approaching 100 %.

7.3

Test data

The test data originates from the 106 test pictures that were extracted and isolated during the previous tests. The actual test data are the single characters extracted from the license plate, as mentioned in Section 6.2. No attempts will be made to identify the letters in the plates using the statistical pattern recognition, since this method has only been designed with the numbers in mind. The letters are identified using template based identification, because here only a single template is needed, in contrast to statistical

Chapter 7. Identification test

pattern recognition where several observations are required.

7.4

Feature based identification

In the testing of the statistical pattern recognition, the numbers will be identified using both Euclidean and Mahalanobis distance. Otherwise the tests are performed in the exact same way, as described in the following section.

7.4.1

Test description

The test of feature based identification is performed by looking at what the different characters are identified as, and what they actually are. Furthermore, a different setting for the maximum correlation threshold is used when the SEPCOR algorithm is performed, thereby altering the number of features used. 100

Identification percentage

100

150

200

250

300

Number of features

Figure 7.1: Identification percentage plotted against number of features.

7.4.2

Result of test using Euclidian distance

Result of the test with Euclidean distance is summarized in Table 7.1, where different values of the max correlation have been tested. Table 7.2 shows the result of the test with the training set, and a max correlation of 1. When testing on the training set, an identification very close to 100 % of the digits is expected, since this is also the data, the features have been derived from.

100

7.4 Feature based identification

The test was performed using a maximum correlation of 1, and this yields an identification rate of 100 %. This was expected, since the system should be able to identify the characters it was trained with. Max. Features corr. used 0.1 5 0.2 13 0.3 25 0.4 42 0.5 69 0.6 96 0.7 142 0.8 210 0.9 262 1.0 306

Successful Percentage 130 of 350 37.1 % 247 of 350 70.6 % 267 of 350 76.3 % 316 of 350 90.3 % 337 of 350 96.3 % 338 of 350 96.6 % 344 of 350 98.3 % 344 of 350 98.3 % 345 of 350 98.6 % 345 of 350 98.9 %

Table 7.1: Result of the test on the test set

The best result obtained on the test set is an identification percentage of 98.9 %. As the maximum correlation rises, the recognition percentage rises too. This is plotted in Figure 7.1. Results Correct Training set size Percentage (%)

180 180 100.0

Table 7.2: Result of the test on the training set

As it can be seen in both Table 7.1 and Figure 7.1, the recognition percentage rises quickly when the maximum correlation is below 0.5, and then settles at about 98 %. The gain achieved from using a maximum correlation of 1 compared to 0.7 is only 0.57 %. But since there are no hard timing restrictions, the higher identification rate, although small, is preferable in this system.

(a)

(b)

Figure 7.2: (a) shows the unidentifiable character, and (b) the optimal result.

101

Chapter 7. Identification test

The number of errors using the value 0.7 are spread on three different license plates, and on two using 1 as maximum correlation. It should be noted, that using the value 0.7, one license plate was responsible for four of the six errors. Using a maximum correlation of 1, the same plate accounts for three out of four errors. An example of an error originating from this plate can be seen in Figure 7.2. This image actually produces some very nice subimages of the digits, but the digits in these images are a bit small. The digit in the image should fill the image horizontally, but fails to do so, probably because of the use of static bounds. This is believed to be the cause of the inability to identify the digits in this plate.

7.4.3

Result of test using Mahalanobis distance

The test using Mahalanobis distance is similar to the test described in the previous section. The only difference is that two tests are performed. In the first test, the training set was used for training and in the second, the test set was used for training. Both sets were tested in each test. Table 7.3 shows the results of the tests when run on the same set that was used for training using all features. As expected the identification percentage is very high. Results Correct Training set size Percentage (%)

Test 1 Test 2 175 333 180 350 97.2 95.1

Table 7.3: Result of the tests on the training sets

Table 7.4 shows the results of identifying the set not used for training. Not surprisingly, the identification percentage is lower when identifying the set not used for training. Also, contrary to the results with the Euclidean distance, adding more features does not always increase the identification percentage. The results of the tests are also shown in Figure 7.3 and Figure 7.4 as a function of the number of features used. Here it becomes even more apparent that a higher number of features does not always produce a better result. The reason seems to be that the SEPCOR algorithm sometimes removes good features, which although correlated with other selected features, had a positive impact on the identification percentage. These features are then replaced with other features, which provide an inferior result. Or to put it in other words: When removing features, the SEPCOR algorithm does not look

102

7.4 Feature based identification

Test 1 Max. corr. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Features used 5 12 27 50 85 123 173 233 275 306

Test 2

Correct (%) 165 of 350 47.1 133 of 350 38.0 181 of 350 51.7 229 of 350 65.4 236 of 350 67.4 249 of 350 71.1 268 of 350 76.6 270 of 350 77.1 299 of 350 85.4 301 of 350 86.0

Features used Successful (%) 7 75 of 180 41.7 13 163 of 180 90.6 31 116 of 180 64.4 47 146 of 180 81.1 73 115 of 180 63.9 106 125 of 180 69.4 160 150 of 180 83.3 208 166 of 180 92.2 258 161 of 180 89.4 306 162 of 180 90.0

Table 7.4: Result of the tests

Identification percentage

40 Training set Test set 30

100

150

200

250

300

Number of features

Figure 7.3: Result of identification using training set as training

at how good a feature is, that is how high its V-value is, before removing it. This means that any correlated feature no matter how good can potentially be removed. Comparing identification percentages with those from the Euclidean distance, it is clear that identification using the Euclidean distance produces better results. There can be several reasons as to why this is the case. First, the amount of data used for training the system in the two tests might have been

103

Chapter 7. Identification test

too small, although the tests actually show that identification using the smaller training set performs better. This might be a coincidence though. Secondly, the assumption that the distribution of the data is normal might be wrong, and as mentioned in Section 4.6.3 not all features can be considered as normal distributed. 100

Identification percentage

40 Training set Test set 30

100

150 Number of features

200

250

300

Figure 7.4: Result of identification using test set as training

7.5

Identification through correlation

Unlike the previous tests, this time the identification is performed on both the digits and letters in the license plates.

7.5.1

Test description

Each character is checked against the appropriate template pool, one containing the templates for the letters and the the other for the digits, and by looking at the calculated coefficients it is determined which character is in the image.

7.5.2

Result of test

The result of the test can be seen in Table 7.5. In the case of the letters, a success rate lower than that of the digits was expected, as more work has been

104

7.5 Identification through correlation

Data set Test set

Letters Digits Total for test set Training set Letters Digits Total for training set

Successful Percentage 126 of 140 90.0 % 344 of 350 98.3 % 470 of 490 95.9 % 54 of 72 75.0 % 170 of 180 94.4 % 224 of 252 88.9 %

Table 7.5: Identifying the characters using templates.

put into the construction of the digit templates. Also there are more different letters than digits. Many of the errors were due to poor quality of the input images. The major problems are connected to the size of the input images and the overall brightness of the images. Using the plate in Figure 7.5 as an example, it is noted that in spite of the small size of the plate in the image, the plate is extracted and the characters are isolated without problems. The character images are however of such a poor quality that the correlation identification fails.

Figure 7.5: The size problem illustrated.

Many of the extracted license plates were so dark that no proper threshold value could be found to remove the background of the plate and make the characters stand out in a satisfactory way. The plate in Figure 7.6 serves as an example of such a plate. The problem is exaggerated in that image, as a better isolation of the characters can be achieved using dynamical thresholding but it illustrates the problem with this type of plates. Almost half of the errors in the training set occur in plates with more than one error, indicating that the input images from that plate is not of a sufficient quality. Many of the other errors can also be attributed to these two factors as the plates are borderline examples of the situations. Also there is

105

Chapter 7. Identification test

Figure 7.6: The brightness problem exaggerated.

no pattern in the types of errors. No letter or digit is represented noticeably more than others, and all characters involved in errors have at some point been correctly identified. This also indicates that the input plates are not of a sufficient quality. In the test set, the errors were distributed so that only one error occurred in each plate. It can be concluded, that template identification of the license plate letters cannot fulfill the purpose is was intended for, without a serious improvement of the templates. The identification of the digits is satisfactory.

7.6

Summary

Although the hit rates are almost the same in the two of the tests, Euclidean distance and template matching, three out of four errors in feature based identification are to be found in the same plate, whereas the errors in the template method are to be found in different plates. This supports what was stated in Section 4.7, where the sensitivity to noise of the feature based method was discussed. In Section 4.7, it was also stated that template matching was less susceptible to noise prone errors than feature based identification. This is hard to see from these results, as the template method hit rate is lower. This indicates that it is in fact the chosen features who are less susceptible to noise than the template strategy. All in all, the conclusion of the tests must be, that the feature based identification suits our purpose the best, but that identification using the Mahalanobis distance either needs a larger training set or that the assumption that the data is normal distributed is not entirely correct.

106

Chapter

System test 8.1

Introduction

In the previous test chapters, the components of the system were examined separately in terms of performance. It is also relevant to test the combination of the components. When testing the individual components, only useful input was used. In real life, an error made in the license plate isolation will ripple through the system, and affect the overall performance. This chapter will describe the tests performed on the final system.

8.2

Criteria of success

The ultimate goal of the program is to take an image containing a vehicle as input, and output the registration number of the identification plate of the vehicle. The identification of numbers was implemented using both feature based identification and template matching. The identification of letters was only implemented in the template matching, and therefore the letters are discarded when deciding whether or not a license plate have been correctly identified. A successful processing of an image is defined as a processing that extracts the license plate, isolates all seven characters, and identifies all the digits.

8.3

Test data

The test was performed using all of the test images. The training images were not included in the test, since it is more interesting to see, how well the system performs when confronted with some previously unknown input images.

107

Chapter 8. System test

8.4

Test description

The test was run using region growing for license plate extraction, since it proved to be the most robust method (Section 5.2.4). For isolating the characters, the combined method described in Section 3.2.5 was used, which should prove sufficient, since the results demonstrated in Section 6.8 were quite convincing. For identifying the characters, three possibilities were tested, since the results were much closer than those of region growing and Hough transform.

8.5

Results

The results obtained for this test are seen in Table 8.1. Region growing combined with the method for finding the most probable region, found a total of 57 license plates. When using both region growing and Hough transform, the number of license plates found was actually slightly lower, due to the extra amount of random regions to be sorted out. Therefore, the remaining algorithms automatically got these 57 license plates, and 15 random regions as input. Of course, it is not very interesting to see what the further results will be for the 15 random regions. The isolation of the individual characters worked in all of the 57 license plate regions. Data set Test set

Training set Maximum overall successful plates

Method License plate extraction Character isolation Character identification - Features - Euclidean - Features - Mahalanobis - Correlation Not performed

Successful Percentage 57 of 72 79.2 % 57 of 57 100.0 % 55 of 57 33 of 57 50 of 57

96.5 % 57.9 % 87.2 %

55 of 72

76.4 %

Table 8.1: Overall result for the system

As mentioned, all three methods were tested for the identification of the characters. Here Mahalanobis turned out to provide the least impressive results. It turned out, that in many of the plates only a single character was misidentified, but in this test, such a plate is considered a failure.

108

8.6 Summary

The correlation scheme failed in seven of the successfully extracted plates, thus increasing the total amount of correctly identified plates to 50. The feature based method using Euclidean distance proved to be the superior method, given the features that were selected in the system. Here only two plates failed, raising the overall success rate to 76.4 %. The two plates that were wrongly identified, also were not found by the other two methods, and the reason is probably, that they contain black spots due to the bolts (see Figure 8.1).

Figure 8.1: Good quality license plates wrongly identified

On the other hand, several poor license plate images were correctly identified. Two examples of such plates are shown in Figure 8.2. These also contain marks from the bolts, but the classifier is still capable of providing the correct number for each character.

Figure 8.2: Poor quality license plates correctly identified

8.6

Summary

The test of the overall system established, that by using region growing for license plate extraction and a Euclidean distance measure for the feature based identification, a total of 76.4 % of the test images were successfully identified. The main cause of the failures was the license plate extraction, which caused 15 of the 17 errors. Since it was shown in Section 5.2.4, that 70 license plates were actually found, the algorithm for extracting the most probable region throws away 13 license plates in favor of regions containing random information. The results could be further enhanced, if some restrictions were imposed on the input images. If, for instance, the images were taken in approximately the same distance from the vehicle (such as in the currently used system), the extraction of the license plate would be significantly easier to accomplish.

109

Part IV Conclusion

111

Chapter

Conclusion The purpose of this project has been to investigate the possibility of making a system for automatic recognition of license plates, to be used by the police force to catch speed violators. In the current system, manual labor is needed to register the license plate of the vehicle violating the speed limit. The majority of the involved tasks are trivial, and the extensive and expensive police training is not used in any way. Therefore an automatic traffic control system, eliminating or at least reducing these tasks has been proposed. We wanted to investigate the possibility of making such an automatic system, namely the part dealing with automatically recognizing the license plates. Given an input image, it should be able to first extract the license plate, then isolate the characters contained in the plate, and finally identify the characters in the license plate. For each task, a set of methods were developed and tested. For the extraction part, the Hough transform, and in particular the region growing method proved capable of extracting the plate from an image. Both of them locate a large number of candidate regions, and to select between them, template matching was utilized. Template matching, combined with heightwidth ratio and peak-valley methods, provided a successful method of selecting the correct region. The methods developed for isolating the characters proved very reliable. The method for finding character bounds using an algorithm that searches for connected components proved to be the most useful, and combined with pixel count and static bounds, the method proved to be extremely successful. For the actual identification process, two related methods were developed; one based on statistical pattern recognition and the other on template matching. Both methods proved to be highly successful, with feature based identification slightly better than template matching. This is because more features are included in the statistical pattern recognition, and therefore this method

113

Chapter 9. Conclusion

was chosen to identify the digits on the plate. No attempts for identifying the letters were made using features based identification, and this is the only reason why template matching was used to identify the letters. In order for the system to be useful, it should be able to combine the three different tasks, and to recognize the license plates in a high percentage, so the use of manual labor is reduced as much as possible. This implies, that the success rate for the individual parts should be close to 100 %. The results obtained are summarized in Table 9.1. Mission Plate extraction Character isolation Character identification Overall performance

Success rate Successful 98.1 % ✓ 100 % ✓ 98.9 % ✓ 76.4 % ✓

Table 9.1: Main results obtained.

As can be seen, the individual parts perform very satisfactory, all with a success rate close to 100 %. The plate extraction succeeds in 98.1 % of the test images, and this is a very high success rate. The extraction fails in only two of the images. This is acceptable, since the extraction works in more than 98 % of the images, thereby fulfilling the criteria of this task. The part of isolating the characters contained in the license plate succeeds in 100 % of the cases, and thus is very successful, achieving the goal set for this task. Out of the isolated digits, 98.9 % were correctly identified. This is also a very high success rate, and it must be taken into account, that the wrongly identified digits originates from only two plates. The overall performance is not as high as for the individual tasks, but still a large amount of license plates is correctly identified, namely 76.4 %. In general, the conclusion of this report is, that a system for automatic license plate recognition can be constructed. We have successfully designed and implemented the key element of the system, the actual recognition of the license plate. The main theme of the semester was gathering and description of information, and the goal is to collect physical data, represent these symbolically, and demonstrate processing techniques for this data. This goal has been accomplished in this project.

114

Part V Appendix

115

Appendix

Videocamera As described in Section 1.2, a standard video camera is to be used for obtaining the data required to recognize the license plates of the vehicles. There are certain requirements for this camera regarding resolution, color, and light sensitivity. Resolution A certain resolution is required in order to distinguish between the individual characters of a license plate. Generally, raising the resolution will provide better images. This improvement in quality comes at the cost of larger amounts of data, and more expensive equipment. Color Since Danish license plates are either yellow or white with black characters, no color is needed to separate characters from the background. However, most modern equipment defaults to color images. This does not pose a problem, as a software conversion to gray scale images is possible. As with the resolution, an improvement in color depth will result in more data to be processed. Light sensitivity Since the equipment will be placed outdoors, weather will have a large impact on the images. Especially the camera should be able to automatically adjust the light sensitivity to the conditions. If the light sensitivity is either too high or too low, the images will tend to become blurry, so that even high-resolution images are useless. A camera with good ability to obtain sharp images at varying lighting conditions, is said to have a large dynamic range.

117

Appendix A. Videocamera

A.1

Physical components

All of the factors described in the previous section depend upon the physical components of the camera. These components include lens, shutter, and CCDchip (Charge-Coupled Device). Lens The camera lens sharpens images by focusing the light from the source at a particular point, where the CCD-chip converts the light to an electrical signal. Usually a camera contains several different lenses, to compensate for e.g. the fact, that light at different wavelengths diffracts differently. Lenses are also used for zooming. Shutter The shutter is simply a device, which lets different amounts of light pass into the camera, depending upon the overall brightness of the surroundings. Usually the shutter consists of six ‘plates’, that synchronously move away from, or closer to, each other. CCD-chip The CCD-chip converts an amount of light to an electrical signal. In its most simple design, no concern is given to the color of the absorbed light. The more sophisticated color CCD-chips work in one of two ways: Either the light is spread into its three base colors by lenses and absorbed by a CCD-chip for each color, or the CCD-chip uses a color filtered array such as the Bayer pattern (see Figure A.1) to distinguish between colors.

Figure A.1: Color aliasing using the Bayer pattern

The chips using such techniques will experience a phenomenon called color aliasing, also seen in Figure A.1 The image is constructed by using shifting registers as seen in Figure A.2. The bottom line is ‘dropped’ into a buffering register, and the value of each pixel in this line is sequentially written to an output channel.

118

A.1 Physical components

When an entire line has been written to output, a new line is shifted downwards and so forth.

Figure A.2: CCD-chip with registers

119

Appendix

Visiting the police force In order to get an understanding of the case, we visited the local police department, so that we could see first hand how the system currently being used operates. The presentation of the system was split into two parts. First we saw how the pictures are processed at the local police office, and the different stages were explained. Secondly, we saw the recording of the images, as we drove out to one of the surveillance vans. Here we got a feel for the circumstances under which the images are recorded.

B.1

The system at the precinct

The first part of the visit was the police office where we saw a presentation of how the pictures are processed. In the office of the division responsible for processing the pictures, two people worked with registering the license plates. When the system is expanded to the entire northern Jutland, Aalborg will still be the only place the registration takes place. The jobs of the people working in Aalborg begins when receiving a disk containing the pictures taken during a surveillance session. Each picture is processed in turn in the same way. In the top of the picture information concerning the location of the surveillance is stored, and this information is registered along side the license plate. This is information about specific location, type of road monitored and out of the ordinary speed limits. Also the speed of the recorded vehicle is checked. If the speed of the vehicle is below the speed limits, naturally the picture is discarded. After this information has been logged, the license plate is registered. The images are often blurred and it is necessary to perform some image processing in order for the plate to be readable. The license plate is selected by setting

121

Appendix B. Visiting the police force

a rectangular region of interest, the selection is enlarged and the enlarged plate is sharpened through histogram equalization. A default setting is applied automatically, but if the result is unsatisfactory the settings can be changed. The same is done for the operator of the vehicle, because the driver has to be clearly identifiable according to Danish traffic legislation. In addition, the faces of any passengers must be concealed. If the driver of the vehicle does not have a drivers license he will be fined accordingly, but any other misdemeanors cannot be addressed. This means that an offense that might be finable if the car was pulled over is ignored. This includes violations often easily observed on the pictures, such as driving while talking on the cell phone or driving without using the seat belt. What happens to foreign traffic offenders, depends on where they are from. Ideally, if the offender is from one of the Nordic countries or any other country the Danish police cooperates with, the case is turned over to them. In many of the other countries they could not be bothered with a Danish traffic violation. In any case, if no such international arrangement exits, the offender is not fined. Once the information needed to issue the fine has been extracted from the image, it is sent on to a second registrant, where the information is double checked, and the plate is looked up in the Danish Motorized Vehicles registry. If for obvious reasons the driver can be ruled out as the owner of the vehicle, the owner must inform the police of who drove the vehicle at the time of the offense, or be fined himself.

B.2

The system in the field

The system is fitted into a van, which can be parked alongside the road. Since the speed estimator is radar based, it needs to overlook a certain length of straight road, in order to be able to determine the speed of the passing vehicle. Behind the van the road needs to be straight for approximately 40 to 60 meters. Parking the van is no trivial matter. The position of the van in relation to the road has to be very accurately parked parallel and at most 3.5 meters from the curb. The way this is done is that a ring hangs from the ceiling of the van and a small marker is painted on the windshield. Through the ring the marker is aimed at a land marking stick held by another officer. The way the van is parked enables it to monitor and capture four lanes at the same time, of course only monitoring a single vehicle at a time. Once the van has been parked, the system must be calibrated. The camera is adjusted to give the best images under the given conditions. Also, the

122

B.3 Evaluation of the current system

speed limit to trigger the camera is set and the address of its position is also registered. When setting the “trigger-speed-limit” a 3 km/h insecurity of the radar is taken into account, and only offenders driving more than 10% faster than the speed limit of the road are fined. This means that for a section of road with a speed limit of 50 km/h the trigger speed limit is set to 59 km/h. Having initialized the system no further human involvement is needed. The need for an officer to be present is based on the fact that different vehicle types have different speed limits. If busses and lorries are to be fined he has to manually trigger the camera when they pass by. Actually, what he does is switching the system to a different trigger speed limit. The radar is a sensitive piece of equipment. In addition to the 3 km/h margin of error it is also affected by acceleration and deceleration. The use of radar for measuring the speed of the vehicles also restricts the locations available for surveillance. Large solid objects such as billboards can confuse the radar.

B.3

Evaluation of the current system

This system was introduced quite recently. Actually it has only been implemented in a few districts, and for testing purposes only. Up until now, the system has proved to be a success. With over 100 fines issued a day in Aalborg and a national average of fines on 798 DKR, the system is quite profitable. The equipment needed is very expensive. A fully equipped van runs at about 1.000.000 DKR, where the actual price of the van is only about 30%. As mentioned the system is still in the testing phase, only a limited number vans have been equipped. This fact combined with the limited number of places suited for surveillance means that people are starting to recognize the vans. Also, the system is not useable on freeways. The only restriction regarding the use on freeways, is that it is not possible to set the type of road to a freeway. This is caused by the fact, that the system has no option for setting the speed limit to 110 km/h. Therefore the system can not issue the appropriate fine. The effect of the system is unquestionable. After having been used for a while the amount of speeding offenders drops. This is not only because the van becomes recognizable. Cables buried under the asphalt in the cities monitoring the traffic flow, show that the overall number of vehicles speeding drops.

123

Appendix

Contents of CD This purpose of this appendix is to give an overview of the attached CD. Below, a quick overview of the contents is displayed: Directory: CD Acrobat Reader 5.05 Referenced web pages Images Test Training Source code Final Image Misc Installation Required DLL’s All of the instructions required to install and use the program are included on the CD. If a browser window does not appear automatically when inserting the CD, manually start your browser and select the file index.html in the root of the CD. The report is also available for viewing via a link in the index.html file.

C.1

The program

The program was written in a Windows environment, since there are several well-documented image processing libraries available for Windows. Also, it was chosen to ease the process of creating a graphical user interface for testing purposes. The code was written in Microsoft Visual C++, and the source code folder

125

Appendix C. Contents of CD

on the CD also contains the project files used when building the executable. The libraries imported were Intel Image Processing Library (IPL) and Intel JPEG Library (IJL). Also, a package called Open Computer Vision (OpenCV) was utilized. All of these libraries contain routines for manipulating images, including morphology operations, per-pixel operations and color manipulation.

126

Part VI Literature

127

Bibliography

[1] David R. Anderson, Dennis J. Sweeney and Thomas A. Williams Statistics for business and economics, 8. ed. 2002, South-Western ISBN 0-324-06672-4 [2] Vejledning for statens bilinspektion http://www.fstyr.dk/fagomraader/krav/pdf sbi/afsnit16.pdf November 2001 [3] Bernhard Jähne Digital Image Processing 1993, Springer-Verlag ISBN 3-540-53782-1 [4] Rafael C. Gonzalez, Richard E. Woods Digital Image Processing 1993, Addison-Wesley ISBN 0-201-60078-1 [5] J.P. Lewis http://www.idiom.com/˜zilla/Work/nvisionInterface/nip.html Industrial Light & Magic [6] Moritz Störing and Thomas B. Moeslund An Introduction to Template Matching 2001, Technical Report CVMT ISSN 0906-6233 [7] Gorm Jespersen Visit at Aalborg Police Department 13/03-2002

129

BIBLIOGRAPHY

[8] Trafikudvalgets evaluering af mål om nedbringelse af antal ulykker http://www.folketinget.dk/Samling/21001/udvbilag/TRU/Almdel bilag611.htm [9] Richard O. Duda, Peter E. Hart Pattern Classification and Scene Analysis 1973, John Wiley & Sons [10] David C. Lay Linear Algebra and Its Applications, Second Edition 2000, Addison-Wesley ISSN 0201824787

130