ScottKrig KrigResearch,USA
ISBN978-3-319-33761-6ISBN978-3-319-33762-3(eBook)
DOI10.1007/978-3-319-33762-3
LibraryofCongressControlNumber:2016938637
# SpringerInternationalPublishingSwitzerland2016
ThisSpringerimprintispublishedbySpringerNatureThisworkissubjecttocopyright.Allrights arereservedbythePublisher,whetherthewholeorpartofthematerialisconcerned,specifically therightsoftranslation,reprinting,reuseofillustrations,recitation,broadcasting,reproduction onmicrofilmsorinanyotherphysicalway,andtransmissionorinformationstorageand retrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped.
Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthis publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesare exemptfromtherelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationin thisbookarebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernor theauthorsortheeditorsgiveawarranty,expressorimplied,withrespecttothematerial containedhereinorforanyerrorsoromissionsthatmayhavebeenmade.
Printedonacid-freepaper
ThisSpringerimprintispublishedbySpringerNature
TheregisteredcompanyisSpringerInternationalPublishingAGSwitzerland
Thegoalofthissecondversionistoaddnewmaterialsondeeplearning, neuroscienceappliedtocomputervision,historicaldevelopmentsinneural networks,andfeaturelearningarchitectures,particularlyneuralnetwork methods.Inaddition,thissecondeditioncleansupsometyposandother itemsfromthefirstversion.Intotal,threenewchaptersareaddedtosurvey thelatestfeaturelearning,andhierarchicaldeeplearningmethodsand architectures.Overall,thisbookisprovidesawidesurveyofcomputervision methodsincludinglocalfeaturedescriptors,regionalandglobalfeatures,and featurelearningmethods,withataxonomyfororganizationalpurposes. Analysis isdistributedthroughthebooktoprovide intuition behindthe variousapproaches,encouragingthereadertothinkforthemselvesabout themotivationsforeachapproach,whydifferentmethodsarecreated,how eachmethodisdesignedandarchitected,andwhyitworks.Nearly1000 referencestotheliteratureandothermaterialsareprovided,makingcomputervisionandimagingresourcesaccessibleatmanylevels.
Myexpectationforthereaderisthis: ifyouwanttolearnabout90%of computervision,readthisbook.Tolearntheother10%,readthereferences providedandspendatleast20yearscreatingrealsystems. Readingthis bookwilltakeamatterofhours,andreadingthereferencesandcreatingreal systemswilltakealifetimetoonlyscratchthesurface.Wefollowtheaxiom oftheeminentDr.JackSparrow,whohasnotimeforextraneousdetails,and hereweendeavortopresentcomputervisionmaterialsinafashionthat makesthefundamentalsaccessibletomanyoutsidetheinnercirclesof academia:
“Ilikeit.Simple,easytoremember”. JackSparrow,PiratesoftheCaribbean
Thisbookissuitableforindependentstudy,reference,orcourseworkat theuniversitylevelandbeyondforexperiencedengineersandscientists.The chaptersaredividedinsuchawaythatvariouscoursescanbedevisedto incorporateasubsetofchapterstoaccommodatecourserequirements.For example,typicalcoursetitlesinclude“ImageSensorsandImage Processing,”“ComputerVisionAndImageProcessing,”“AppliedComputer VisionAndImagingOptimizations,”“FeatureLearning,DeepLearning, andNeuralNetworkArchitectures,”“ComputerVisionArchitectures,” “ComputerVisionSurvey.”Questionsareavailableforcourseworkatthe endofeachchapter.Itisrecommendedthatthisbookbeusedasa v
ForewordtotheSecondEdition
complement tootherfinebooks,opensourcecode,andhands-onmaterials forstudyincomputervisionandrelatedscientificdisciplines,orpossibly usedbyitselfforahigher-levelsurveycourse.
Thisbookmaybeusedasrequiredreadingtoprovideasurvey componenttoacademiccourseworkforscienceandengineering disciplines,tocomplementothertextsthatcontainhands-onand how-tomaterials.
ThisbookDOESNOTPROVIDEextensivehow-tocodingexamples, workedoutexamples,mathematicalproofs,experimentalresultsand comparisons,ordetailedperformancedata,whicharealreadyverywell coveredinthebibliographyreferences.Thegoalistoprovideananalysis acrossarepresentativesurveyofmethods,ratherthanrepeatingwhatis alreadyfoundinthereferences.Thisisnotaworkbookwithopensource code(onlyalittlesourcecodeisprovided),sincetherearemanyfineopen sourcematerialsavailablealready,whicharereferencedfortheinterested reader.
Instead,thisbookDOESPROVIDEanextensivesurvey,taxonomy,and analysisofcomputervisionmethods.Thegoalistofindthe intuition behind themethodssurveyed. Thebookismeanttoberead,ratherthanworked through.Thisisnotaworkbook,butisintendedtoprovidesufficientbackgroundforthereadertofindpathwaysforwardintobasicorappliedresearch foravarietyofscientificandengineeringapplications. Insomerespects,this workisamuseumofcomputervision,containingconcepts,observations, oddments,andrelicswhichfascinateme.
Thebookisdesignedtocomplementexistingtextsandfillanicheinthe literature.Thebooktakesacompletepaththroughcomputervision,beginningwithimagesensors,imageprocessing,global-regional-localfeature descriptormethods,featurelearninganddeeplearning,neuralnetworksfor computervision,groundtruthdataandtraining,appliedengineering optimizationsacrossCPU,GPU,andsoftwareoptimizationmethods.The authorcouldnotfindasimilarbook,otherwiseIwouldnothavebegun thiswork.
Thisbookaimsatasurvey,taxonomy,andanalysisofcomputervision methodsfromtheperspectiveofthefeaturesused—thefeaturedescriptors themselves,howtheyaredesignedandhowtheyareorganized.Learning methodsandarchitecturesarenecessaryandsupportingfactors,andare includedhereforcompleteness.However,Iampersonallyfascinatedby thefeaturedescriptormethodsthemselves,andIregardthemasan art-formformathematicallyarrangingpixelpatterns,shapes,andspectrato revealhowimagesarecreated.Iregardeachfeaturedescriptorasaworkof art,likeapaintingormathematicalsculpturepreparedbyanartist,andthus theperspectiveofthisworkistosurveyfeaturedescriptorandfeature learningmethodsandappreciateeachone.
Asshowninthisbookoverandoveragain,researchersarefindingthata widerangeoffeaturedescriptorsareeffective,andthatoneofthekeysto bestresultsseemstobethesheernumberoffeaturesusedinfeature hierarchies,ratherthanthechoiceofSIFTvs.pixelpatchesvs.CNN features.Inthesurveyshereinweseethatmanymethodsforlearningand
vi ForewordtotheSecondEdition
trainingareused,manyarchitecturesareused,andtheconsensusseemstobe thathierarchicalfeaturelearningisnowthemainstayofcomputervision, followingonfromthepioneeringworkinconvolutionalneuralnetworksand deeplearningmethodsappliedtocomputervision,whichhasaccelerated sincethenewmillennium.Theoldercomputervisionmethodsarebeing combinedwiththenewerones,andnowapplicationsarebeginningtoappear inconsumerdevices,ratherthanexoticmilitaryandintelligencesystemsof thepast.
SpecialthankstoCourtneyClarkeatSpringerforcommissioningthis secondversion,andprovidingsupportandguidancetomakethesecond versionbetter.
Specialthankstoallthewonderfulfeedbackonthefirstversion,which helpedtoshapethissecondversion.VinRatfordandJeffBierofthe EmbeddedVisionAlliance(EVA)arrangedtoprovidecopiesofthefirst versiontoallEVAmembers,bothhardcopyande-bookversions,and maintainedafeedbackwebpageforreviewcomments—muchappreciated. ThankstoMikeSchmidtandVadimPisarevskyforexcellentreview commentsoveroftheentirebook.JuergenSchmidhuberprovidedlinksto historicalinformationonneuralnetworksandotherusefulinformation, KunihikoFukushimaprovidedcopiesofsomeofhisearlyneuralnetwork researchpapers,RahulSuthankarprovidedupdatesonkeytrendsincomputervision,andHugoLaRochelleprovidedinformationandreferenceson CNNtopicsandPatrickCoxonHMAXtopics.Interestinginformationwas alsoprovidedbyRobertGens,AndrejKarpathy.AndIwouldliketore-thank thosewhocontributedtothefirstversion,includingPaulRosinregarding syntheticinterestpoints,YannLeCunforprovidingkeyreferencesintodeep learningandconvolutionalnetworks,ShreeNayarforpermissiontousea fewimages,LucianoOviedoforblueskydiscussions,andmanyotherswho haveinfluencedmythinkingincludingAlexandreAlahi,SteveSeitz,Bryan Russel,LiefengBo,XiaofengRen,GutembergGuerra-filho,Harsha Viswana,DaleHitt,JoshuaGleason,NoahSnavely,DanielScharstein, ThomasSalmon,RichardBaraniuk,CarlVodrick,Herve ´ Je ´ gou,Andrew Richardson,OfriWeschler,HongJiang,AndyKuzma,MichaelJeronimo, EliTuriel,andmanyotherswhomIhavefailedtomention.
Asusual,thankstomywifeforpatiencewithmyresearch,andalsofor providingthe“governor”switchtopacemywork,withoutwhichIwould likelyburnoutmorecompletely.Andmostofall,specialthankstothegreat inventorwhoinspiresusall,AnnoDomini2016.
ForewordtotheSecondEdition vii
ScottKrig
Contents 1ImageCaptureandRepresentation ...................1 ImageSensorTechnology... .........................1 SensorMaterials .................................2 SensorPhotodiodeCells...........................2 SensorConfigurations:Mosaic,Foveon,BSI.. ..........4 DynamicRange,Noise,SuperResolution..............5 SensorProcessing ................................5 De-Mosaicking ..................................6 DeadPixelCorrection. ............................6 ColorandLightingCorrections... ...................6 GeometricCorrections....................... .....6 CamerasandComputationalImaging. ..................7 OverviewofComputationalImaging..................7 Single-PixelComputationalCameras...... ............8 2DComputationalCameras... .....................9 3DDepthCameraSystems .........................10 3DDepthProcessing ...............................21 OverviewofMethods. ............................22 ProblemsinDepthSensingandProcessing.. ...........22 MonocularDepthProcessing. .......................27 3DRepresentations:Voxels,DepthMaps,Meshes, andPointClouds..................................30 Summary.. ......................................31 Chapter1:LearningAssignments... ...................33 2ImagePre-Processing ..............................35 PerspectivesonImageProcessing......................35 ProblemstoSolveDuringImagePreprocessing............36 VisionPipelinesandImagePreprocessing.. ............36 Corrections....................................38 Enhancements ..................................38 PreparingImagesforFeatureExtraction...............39 TheTaxonomyofImageProcessingMethods .............43 Point. ........................................44 Line ..........................................44 Area .........................................44 Algorithmic ....................................45 ix
DataConversions ................................45 Colorimetry......................................45 OverviewofColorManagementSystems ..............46
....47 DeviceColorModels .............................47
...................48 GamutMappingandRenderingIntent .................48 PracticalConsiderationsforColorEnhancements.........49 ColorAccuracyandPrecision.......................50
................................50
KernelFilteringandShapeSelection ..................52 PointFiltering. .................................53 NoiseandArtifactFiltering .........................54 IntegralImagesandBoxFilters......................55 EdgeDetectors ....................................56
TransformFiltering,Fourier,andOthers.................58 FourierTransformFamily..........................58 MorphologyandSegmentation.. ......................61 BinaryMorphology..... .........................62 GrayScaleandColorMorphology ...................63 MorphologyOptimizationsandRefinements. ...........63 EuclideanDistanceMaps..........................63 Super-pixelSegmentation... .......................64 DepthSegmentation. .............................65 ColorSegmentation... ...........................66 Thresholding.. ...................................66 GlobalThresholding..............................67 LocalThresholding. ..............................70 Summary.. ......................................72 Chapter2:LearningAssignments... ...................73
.......................75 HistoricalSurveyofFeatures.........................75
TexturalAnalysis ................................78 StatisticalMethods ...............................80 TextureRegionMetrics... ..........................81 EdgeMetrics...................................82 Cross-CorrelationandAuto-correlation................83 FourierSpectrum,Wavelets,andBasisSignatures........84 Co-occurrenceMatrix,HaralickFeatures..... ..........85 LawsTextureMetrics.............................93 LBPLocalBinaryPatterns .........................94 DynamicTextures...............................95 x Contents
Illuminants,WhitePoint,BlackPoint,andNeutralAxis
ColorSpacesandColorPerception
SpatialFiltering...
ConvolutionalFilteringandDetection.................50
KernelSets:Sobel,Scharr,Prewitt,Roberts, Kirsch,Robinson,andFrei–Chen ....................56 CannyDetector.................................57
3GlobalandRegionalFeatures
KeyIdeas:Global,Regional,andLocalMetrics..........76
StatisticalRegionMetrics............................96 ImageMomentFeatures...........................96 PointMetricFeatures.............................97 GlobalHistograms...............................98 LocalRegionHistograms ..........................99 ScatterDiagrams,3DHistograms.. ..................99 Multi-resolution,Multi-scaleHistograms ...............102 RadialHistograms... ............................103 ContourorEdgeHistograms ........................104 BasisSpaceMetrics ................................104 FourierDescription ...............................107 Walsh–HadamardTransform ........................108 HAARTransform ................................108 SlantTransform... ..............................108 ZernikePolynomials ..............................109 SteerableFilters .................................109 Karhunen–LoeveTransformandHotellingTransform.....110 WaveletTransformandGaborFilters.................110 HoughTransformandRadonTransform...............112 Summary.. ......................................113 Chapter3:LearningAssignments... ...................114 4LocalFeatureDesignConcepts ......................115 LocalFeatures ....................................115 Detectors,InterestPoints,Keypoints,AnchorPoints, Landmarks.....................................116 Descriptors,FeatureDescription,FeatureExtraction... ....116 SparseLocalPatternMethods.......................117 LocalFeatureAttributes .............................117 ChoosingFeatureDescriptorsandInterestPoints.........117 FeatureDescriptorsandFeatureMatching ..............118 CriteriaforGoodness.. ...........................119 Repeatability,Easyvs.HardtoFind..................120 Distinctivevs.Indistinctive.........................120 RelativeandAbsolutePosition...... ................120 MatchingCostandCorrespondence...................120 DistanceFunctions............ .....................121 EarlyWorkonDistanceFunctions...................121 EuclideanorCartesianDistanceMetrics ...............122 GridDistanceMetrics .............................124 StatisticalDifferenceMetrics.......................124 BinaryorBooleanDistanceMetrics ..................125 DescriptorRepresentation... .........................126 CoordinateSpaces,ComplexSpaces ..................126 CartesianCoordinates.............................127 PolarandLogPolarCoordinates.....................127 RadialCoordinates ...............................127 Contents xi
DescriptorDensity.
LocalBinaryDescriptorPoint-PairPatterns...............134
SphericalCoordinates.............................128 GaugeCoordinates ...............................128 MultivariateSpaces,MultimodalData.................128 FeaturePyramids................................129
................................129 InterestPointandDescriptorCulling..................129 Densevs.SparseFeatureDescription.................130 DescriptorShapeTopologies.........................130 CorrelationTemplates. ............................131 PatchesandShape ...............................131 ObjectPolygonShapes.. ..........................133
FREAKRetinalPatterns.. .........................135
ORBandBRIEFPatterns
.........................138
.........139
...................146 DenseSearch ...................................146 GridSearch....................................146 Multi-scalePyramidSearch... .....................147
....................147 FeaturePyramids................................149
................150
LimitedSearch.........................151 ComputerVision,Models,Organization .................151 FeatureSpace
ObjectModels..................................152 Constraints.. ...................................154
...................154
xii Contents
BriskPatterns... ................................136
..........................137 DescriptorDiscrimination...
SpectraDiscrimination............................138 Region,Shapes,andPatternDiscrimination....
GeometricDiscriminationFactors... .................140 FeatureVisualizationtoEvaluateDiscrimination.........140 Accuracy,Trackability. ...........................143 AccuracyOptimizations,SubregionOverlap,Gaussian Weighting,andPooling...........................145 Sub-pixelAccuracy..............................145 SearchStrategiesandOptimizations.
ScaleSpaceandImagePyramids.
SparsePredictiveSearchandTracking ................150 TrackingRegion-LimitedSearch.....
SegmentationLimitedSearch.......................150 Depthor Z
...................................152
SelectionofDetectorsandFeatures
OverviewofTraining..... ........................155 ClassificationofFeaturesandObjects .................156 FeatureLearning,SparseCoding, ConvolutionalNetworks...........................161 Summary.. ......................................164 Chapter4:LearningAssignments... ...................165
5TaxonomyofFeatureDescriptionAttributes ............167 GeneralRobustnessTaxonomy ........................170 GeneralVisionMetricsTaxonomy.. ...................173 FeatureMetricEvaluation...........................182 SIFTExample. .................................183 LBPExample ...................................184 ShapeFactorsExample ............................184 Summary.. ......................................185 Chapter5:LearningAssignments... ...................186 6InterestPointDetectorandFeatureDescriptorSurvey ....187 InterestPointTuning ...............................188 InterestPointConcepts ..............................189 InterestPointMethodSurvey. ........................191 LaplacianandLaplacianofGaussian..................192 MoravacCornerDetector ..........................192 HarrisMethods,Harris–Stephens,Shi–Tomasi, andHessianTypeDetectors ........................192 HessianMatrixDetectorandHessian–Laplace...........193 DifferenceofGaussians...........................193 SalientRegions.................................193 SUSAN,andTrajkovicandHedly. ...................194 Fast,Faster,AGHAST ............................194 LocalCurvatureMethods ..........................195 MorphologicalInterestRegions......................196 FeatureDescriptorSurvey.. .........................196 LocalBinaryDescriptors... .......................197 Census.. ......................................205 ModifiedCensusTransform ........................205 BRIEF........................................206 ORB.........................................206 BRISK ........................................207 FREAK .......................................208 SpectraDescriptors................................208 SIFT.........................................209 SIFT-PCA.....................................213 SIFT-GLOH ....................................214 SIFT-SIFERRetrofit ..............................214 SIFTCS-LBPRetrofit... .........................214 RootSIFTRetrofit ................................215 CenSurEandSTAR.. ............................216 CorrelationTemplates. ............................217 HAARFeatures.................................219 Viola–JoneswithHAAR-LikeFeatures................220 SURF ........................................221 VariationsonSURF. ............................222 HistogramofGradients(HOG)andVariants... .........223 PHOGandRelatedMethods........................224 Contents xiii
DaisyandO-Daisy ...............................225 CARD.. ......................................226
RobustFastFeatureMatching... ....................228
RIFF,CHOG ...................................229
ChainCodeHistograms ...........................230 D-NETS......................................231
LocalGradientPattern ............................232
LocalPhaseQuantization ..........................232
BasisSpaceDescriptors.............................233
FourierDescriptors...............................234
OtherBasisFunctionsforDescriptorBuilding ...........235
SparseCodingMethods ...........................235
PolygonShapeDescriptors ...........................235
MSERMethod... ...............................236
ObjectShapeMetricsforBlobsandPolygons.. .........237 ShapeContext. .................................239
3D,4D,VolumetricandMultimodalDescriptors...........241 3DHOG......................................241
HON4D......................................242
3DSIFT......................................243 Summary.. ......................................244
Chapter6:LearningAssignments... ...................245
7GroundTruthData,Content,Metrics,andAnalysis ......247 WhatIsGroundTruthData?. .........................247
PreviousWorkonGroundTruthData:Artvs.Science.......249 GeneralMeasuresofQualityPerformance. .............249 MeasuresofAlgorithmPerformance..................250 Rosin’sWorkonCorners. .........................251
KeyQuestionsforConstructingGroundTruthData.........252 Content:Adopt,Modify,orCreate...................252 SurveyofAvailableGroundTruthData... .............252 FittingGroundTruthDatatoAlgorithms...............253 SceneCompositionandLabeling... ..................254
DefiningtheGoalsandExpectations.. ..................256 MikolajczykandSchmidMethodology ................256 OpenRatingSystems.. ...........................256 CornerCasesandLimits...........................257 InterestPointsandFeatures .........................257 RobustnessCriteriaforGroundTruthData...............258 IllustratedRobustnessCriteria.......................258 UsingRobustnessCriteriaforRealApplications.........259 PairingMetricswithGroundTruth.....................261 PairingandTuningInterestPoints,Features, andGroundTruth................................261 ExamplesUsingtheGeneralVisionTaxonomy ..........261 SyntheticFeatureAlphabets..........................262 GoalsfortheSyntheticDataset... ...................263
xiv Contents
SyntheticInterestPointAlphabet... ..................266 HybridSyntheticOverlaysonRealImages.............268 Summary.. ......................................269 Chapter7:LearningAssignments... ...................271 8VisionPipelinesandOptimizations ...................273 Stages,Operations,andResources.....................274 ComputeResourceBudgets.. ........................275 ComputeUnits,ALUs,andAccelerators.. .............277 PowerUse .....................................278 MemoryUse ...................................278 I/OPerformance.................................282 TheVisionPipelineExamples........................282 AutomobileRecognition.. .........................282 Face,Emotion,andAgeRecognition........ ..........289 ImageClassification. .............................296 AugmentedReality. ..............................299 AccelerationAlternatives............................304 MemoryOptimizations.. ..........................304 Coarse-GrainParallelism.. ........................307 Fine-GrainDataParallelism ........................308 AdvancedInstructionSetsandAccelerators.............310 VisionAlgorithmOptimizationsandTuning..............311 CompilerandManualOptimizations..................312 Tuning... .....................................312 FeatureDescriptorRetrofit,Detectors, DistanceFunctions ...............................313 BoxletsandConvolutionAcceleration........ .........313 Data-TypeOptimizations,Integervs.Float .............314 OptimizationResources .............................314 Summary.. ......................................315 Chapter8:LearningAssignments... ...................316 9FeatureLearningArchitectureTaxonomy andNeuroscienceBackground .......................319 NeuroscienceInspirationsforComputerVision............320 FeatureGenerationvs.FeatureLearning .................321 TerminologyofNeuroscienceAppliedtoComputerVision...322 ClassesofFeatureLearning..........................327 ConvolutionalFeatureWeightLearning ................328 LocalFeatureDescriptorLearning ...................328 BasisFeatureCompositionandDictionaryLearning .......329 SummaryPerspectiveonFeatureLearningMethods .......329 MachineLearningModelsforComputerVision... .........329 ExpertSystems ..................................330 StatisticalandMathematicalAnalysisMethods ..........331 NeuralScienceInspiredMethods....................331 DeepLearning ..................................331 DNNHackingandMisclassification ................333 Contents xv
HistoryofMachineLearning(ML)andFeatureLearning .....333 HistoricalSurvey,1940s–2010s ......................334
ArtificialNeuralNetwork(ANN)TaxonomyOverview ....338 FeatureLearningOverview..........................339 LearnedFeatureDescriptorTypes....................339 HierarchicalFeatureLearning .......................340 HowManyFeaturestoLearn? ......................340 ThePowerOfDNNs... ..........................341 EncodingEfficiency. .............................341 HandcraftedFeaturesvs.HandcraftedDeepLearning. .....341 InvarianceandRobustnessAttributesforFeature Learning ......................................343
WhatAretheBestFeaturesandLearning Architectures? ..................................343 MergerofBigData,Analytics,andComputerVision......345 KeyTechnologyEnablers.. ........................347 NeuroscienceConcepts ..............................348 BiologyandBlueprint. ............................349 TheElusiveUnifiedLearningTheory.................350 HumanVisualSystemArchitecture...................351 TaxonomyofFeatureLearningArchitectures .............356 ArchitectureTopologies ...........................357 ANNs(ArtificialNeuralNetworks).................358 FNN(FeedForwardNeuralNetwork)..... ..........358 RNN(RecurrentNeuralNetwork).. ................358 BFN(BasisFunctionNetwork)... .................359 Ensembles,Hybrids.... ........................359 ArchitectureComponentsandLayers.................360 LayerTotals.................................360 LayerConnectionTopology ......................361 MemoryModel ...............................362 TrainingProtocols.. ...........................362 InputSamplingMethods.. .......................363 Dropout,Reconfiguration,Regularization............363 Preprocessing,NumericConditioning...............365 FeaturesSetDimensions.. .......................365 FeatureInitialization.. .........................366 Features,Filters.. .............................366 Activation,Transferfunctions.....................367 Post-processing,NumericConditioning..............368 Pooling,Subsampling,Downsampling,Upsampling.....369 Classifiers.... ...............................371 Summary.. ......................................371 Chapter9:LearningAssignments... ...................373
xvi Contents
10FeatureLearningandDeepLearningArchitecture Survey .........................................375 ArchitectureSurvey. ...............................376 FNNArchitectureSurvey. .........................377 P—Perceptron................................377 MLP,MultilayerPerceptron,Cognitron,Neocognitron...383 ConceptsforCNNs,Convnets,DeepMLPs. ..........387 LeNet......................................417 AlexNet,ZFNet ...............................419 VGGNetandVariantsMSRA-22,BaiduDeepImage, DeepResidualLearning.........................422 Half-CNN...................................425 NiN,Maxout........
GoogLeNet,InceptionNet........................431 MSRA-22,SPP-Net,R-CNN,MSSNN,Fast-R-CNN....434 Baidu,DeepImage,MINWA.. ...................437 SYMNETS—DeepSymmetryNetworks.............438 RNNArchitectureSurvey
MultidimensionalRNNs,MDRNN
Learning(DRL)...
.........................426
..................454
.................457
.............465
.........................469
Networks..... ...............................469
.........486
.............488
...................490
..................495
EnsembleMethods............ .....................506
......................508
Summary.. ......................................511
...................513 AppendixA:SyntheticFeatureAnalysis ...................515
..............547 AppendixC:ImagingandComputerVisionResources ........555 Contents xvii
..........................442 ConceptsforRecurrentNeuralNetworks.............443 LSTM,GRU.................................451 NTM,RNN-NTM,RL-NTM...
C-RNN,QDRNN..............................460 RCL-RCNN ..................................461 dasNET.....................................463 NAP—NeuralAbstractionPyramid....
BFNArchitectureSurvey.
ConceptsforMachineLearningandBasisFeature
PNN—PolynomialNeuralNetwork,GMDH.
HKD—KernelDescriptorLearning....
HMP—SparseFeatureLearning
HMAXandNeurologicalModels
HMO—HierarchicalModelOptimization............506
DeepNeuralNetworkFutures...
IncreasingDepthtotheMax—DeepResidual
..............................509 ApproximatingComplexModelsUsingASimpler MLP(ModelCompression).........................510 ClassifierDecompositionandRecombination............511
Chapter10:LearningAssignments..
AppendixB:SurveyofGroundTruthDatasets
AppendixD:ExtendedSDMMetrics
AppendixE:TheVisualGenomeModel(VGM)
References
.....................563
.............575
..........................................601
..............................................627 xviii Contents
Index
ImageCaptureandRepresentation
“Thechangingofbodiesintolight,andlightintobodies,isveryconformabletothecourse ofNature,whichseemsdelightedwithtransmutations.”
—IsaacNewton
Computervisionstartswithimages.Thischaptersurveysarangeoftopicsdealingwithcapturing, processing,andrepresentingimages,includingcomputationalimaging,2Dimaging,and3Ddepth imagingmethods,sensorprocessing,depth-fieldprocessingforstereoandmonocularmulti-view stereo,andsurfacereconstruction.Ahigh-leveloverviewofselectedtopicsisprovided,with referencesfortheinterestedreadertodigdeeper.Readerswithastrongbackgroundintheareaof 2Dand3Dimagingmaybenefitfromalightreadingofthischapter.
ImageSensorTechnology
Thissectionprovidesabasicoverviewofimagesensortechnologyasabasisforunderstandinghow imagesareformedandfordevelopingeffectivestrategiesforimagesensorprocessingtooptimizethe imagequalityforcomputervision.
TypicalimagesensorsarecreatedfromeitherCCDcells(charge-coupleddevice)orstandard CMOScells(complementarymetal-oxidesemiconductor).TheCCDandCMOSsensorsshare similarcharacteristicsandbotharewidelyusedincommercialcameras.Themajorityofsensors todayuseCMOScells,though,mostlyduetomanufacturingconsiderations.Sensorsandopticsare oftenintegratedtocreate wafer-scalecameras forapplicationslikebiologyormicroscopy,asshown inFig. 1.1
Imagesensorsaredesignedtoreachspecificdesigngoalswithdifferentapplicationsinmind, providingvaryinglevelsofsensitivityandquality.Consultthemanufacturer’sinformationtoget familiarwitheachsensor.Forexample,thesizeandmaterialcompositionofeachphotodiodesensor cellelementisoptimizedforagivensemiconductormanufacturingprocesssoastoachievethebest trade-offbetweensilicondieareaanddynamicresponseforlightintensityandcolordetection.
Forcomputervision,theeffectsofsamplingtheoryarerelevant—forexample,theNyquist frequencyappliedtopixelcoverageofthetargetscene.Thesensorresolutionandopticstogether mustprovideadequateresolutionforeachpixeltoimagethefeaturesofinterest,soitfollowsthata featureofinterestshouldbeimagedorsampledatleasttwotimesgreaterthantheminimumsizeof thesmallestpixelsofimportancetothefeature.Ofcourse,2 oversamplingisjustaminimumtarget foraccuracy;inpractice,singlepixelwidefeaturesarenoteasilyresolved.
1
# SpringerInternationalPublishingSwitzerland2016 S.Krig, ComputerVisionMetrics,DOI10.1007/978-3-319-33762-3_1 1
Micro-lenses
RGB Color Filters
CMOS imager
Commonintegratedimagesensorarrangementwithopticsandcolorfilters
Forbestresults,thecamerasystemshouldbecalibratedforagivenapplicationtodeterminethe sensornoiseanddynamicrangeforpixelbitdepthunderdifferentlightinganddistancesituations. Appropriatesensorprocessingmethodsshouldbedevelopedtodealwiththenoiseandnonlinear responseofthesensorforanycolorchannel,todetectandcorrectdeadpixels,andtohandlemodeling ofgeometricdistortion.Ifyoudeviseasimplecalibrationmethodusingatestpatternwithfineand coarsegradationsofgrayscale,color,anddifferentscalesofpixelfeatures,appropriatesensor processingmethodscanbedevised.InChap. 2,wesurveyarangeofimageprocessingmethods applicabletosensorprocessing.Butletusbeginbysurveyingthesensormaterials.
SensorMaterials
Silicon-basedimagesensorsaremostcommon,althoughothermaterialssuchasgallium(Ga)are usedinindustrialandmilitaryapplicationstocoverlongerIRwavelengthsthansiliconcanreach. Imagesensorsrangeinresolution,dependinguponthecameraused,fromasinglepixelphototransistorcamera,through1Dlinescanarraysforindustrialapplications,to2Drectangulararraysfor commoncameras,allthewaytosphericalarraysforhigh-resolutionimaging.(Sensorconfigurations andcameraconfigurationsarecoveredlaterinthischapter.)
CommonimagingsensorsaremadeusingsiliconasCCD,CMOS,BSI,andFoveonmethods,as discussedabitlaterinthischapter.Siliconimagesensorshaveanonlinearspectralresponsecurve; thenearinfraredpartofthespectrumissensedwell,whileblue,violet,andnearUVaresensedless well,asshowninFig. 1.2.Notethatthesiliconspectralresponsemustbeaccountedforwhenreading therawsensordataandquantizingthedataintoadigitalpixel.Sensormanufacturersmakedesign compensationsinthisarea;however,sensorcolorresponseshouldalsobeconsideredwhen calibratingyourcamerasystemanddevisingthesensorprocessingmethodsforyourapplication.
SensorPhotodiodeCells
Onekeyconsiderationforimagesensorsisthephotodiodesizeorcellsize.Asensorcellusingsmall photodiodeswillnotbeabletocaptureasmanyphotonsasalargephotodiode.Ifthecellsizeisnear thewavelengthofthevisiblelighttobecaptured,suchasbluelightat400nm,thenadditional problemsmustbeovercomeinthesensordesigntocorrecttheimagecolor.Sensormanufacturers takegreatcaretodesigncellsattheoptimalsizetoimageallcolorsequallywell(Fig. 1.3).Inthe extreme,smallsensorsmaybemoresensitivetonoise,owingtoalackofaccumulatedphotonsand sensorreadoutnoise.Ifthephotodiodesensorcellsaretoolarge,thereisnobenefiteither,andthedie sizeandcostforsilicongoup,providingnoadvantage.Commoncommercialsensordevicesmay havesensorcellsizesofaround1squaremicronandlarger;eachmanufacturerisdifferent,however, andtrade-offsaremadetoreachspecificrequirements.
21ImageCaptureandRepresentation
Figure1.1
Figure1.2 Typicalspectralresponseofafewtypesofsiliconphotodiodes.Notethehighestsensitivityinthenearinfraredrangearound900nmandnonlinearsensitivityacrossthevisiblespectrumof400–700nm.RemovingtheIR filterfromacameraincreasesthenear-infraredsensitivityduetothenormalsiliconresponse.(Spectraldataimage # OSIOptoelectronicsInc.andusedbypermission)
color spectral overlap
Sensitivity
Wavelength (nm)
Figure1.3 Primarycolorassignmenttowavelengths.Notethattheprimarycolorregionsoverlap,withgreenbeinga goodmonochromeproxyforallcolors
1.00 0.80 0.60 0.40 0.20 0.00
390440490540590640690740
RGB
ImageSensorTechnology 3
SensorConfigurations:Mosaic,Foveon,BSI
Therearevariouson-chipconfigurationsformultispectralsensordesign,includingmosaicsand stackedmethods,asshowninFig. 1.4.Ina mosaicmethod,thecolorfiltersarearrangedinamosaic patternaboveeachcell.The Foveon1 sensorstackingmethod reliesonthephysicsofdepthpenetrationofthecolorwavelengthsintothesemiconductormaterial,whereeachcolorpenetratesthesilicon toadifferentdepth,therebyimagingtheseparatecolors.Theoverallcellsizeaccommodatesall colors,andsoseparatecellsarenotneededforeachcolor.
Figure1.4 (Left)TheFoveonmethodofstackingRGBcellstoabsorbdifferentwavelengthsatdifferentdepths,with allRGBcolorsateachcelllocation.(Right)AstandardmosaiccellplacementwithRGBfiltersaboveeach photodiode,withfiltersonlyallowingthespecificwavelengthstopassintoeachphotodiode
Back-side-illuminated (BSI)sensorconfigurationsrearrangethesensorwiringonthedietoallow foralargercellareaandmorephotonstobeaccumulatedineachcell.SeetheAptina[392]white paperforacomparisonoffront-sideandback-sidediecircuitarrangement.
Thearrangementofsensorcellsalsoaffectsthecolorresponse.Forexample,Fig. 1.5 shows variousarrangementsofprimarycolor(R,G,B)sensorsaswellaswhite(W)sensorstogether,where Wsensorshaveaclearorneutralcolorfilter.Thesensorcellarrangementsallowforarangeofpixel processingoptions—forexample,combiningselectedpixelsinvariousconfigurationsofneighboring cellsduringsensorprocessingforapixelformationthatoptimizescolorresponseorspatialcolor resolution.Infact,someapplicationsjustusetherawsensordataandperformcustomprocessingto increasetheresolutionordevelopalternativecolormixes.
Theoverallsensorsizeandformatdeterminesthelenssizeaswell.Ingeneral,alargerlensletsin morelight,solargersensorsaretypicallybettersuitedtodigitalcamerasforphotography applications.Inaddition,thecellplacementaspectratioonthediedeterminespixelgeometry—for example,a4:3aspectratioiscommonfordigitalcameraswhile3:2isstandardfor35mmfilm.The sensorconfigurationdetailsareworthunderstandinginordertodevisethebestsensorprocessingand imagepreprocessingpipelines.
Stacked Photo-diodes R B G R filter B filter G filter
Photo-diode Photo-diode Photo-diode
1 FoveonisaregisteredtrademarkofFoveonInc. 41ImageCaptureandRepresentation
Figure1.5 Severaldifferentmosaicconfigurationsofcellcolors,includingwhite,primaryRGBcolors,andsecondary CYMcells.Eachconfigurationprovidesdifferentoptionsforsensorprocessingtooptimizeforcolororspatial resolution.(Imageusedbypermission, # IntelPress,fromBuildingIntelligentSystems)
DynamicRange,Noise,SuperResolution
Currentstate-of-the-artsensorsprovideatleast8bitspercolorcell,andusuallyare12–14bits.Sensor cellsrequireareaandtimetoaccumulatephotons,sosmallercellsmustbedesignedcarefullytoavoid problems.Noisemaycomefromoptics,colorfilters,sensorcells,gainandA/Dconverters,postprocessing,orthecompressionmethods,ifused.Sensorreadoutnoisealsoaffectseffectiveresolution,aseachpixelcellisreadoutofthesensor,senttoanA/Dconverter,andformedintodigitallines andcolumnsforconversionintopixels.Bettersensorswillprovidelessnoiseandhighereffectivebit resolution,howevereffectiveresolutioncanbeincreasedusingsuperresolutionmethods,bytaking severalimagesinrapidsuccessionaveragedtogethertoreducenoise[886],oralternativelythesensor positioncanbemicro-MEMS-ditheredtocreateimagesequencestoaveragetogethertoincrease resolution.Agoodsurveyofde-noisingisfoundintheworkbyIbenthal[391].
Inaddition,sensorphotonabsorptionisdifferentforeachcolor,andmaybeproblematicforblue, whichcanbethehardestcolorforsmallersensorstoimage.Insomecases,themanufacturermay attempttoprovideasimplegamma-curvecorrectionmethodbuiltintothesensorforeachcolor, whichisnotrecommended.Fordemandingcolorapplications,considercolorimetricdevicemodels andcolormanagement(aswillbediscussedinChap. 2),orevenbycharacterizingthenonlinearityfor eachcolorchannelofthesensoranddevelopingasetofsimplecorrectiveLUTtransforms.(NoisefilteringmethodsapplicabletodepthsensingarealsocoveredinChap. 2.)
SensorProcessing
Sensorprocessingisrequiredtode-mosaicandassemblethepixelsfromthesensorarray,andalsoto correctsensingdefects.Wediscussthebasicsofsensorprocessinginthissection.
Typically,adedicatedsensorprocessorisprovidedineachimagingsystem,includingafastHW sensorinterface,optimizedVLIWandSIMDinstructions,anddedicatedfixed-functionhardware blockstodealwiththemassivelyparallelpixel-processingworkloadsforsensorprocessing.Usually, sensorprocessingistransparent,automatic,andsetupbythemanufactureroftheimagingsystem,
ImageSensorTechnology 5
andallimagesfromthesensorareprocessedthesameway.Abypassmayexisttoprovidetheraw datathatcanallowcustomsensorprocessingforapplicationslikedigitalphotography.
De-Mosaicking
Dependingonthesensorcellconfiguration,asshowninFig. 1.5,variousde-mosaickingalgorithms areemployedtocreateafinalRGBpixelfromtherawsensordata.AgoodsurveybyLosson etal.[388]andanotherbyLietal.[389]providesomebackgroundonthechallengesinvolvedandthe variousmethodsemployed.
Oneofthecentralchallengesofde-mosaickingispixelinterpolationtocombinethecolorchannels fromnearbycellsintoasinglepixel.Giventhegeometryofsensorcellplacementandtheaspectratio ofthecelllayout,thisisnotatrivialproblem.Arelatedissueiscolorcellweighting forexample, howmuchofeachcolorshouldbeintegratedintoeachRGBpixel.Sincethespatialcellresolutionin amosaickedsensorisgreaterthanthefinalcombinedRGBpixelresolution,someapplications requiretherawsensordatatotakeadvantageofalltheaccuracyandresolutionpossible,orto performspecialprocessingtoeitherincreasetheeffectivepixelresolutionordoabetterjobof spatiallyaccuratecolorprocessingandde-mosaicking.
DeadPixelCorrection
Asensor,likeanLCDdisplay,mayhavedeadpixels.Avendormaycalibratethesensoratthefactory andprovideasensordefectmapfortheknowndefects,providingcoordinatesofthosedeadpixelsfor useincorrectionsinthecameramoduleordriversoftware.Insomecases,adaptivedefectcorrection methods[390]areusedonthesensortomonitortheadjacentpixelstoactivelylookfordefectsand thentocorrectarangeofdefecttypes,suchassinglepixeldefects,columnorlinedefects,anddefects suchas2 2or3 3clusters.Acameradrivercanalsoprovideadaptivedefectanalysistolookfor flawsinrealtime,andperhapsprovidespecialcompensationcontrolsinacamerasetupmenu.
ColorandLightingCorrections
Colorcorrectionsarerequiredtobalancetheoverallcoloraccuracyaswellasthewhitebalance.As showninFig. 1.2,colorsensitivityisusuallyverygoodinsiliconsensorsforredandgreen,butless goodforblue,sotheopportunityforprovidingthemostaccuratecolorstartswithunderstandingand calibratingthesensor.
Mostimagesensorprocessorscontainageometricprocessorforvignettecorrection,which manifestsasdarkerilluminationattheedgesoftheimage,asdiscussedinChap. 7 Table 7.1 on robustnesscriteria.Thecorrectionsarebasedonageometricwarpfunction,whichiscalibratedatthe factorytomatchtheopticsvignettepattern,allowingforaprogrammableilluminationfunctionto increaseilluminationtowardtheedges.Foradiscussionofimagewarpingmethodsapplicableto vignetting,seeRef.[472].
GeometricCorrections
Alensmayhavegeometricaberrationsormaywarptowardtheedges,producingimageswithradial distortion,aproblemthatisrelatedtothevignettingdiscussedaboveandshowninChap. 7 (Fig. 7.6). Todealwithlensdistortion,mostimagingsystemshaveadedicatedsensorprocessorwitha
61ImageCaptureandRepresentation
hardware-accelerateddigitalwarpunitsimilartothetexturesamplerinaGPU.Thegeometric correctionsarecalibratedandprogrammedinthefactoryfortheoptics.SeeRef.[472]fora discussionofimagewarpingmethods.
CamerasandComputationalImaging
Manynovelcameraconfigurationsaremakingtheirwayintocommercialapplicationsusing computationalimaging methodstosynthesizenewimagesfromrawsensordata—forexample,depth camerasandhighdynamicrangecameras.AsshowninFig. 1.6,aconventionalcamerasystem usesasinglesensor,lens,andilluminatortocreate2Dimages.However,acomputationalimaging cameramayprovidemultipleoptics,multipleprogrammableilluminationpatterns,andmultiple sensors,enablingnovelapplicationssuchas3Ddepthsensingandimagerelighting,takingadvantage ofthedepthinformation,mappingtheimageasatextureontothedepthmap,andintroducingnew lightsourcesandthenre-renderingtheimageinagraphicspipeline.Sincecomputationalcamerasare beginningtoemergeinconsumerdevicesandwillbecomethefrontendofcomputervisionpipelines, wesurveysomeofthemethodsused.
Image Enhancements
- Color Enhancements
- Filtering, Contrast
Computational Imaging
- High Dynamic Range HDR
- High Frame Rates
- 3D Depth Maps
- Focal Plane Refocusing
- Focal Sweep
- Rolling Shutter
- Panorama Stitching - Image Relighting
2D Sensor
2D Sensor Array
2D Sensor Array 2D Sensor Array 2D Sensor Array
Single Lens
Multi-lens optics Arrays
Multi-lens optics Arrays
- Plenopticlens arrays
Multi-lens optics Arrays
- Plenopticlens arrays
Multi-lens Optics Arrays
- Sphere/ball lenses
- Sphere/ball lenses
- Plenopticlens arrays
- PlenopticLens Arrays
- Sphere/ball lenses
- Sphere/Ball Lenses
Single Flash Programmable Flash
- Pattern Projectors
- Multi-Flash
Figure1.6 Comparisonofcomputationalimagingsystemswithconventionalcameras.(Top)Simplecameramodel withflash,lens,andimagingdevicefollowedbyimageenhancementslikesharpeningandcolorcorrections.(Bottom) Computationalimagingusingprogrammableflash,opticsarrays,andsensorarrays,followedbycomputationalimaging applications.NOTSHOWN:superresolution[886]discussedearlier
OverviewofComputationalImaging
Computationalimaging[396,429]providesoptionsforsynthesizingnewimagesfromtherawimage data.Acomputationalcameramaycontrolaprogrammableflashpatternprojector,alensarray,and multipleimagesensors,aswellassynthesizenewimagesfromtherawdata,asillustratedinFig. 1.6. Todigdeeperintocomputationalimagingandexplorethecurrentresearch,seetheCAVEComputer VisionLaboratoryatColumbiaUniversityandtheRochesterInstituteofTechnologyImaging Research.Herearesomeofthemethodsandapplicationsinuse.
CamerasandComputationalImaging7