Automated Cricket Highlight Generation using Advanced Video Processing Techniques by IRJET Journal

Automated Cricket Highlight Generation using Advanced Video Processing Techniques

Dr. Shraddha Khonde1, Shraddha Lalbegee2, Rida Shaikh3, Afifa Shaikh4 and Saniya Sayyed5

Professor, Dept. of computer Engineering, Modern Education Society’s Wadia College of Engineering Pune, Maharashtra, India

Student, Dept. of computer Engineering, Modern Education Society’s Wadia College of Engineering Pune, Maharashtra, India

Student, Dept. of computer Engineering, Modern Education Society’s Wadia College of Engineering Pune, Maharashtra, India ***

Abstract - This project focuses on automating the generationofcrickethighlightsusinga unique integrationof advanced computer vision and machine learning techniques, including Support Vector Machines (SVM), Artificial Neural Networks (ANN), Optical Character Recognition (OCR), and pose recognition. Cricket, a sport with intricate rules and lengthy matches, often demands labor-intensive efforts to create concise and engaging highlights. Our approach emphasizes innovation by combining event-driven and excitement-based features to identify and trim key moments such as boundaries, wickets, and significant milestones.

By utilizing crowd reactions, player celebrations, scoreboardchanges,andaudiointensityvariations,thesystem offers an accurate and dynamic way to detect critical match events. To ensure reliability, the generated highlights are cross-validated with official highlight reels and subjected to manual error-checking, emphasizing precision and completeness. This automation not only saves time and resources but also provides cricket fans with high-quality, accessible content tailored to their busy schedules, thereby revolutionizing the traditional approach to sports video summarization.

Key Words: Machine learning, Computer vision, Support Vector Machines, Artificial Neural Networks, Optical Character Recognition, Pose recognition, Event detection, Excitement-based analysis, Video summarization, Sports highlights

1.INTRODUCTION

1.1 Cricket Highlight Generation

Creatingcrickethighlightsinvolvestheintricatetask of transforming hours-long match footage into concise

videosthatcapturetheexcitementandessenceofthegame. With matches often extending beyond three hours, the processrequiresidentifyingkeyeventssuchasboundaries, wickets,andgame-definingmomentstocraftacompelling 7–8minutehighlightreel.Traditionalmethodsrelyheavily onmanualeffortsfromskilledanalysts,makingtheprocess time-consumingandpronetoinconsistencies.

This project introduces a novel framework that integratescutting-edgetechnologiestoautomatehighlight generation. OCR is employed to analyze and interpret scoreboarddata,SVMdetectschangesinaudiointensityand crowd reactions,whileANN leveragesposerecognition to identify player and umpire actions. By combining these advanced methodologies, the system ensures a comprehensiveapproachtoeventdetection,addressingthe challengesofaccuracy,scalability,andefficiencyinhighlight creation.

Unlike conventional methods, this automated solution dynamically analyzes multiple data streams visual, auditory, and statistical simultaneously, ensuring that the most exciting moments are captured without manualintervention.Theresultisatransformativewayto engage cricket fans, offering them immersive and highquality content in a fraction of the time required by traditionalprocesses.

1.2 Existing System

The conventional cricket highlight production systemdependssignificantlyonmanualvideoediting,which is time-consuming, labor-intensive, and susceptible to human error. Human editors normally watch hours of footage, flag important events, and manually cut highlight clips.Theseoperationsarehumanintuition-basedandare oftensubjecttosubjectivejudgment.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 06 | Jun 2025 www.irjet.net p-ISSN: 2395-0072

Modern systems, by virtue of literature, seek to bring in automation to this pipeline. For example: OCR systemslikethoseofSuksai&Ratanaworabhan(2016)and Anjumetal.(2013)havedemonstratedthattextextraction fromthescoreboardthroughtext-basedcuesiscapableof deliveringprecisetimingforeventssuchasboundariesand wickets.

Action recognition systems (Saha et al., 2022) combineOCRwithposeestimationandCNN/LSTMmodels toidentifydistinctmatchactionssuchasbowlingorwicket celebration.

Audio-driven excitement detection (Banjar et al., 2024) employs SVM with ternary acoustic codes to detect phases of crowd excitement, which enables flagging prospectivehighlightswithlowvisualinput.

Such systems are still subject to a number of limitations:

 Theytendtorelyonasolitarymodality(e.g.,audio or text), making them susceptible to that cue's absenceornoise.

 They can lack broadcast style, language, or video quality.

 Pose recognition in the majority of conventional systems is not exploited, losing an important cue fromumpiregestures.

 TraditionalandDevelopingSystemComponents:

 User Interface: Enables analysts or users to label framesorsegments.

 Event Detection Module: Either employs manual indicationorminimalautomation(e.g.,simpleOCR).

 VideoEditingTool:Synthesizeslabeledeventsinto ahighlightreel.

 StorageandPlaybackSystem:Storeshighlightsfor laterviewingorbroadcast.

Data Flow:

RawVideo→EventMarker(Auto/Manual)→Editor →

HighlightCompiler→OutputMedia

Stakeholder Interactions:

Analysts/Editors:Checkandlabelfootage. Developers: Createautomationpipelinesor dashboards.

Fans/Viewers: Watchhighlightpackageson appsorsocialmedia.

Adversary Model in Existing Systems: Legacysystemsarenotresilientto:

 Missedhighlightsduetohumanerrororfatigue.

 Replay confusion (multiple marks on the same event).

 Frame skipping or inaccuracies due to uncertain visualdata.

Current systems thwart these problems through aggregatingseveralcues(text,sound,motion)andreducing subjectivityinselectinghighlights.

The introduced system of automated highlight creationtakesadvantageofandimprovesuponthecurrent systemsbybringingtogetherallthreemajormodalities:text recognition,acousticpatternanalysis,andposeestimation, resulting in robust, efficient, and scalable real-time summarizationofsports.

2. System Implementation

The system is deployed as an extensible and modularFlaskwebapplicationthatauto-generatescricket highlights through a multi-modal process consisting of Optical Character Recognition (OCR), audio-visual feature extraction,andmachinelearningclassifiers.Thesystemcan eitheracceptYouTubelinksoruploadedvideos,identifykey events, create highlight clips, and show statistics visualizations.

KeyComponents:

1.OCR(OpticalCharacterRecognition)Module:

Uses Tesseract to extract text-based information from the scoreboardarea.

Preprocessingoperationslikegrayscaleconversion, Gaussian blurring, and adaptive thresholding enhance the accuracyoftextdetection.

Text is scanned for event-related keywords such as "SIX", "FOUR", or "WICKET" to recognize potential highlight moments.

Region of Interest (ROI)for OCR isdefined asthe lowerregionoftheframewherescoreoverlaystendtobe, configurablethroughratios.

2.FeatureExtractionModule:

 Picksouttext-basedbinaryfeatures(e.g.,keyword presence).

 ComputescolorhistogramdescriptorsinHSVcolor space(24values).

 Performs edge density computation using Canny edgedetection.

 Measures the intensity of motion between successiveframestocapturechanges.

The final feature vector comprises 6 text features + 24 histogramvalues+2imagemeasures=32features.

3.MachineLearningClassifiers:

 Trained on synthetic and real data with featurelabelcorrespondence.

 SVM (Support Vector Machine): RBF kernel with normalizedinputforexcitementclassification.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 06 | Jun 2025 www.irjet.net p-ISSN: 2395-0072

 ANN(ArtificialNeuralNetwork):MLPClassifierwith two hidden layers (100, 50), early stopping, and validationfraction.

 HybridModel:Asoft-votingclassifierthatcombines predictions from both SVM and ANN for better accuracy.

 Modelsaretrainedwithtrain_test_splitandstored withjoblib.

4.VideoAnalyzer&EventDetection:

 Eachhalf-secondframeisanalyzedtodetectevents.

 OCRtextisutilizedtoclassifyframecategory(sixes, wickets,etc.).

 If the same event doesn't happen within a fivesecondwindow,theeventiscaptured.

 Detectedframesarelabeledandtimestampedand saved.

 Final highlights are created using moviepy by cropping 5-second segments around the event times.

5.Visualization&Charts:

 Eventfrequenciesarebeingsummarizedinbarand piecharts.

 Charts are being created using matplotlib and seabornandbeingsavedasstaticfilestodisplayon frontend.

6.FlaskWebInterface:

 /processendpointtakesYouTubeURL,eventtype, andmodelchoice.

 ReturnsJSONwithanalysisimages,highlights,video link,accuracyscores,andprocessingtime.

 /routereturnstheindex.htmlinterface.

7.SystemConfigurationandUtilities:

 Utilizes Flask configuration dictionary for folder setting,ROIratios,andmodelpaths.

 Createsdirectoriesformodels,datasets,frames,and charts.

 Allows fallback models in the event of training failure.

 Random synthetic data generation is applied for developmentandtestingpurposes.

2.1. System Evaluation

A.EvaluationMethod(8%)

 Synthetic and real cricket match videos were processed.

 Events were manually annotated and compared withsystem-producedevents.

 Eachframedetectedwastaggedwithaconfidence scorederivedfromclassifierpredictions.

 Accuracy metrics (precision, recall, F1-score and overallaccuracy)werecalculated.

 Evaluationinvolvedqualitative(visualcorrectness) aswellasquantitative(metric-based)analysis.

B.ResultsoftheEvaluation(6%)

Table -1:MetricsandValues

ThemodelevaluationresultsaretabulatedinTableI,which indicatesthatthesystemattains85%precision,80%recall, 82%F1-score,and83%overallaccuracy.

Chart -1: ModelPerformanceComparison

This is Chart -1, which compares SVM, ANN, and Hybrid models based on important metrics Precision, Recall, F1 Score, and Accuracy. The Hybrid model shows higher recall (~0.95), which means more accurate identification of actual events. SVM demonstrates stable performance, particularly robust in accuracy. ANN marginallyperformsbetter thanSVMonF1andaccuracy, but trails on recall. In total, the Hybrid method has the optimumsensitivity-correctnessbalance.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 06 | Jun 2025 www.irjet.net p-ISSN: 2395-0072

Chart -2: EventDistribution

This pie chart (Figure 2) shows the relative percentage of each type of detected event. 'Sixes' predominate at 42.9%, suggesting either boundaryabundantgameorincreasedOCRrecognisability.Milestones compose 28.6%, which captures consistent score change monitoring. 'Run_rate' and 'won' both constitute 14.3%, underlining their lower occurrence in match footage. The chart provides an instant visual impression of highlight compositionandemphasis.

Chart -3: EventFrequency

Chart-3 illustrates the number of events detected percategoryfromthevideodataset.'Sixes'occurredmost often, which were followed by 'milestones' such as halfcenturies. Events like 'run_rate' and 'won' occur less frequently, maybe due to infrequency or lower OCR detection. This bar graph substantiates that great-scoring plays rule cricket highlights. It also demonstrates the efficiencyofeventclassificationlogicinthedeployedsystem.

Chart -4: Precision

Chart-4isaradarplotcomparisonofSVMandANN modelsintermsofperformancemetrics.Bothmodelsarein good balance, creating closely matched contours on the graph.SVMisnarrowlyaheadinrecallandprecision,though ANNiscomparableintermsofaccuracyandF1.Theradar formateffectivelydisplaysmultidimensionalperformancein asingleview.Thisjustifiesensemblingbothmodelstogether intheendhybridclassifier.

Hybrid model performed better than standalone SVM and ANNclassifiers.

OCRtextinaccuracywasmorecommoninreplaysor low-qualityvideos.

Pose recognition was integrated into the ANN featuredesign,aidingclassificationingesture-richsituations (e.g.,umpiregestures).

2.2 Discussion of the Evaluation Results

 Results confirm that a hybrid model improves detectionrobustness.

 Thereweresomemisclassificationswhenaudioand visual features disagreed (e.g., crowd cheering duringareplay).

 Highedgedensityoccasionallyspuriouslyreports actionduetocrowdbannersoroverlays.

 OCRfailureswereprimarilycausedbylowcontrast ormotionblurinvideo.

Theoverallevaluationaffirmsthepotentialofthissystemfor automated,accuratehighlightgenerationforcricket.Further developments may involve dynamic OCR ROI adjustment, ANNarchitecturerefinementforrealgestureclassification, andreplayfilteringintegrationthroughopticalfloworframe hashing.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 06 | Jun 2025 www.irjet.net p-ISSN: 2395-0072

3. Literature Survey

3.1 OCR:

Accordingto[1],OpticalCharacterRecognitionhas been used as one of the most efficient ways of extracting meaningful text information from video frames for generatingsportshighlights.Forexample,PichetSuksaiand ParujRatanaworabhan recommended the use of OCR that detectstimeindicatorsandthemarkersofkeyeventswith termslike"goal"or"penalty,"whichcanbededucedfrom online sources through analyzing the narratives. Unlike othertraditionalimageprocessingmethods,whichusually producefalsepositives,theirOCR-basedapproachfocuses solely on text recognition, which is more accurate and reliable. By aligning the extracted keywords with the corresponding time markers, this methodology achieves precise highlight generation without relying on complex heuristics,makingOCRatractableandefficientsolutionfor automatingsportshighlightproduction.

The paper "Video Summarization and Sports HighlightsGeneration" byMuhammadEhsanAnjum,Syed FarooqAli,MalikTahirHassan,andMuhammadAdnan[2], explores an innovative approach for generating cricket highlights using Optical Character Recognition (OCR) techniques which is a new algorithm to generate cricket highlightsbyextractingtextualinformationfromthevideo frames.Here,thealgorithmusesOCRtoextractandanalyze thescoreinformationfrom thescore barsavailableinthe frames.Themethodologyistodetectthescorebarinevery frame,extractthescore,andthencalculatethedifferences between consecutive frames' scores. The events like boundaries(4or6)andwicketfallsareidentifiedthrough thechangesinthescores,andtheseframesareputtogether tomakethehighlights.ThealgorithmwastestedontwoODI cricket videos with different resolutions (640x480 and 698x480)andframerates(25fpsand15fps).Aframegapof 25 was used to evaluate score differences iteratively, ensuring that significant events were captured efficiently. Keyframescoveringtheeventduration(approximately200 frames)wereincludedtomaintainthecontextualintegrityof thehighlightsFurtherusedtwoOCRtechniqueswereused: brute-forcematchingandtemplatematching.Whilebruteforce matching was easy but expensive computationally, templatematchingismoreefficient.Itrecognizedconnected componentsinthescorebarandmatchedthemtoalibrary of predeveloped score templates. It showed that the algorithmhascapturedmostfoursandsixes,andsometimes failedincapturingitduetolowvideoqualityandalimited OCRlibrary.Thesizeofthehighlightsdependsontheevent frequencyandframeinclusionperevent,calculatedas:

TotalFramesinHighlights=FramesperEvent×Numberof Events.For example,if200framesareincludedper event and 60 events occur, the highlight size equals 12,000 frames.[2]

Thepaper[3]proposesthedesignofanautomated system for cricket highlight generation by key event detection, combining score extraction with action recognition.Thedevelopedsystemextractsthescorecaption fromvideoframesusingtechniqueslikeEASTandOCRfor the purpose of action recognition using CNN and LSTMbasedmodelstoclassifybowlingasakeyevent.Two-phase processingcanbeanefficientwayforprocessingvideosof cricketmatchesandprovidingvaluablehighlights.

Themethodisefficientandyieldspromisingresults bydemonstratingaproductivesolutioningeneratingcricket highlights.(Sahaetal.,2022)[3].

Table -2: ScoreExtractionPerformance

Table2aboveshowstheperformanceofthesystem intermsofextractingscoresforimportantcricketingevents, includingfours,sixes,andwickets.Itcanbeclearlyseenthat thesystemwashighlyaccurateindetectingalltheseevents, with the accuracy percentage being 92.30% for fours, 93.33%forsixes,and93.02%forwickets.

The research is divided into two main parts: keyframedetectionandbowlingactionrecognition.In the first phase, the EAST model is used for text detection followed by Tesseract for text recognition, to identify keyframesandeventssuchasfours,sixes,andwickets.The performanceofthescoreextractionpartisevaluatedusing theformula:

Performance=DetectedEvents/TotalEvents×100

where"DetectedEvents"isthecorrectlydetectedeventsand "TotalEvents"isthetotaleventsinavideoclip.Theresults forthissectionareremarkable:Fours92.30%,Sixes93.33%, and Wickets 93.02% (Saha et al., 2022) [3]. The second phaseinwhich,videoclipof20-25secondsinlengththrough the model of LRCN model, which reduces all such information,whereonlythenecessarypartshavebeentaken such as the bowling action. Its model was tested on some different datasets and shows such great performances: Cricket dataset 93.59%, Bowling dataset 98.10%, and combinedBowlingandNon-Bowlingdataset100.00%.

The final accuracy of the proposed system is evaluated for a clip with a declaration of correct when it detects a significant event namely fours, sixes or wickets, andthencorrectlyrecognizesthebowlingactionafterithas

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 06 | Jun 2025 www.irjet.net p-ISSN: 2395-0072

detectedthekeyframes.Total34TruePositives,100True Negatives, 2 False Positives and 11 False Negatives are reported with an average accuracy of 92.41% and a precision94.44%.Thisdemonstratestheproposedmethod effectivenessindetectingkeyeventsforautomaticcricket highlightgeneration(Sahaetal.,2022)[3].

3.2 Pose Recognition

Rahul Mili, Nayan Ranjan Das, Arjun Tandon, Saquelain Mokhtar, Imon Mukherjee, and Goutam Paul introducethenovelapproachofrecognizingumpiregestures incricketthroughposeestimationinthepapertitled"Pose Recognition in Cricket using Keypoints". The research is centeredondetectingkeyumpiresignalslikeNoBall,Six, Wide,Out,andNoActionbyextractingkeypointsfromvideo frames with a pose estimation model. The system uses MobileNet to detect the pose, which is a relatively computation-efficientdeeplearningmodel followedby an ANNclassifierfortheidentificationofthegesture.Itisthis hybrid technique, which makes it much accurate, and the accuracy is as high as 87%, proving that this is a robust systemandrobustinrecognizingallgesturesrelatedtothe cricketumpire.

This method shows significant potential in automatingtheprocessofvideosummarizationincricketby accuratelydetectingspecificeventsanddecisionsmadeby the umpire. By automating the recognition of umpire gestures, the system can contribute to the efficient generation of highlights, reducing manual effort and ensuring that important moments are captured correctly. Thisisanareaofemergingworkontheapplicationofpose estimationforgesturerecognition,andthispaperpointsout howthesamecanbeappliedforcricket,whereinvisualcues such as gestures of an umpire would decide whether the eventturnsinfavouroragainstit.

Contrary to most previous works on cricket highlight generation based on rule-based or manually detectedevents,thetechniqueproposedherewouldpresent a sizeable improvement as it resorts to machine learning techniques. The integration of pose recognition and ANN classifiersallowsthesystemtoidentifysubtledifferencesin gestures, which would be challenging to capture using traditionalmethods.Additionally,thedeeplearning-based model has the advantage of being more adaptable and scalable to other sports and event detection applications, makingitaversatiletoolforreal-timeeventrecognition.

Chart-5 : FlowchartofUmpirePoseClassificationUsing KeypointsDetectionandNeuralNetworks

TheChart5illustrateshowtheposesofumpiresare classifiedbytheiractions.AnimageoftheSnowDatasetis passedthroughakeypointsdetectionnetworklikeTFPose Estimator.Thekeypointsextractedfromanimagearethen mappedtoastatisticaldatasetthatrepresentsposture.This statisticaldatafeedintoanArtificialNeuralNetwork(ANN) for classification. The network identifies the action of the umpire,whichcouldbeaNoAction,Wide,Out,Sixes,orNo Ball bythepose.

Thecontributionofthisstudyissignificantnotonly forcricketbutforbroaderapplicationsinothersportsand domainsthatrequiregesturerecognition.Theauthorshave effectivelyshownthepotentialofcombiningposeestimation andmachinelearningtowardimprovingautomatedsystems in the analysis of sports. In this regard, their study contributes to previous work conducted in video summarization,poserecognition,andeventdetection,which paves the way for further improvements on real-time automatedhighlightsgeneration.Asidefromtechnicalmerit, the paper will address the need to make the production process of video highlights devoid of human intervention, hencebecomingvery importantin a contemporary sports broadcast. Applications in pose recognition may automatically smoothen the whole process of postproductionwhilegeneratinghighlightsfasterand,ofcourse, morepreciselyasdemandedbytheworldforinstantaccess to those dramatic sporting moments. By incorporating machinelearning,thisworkcouldeventuallyextendtoother domains, such as the analysis of crowd behavior, action recognitioninteamsports,andevenassistivetechnologies for individuals with disabilities. Thus, the contribution of thisworknotonlyaddstothespecificdomainofcricketbut

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 06 | Jun 2025 www.irjet.net p-ISSN: 2395-0072

alsoformsaprecedentforfutureworkinautomatedevent detectionforvariousdomains.

Inconclusion,thepaperbyMilietal.representsan important step forward in the realm of automated sports video summarization, particularly for cricket. Using pose recognitioncoupledwithmachinelearningtechniques,the authorshavedemonstratedarobustsystemthatenhances theaccuracyandefficiency ofgeneratingvideohighlights. This will open the door to further studies on how such systemsmaybespedupandscaled,impactingbothsports and entertainment fields. This review, being expanded, captures some key aspects of what the paper contributes, settingamuchbroadercontexttounderstandwhattherole of pose recognition is in the automated cricket highlight generationanditsfutureapplicationsbeyondthesport.[5]

3.3 SVM:

AsperAmeenBanjar,HussainDawood,AliJaved,andBushra Zeb's paper, "Sports Video Summarization Using Acoustic SymmetricTernaryCodesandSVM,"providesanautomated approachforautomaticallycreatingsportsvideohighlights based on cricket and soccer video audio streams. The approachusessymmetricternarycodes(STC)asa feature descriptortodetectexcitementintheaudio,whichisfurther used to identify key frames in the corresponding video. A Support Vector Machine (SVM) classifier is used for excitementdetection,andthevideoframesrelatedtoexcited audiosegmentsaremarkedaskeyevents.Thesekeyframes form video skims, aligned according to the user-defined length of the summary, resulting in a concise video summary.Themethodwastestedagainstdiversedatasets and obtained results with high accuracy at 97.7% and 91.23%forcricketandsoccervideos,respectively.Itdepicts informationaboutthedatasetforwhichperformanceistobe calculated and the experiments made with the proposed method to gauge their performance. The videos under consideration include cricket and soccer videos selected fromamyriadofbroadcasterslikePTVSports,StarSports, TenSports,SkySports,andSuperSports.Allvideoswithin this dataset have a frame rate of 30 fps with spatial resolutionat640×480.Thedatasetiscreatedtodiversify different video lengths, broadcasters, sports genres, tournaments,andcommentators.

Table -3: ConfusionMatrixAnalysisforCricketandSoccer Classification

Table3aboveshowsaconfusionmatrixanalysisfor aclassificationtaskthatdifferentiatesbetweenCricketand Soccer.Thisdemonstratesthemodel'sperformanceasfaras correctlyandincorrectlyclassifiedinstancesareconcerned. ForCricket,themodelcorrectlypredicted20,399aspositive while30,002asnegative,withminimalmisclassificationof 241 and 718, respectively. Conversely, for Soccer, it misclassified 15,249 as Cricket, but correctly predicted 17,197 as negative. The table shows that the model has a strong performance in correctly classifying Cricket but revealssignificantmisclassificationissuesindistinguishing SoccerfromCricket,whichindicatesareasforimprovement.

The SoccerNet dataset was used for performance evaluation.Thisdatasetconsistsof500games,totaling764 hours, with audio segments for key-event detection. The dataset was divided into training, validation, and testing collections, each comprising 300, 100, and 100 games, respectively,withtwoseparatevideosforeachhalf,which amounts to 600 videos for training, and 200 each for validationandtesting.

The proposed framework concentrates on excitement-basedkeyaudioframedetectionforthepurpose ofperformanceevaluation.Thereare3564audioclipsinthe datasetand1782fromthesoccervideosandtherestfrom cricketvideos.Theclipsareannotatedintoexcitedandnonexcitedcategories baseduponthe magnitude of theaudio (e.g.,aloudaudiencecheeringandloudcommentary).80% oftheclipsfromeachsportwereusedfortraining,withan equalsplitbetweenexcitedandnon-excitedaudio,while

20%oftheclipswereusedfortesting.TheSTCfeaturesof theaudioclipswereextractedandusedwithSupportVector Machines (SVM) for key-event detection and video summarization.[4]

4. Comparison of Literature Survey

Chart -6: Technologydistributionchart

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 06 | Jun 2025 www.irjet.net p-ISSN: 2395-0072

Chart 6 captures the ranking of the most preferred technologies;here,DNNisinthemajority,whileOCRstands inthesecondpositionwithareasonableshare.Audio-Visual Extraction and YOLOv4 are estimated with moderate portions, while SVM, Video Summarization, and ORB BRS occupysmallersegments.

Chart -7: AccuracyComparisonofVariousTechniques

Chart 7 represents video summarization at 80%, whereas NN(CNN,ANN)brillianceisat97%accuracy.Theothersare mid-tier - OCR and SVM at 95%96%, and Audio-Video Extractionisinthemidrangeat95%-96%.Thegapshows that Video Summarization requires additional work to be developedfurtherfromthisaspect.

Chart -8: Researchpaperpublicationtrendchart

Chart8,therefore,has2022asthehighlightyearwithall8 publications,whichaccountsforthehighestproductionof papersrecordedinanyotheryear.Evenotheryearsof2024 and 2023 had only one or two or three papers and even earliersuchas2020hadaslowas1or2.Thissurelyshows thatresearchoutputwasstrongestin2022.

5.Cricket Highlight Generation System Architecture

Fig5.SystemArchitecture

1. User Interface(Website)

1. Itallbeginswiththeuserinterfaceitisaweboran application.

2. Thisinterfaceisusedbytheusersasatooltoenter their preferences or to select particular cricket matches.

2. Application Server

1. TheUIiscommunicativewiththeapplicationserver thatmanagesrequestsandorchestratestheoverall process.

2. It takes care of data flow and coordinates interactionsamongvariousmodules.

3. Video Processing Module

Thismoduleprocessesrawcricketvideofootage.Thesubmodulesinclude:

1. Audio Cues-Based Segmentation: It uses audio cues(likecrowdnoiseorcommentary)tosegment thevideointomeaningfulclips.

2. Replay Detection: Identifies replay moments, which can be boundaries, milestones, wickets, or sixes.

3. Scoreboard Extraction: Important Information Scoreboardextraction-thecurrentscore,overs,and batsmendetails.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 06 | Jun 2025 www.irjet.net p-ISSN: 2395-0072

4. Text Processing (OCR) Module

1. Optical CharacterRecognitionisapplied todetect thetextinsideavideo.

2. It follows run changes, wicket changes, and other textualinformation.

5. Audio Classification (SVM) Module

1. TheSVMsareappliedtotheclassificationofaudio.

2. In this module, it tries the audio intensity that catchestheexcitementrelatedtothekeyevents.

6. Pose Recognition (ANN) Module

1. ArtificialNeuralNetworks(ANN)scansforthepose ofplayersandrecognizesspecificcricketingactions such as batting, bowling, and fielding based on playermovements.

7. Database

1. Stores the relevant data like extracted features, timestamping,andmetadataforuselaterinsome othermodule.

8. Highlight Generation Module

1. Combine all the information from the above modulestomakecrickethighlights.

2. Clip all the important events such as boundary, wicket,andmilestonesinauniformmanner.

3. Itbasedsuggestionsonsequences,includingthose primarily and fundamentally dependent on interactionsbetweenplayers,audiointensity,and otherfeaturesindevelopingthehighlightpackage

6. Challenges

i. Key moment detection: Huge video data has massive wicketsand boundaries.Thekeymoment identification is therefore much more challenging and requires robust modelsofmachinelearning.

ii. OCR Limitations: Applying OCR in extracting text from dynamic scoreboards and graphics can be hampered by variabilityinfonts,motionblur,andbackgroundnoise.

iii. Data Quality and Consistency: The difference in video quality, angles, and camera movement may affect the performanceofalgorithmsimplementedtoextracthighlight features.

iv.CopyrightandLicensingIssues:Copyrightandlicensing issuesarequitetoughtonavigatewithvideocontentfrom cricketmatches,especiallywhenusingthemanddistributing them.

7. Results

.9.HomePage

Fig.10.Frames

Fig.11.Highlights Video

Fig.12.Accuracy

Fig.13.Accuracy

Figures9–13arederivedfromtheexperimentalresultsof thesystemproposedhereandrepresentmodelperformance measures,eventfrequency,comparativestudies,andevent distributionaccordingtoprocessedcricketmatchdata.

Fig

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 06 | Jun 2025 www.irjet.net p-ISSN: 2395-0072

8. Conclusion

Thisprojecthasshownthatitwassuccessful,with thebuildingofasystemtogeneratecricketmatchhighlights, integrating features from multiple machine learning and deep learning techniques, including SVM, ANN, OCR, and poserecognition.Itmovedfromaudio-basedapproachesat the initial stages to more advanced image processing and pose detection strategies, significantly enhancing the generation in the highlight streams. While the system achievedhighimprovementsoverthetraditionalapproaches withaudio-basedmethodsbutbroughtaboutmorerelevant and accurate spots than traditional systems, there is still roomforimprovement.FutureimprovementswillFocuson refining the accuracy of the model and incorporation of features toward more precise and more automatized highlightgeneration.

9. References

[1]ANewApproachtoExtractingSportHighlightP.Suksai andP.Ratanaworabhan,"Anewapproachtoextractingsport highlight," 2016 International Computer Science and EngineeringConference(ICSEC),ChiangMai,Thailand,2016, pp.1-6,doi:10.1109/ICSEC.2016.7859876.

[2]VideoSummarizationSportsHighlightsGenerationM.E. Anjum, S. F. Ali, M. T. Hassan and M. Adnan, "Video summarization: Sports highlights generation," INMIC, Lahore, Pakistan, 2013, pp. 142-147, doi: 10.1109/INMIC.2013.6731340.

[3] Automatic Generation Framework Comprising Score Extraction and Action Recognition A. Saha, N. Kothari, P. Shah, R. Parekh and N. Shekokar, "Cricket Highlight Generation:AutomaticGenerationFrameworkComprising Score Extraction and Action Recognition," 2022 IEEE 3rd GlobalConferenceforAdvancementinTechnology(GCAT), Bangalore, India, 2022, pp. 1-7, doi: 10.1109/GCAT55367.2022.9972198.

[4]Sports video summarization using acoustic symmetric ternarycodesandSVMAmeenBanjar,HussainDawood,Ali Javed, Bushra Zeb, Sports video summarization using acoustic symmetric ternary codes and SVM, Applied Acoustics, Volume 216, 2024, 109795, ISSN 0003-682X, https://doi.org/10.1016/j.apacoust.2023.109795.

[5]PoseRecognitioninCricketusingKeypointsR.Mili,N.R. Das,A.Tandon,S.Mokhtar,I.MukherjeeandG.Paul,"Pose RecognitioninCricketusingKeypoints,"2022IEEE9thUttar Pradesh Section International Conference on Electrical, ElectronicsandComputerEngineering(UPCON),Prayagraj, India, 2022, pp. 1-5, doi: 10.1109/UPCON56432.2022.9986481.