
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 11 | Nov 2025 www.irjet.net p-ISSN: 2395-0072
![]()

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 11 | Nov 2025 www.irjet.net p-ISSN: 2395-0072
Prof.
Shobha S Biradar1 , Sunil2
1Professor, Master of Computer Application, VTU, Kalaburagi , Karnataka ,India
2Student , Master of Computer Application ,VTU, Kalaburagi , Karnataka ,India
ABSTRACT- Real-timeobjectdetectionisarapidlyevolving fieldwithinArtificialIntelligence(AI)andMachineLearning (ML) that focuses on the automatic identification and localization of objects in images or video streams with minimaldelay.Unliketraditionalimageclassification,which onlydeterminesthepresenceofanobject,objectdetection provides both classification and spatial information using bounding boxes. Recent advancements in deep learning, particularly Convolution Neural Networks(CNNs) and architectures such as YOLO (You Only Look Once), SSD (Single Shot Multibox Detector), and Faster R-CNN, have significantlyimprovedtheaccuracyandspeedofdetection.
Keywords: Real time object detection, Deep learning, Real-time object detection, Computer vision, object detection.
Thisunifieddetectionapproachframesthetaskasasingle networkthatdirectlyregressesobjectboundingboxesand classprobabilitiesfromfullimages,enablingextremelyfast end-to-end inference suitable for real-time video. Ittrades somelocalizationfinesseforspeedbutsetthestandardfor practical,deployabledetectorsonlivestreams.Becausethe modeliscompactandtrainsend-to-end,itbecameapopular baseline for edge and camera-side applications where throughput is critical. Its simplicity also made export and engineering tooling easier, accelerating adoption. Use this family as the first production candidate when latency and simplicityaretoppriorities.
Thissingle-shotdetectorpredictsobjectboxesandclasses frommulti-scalefeaturemaps,handlingdifferentobjectsizes in one pass and offering an attractive speed/accuracy tradeoff. It demonstrated that proposal-free detectors can rivaltwo-stagesystemswhileremainingfaster usefulwhen youneedmoreaccuracythanthesmallestreal-timemodels but still require high FPS. The multi-scale default-box mechanism is widely reused in mobile and embedded detectors.Formid-tierhardware,SSDisapragmaticoption balancingprecisionandthroughput.
The primary challenge lies in designing a system that can accurately detect and classify multiple objects in a video streamwhilemaintainingreal-timeperformance.Achieving thisrequiresbalancingspeed,accuracy,andcomputational
efficiency Furthermore, real-world environments often presentadditionaldifficultiessuchasvariationsinlighting, object occlusion, background clutter, and camera noise, whichmakedetectiontasksmorecomplex. Therefore,the problemaddressedinthisprojectis thedevelopmentof a reliable real-time object detection system that leverages advancedAIandMLtechniques,particularlydeeplearning models, to overcome the shortcomings of traditional approachesandprovideaccurate,fast,andefficient object detectionforreal-worldapplications.
Themainobjectiveofthisstudyistodesignandimplementa real-timeobjectdetectionsystemthatcanaccuratelyidentify and classify multiple objects in live video streams using Artificial Intelligence (AI) and Machine Learning (ML) techniques. To achieve this, the study is guided by the followingspecificobjectives:
The methodology of this study outlines the systematic approach followed to design and implement the real-time objectdetectionsystem
1) Model Selection: Choose a suitable pre-trained deep learning model such as YOLOv8 or SSD for real-time performance.
2) System Design: Developapipelinewherevideoframes arecapturedfromawebcamorvideosource.
3) Implementation: Integrate the chosen model into the systemusingPythonandOpenCV.
4) Testing and Evaluation: Testthesystemunderdifferent conditionssuchaslightingvariations,movingobjects,and occlusions.
Mobile / lightweight detectors (various, 2016–2023) MobileNet-SSD, Tiny-YOLO, andEfficient Det-litevariants showhowarchitecturechoices(depthwiseseparableconvs, smallerfeaturemaps)minimizecomputeandmemorywhile preserving usable accuracy.Thesemodelsare tailored for CPU/mobile/NPU inference where full-size detectors are impractical. Use them for on-device, battery-sensitive applications.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 11 | Nov 2025 www.irjet.net p-ISSN: 2395-0072
Bewley et al. (2016) IntroducedSORT,aminimal-latency trackerusing KalmanfiltersandHungarianassignment to linkdetectionsacrossframes,achievinghighthroughputfor multi-objecttracking.SORT’ssimplicitymakesitapopular baselineinreal-timesystemsthatneedtrajectorieswithout heavy compute. It’s commonly used with fast detectors to createscalablecamerapipelines.
Wojke et al. (2017) Extended SORT with learned appearance embeddings (DeepSORT) to reduce identity switches during occlusion and reappearance, combining motion and appearance cues. Deep SORT is widely used where ID continuity matters (counting, re-id across cameras);itbalancesrobustnesswithreal-timeconstraints byoffloadingembeddingcomputationtoalightweightCNN.
Lin et al. (2014) The COCO dataset provided a largescale, richly-annotated benchmark with instance segmentationanddensescenes,catalyzingmoderndetector evaluation and architecture development. COCO’s standardizedmetrics(mAPatmultipleIoUthresholds)and diverse images are the de-facto evaluation baseline for detectionresearch.TrainandevaluateonCOCOfor broad generalizationandstandardizedcomparisons.
The real-time object detection system is designed as a standalone application that integrates computer vision techniqueswithdeeplearningmodelstodetectandclassify objects in live video streams. The system fits into the domain of AI-powered intelligent monitoring solutions, whereitactsasacoremodulethatcanlaterbeextended intolargerapplicationssuchassmartsurveillancesystems, autonomousvehicles,orindustrialautomationplatforms
System Architecture Overview:
Input Subsystem:Captureslivevideofeedfromawebcam, CCTV,orpre-recordedvideofile.
Processing Subsystem: Uses a deeplearning model (e.g., YOLO,SSD,FasterR-CNN)todetectandclassifyobjects.
Output Subsystem: Displays processed frames with boundingboxes,labels,andconfidencescoresinreal-time.
User Interaction Subsystem: Allows users to start, stop, andexitthedetectionprocesswithminimaleffort.
System Context:
Thesysteminteractswithhardwarecomponents(camera, GPU/CPU,displaymonitor)andsoftwarelibraries(OpenCV, PyTorch/TensorFlow,YOLOv8).
Itreliesonpre-traineddeeplearningmodels,whichcanbe replacedorretrainedfordomain-specificapplications.The
output is presented visually, but may also be extended to generatelogsoralertsinadvancedversions.

Collaboration Diagrams:


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 11 | Nov 2025 www.irjet.net p-ISSN: 2395-0072




Softwaretestingisacrucialphaseinsystemdevelopment thatensuresthereliability,accuracy,andefficiencyofthe application.The Real-Time Object detection System was tested at multiple levels to identify errors, verify functionality, and validate results against user requirements.
Testing Strategies: The system followed a bottom-up testing strategy, starting from small units (functions, modules) and moving towards the complete system. Both manual testing (for usability and output verification) and automated testing (for functional modules) were applied. Thefollowingstrategieswereused:
BlackBoxTesting–Focusedoninputsandoutputs withoutcheckinginternalcode.
White Box Testing – Focused on the flow of logic andcorrectnessofalgorithms.
Integration Testing: Combined modules such as video input + preprocessing + detection
Checkediftheintegrationallowedsmoothflowof databetweencomponents.
Example: Verified that preprocessed frames were correctlypassedtothedetectionmodel.
The Real-Time Object detection System successfully demonstrates the application of Artificial Intelligence and Machine Learning in solving real-world problems. By integratingstate-of-the-artobjectdetectionmodelswithlive video streams, the system is capable of identifying and classifyingmultipleobjectsaccuratelyandefficiently.
Through systematic design, development, and testing, the systemhasachieveditsobjectivesofreal-timeperformance,

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 11 | Nov 2025 www.irjet.net p-ISSN: 2395-0072
reliability, and user-friendliness. The use of deep learning models such as YOLO/SSD ensures high accuracy, while OpenCV enables smooth video frame handling and visualization.
Althoughthesystemsuccessfullyachievesitsobjectivesof detecting and classifying objects in real time, there are severalpotentialareaswhereitcanbefurtherimprovedand expanded.
Improved Accuracy: Integrating advanced deep learning models (e.g., YOLOv8, EfficientDet, Transformer-based models)canenhancedetectionaccuracyandrobustness.for specificdomains(trafficmonitoring,medicalimaging,retail, etc.)canmakethesystemmoreapplication-specific.
Faster Processing: Implementing GPU acceleration or deploying models on edge devices (NVIDIA Jetson, Google Coral,TPUs)canreducelatencyandimproveperformancein real-time environments. Optimizing models through quantizationandpruningtorunefficientlyonlow-resource devices.
[1]YOLO YouOnlyLookOnce:unified,real-timeobject detection.CVFoundation
[2]SSD SingleShotMultiBoxDetector:single-stagemultiscaledetector.ComputerScience
[3]FasterR-CNN RegionProposalNetwork+detection backbone.arXiv
[4] YOLO family progress & practical repos (YOLOv4 / YOLOv5/YOLOv8).ResearchGateACMDigitalLibrary
[5]EfficientDet scalableandefficientdetectionviaBiFPN andcompoundscaling.CVFOpenAccess
[6] SSD / MobileNet-SSD and mobile/lightweight detector variants.
[7]SORT SimpleOnlineandRealtimeTracking(trackingby-detectionbaseline).ACMDigitalLibrary
[8]DeepSORT SORT+deepappearanceembeddingsfor robustIDassociation.arXiv
[9]COCOdataset CommonObjectsinContext(standard detection/segmentationbenchmark).arXiv
[10]MOTChallenge/multi-objecttrackingbenchmarksand metrics.