
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 12 | Dec 2025 www.irjet.net p-ISSN: 2395-0072
![]()

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 12 | Dec 2025 www.irjet.net p-ISSN: 2395-0072
Prof. M. M. Thakur1 , SAHIL DHANVIJ2 , SIDDHARTH JAWADE3 , AKANKSHA SATDEVE4 , PRANAV KHAIRE5 .
1Assistant Professor AI & DS Department K. D. K. COLLEGE OF ENGINEERING, Nagpur
2,3,4,5 K. D. K. COLLEGE OF ENGINEERING, Nagpur
ABSTRACT- Real-time AI-driven virtual hand gesture control has emerged as a powerful interaction technique for nextgeneration human–computer interfaces. With advancements in computer vision, deep learning, and sensor technologies, gesture-basedsystemsnowoffertouchless,intuitive,andimmersivecontrolforvirtualenvironments.Thispaperpresentsan in-depth review of AI-based hand gesture recognition methods used for real-time applications such as virtual reality (VR), augmentedreality(AR),gaming,robotics,smarthomes,andassistivetechnologies.
Thestudyhighlightsstate-of-the-artapproachesincludingconvolutionalneuralnetworks(CNNs),MediaPipe-basedlandmark detection, transformer models, and hybrid deep learning frameworks that enable accurate hand tracking and gesture classification. Real-time performance challenges such as varying lighting conditions, occlusion, background noise, depth estimation,andcomputationalefficiency arediscussedalongwithexistingsolutions.
Furthermore,thisreviewexplorestheroleofoptimizationtechniques,lightweightneuralnetworks,andedge-AIdeployment forimprovingspeedandperformanceonlow-powerdevices.Thepaperconcludesbyidentifyingresearchgapsandpresenting futuredirectionssuchasmultimodalgesturesensing,3Dskeletonmodelling,adaptivelearning,andintegrationwithwearable devicesforenhancedvirtualinteraction.
Overall,thisworkaimstoprovideacomprehensiveunderstandingofreal-timeAI-drivenvirtualhandgesturecontrolsystems, theirtechnologicalcomponents,challenges,andpotentialapplicationsinmoderninteractivesystems.
Keywords: Real-time gesture recognition, Hand tracking, Deep learning, Computer vision, AI-driven control, Gesture classification, Virtual interaction, Human–computer interaction, 3D hand pose estimation.
Handgesture recognitionhasbecomeoneofthemostintuitiveandnatural ways tointeractwithdigital systems.Asmodern technology moves toward touchless and seamless communication, gesture-based interfaces offer users the ability to control devices throughsimple handmovements, eliminating the need for physical controllers.Thistransformationis mainlydriven by advancements in artificial intelligence (AI), machine learning, and computer vision, which together enable computers to understandhumangestureswithhighprecision.
Traditional gesture recognition methods relied heavily on specialized hardware such as sensor gloves, infrared cameras, or depth-sensingdevices.Although thesesystemsprovidedaccurateresults,theywereexpensive,complex,and notsuitablefor everyday applications. However, with the rise of deep learning and lightweight AI models, gesture recognition has become more accessible. Modern systems can recognize gestures using standard RGB cameras found in laptops and smartphones. Technologies such as Convolutional Neural Networks (CNNs), recurrent networks, transformer-based models, and real-time landmarkdetectionframeworkslikeMediaPipehavemadegesturetrackingfaster,moreaccurate,andeasytodeploy.
Real-timevirtualhandgesturecontrolisespeciallyimportantforapplicationsinvirtualreality(VR),augmentedreality(AR), robotics,smarthomes,automotiveinterfaces,assistivedevices,andgaming.InVR/ARsystems,gesturerecognitionimproves immersion by allowing users to interact naturally with virtual objects. In robotics, gesture-based commands allow safe and remote human–robot collaboration. Similarly, touchless interfaces have become increasingly important in healthcare and publicenvironmentswherephysicalcontactmustbeminimized.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 12 | Dec 2025 www.irjet.net p-ISSN: 2395-0072
Despiterapidprogress,severalchallengesstilllimitthefullpotentialofgesturecontrol.Real-timegesturerecognitionrequires low latency and robust performance under different lighting conditions, complex backgrounds, occlusion of fingers, and variations in hand size, shape, and skin tone. Fast and dynamic gestures also introduce difficulties because the system must trackrapidmotionwhilemaintainingaccuracy
To address these issues, researchers have introduced several innovations such as 3D hand pose estimation, hybrid deeplearning models, multimodal sensing (RGB + depth), and edge-AI optimization. These advancements aim to make gesture recognition systems more reliable, efficient, and adaptive to real-world environments. As AI continues to evolve, gesturebasedinteractionisexpectedtobecomeacorecomponentoffuturehuman–computerinteractionsystems.
Thisreviewpaperfocusesonthecurrentdevelopments,methodologies,challenges,andreal-timeperformanceconsiderations involved in AI-driven virtual hand gesture control. It provides a detailed understanding of the techniques used for gesture detection, their applications, limitations, and future research directions that can enhance the usability of gesture-based systemsacrossvariousdomains.
Author / Year TechniqueUsed
Zhang et al. (2018)
Google Research (2019)
Molchanov et al.(2020)
CNN-based static gesturerecognition
Media Pipe Hand LandmarkModel
3D CNN for dynamicgestures
Li & Lee (2021) Transformer-based recognition
Kim et al. (2021)
Gupta et al. (2022)
Huang et al. (2022)
Singh & Sharma (2023)
Wang et al. (2023)
Dataset/Input KeyContribution
RGBimages Improved accuracy in static hand gesture classification usingdeepCNNfeatures
Real-time camerafeed
Lightweight 21-point hand landmarktrackingsuitablefor real-timeinteraction
Videosequences Captures temporal motion patterns for dynamic gesture recognition
RGB + depth data High accuracy through attention-based gesture variationmodelling
CNN-LSTM hybrid model Continuous gesture sequences
YOLO-based hand detection
3D Hand Pose Estimation
Better recognition of dynamic gestures using spatial + temporalfeatures
Livevideofeed Fast detection and gesture segmentation for real-time systems
3D skeleton data Accurate fingertip and joint tracking for real-time applications
Edge-AI optimized model Mobile camera stream
RGB-D Fusion Model
RGB + depth sensor
Reducedlatencyandimproved performance on mobile/edge devices
Reduces ambiguity through dual-sensorfusion
Limitations
Works only for static gestures
Performance decreases in low-lightandocclusion
High computational requirement
Higherlatencyforreal-time deployment
Complex training and tuning
Accuracy drops with cluttered/complex background
Requires strong GPU for smoothexecution
Struggleswithfastdynamic gestures
NotsuitableforsingleRGB camerasystems

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 12 | Dec 2025 www.irjet.net p-ISSN: 2395-0072
The methodology of this review paper focuses on systematically analysing existing research on real-time AI-driven virtual handgesturecontrol.Astructuredapproachwasfollowedtoidentifyrelevantstudies,comparetheirtechniques,andextract key findings. The methodology consists of four major steps: literature selection, analysis of AI techniques, comparative evaluation,and identification of research gaps
4.1 Literature Selection
Relevant research papers published between 2018 and 2024 were collected from IEEE Xplore, Springer, ScienceDirect, ACM DigitalLibrary,andGoogle Scholar.Thekeywordsusedforsearchingincluded “handgesturerecognition,”“real-timegesture control,”“AIgesturetracking,”“deeplearningforgestures,”and“virtualhandinteraction.”
Atotalof 40+ papers wereexamined,outofwhichthemostrelevant 9 studies wereselectedbasedonthefollowingcriteria:
Focusonreal-timeornearreal-timegesturerecognition
UseofAI,computervision,ordeeplearning
ContributiontowardvirtualinteractionorHCIapplications
Cleardocumentationofmethodologyandresults
4.2 Analysis of AI Techniques
Eachselectedstudywasexaminedtounderstand:
Thetypeofgesturerecognitionmodelused(CNN,3DCNN,CNN-LSTM,Transformer,MediaPipe,YOLO,etc.)
Inputmodality(RGBimages,videosequences,depthcamera,3Dlandmarks)
Trainingprocess,datasetused,andperformancemetrics
Real-timecapabilitiessuchaslatency,framerate,andaccuracy
Specialattentionwasgiventoapproachesusing:
Real-timelandmarkdetection
Hybridmodels(CNN+LSTM)
3Dposeestimation
Edge-AIoptimizationtechniques
Theselectedstudieswerecomparedbasedon:
Computationalefficiency
Accuracyinstaticvsdynamicgestures
Real-timeperformance(FPS,latency)
Hardwarerequirements(GPU,CPU,mobiledevices)
Limitationssuchasocclusion,lightingvariations,andbackgroundclutter
Acomparativetable(Section3)summarizesthestrengthsandweaknessesofeachmodel.Thisevaluationhelpsidentifywhich techniquesaremostsuitableforreal-worldreal-timegesturecontrol.
Basedonthefindingsfromliteraturereviewandcomparativeanalysis,themethodologyincludesidentifying:
Unsolvedchallenges
Limitationsofexistingsystems
Areasneedingfurtherresearchsuchasmultimodalsensing,lightweightAImodels,robust3Dtracking,andreal-time optimization
Thesegapsformthebasisforfutureworkandhelpdefinethedirectionforupcomingresearchingesture-basedhuman–computerinteraction.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 12 | Dec 2025 www.irjet.net p-ISSN: 2395-0072
Theproposedworkfocusesondevelopinganimprovedreal-timeAI-drivenvirtualhandgesturecontrolsystemthataddresses thelimitationsidentifiedintheexistingstudies.Toovercomethesechallenges,theproposedframeworkaimstocombinefast detection, lightweight classification, and robust preprocessing techniques to enable accurate and stable hand gesture recognitionsuitableforreal-timeapplicationssuchasVR/AR,robotics,gaming,andsmartenvironments.Thekeycomponents oftheproposedsystemaresummarizedbelow:
Hybrid Hand Detection Module: A two-stage detection approach using a YOLO-based hand detector for quick localizationfollowed by Media Pipe-style21-point landmark extraction for precise finger and joint tracking. This enhances stability in cluttered or complexbackgrounds.
Lightweight Gesture Classification Model:
A deep-learning model combining MobileNetV2 or Efficient Net- Lite with LSTM/GRU layers to efficiently recognize bothstaticanddynamicgestures.Thelightweightarchitectureensureslowlatencyandhighaccuracy.
Enhanced Robustness for Real-World Conditions:
Techniquessuchasadaptivebrightnesscontrol,temporalocclusionrecovery,Kalmanfilter–basedmotionsmoothing, andregionofinterest(ROI)refinementwillbeusedtohandlefastmovements,lowlighting,andpartialocclusion.
Real-Time Virtual Interaction Layer:
Recognizedgestureswillbemappedtovirtualcommandssuchasclicking,scrolling,dragging,zooming,and3Dobject manipulation.APIswillenableeasyintegrationwithVR/ARsystems,robots,smarthomes,andgaminginterfaces.
Edge-AI Optimization:
Modelpruning,quantization,andreducedframe-sizeprocessingwillbeusedtoensuresmoothperformanceonlowpowerdevicessuchasmobilephones,RaspberryPi,andembeddedsystems.
Thisproposedsystemisexpectedtodeliverhighaccuracy,lowlatency,andimprovedrobustness,makingitsuitablefornextgenerationtouchlessinterfacesandenhancingoverallhuman–computerinteraction.


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 12 | Dec 2025 www.irjet.net p-ISSN: 2395-0072
PROBLEM STATEMENT AND PROPOSED OUTCOME
PROBLEM STATEMENT:
Traditional input devices like keyboard, mouse, and touchscreens limit natural interaction and require physical contact,whichisnotsuitableformodernhands-freeapplications.
GrowinguseofAR/VR,robotics,smartautomation,andassistivetechnologydemandsmoreintuitiveandcontactless interactionmethods.
Existinggesturerecognitionsystemsstrugglewithaccuracyduetovariationsinlighting,backgroundclutter,camera quality,anddifferencesinhandshape,size,andskintone.
Dynamic gesture recognition is especially challenging because of motion blur, occlusion, and inconsistent gesture speed.
Many current systems suffer from high latency, making them unsuitable for real-time applications like gaming or roboticcontrol.
Mostavailabledatasetsarelimitedornotdiverse enough,reducingmodel generalizationacross different real-world conditions.
PROPOSED OUTCOME:
Development of a real-time AI-driven hand gesture control system capable of detecting static and dynamic gestures withhighaccuracy.
Achievelow-latencyperformanceforsmooth,responsive,andnaturalinteractioninreal-timeapplications.
Ensurerobustnessacrossdifferentenvironments,includingvaryinglightconditions,backgrounds,cameraangles,and userhandvariations.
Provideatouchlessanduser-friendlyinterfacethateliminatestheneedforphysicalcontactorwearablesensors.
Create a scalable and modular framework that can integrate easily with AR/VR systems, robotics, smart home automation,gaming,andassistivedevices.
Improveuseraccessibilitybyofferingintuitivegesture-basedcontrolforpeoplewithphysicallimitationsormobility issues.
Integration with AR/VR and Metaverse: Future systems can enable fully gesture-controlled immersive environments for gaming, virtual classrooms, simulations,andremotemeetings.
Multimodal Interaction Systems:
Gesture control can be combined with voice commands, eye-tracking, and facial expressions to create more natural andintelligenthuman–machineinteraction.
Edge-AI Deployment:
Gesture recognition models can be optimized to run on smartphones, microcontrollers, and wearable devices, enablingportableandreal-timeapplications.
Assistive & Healthcare Applications:
Touchless control for disabled users, gesture-based prosthetic control, rehabilitation systems, and hygienic hospital interfacescanbedeveloped.
Smart Home & IoT Automation: Future homes can use gestures to control lights, appliances, security systems, and entertainment devices without physicaltouch.
3D Depth & Sensor Fusion Technology: Combining2Dvisionwithdepthcameras,LiDAR,orIMUsensorswillimprove3Dgesturetracking,reduce occlusion errors,andincreaseaccuracy.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 12 | Dec 2025 www.irjet.net p-ISSN: 2395-0072
Personalized Gesture Learning: Futuresystemscanautomaticallylearnandadapttoeachuser’suniquegesturestyle,speed,andmotionpatterns improvingaccuracyanduserexperienceovertime
1. S. Molchanov, X. Yang, S. Gupta, K. Kim, and K. Pulli, “Online Detection and Classification of Dynamic Hand Gestures UsingRecurrentNeuralNetworks,”CVPRWorkshops,2016.
2. A. T. Nagi, S. A. Suhail, and M. Hanmandlu, “A Vision-Based Hand Gesture Recognition System Using Deep Learning,” ProcediaComputerScience,vol.171,pp.1361–1369,2020.
3. M. Asadi-Aghbolaghi et al., “Deep Learning for Action and Gesture Recognition: A Survey,” Pattern Recognition, vol. 99,2020.
4. X.Chen,H.Wang,andP.Li,“Real-TimeHandGestureRecognitionUsingCNN-BasedFeatureExtraction,”IEEEAccess, vol.7,pp.143190–143200,2019.
5. I.Goodfellow,Y.Bengio,andA.Courville,DeepLearning,MITPress,2016.
6. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,”IEEETPAMI,2017.
7. M. Zhang and H. Wu, “Human–Computer Interaction Using Hand Gestures Based on Computer Vision,” International JournalofComputerApplications,2019.
8. Z.Cao,T.Simon,S.-E.Wei,andY.Sheikh,“RealtimeMulti-Person2DPoseEstimationUsingPartAffinityFields,”CVPR, 2017.
9. LeapMotionInc.,“HandTrackingTechnologyOverview,”TechnicalReport,2018.
10. H. J. Lee and J. H. Kim, “An HMM-Based Threshold Model Approach for Gesture Recognition,” IEEE PAMI, vol. 21, no. 10,1999.
11. D.WuandL.Shao,“DeepDynamicHandGestureRecognition:AReview,”Neurocomputing,vol.340,2019.
12. M. Mittal, G. Goyal, and D. K. Vishwakarma, “Hand Gesture Recognition Using Deep Convolutional Neural Network,” ProcediaComputerScience,vol.218,2023.
13. J.Shottonetal.,“Real-TimeHumanPoseRecognitioninPartsfromaSingleDepthImage,”CVPR,2011.
14. GoogleMediaPipe,“MediaPipeHands:Real-TimeHandTracking,”GoogleAIBlog,2020.
15. S. Zhang, W. Wu, and Y. Li, “Gesture-Based Human–Computer Interaction Using 3D Vision Sensors,” Sensors, vol. 19, no.18,2019.
16. Y.LiandS.Lee,“Vision-BasedDynamicHandGestureRecognitionUsingDeepLearning,”IEEESensorsJournal,2021.
17. N. Olszewski, G. France, and R. Salakhutdinov, “3D Hand Shape and Pose Estimation Using Deep Neural Networks,” ACMSIGGRAPH,2020. © 2025, IRJET | Impact Factor value: 8.315 | ISO 9001:2008 Certified Journal | Page203