Skip to main content

A Review on Real Time AI-Driven Virtual Hand Gesture Control

Page 1


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 12 | Dec 2025 www.irjet.net p-ISSN: 2395-0072

A Review on Real Time AI-Driven Virtual Hand Gesture Control

1Assistant Professor AI & DS Department K. D. K. COLLEGE OF ENGINEERING, Nagpur

2,3,4,5 K. D. K. COLLEGE OF ENGINEERING, Nagpur

ABSTRACT- Real-time AI-driven virtual hand gesture control has emerged as a powerful interaction technique for nextgeneration human–computer interfaces. With advancements in computer vision, deep learning, and sensor technologies, gesture-basedsystemsnowoffertouchless,intuitive,andimmersivecontrolforvirtualenvironments.Thispaperpresentsan in-depth review of AI-based hand gesture recognition methods used for real-time applications such as virtual reality (VR), augmentedreality(AR),gaming,robotics,smarthomes,andassistivetechnologies.

Thestudyhighlightsstate-of-the-artapproachesincludingconvolutionalneuralnetworks(CNNs),MediaPipe-basedlandmark detection, transformer models, and hybrid deep learning frameworks that enable accurate hand tracking and gesture classification. Real-time performance challenges such as varying lighting conditions, occlusion, background noise, depth estimation,andcomputationalefficiency arediscussedalongwithexistingsolutions.

Furthermore,thisreviewexplorestheroleofoptimizationtechniques,lightweightneuralnetworks,andedge-AIdeployment forimprovingspeedandperformanceonlow-powerdevices.Thepaperconcludesbyidentifyingresearchgapsandpresenting futuredirectionssuchasmultimodalgesturesensing,3Dskeletonmodelling,adaptivelearning,andintegrationwithwearable devicesforenhancedvirtualinteraction.

Overall,thisworkaimstoprovideacomprehensiveunderstandingofreal-timeAI-drivenvirtualhandgesturecontrolsystems, theirtechnologicalcomponents,challenges,andpotentialapplicationsinmoderninteractivesystems.

Keywords: Real-time gesture recognition, Hand tracking, Deep learning, Computer vision, AI-driven control, Gesture classification, Virtual interaction, Human–computer interaction, 3D hand pose estimation.

INTRODUCTION

Handgesture recognitionhasbecomeoneofthemostintuitiveandnatural ways tointeractwithdigital systems.Asmodern technology moves toward touchless and seamless communication, gesture-based interfaces offer users the ability to control devices throughsimple handmovements, eliminating the need for physical controllers.Thistransformationis mainlydriven by advancements in artificial intelligence (AI), machine learning, and computer vision, which together enable computers to understandhumangestureswithhighprecision.

Traditional gesture recognition methods relied heavily on specialized hardware such as sensor gloves, infrared cameras, or depth-sensingdevices.Although thesesystemsprovidedaccurateresults,theywereexpensive,complex,and notsuitablefor everyday applications. However, with the rise of deep learning and lightweight AI models, gesture recognition has become more accessible. Modern systems can recognize gestures using standard RGB cameras found in laptops and smartphones. Technologies such as Convolutional Neural Networks (CNNs), recurrent networks, transformer-based models, and real-time landmarkdetectionframeworkslikeMediaPipehavemadegesturetrackingfaster,moreaccurate,andeasytodeploy.

Real-timevirtualhandgesturecontrolisespeciallyimportantforapplicationsinvirtualreality(VR),augmentedreality(AR), robotics,smarthomes,automotiveinterfaces,assistivedevices,andgaming.InVR/ARsystems,gesturerecognitionimproves immersion by allowing users to interact naturally with virtual objects. In robotics, gesture-based commands allow safe and remote human–robot collaboration. Similarly, touchless interfaces have become increasingly important in healthcare and publicenvironmentswherephysicalcontactmustbeminimized.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 12 | Dec 2025 www.irjet.net p-ISSN: 2395-0072

Despiterapidprogress,severalchallengesstilllimitthefullpotentialofgesturecontrol.Real-timegesturerecognitionrequires low latency and robust performance under different lighting conditions, complex backgrounds, occlusion of fingers, and variations in hand size, shape, and skin tone. Fast and dynamic gestures also introduce difficulties because the system must trackrapidmotionwhilemaintainingaccuracy

To address these issues, researchers have introduced several innovations such as 3D hand pose estimation, hybrid deeplearning models, multimodal sensing (RGB + depth), and edge-AI optimization. These advancements aim to make gesture recognition systems more reliable, efficient, and adaptive to real-world environments. As AI continues to evolve, gesturebasedinteractionisexpectedtobecomeacorecomponentoffuturehuman–computerinteractionsystems.

Thisreviewpaperfocusesonthecurrentdevelopments,methodologies,challenges,andreal-timeperformanceconsiderations involved in AI-driven virtual hand gesture control. It provides a detailed understanding of the techniques used for gesture detection, their applications, limitations, and future research directions that can enhance the usability of gesture-based systemsacrossvariousdomains.

LITERATURE REVIEW

Author / Year TechniqueUsed

Zhang et al. (2018)

Google Research (2019)

Molchanov et al.(2020)

CNN-based static gesturerecognition

Media Pipe Hand LandmarkModel

3D CNN for dynamicgestures

Li & Lee (2021) Transformer-based recognition

Kim et al. (2021)

Gupta et al. (2022)

Huang et al. (2022)

Singh & Sharma (2023)

Wang et al. (2023)

Dataset/Input KeyContribution

RGBimages Improved accuracy in static hand gesture classification usingdeepCNNfeatures

Real-time camerafeed

Lightweight 21-point hand landmarktrackingsuitablefor real-timeinteraction

Videosequences Captures temporal motion patterns for dynamic gesture recognition

RGB + depth data High accuracy through attention-based gesture variationmodelling

CNN-LSTM hybrid model Continuous gesture sequences

YOLO-based hand detection

3D Hand Pose Estimation

Better recognition of dynamic gestures using spatial + temporalfeatures

Livevideofeed Fast detection and gesture segmentation for real-time systems

3D skeleton data Accurate fingertip and joint tracking for real-time applications

Edge-AI optimized model Mobile camera stream

RGB-D Fusion Model

RGB + depth sensor

Reducedlatencyandimproved performance on mobile/edge devices

Reduces ambiguity through dual-sensorfusion

Limitations

Works only for static gestures

Performance decreases in low-lightandocclusion

High computational requirement

Higherlatencyforreal-time deployment

Complex training and tuning

Accuracy drops with cluttered/complex background

Requires strong GPU for smoothexecution

Struggleswithfastdynamic gestures

NotsuitableforsingleRGB camerasystems

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 12 | Dec 2025 www.irjet.net p-ISSN: 2395-0072

METHODOLOGY

The methodology of this review paper focuses on systematically analysing existing research on real-time AI-driven virtual handgesturecontrol.Astructuredapproachwasfollowedtoidentifyrelevantstudies,comparetheirtechniques,andextract key findings. The methodology consists of four major steps: literature selection, analysis of AI techniques, comparative evaluation,and identification of research gaps

4.1 Literature Selection

Relevant research papers published between 2018 and 2024 were collected from IEEE Xplore, Springer, ScienceDirect, ACM DigitalLibrary,andGoogle Scholar.Thekeywordsusedforsearchingincluded “handgesturerecognition,”“real-timegesture control,”“AIgesturetracking,”“deeplearningforgestures,”and“virtualhandinteraction.”

Atotalof 40+ papers wereexamined,outofwhichthemostrelevant 9 studies wereselectedbasedonthefollowingcriteria:

 Focusonreal-timeornearreal-timegesturerecognition

 UseofAI,computervision,ordeeplearning

 ContributiontowardvirtualinteractionorHCIapplications

 Cleardocumentationofmethodologyandresults

4.2 Analysis of AI Techniques

Eachselectedstudywasexaminedtounderstand:

 Thetypeofgesturerecognitionmodelused(CNN,3DCNN,CNN-LSTM,Transformer,MediaPipe,YOLO,etc.)

 Inputmodality(RGBimages,videosequences,depthcamera,3Dlandmarks)

 Trainingprocess,datasetused,andperformancemetrics

 Real-timecapabilitiessuchaslatency,framerate,andaccuracy

Specialattentionwasgiventoapproachesusing:

 Real-timelandmarkdetection

 Hybridmodels(CNN+LSTM)

 3Dposeestimation

 Edge-AIoptimizationtechniques

4.3 Comparative Evaluation

Theselectedstudieswerecomparedbasedon:

 Computationalefficiency

 Accuracyinstaticvsdynamicgestures

 Real-timeperformance(FPS,latency)

 Hardwarerequirements(GPU,CPU,mobiledevices)

 Limitationssuchasocclusion,lightingvariations,andbackgroundclutter

Acomparativetable(Section3)summarizesthestrengthsandweaknessesofeachmodel.Thisevaluationhelpsidentifywhich techniquesaremostsuitableforreal-worldreal-timegesturecontrol.

4.4 Identification of Research Gaps

Basedonthefindingsfromliteraturereviewandcomparativeanalysis,themethodologyincludesidentifying:

 Unsolvedchallenges

 Limitationsofexistingsystems

 Areasneedingfurtherresearchsuchasmultimodalsensing,lightweightAImodels,robust3Dtracking,andreal-time optimization

Thesegapsformthebasisforfutureworkandhelpdefinethedirectionforupcomingresearchingesture-basedhuman–computerinteraction.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 12 | Dec 2025 www.irjet.net p-ISSN: 2395-0072

PROPOSED WORK

Theproposedworkfocusesondevelopinganimprovedreal-timeAI-drivenvirtualhandgesturecontrolsystemthataddresses thelimitationsidentifiedintheexistingstudies.Toovercomethesechallenges,theproposedframeworkaimstocombinefast detection, lightweight classification, and robust preprocessing techniques to enable accurate and stable hand gesture recognitionsuitableforreal-timeapplicationssuchasVR/AR,robotics,gaming,andsmartenvironments.Thekeycomponents oftheproposedsystemaresummarizedbelow:

 Hybrid Hand Detection Module: A two-stage detection approach using a YOLO-based hand detector for quick localizationfollowed by Media Pipe-style21-point landmark extraction for precise finger and joint tracking. This enhances stability in cluttered or complexbackgrounds.

 Lightweight Gesture Classification Model:

A deep-learning model combining MobileNetV2 or Efficient Net- Lite with LSTM/GRU layers to efficiently recognize bothstaticanddynamicgestures.Thelightweightarchitectureensureslowlatencyandhighaccuracy.

 Enhanced Robustness for Real-World Conditions:

Techniquessuchasadaptivebrightnesscontrol,temporalocclusionrecovery,Kalmanfilter–basedmotionsmoothing, andregionofinterest(ROI)refinementwillbeusedtohandlefastmovements,lowlighting,andpartialocclusion.

 Real-Time Virtual Interaction Layer:

Recognizedgestureswillbemappedtovirtualcommandssuchasclicking,scrolling,dragging,zooming,and3Dobject manipulation.APIswillenableeasyintegrationwithVR/ARsystems,robots,smarthomes,andgaminginterfaces.

 Edge-AI Optimization:

Modelpruning,quantization,andreducedframe-sizeprocessingwillbeusedtoensuresmoothperformanceonlowpowerdevicessuchasmobilephones,RaspberryPi,andembeddedsystems.

Thisproposedsystemisexpectedtodeliverhighaccuracy,lowlatency,andimprovedrobustness,makingitsuitablefornextgenerationtouchlessinterfacesandenhancingoverallhuman–computerinteraction.

Fig. Flow Chart of Virtual AI Mouse

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 12 | Dec 2025 www.irjet.net p-ISSN: 2395-0072

PROBLEM STATEMENT AND PROPOSED OUTCOME

PROBLEM STATEMENT:

 Traditional input devices like keyboard, mouse, and touchscreens limit natural interaction and require physical contact,whichisnotsuitableformodernhands-freeapplications.

 GrowinguseofAR/VR,robotics,smartautomation,andassistivetechnologydemandsmoreintuitiveandcontactless interactionmethods.

 Existinggesturerecognitionsystemsstrugglewithaccuracyduetovariationsinlighting,backgroundclutter,camera quality,anddifferencesinhandshape,size,andskintone.

 Dynamic gesture recognition is especially challenging because of motion blur, occlusion, and inconsistent gesture speed.

 Many current systems suffer from high latency, making them unsuitable for real-time applications like gaming or roboticcontrol.

 Mostavailabledatasetsarelimitedornotdiverse enough,reducingmodel generalizationacross different real-world conditions.

PROPOSED OUTCOME:

 Development of a real-time AI-driven hand gesture control system capable of detecting static and dynamic gestures withhighaccuracy.

 Achievelow-latencyperformanceforsmooth,responsive,andnaturalinteractioninreal-timeapplications.

 Ensurerobustnessacrossdifferentenvironments,includingvaryinglightconditions,backgrounds,cameraangles,and userhandvariations.

 Provideatouchlessanduser-friendlyinterfacethateliminatestheneedforphysicalcontactorwearablesensors.

 Create a scalable and modular framework that can integrate easily with AR/VR systems, robotics, smart home automation,gaming,andassistivedevices.

 Improveuseraccessibilitybyofferingintuitivegesture-basedcontrolforpeoplewithphysicallimitationsormobility issues.

FUTURE SCOPE

 Integration with AR/VR and Metaverse: Future systems can enable fully gesture-controlled immersive environments for gaming, virtual classrooms, simulations,andremotemeetings.

 Multimodal Interaction Systems:

Gesture control can be combined with voice commands, eye-tracking, and facial expressions to create more natural andintelligenthuman–machineinteraction.

 Edge-AI Deployment:

Gesture recognition models can be optimized to run on smartphones, microcontrollers, and wearable devices, enablingportableandreal-timeapplications.

 Assistive & Healthcare Applications:

Touchless control for disabled users, gesture-based prosthetic control, rehabilitation systems, and hygienic hospital interfacescanbedeveloped.

 Smart Home & IoT Automation: Future homes can use gestures to control lights, appliances, security systems, and entertainment devices without physicaltouch.

 3D Depth & Sensor Fusion Technology: Combining2Dvisionwithdepthcameras,LiDAR,orIMUsensorswillimprove3Dgesturetracking,reduce occlusion errors,andincreaseaccuracy.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 12 | Dec 2025 www.irjet.net p-ISSN: 2395-0072

 Personalized Gesture Learning: Futuresystemscanautomaticallylearnandadapttoeachuser’suniquegesturestyle,speed,andmotionpatterns improvingaccuracyanduserexperienceovertime

REFERENCES

1. S. Molchanov, X. Yang, S. Gupta, K. Kim, and K. Pulli, “Online Detection and Classification of Dynamic Hand Gestures UsingRecurrentNeuralNetworks,”CVPRWorkshops,2016.

2. A. T. Nagi, S. A. Suhail, and M. Hanmandlu, “A Vision-Based Hand Gesture Recognition System Using Deep Learning,” ProcediaComputerScience,vol.171,pp.1361–1369,2020.

3. M. Asadi-Aghbolaghi et al., “Deep Learning for Action and Gesture Recognition: A Survey,” Pattern Recognition, vol. 99,2020.

4. X.Chen,H.Wang,andP.Li,“Real-TimeHandGestureRecognitionUsingCNN-BasedFeatureExtraction,”IEEEAccess, vol.7,pp.143190–143200,2019.

5. I.Goodfellow,Y.Bengio,andA.Courville,DeepLearning,MITPress,2016.

6. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,”IEEETPAMI,2017.

7. M. Zhang and H. Wu, “Human–Computer Interaction Using Hand Gestures Based on Computer Vision,” International JournalofComputerApplications,2019.

8. Z.Cao,T.Simon,S.-E.Wei,andY.Sheikh,“RealtimeMulti-Person2DPoseEstimationUsingPartAffinityFields,”CVPR, 2017.

9. LeapMotionInc.,“HandTrackingTechnologyOverview,”TechnicalReport,2018.

10. H. J. Lee and J. H. Kim, “An HMM-Based Threshold Model Approach for Gesture Recognition,” IEEE PAMI, vol. 21, no. 10,1999.

11. D.WuandL.Shao,“DeepDynamicHandGestureRecognition:AReview,”Neurocomputing,vol.340,2019.

12. M. Mittal, G. Goyal, and D. K. Vishwakarma, “Hand Gesture Recognition Using Deep Convolutional Neural Network,” ProcediaComputerScience,vol.218,2023.

13. J.Shottonetal.,“Real-TimeHumanPoseRecognitioninPartsfromaSingleDepthImage,”CVPR,2011.

14. GoogleMediaPipe,“MediaPipeHands:Real-TimeHandTracking,”GoogleAIBlog,2020.

15. S. Zhang, W. Wu, and Y. Li, “Gesture-Based Human–Computer Interaction Using 3D Vision Sensors,” Sensors, vol. 19, no.18,2019.

16. Y.LiandS.Lee,“Vision-BasedDynamicHandGestureRecognitionUsingDeepLearning,”IEEESensorsJournal,2021.

17. N. Olszewski, G. France, and R. Salakhutdinov, “3D Hand Shape and Pose Estimation Using Deep Neural Networks,” ACMSIGGRAPH,2020. © 2025, IRJET | Impact Factor value: 8.315 | ISO 9001:2008 Certified Journal | Page203

Turn static files into dynamic content formats.

Create a flipbook