International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 12 Issue: 07 | Jul 2025
p-ISSN: 2395-0072
www.irjet.net
A Survey on FPGA-Based Object Detection Using Various YOLO Techniques ๐. ๐. ๐. ๐๐ก๐๐ง๐ข ๐๐ฎ๐ฆ๐๐ซ ๐, ๐๐ซ. ๐. ๐๐ก๐๐ง๐ฌ๐ข ๐๐๐ง๐ข๐ 1M. Tech, VLSI & EMBEDDED SYSTEM, ECE, UCEK, JNTU Kakinada, Andhra Pradesh, India 2Assistant Professor, Dept. Of ECE, UCEK, JNTU Kakinada, Andhra Pradesh, India.
------------------------------------------------------------------------***-------------------------------------------------------------------------
Abstract- In recent years, object detection has gained significant attention in computer vision. It has uses in areas like autonomous vehicle navigation, pedestrian detection, surveillance systems, and IoT monitoring. Many studies have aimed to improve the speed and accuracy of object detection systems to meet real-time needs. The latest technique, You Only Look Once (YOLO), takes a fresh approach by treating object detection as a regression problem. YOLOv8, the newest version, allows for the simultaneous prediction of multiple objects in an image with a single pass, achieving both high accuracy and fast inference. This paper surveys FPGA-based object detection methods that use YOLO and focuses on improving the efficiency of current systems for embedded environments. It reviews various modifications to the YOLO framework, including quantization strategies (binary, multi-bit, fixed-point), hardware optimizations like pipelining, and Verilogbased FPGA implementations, analyzing their trade-offs in accuracy, throughput, and power consumption.
networks, YOLO achieves real-time performance on both CPUs and GPUs. This makes it a standard for applications that need quick inference [1]. The latest iteration, YOLOv8, released by Ultralytics in 2023, introduces significant architectural advancements that enhance its performance on standard benchmarks like COCO and PASCAL VOC. YOLOv8 incorporates an improved CSPDarknet backbone with C2f modules for efficient feature extraction, a PANet-inspired neck for robust multi-scale feature fusion, and a decoupled, anchor-free detection head that simplifies the prediction process while improving inference speed. These enhancements enable YOLOv8 to achieve state-of-the-art mean Average Precision (mAP) and fast processing, making it ideal for real-time applications. However, deploying such a computationally intensive model on resource-constrained edge devices, such as those used in drones, mobile robots, or IoT nodes, poses significant challenges. The deep layers and multi-scale processing of YOLOv8 demand substantial computational resources, which are often unavailable on edge platforms with limited memory, power budgets, and processing capabilities, necessitating specialized hardware solutions to maintain performance in embedded environments [2].
Keywords- Object Detection, YOLO, Binary Neural Networks (BNN), Hardware Acceleration, Embedded Vision.
1. Introduction
Field-Programmable Gate Arrays (FPGAs) offer a promising platform for deploying YOLOv8 in such resource-constrained settings due to their low power consumption and reconfigurable nature. Unlike GPUs, which benefit from high-bandwidth memory and massive parallelism, FPGAs have limited resources, including Look-Up Tables (LUTs), Block RAMs (BRAMs), and Digital Signal Processing (DSP) blocks, requiring careful optimization to handle YOLOv8โs complex operations. Translating the modelโs floating-pointintensive computations into FPGA-compatible designs involves transforming layers such as convolution, batch normalization, activation functions, pooling, and fully connected layers into efficient hardware modules. Hardware description languages like Verilog are employed to design these digital logic circuits, enabling precise control over dataflow, timing, and resource allocation. However, achieving real-time performance (typically 30+ frames per second) while minimizing power consumption and external memory access remains a significant challenge, as FPGA designs must
Real-time object detection has become essential in modern computer vision. It supports technologies like autonomous vehicles, smart surveillance systems, advanced robotics, and vision applications in the Internet of Things (IoT). These systems require the quick classification and localization of multiple objects in images or video streams. They need both high accuracy and low latency to work well in changing environments. Traditional computer vision methods, which used handcrafted features, have largely been replaced by deep learning models because of their better ability to adapt to different scenes and datasets. low latency to work well in changing environments. Traditional computer vision methods, which used hand-crafted features, have largely been replaced by deep learning models because of their better ability to adapt to different scenes and datasets. Among these models, the You Only Look Once (YOLO) family is widely used for its unified detection approach. It treats object detection as a regression problem. By removing the need for complex region proposal
ยฉ 2025, IRJET
|
Impact Factor value: 8.315
|
ISO 9001:2008 Certified Journal
|
Page 734