Issuu

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 12 Issue: 10 | Oct 2025

p-ISSN: 2395-0072

www.irjet.net

A Comparative Analysis of K-Nearest Neighbors and Random Forest Regression for Predicting Solar Panel Power Generation Prof. Chetankumar B. Parmar1 1Assistant Professor, BCA Department, Narmada College of Science & Commerce, Bharuch- Gujarat, India,

Research Scholar – Veer Narmad South Gujarat University (Computer Science Dept.), Surat ---------------------------------------------------------------------***--------------------------------------------------------------------This research investigates the application of two prominent Abstract - The accurate forecasting of solar power generation is critical for the efficient integration of renewable energy into the smart grid and energy markets [6]. This study presents a comparative analysis of two machine learning models—K-Nearest Neighbors (KNN) and Random Forest (RF) Regression—for predicting the power output of a solar panel system based on meteorological and positional data. A dataset comprising 4,213 instances with 20 feature variables, including temperature, humidity, cloud cover, solar radiation, and wind parameters, was utilized [20]. The data was preprocessed, and the models were trained and evaluated using standard metrics. The Random Forest Regressor demonstrated superior performance, achieving an R² score of 0.8201 and a Root Mean Squared Error (RMSE) of 399.41 kW, compared to the KNN model's R² of 0.7834 and a higher MSE. The results underscore the efficacy of ensemble methods like Random Forest in handling complex, non-linear relationships in solar energy prediction tasks [7, 13], providing a robust tool for real-time energy management and planning.

ML algorithms for regression: K-Nearest Neighbors (KNN) and Random Forest (RF). The study utilizes a real-world dataset containing various environmental and systemoriented features to predict generated power in kilowatts (kW). The performance of both models is rigorously evaluated and compared, providing insights into their respective strengths and applicability in the domain of solar energy forecasting, a area of active research as seen in recent literature [1, 3, 5].

Key Words: Solar Power Forecasting, Machine Learning, Random Forest, K-Nearest Neighbors, Renewable Energy, Predictive Modeling, Smart Grid.

Random Forest, an ensemble learning method, operates by constructing a multitude of decision trees and outputting the mean prediction. It remains highly effective for regression tasks due to its robustness against overfitting and its ability to model complex interactions [7, 8]. Recent studies continue to validate its superiority; for instance, Chatterjee et al. (2023) found RF to be a top performer for PV output prediction due to its ability to handle non-linear relationships between weather variables and power generation [7]. Furthermore, its utility in feature importance analysis provides valuable insights for domain experts [8, 18].

2. Literature Review The use of machine learning in solar energy forecasting has evolved from simple statistical models to sophisticated ensemble and deep learning techniques. Recent reviews and studies highlight the dominance of tree-based models and neural networks in achieving state-of-the-art accuracy [1, 5, 7].

1. INTRODUCTION The global transition towards sustainable energy sources has positioned solar power at the forefront of renewable energy technologies [1]. However, the inherent intermittency and variability of solar irradiance pose significant challenges to grid stability, energy management, and market operations [2, 12]. Accurate models for predicting solar power output are, therefore, essential for optimizing grid operations, facilitating energy trading, and ensuring a reliable power supply [3, 16].

The K-Nearest Neighbors algorithm, a simple instancebased method, has seen application in various energy prediction contexts. While it can be computationally intensive, its simplicity is advantageous. However, its performance is often surpassed by ensemble methods on larger, more complex datasets [12, 13]. Comparative analyses, such as the one by Rahman et al. (2021), often place KNN behind more advanced ensemble methods in terms of predictive accuracy for solar power [12].

Traditional physical models for solar forecasting often require detailed system parameters and can be computationally intensive [4]. In contrast, data-driven machine learning (ML) approaches have gained prominence for their ability to learn complex, non-linear patterns directly from historical data without explicit physical equations [5, 15]. These models can incorporate a multitude of meteorological variables to generate highly accurate shortterm forecasts, which are crucial for grid integration [6, 14].

Impact Factor value: 8.315

Current research frontiers involve not only model comparison but also the development of advanced deep learning architectures [3] and the integration of new data sources, such as air quality information [2]. The exploration of models that can generalize across different geographical

ISO 9001:2008 Certified Journal

Page 651