Science Interrupted: Rethinking Research Practice with Bureaucracy, Agroforestry, and Ethnography (Expertise: Cultures and Technologies of Knowledge) Timothy G. Mclellan
Machine Learning and the Internet of Things in Education
Models and Applications
Studies in Computational Intelligence
Volume 1115
Series Editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes new developments and advances in the various areas of computational intelligence—quickly and with a high quality. The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them. The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, selforganizing systems, soft computing, fuzzy systems, and hybrid intelligent systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output.
Indexed by SCOPUS, DBLP, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
John Bush Idoko · Rahib Abiyev
Editors
Machine Learning and the Internet of Things in Education
Models and Applications
Editors
John Bush Idoko Department of Computer Engineering
Near East University
Nicosia, Cyprus
Rahib Abiyev Department of Computer Engineering
Near East University
Nicosia, Cyprus
ISSN 1860-949X
ISSN 1860-9503 (electronic)
Studies in Computational Intelligence
ISBN 978-3-031-42923-1
ISBN 978-3-031-42924-8 (eBook) https://doi.org/10.1007/978-3-031-42924-8
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Paper in this product is recyclable.
Preface
This book showcases several Machine Learning Techniques and Internet of Things technologies, particularly for learning purposes. The techniques and ideas demonstrated in this book can be explored by targeted researchers, teachers, and students to explicitly ease their learning and research exercises and goals in the field of Artificial Intelligence (AI) and Internet of Things (IoT).
The AI and IoT technologies enumerated and demonstrated in this book will enable systems to be simulative, predictive, prescriptive, and autonomous, and, interestingly, the integration of these technologies can further enhance emerging applications from being assisted to augmented, and ultimately to self-operating smart systems. This book focuses on the design and implementation of algorithmic applications in the field of artificial intelligence and Internet of Things with pertinent applications. The book further depicts the challenges and understanding of the role of technology in the fields of machine learning and Internet of Things for teaching, learning, and research purposes.
The theoretical and practical applications of AI techniques and IoT technologies featured in the book include but not limited to: different algorithmic and practical parts of the AI techniques and IoT technologies for scientific problem diagnosis and recognition, medical diagnosis, e-health, e-learning, e-governance, blockchain technologies, optimizations and predictions, industrial and smart office/home automation, supervised and unsupervised machine learning for IoT data and devices, etc.
Cyprus
Nicosia,
John Bush Idoko Rahib Abiyev
Mansur Mohammed, and Abubakar Usman
John Bush Idoko
John Bush Idoko and Joseph Palmer
A Semantic Portal to Improve Search on Rivers State’s Independent National Electoral Commission
John Bush Idoko and David Tumuni Ogolo
Implementation of Semantic Web Service and Integration of e-Government Based Linked Data
John Bush Idoko and Bashir Abdinur Ahmed
Application of Zero-Trust Networks in e-Health Internet of Things (IoT) Deployments .................................................
Morgan Morgak Gofwen, Bartholomew Idoko, and John Bush Idoko
IoT Security Based Vulnerability Assessment of E-learning Systems ....
Bartholomew Idoko and John Bush Idoko
Blockchain Technology, Artificial Intelligence, and Big Data in Education ......................................................
Ramiz Salama and Fadi Al-Turjman
Sustainable Education Systems with IOT Paradigms
Ramiz Salama and Fadi Al-Turjman
Kamil Dimililer, Ezekiel Tijesunimi Ogidan, and Oluwaseun Priscilla Olawale
About the Editors
John Bush Idoko graduated from the Benue State University Makurdi, Nigeria where he obtained BSc degree in Computer Science in 2010. He started M.Sc. program in Computer Engineering at Near East University, North Cyprus. After receiving M.Sc. degree in 2017, he started Ph.D. program in the same department in the same year. During his post-graduate programs at Near East University, he worked as Research Assistant in the Applied Artificial Intelligence Research Centre. He obtained his Ph.D. in 2020 and he is currently an Assistant Professor in Computer Engineering Department, Near East University, Cyprus. His research interests include but not limited to: AI, machine learning, deep learning, computer vision, data analysis, soft computing, advanced image processing, and bioinformatics.
Rahib Abiyev received the B.Sc. and M.Sc. degrees (First Class Hons.) in Electrical and Electronic Engineering from Azerbaijan State Oil Academy, Baku, in 1989, and the Ph.D. degree in Electrical and Electronic Engineering from ComputerAided Control System Department of the same university, in 1997. He was a Senior Researcher with the research laboratory “Industrial intelligent control systems” of Computer-Aided Control System Department. In 1999, he joined the Department of Computer Engineering, Near East University, Nicosia, North Cyprus, where he is currently a Full Professor and the Chair of Computer Engineering Department. In 2001, he founded Applied Artificial Intelligence Research Centre and in 2008, he created “Robotics” research group. He is currently the Director of the Research Centre. He has published over 300 papers in related subjects. His current research interests include soft computing, control systems, robotics, and signal processing.
Introduction to Machine Learning and IoT
John Bush Idoko and Rahib Abiyev
Abstract Smart systems constructed off machine learning/internet of things technologies are systems that have the ability to reason, calculate, learn from experience, perceive relationships and analogies, store and retrieve data from memory, understand complicated concepts, solve problems, speak fluently in plain language, generalize, classify, and adapt to changing circumstances. To make decisions and build smart environment, mart systems combine sensing, actuation, signal processing, and control. The Internet of Things (IoT) is being developed significantly as a result of the real-time networked information and control that smart systems provide. Smart systems are the next generation of computing and information systems, combining artificial intelligence (AI), machine learning, edge/cloud computing, cyber-physical systems, big data analytics, pervasive/ubiquitous computing, and IoT technologies. Recent years have brought along some significant hurdles for smart systems due to the wide variety of AI applications, IoT devices, and technology. A few of these challenges include the development and deployment of integrated smart systems and the effective and efficient use of computing technologies.
Keywords Artificial intelligence · Machine learning · Deep learning · Neural networks · Internet of things
1 Artificial Intelligence (AI)
The goal of artificial intelligence (AI), a subfield of computer science, is to build machines or computers that are as intelligent as people. Artificial intelligence is not as cutting-edge as we might imagine. The Turing test was created by Alan Turing in 1950, therefore this dates back to that year. Eventually, in the 1960s, ELIZA, the first chatbot computer program, was developed [1]. A world chess champion was
J. B. Idoko (B) · R. Abiyev
Applied Artificial Intelligence Research Centre, Department of Computer Engineering, Near East University, Nicosia 99138, Turkey
J. B. Idoko and R. Abiyev (eds.), Machine Learning and the Internet of Things in Education, Studies in Computational Intelligence 1115, https://doi.org/10.1007/978-3-031-42924-8_1
defeated by the 1977-built chess computer IBM Deep Blue in two out of six games, with the champion winning one and the other three games ending in draws [2]. Apple unveiled Siri as a digital assistant in 2011 [2]. OpenAI was launched in 2015 by Elon Musk and associates [3, 4].
According to John McCarthy, one of the founding fathers of AI, the science and engineering of artificial intelligence is the development of intelligent devices, particularly intelligent computer programs. A method for educating a computer, a robot that is controlled by a computer, or a piece of software to think critically, much like an intelligent person might, is called artificial intelligence. It is essential to first comprehend how the human brain functions as well as how individuals learn, make decisions, and collaborate to solve problems [5] in order to develop intelligent software and systems.
Some of the objectives of AI are: 1. to develop expert systems that behave intelligently, learn, demonstrate, explain, and provide their users with guidance and 2. to add human intelligence to machines to make them comprehend, think, learn, and act like people. A science and technology called artificial intelligence is based on fields like computer science, engineering, mathematics, biology, linguistics, and psychology. The development of computer abilities akin to human intelligence, such as problem-solving, learning, and reasoning, is one of the major focus of artificial intelligence.
Machine learning, deep learning, and other areas are among the many subfields of AI, which is a broad and expanding field [6–24]. Figure 1 illustrates a transitive subset of artificial intelligence.
In nutshell, machine learning is the idea that computers can use algorithms to improve their creativity and predictions such that they more closely mimic human thought processes [7]. Figure 2 shows a typical machine learning model learning process.
Machine learning involves a number of learning processes such as:
a. Supervised learning: Machines/robots are made to learn through supervised learning, which involves feeding them with labelled data. By providing machines
Fig. 1 Subset of AI
with access to a vast amount of data and training them to interpret it, machines are being trained in this process [8–14]. For example, the computer is presented with a variety of images of dogs shot from numerous perspectives with various color variations, breeds, and many other varieties. In order for the machine to learn to analyze the data from these various dog images, the “insight” of machines must grow. Eventually, the machine will be able to predict whether a given image is a dog from a completely different image that was not even included in the labelled data set of dog images it was fed earlier.
b. Unsupervised learning: Unsupervised learning algorithms, in contrast to supervised learning, evaluate data that has not been assigned a label. This means that in this scenario, we are teaching the computer to interpret and learn from a series of data whose meaning is incomprehensible to eyes of human being. The computer searches for patterns in the data and makes its own decisions based on those patterns. It is important to note that the findings reached here were generated by computers from an unlabeled dataset.
c. Reinforcement learning: A machine learning model that depends on feedback is reinforcement learning. In this method, the machine is fed a set of data and asked to predict what it might be. The machine receives feedback about its errors if it draws an incorrect conclusion from the incoming data. When the computer encounters a completely different image of a basketball, such as if you give it an image of a basketball and it erroneously identifies the basketball as a tennis ball or something else, it automatically learns to recognize an image of a basketball.
d. On the other hand, deep learning is the idea that computers can mimic the steps a human brain takes to reason, evaluate, and learn. A neural network is used in the deep learning process as a component of an AI’s thought process. Deep learning requires a significant amount of data to be trained, as well as a very powerful processing system.
Application areas of AI:
Fig. 2 Learning process of a machine learning model
a. Expert Systems: These applications combine hardware, software, and specialized data to convey reasoning and advice. They offer users explanations and recommendations.
b. Speech Recognition: Certain intelligent systems are able to hear and understand the language as it is used by people to speak, including the meanings of sentences. It can manage a variety of accents, background noise, slangs, changes in human sounds brought on by the cold, etc.
c. Gaming: In strategic games such as chess, tic-tac-toe, poker, etc., where machines may consider numerous probable locations based on heuristic knowledge, AI plays a key role.
d. Natural Language Processing: Makes it possible to communicate with a computer that can understand human natural language.
e. Handwriting Recognition: The text written with a pen on paper or a stylus on a screen is read by the handwriting recognition software. It can change it into editable text and recognize the letter shapes.
f. Intelligent Robots: The jobs that humans assign to robots can be completed by them. They are equipped with sensors that can identify physical data in in real time, including light, heat, temperature, movement, sound, bumps, and pressure. To demonstrate intelligence, they have powerful processors, numerous sensors, and a large amount of memory. Also, they have the capacity to learn from their mistakes and adapt to the new surroundings.
g. Vision Systems: These systems can recognize, decipher, and comprehend computer-generated visual input. Examples include the use of a spy plane’s images to create a map or spatial information, the use of clinical expert systems by doctors to diagnose patients, and the use of computer software by law enforcement to identify criminals based on stored portraits created by forensic artists.
2 Internet of Things
In the Internet of Things, computing can be done whenever and wherever you want. In other terms, the Internet of Things (IoT) is a network of interconnected things (things) that are embedded with sensors, actuators, software, and other technologies in order to connect and exchange data with other objects (things) over the internet [15]. IoT, as seen in Fig. 3, is the nexus of the internet, devices, and data. In 2020, there were 16.5 billion connected things globally, excluding computers and portable electronic devices (such as smartphones and tablets). IoT gathers such information from the numerous sensors embedded in vehicles, refrigerators, spacecraft, etc. There is enormous potential for creative IoT applications across a wide range of sectors as sensors become more ubiquitous. Components of IoT system:
a. Sensor: a linked device that enables the sensing of the scenario’s or controlled environment’s physical properties, whose values are converted to digital data.
b. Actuator: a linked gadget that makes it possible to take action within a given environment.
c. Controller: a connected device implementing an algorithm to transform input data in actions.
d. Smart things: Sensors, actuators, and controllers work together to create digital devices that provide service functions (potentially implemented by local/ distributed execution platforms and M2M/Internet communications).
Application areas of IOT include: automated transport system, smart security cameras, smart farming, thermostats, smart televisions, baby monitors, children’s toys, refrigerators, automatic light bulbs, and many more.
3 Conclusion and the Future of AI and IoT
IoT and AI have recently experienced exponential growth. These fields are going to be so significant and influential that they will significantly alter and improve the society we live in. We cannot even begin to fathom how enormous and influential they will be in the near future. With AI and its rapidly expanding applications in our daily lives, there is still a lot to learn. It would be sage to adjust to this rapidly changing world and acquire AI and IoT related skills. To improve this world, we should learn and grow in the same ways that AI does.
The use of AI and IoT in education can be very beneficial. In order to create curriculum, tactics, and schedules that are appealing, well-suited and inclusive of the majority, if not all, adults and children, they could be utilized to analyze dataset from individual’s perspectives, capabilities, preferences, and shortcoming. Future modes of transportation will change as a result of the applications of AI. In addition to self-driving automobiles, self-flying planes and drones that conveniently carry your meals faster and better are also being developed. The fear of automation replacing
Fig. 3 IoT
jobs is one of the main AI-related worries. But, it is possible that AI creates more employment opportunities than it replaces. By creating new job categories, this will alter how people work.
References
1. Retrieved February 27, 2023, from https://news.harvard.edu/gazette/story/2012/09/alan-turingat-100/
2. Retrieved March 01, 2023, from https://www.ibm.com/ibm/history/ibm100/us/en/icons/dee pblue/
3. Retrieved March 03, 2023, from https://openai.com/blog/introducing-openai/0
4. Retrieved March 03, 2023, from https://www.bbc.com/news/technology-35082344
5. Abiyev, R., Arslan, M., Bush Idoko, J., Sekeroglu, B., & Ilhan, A. (2020). Identification of epileptic EEG signals using convolutional neural networks. Applied Sciences, 10 (12), 4089.
6. Abiyev, R. H., Arslan, M., & Idoko, J. B. (2020). Sign language translation using deep convolutional neural networks. KSII Transactions on Internet & Information Systems, 14(2).
7. Helwan, A., Idoko, J. B., & Abiyev, R. H. (2017). Machine learning techniques for classification of breast tissue. Procedia Computer Science, 120, 402–410.
8. Sekeroglu, B., Abiyev, R., Ilhan, A., Arslan, M., & Idoko, J. B. (2021). Systematic literature review on machine learning and student performance prediction: Critical gaps and possible remedies. Applied Sciences, 11(22), 10907.
9. Idoko, J. B., Arslan, M., & Abiyev, R. (2018). Fuzzy neural system application to differential diagnosis of erythemato-squamous diseases. Cyprus Journal of Medical Sciences, 3(2), 90–97.
10. Ma’aitah, M. K. S., Abiyev, R., & Bush, I. J. (2017). Intelligent classification of liver disorder using fuzzy neural system. International Journal of Advanced Computer Science and Applications, 8 (12).
11. Bush, I. J., Abiyev, R., Ma’aitah, M. K. S., & Altıparmak, H. (2018). Integrated artificial intelligence algorithm for skin detection. In ITM Web of conferences (Vol. 16, p. 02004). EDP Sciences.
12. Bush, I. J., Abiyev, R., & Arslan, M. (2019). Impact of machine learning techniques on hand gesture recognition. Journal of Intelligent & Fuzzy Systems, 37 (3), 4241–4252.
13. Uwanuakwa, I. D., Idoko, J. B., Mbadike, E., Re¸sato˘glu, R., & Alaneme, G. (2022, May). Application of deep learning in structural health management of concrete structures. In Proceedings of the Institution of Civil Engineers-Bridge Engineering (pp. 1–8). Thomas Telford Ltd.
14. Helwan, A., Dilber, U. O., Abiyev, R., & Bush, J. (2017). One-year survival prediction of myocardial infarction. International Journal of Advanced Computer Science and Applications, 8 (6). https://doi.org/10.14569/IJACSA.2017.080622
15. Bush, I. J., Abiyev, R. H., & Mohammad, K. M. (2017). Intelligent machine learning algorithms for colour segmentation. WSEAS Transactions on Signal Processing, 13, 232–240.
16. Dimililer, K., & Bush, I. J. (2017, September). Automated classification of fruits: pawpaw fruit as a case study. In Man-machine interactions 5: 5th international conference on man-machine interactions, ICMMI 2017 Held at Kraków, Poland, October 3–6, 2017 (pp. 365–374). Cham: Springer International Publishing.
17. Bush, I. J., & Dimililer, K. (2017). Static and dynamic pedestrian detection algorithm for visual based driver assistive system. In ITM Web of conferences (Vol. 9, p. 03002). EDP Sciences.
18. Abiyev, R., Idoko, J. B., Arslan, M. (2020, June). Reconstruction of convolutional neural network for sign language recognition. In 2020 International conference on electrical, communication, and computer engineering (ICECCE) (pp. 1–5). IEEE.
19. Abiyev, R., Idoko, J. B., Altıparmak, H., & Tüzünkan, M. (2023). Fetal health state detection using interval type-2 fuzzy neural networks. Diagnostics, 13(10), 1690.
20. Arslan, M., Bush, I. J., & Abiyev, R. H. (2019). Head movement mouse control using convolutional neural network for people with disabilities. In 13th international conference on theory and application of fuzzy systems and soft computing—ICAFS-2018 13 (pp. 239–248). Springer International Publishing.
21. Abiyev, R. H., Idoko, J. B., & Dara, R. (2022). Fuzzy neural networks for detection kidney diseases. In Intelligent and Fuzzy Techniques for Emerging Conditions and Digital Transformation: Proceedings of the INFUS 2021 Conference, held August 24–26, 2021 (Vol. 2, pp. 273–280). Springer International Publishing.
22. Uwanuakwa, I. D., Isienyi, U. G., Bush Idoko, J., & Ismael Albrka, S. (2020, August). Traffic warning system for wildlife road crossing accidents using artificial intelligence. In International Conference on Transportation and Development 2020 (pp. 194–203). Reston, VA: American Society of Civil Engineers.
23. Idoko, B., Idoko, J. B., Kazaure, Y. Z. M., Ibrahim, Y. M., Akinsola, F. A., & Raji, A. R. (2022, November). IoT based motion detector using raspberry Pi gadgetry. In 2022 5th information technology for education and development (ITED) (pp. 1–5). IEEE.
24. Idoko, J. B., Arslan, M., & Abiyev, R. H. (2019). Intensive investigation in differential diagnosis of erythemato-squamous diseases. In Proceedings of the 13th International Conference on Theory and Application of Fuzzy Systems and Soft Computing (ICAFS-2018) (Vol. 10, pp. 978–3).
Deep Convolutional Network for Food Image Identification
Rahib Abiyev and Joseph Adepoju
Abstract Food plays an integral role in human survival, and it is crucial to monitor our food intake to maintain good health and well-being. As mobile applications for tracking food consumption become increasingly popular, having a precise and efficient food classification system is more important than ever. This study presents an optimized food image recognition model known as FRCNN, which employs a convolutional neural network implemented in Python’s Keras library without relying on transfer learning architecture. The FRCNN model underwent training on the Food101 dataset, comprising 101,000 images of 101 food classes, with a 75:25 trainingvalidation split. The results indicate that the model achieved a testing accuracy of 92.33% and a training accuracy of 96.40%, outperforming the baseline model that used transfer learning on the same dataset by 8.12%. To further evaluate the model’s performance, we randomly selected 15 images from 15 different food classes in the Food-101 dataset and achieved an overall accuracy of 94.11% on these previously unseen images. Additionally, we tested the model on the MA Food dataset, consisting of 121 food classes, and obtained a training accuracy of 95.11%. These findings demonstrate that the FRCNN model is highly precise and capable of generalizing well to unseen images, making it a promising tool for food image classification.
Keywords Deep convolutional network · Food image recognition · Transfer learning
1 Introduction
Food is a vital component of our daily lives as it provides the body with essential nutrients and energy to perform basic functions, such as maintaining a healthy immune system and repairing cells and tissues. Given its significance in health-related
R. Abiyev (B) J. Adepoju
Department of Computer Engineering, Applied Artificial Intelligence Research Centre, Near East University, Lefkosa, North Cyprus, Turkey
issues, food monitoring has become increasingly important [5]. Unhealthy eating habits may lead to the development of chronic diseases such as obesity, diabetes, and hypercholesterolemia. According to the World Health Organization (WHO), the global prevalence of obesity more than doubled between 1980 and 2014, with 13% of individuals being obese and 39% of adults overweight. Obesity may also contribute to other conditions, such as osteoarthritis, asthma, cancer, diabetes mellitus type 2, obstructive sleep apnea, and cardiovascular disorders [4]. This is why experts have stressed the importance of accurately assessing food intake in reducing the risks associated with developing chronic illnesses. Hence, there is a need for a highly accurate and optimized food image recognition system. This system involves training a computer to recognize and classify food items using one or more combinations of machine learning algorithms.
Food image recognition is a complex problem that has attracted much interest from the scientific community, prompting researchers to devise various models and methods to tackle it. Although food recognition is still considered challenging due to the need for models that can handle visual data and higher-level semantics, researchers have made progress in developing effective techniques for food image classification. One of the earliest methods used for this task was Fisher Vector, which employs the Fisher kernel to analyse the visual characteristics of food images at a local level. The Fisher kernel uses a generative model, such as the Gaussian Mixture Model, to encode the deviation of a sample from the model into a unique Fisher Vector that can be used for classification. Another technique is the bag of visual words (BOW) representation, which uses vector quantization of affine invariant descriptors of image patches. Additionally, Matsuda et al. [15] proposed a comprehensive approach for identifying and classifying food items in an image that involves using multiple techniques to identify potential food regions, extract features, and apply multiple-kernel learning with non-linear kernels for image classification. Bossard et al. [6] introduced a new benchmark dataset called Food-101 and proposed a method called random forest mining that learns across multiple food classes. Their approach outperformed other methods such as BOW, IFV, RF, and RCF, except for CNN, according to their experimentation results.
Over the years, these techniques have been successful in food image classification and identification tasks. However, with the progress in computer vision, machine learning, and enhanced processing speed, image recognition has undergone a transformation [7, 14, 18]. In current literature, deep learning algorithms, especially CNN, have been extensively used for this task due to their unique properties, such as sparse interaction, parameter sharing, and equivariant representation. As a result, CNN has become a popular method for analysing large image datasets, including food images, and has demonstrated exceptional accuracy [9, 10, 13, 16, 17].
The use of CNN in food image classification has shown significant progress in recent years [1–3, 18, 20]. Researchers have achieved high accuracy rates using pretrained models, such as AlexNet and EfficientNetB0, as well as through the development of novel deep CNN algorithms. These approaches have been tested on various datasets, including the UEC-Food100, UEC-Food256, and Food-101 datasets. DeepFood, developed by Lui et al., achieved a 76.30% accuracy rate on the UEC-Food100
dataset, while Hassannejad et al. outperformed this with an accuracy of 81.45% on the same dataset using Google’s Inception V3 architecture. Mezgec and Koroui Seljak modified the well-known AlexNet structure to create NutriNet, which achieved a rating performance of 86.72% on over 520 food and beverage categories. Similarly, Kawano and Yanai [11] used a pre-trained model similar to the AlexNet architecture and were able to achieve an accuracy of 72.26%, while Christodoulidis et al. [8] introduced a novel deep CNN algorithm that obtained 84.90% accuracy on a custom dataset. Finally, VijayaKumari et al. [19] achieved the best accuracy of 80.16% using the pre-trained EfficientNetB0 model, which was trained on the Food-101 dataset.
The studies mentioned above have shown that CNN has immense potential for accurately classifying food images, which could have numerous practical applications, such as dietary monitoring and meal tracking. However, while these findings are promising, there is still ample room for improvement, and the primary aim of this research is to propose a highly accurate and optimized model for food image recognition and classification. In this paper, we introduce a new CNN architecture called FRCNN that is specifically designed for food recognition. Our proposed system boasts high precision and greater robustness for different food databases, making it a valuable tool for real-world applications in the field of food image recognition. Here is how this paper is structured: in Sect. 2, we describe the methodology we used to develop our food recognition system. In Sect. 3, we provide the details of the FRCNN design and architecture, including the dataset and proposed model structure. We also provide an overview of the simulation results. Finally, in Sect. 4, we present the conclusion of our work.
2 CNN Architecture
Convolutional neural networks (CNNs) are a type of deep artificial neural network used for tasks like object detection and identification in grid-patterned input such as images. CNNs have a similar structure to ANNs, with a feedforward architecture that splits nodes into layers, and the output is passed on to the next layer. They use backpropagation to learn and update weights, which reduces the loss function and error margin. CNNs see images as a grid-like layout of pixels, and their layers detect basic patterns like lines and curves before advancing to more complex ones. CNNs are commonly used in computer vision research due to features like sparse interaction, parameter sharing, and equivariant representation. Convolution, pooling, and fully linked layers make up most CNNs, with feature extraction typically taking place in the first two layers, and the outcome mapped into the fully-connected layer (Fig. 1).
One of the most essential layers in CNNs is the convolution layer, which applies filters to the input image to extract features like edges and corners. The output of this layer is a feature map that is passed to the next layer for further processing. Also, the pooling layer is a layer in a convolutional neural network, which is responsible for reducing the spatial dimensions of the feature maps generated by the convolutional layer, thus reducing the computational complexity of the network. Pooling can be
performed using different techniques such as max pooling, sum pooling or average pooling. The final layer in a typical CNN is the fully connected or dense layer, which takes the output of the convolution and pooling layers and performs classification using an activation function such as the SoftMax to generate the probability distribution over the different classes. The dense layer connects all the nodes in the previous layer to every node in the current layer, making it a computationally intensive part of the network. By combining these layers, CNNs can extract complex features from images and achieve high accuracy in tasks like object detection and classification [2].
After determining CNN’s output signals the learning of network parameters θ starts. The loss function is applied to train CNN. The loss function can be represented as:
where oi and yi are current output and target output signals, correspondingly. Using the loss function the unknown parameters θ are determined. With the use of training examples consisting of input–output pairs {( x (i ) , y (i ) ); i ∈ [1, .., N ]} the learning of θ parameters is carried out to minimize the value of the loss function. For this purpose, Adam optimizer (Kingma & Jimmy, 2015) learning algorithm is used in the paper. For the efficient training of CNN, a large volume of training pairs is required. In the paper, food image data sets are used for training if CNN.
Fig. 1 CNN architecture. Source [12]
3 Design of FRCNN System
3.1 Food-101 Dataset
Bossard et al. [6] developed the food-101 dataset, a new dataset comprising pictures gathered from foodspotting.com, an online platform that allows users to share photos of their food, along with its location and description. To create the dataset, the authors selected the top 101 foods that were regularly labelled and popular on the website. They chose 750 photos for each of the 101 food classes for training, and 250 images for testing (Fig. 2).
3.2 Model Architecture
The FRCNN model’s architecture is similar to the standard CNN design, with a layer consisting of a convolution layer, batch normalization, another convolution layer, batch normalization, and max pooling. The remaining five layers of the FRCNN model have a similar structure, as depicted in Fig. 6. After the fourth layer’s maxpooling, the model goes through flattening, a dense layer, batch normalization, a dropout layer, two fully-connected layers, and finally the classification layer. The architecture of the FRCNN model is illustrated in the diagram Fig. 3.
The proposed FRCNN model is presented in Table 1. During the development of the FRCNN model, various factors were considered. Initially, the focus was on extracting relevant data from the input data, which was achieved using convolutional and pooling layers. This approach allowed the model to analyse images at varying scales, which reduced dimensionality and helped to identify significant patterns. Secondly, the FRCNN model was designed with efficiency in mind, and computational resources and memory usage were optimized by applying techniques like weight sharing and data compression, along with fine-tuning the number of layers and filters for optimal performance. Lastly, to ensure that the model generalizes well and avoids overfitting, the training dataset used to train the model was of high quality, and regularization methods were used. This approach enabled the FRCNN model to achieve exceptional performance even with unseen data, making it an ideal tool for object detection and recognition tasks.
The FRCNN model was trained on the Food-101 dataset, where the model’s training was performed on the training subset, and its performance was subsequently evaluated on the test subset. In evaluating the FRCNN model, the Food-101 dataset was partitioned into 75% for training and 25% for testing purposes. The FRCNN model’s performance was assessed using metrics such as accuracy, precision, recall, and F1 score. Accuracy is determined by calculating the number of true positive, true negative, false positive, and false negative predictions made by the model. Precision and recall are measures of the model’s ability to correctly identify positive instances, while F1 score combines both measures to evaluate the model’s overall performance.
Fig. 2 Food-101 dataset preview
Fig. 3 Proposed FRCNN model architecture
A higher F1 score indicates better performance, with the model striking a balance between precision and recall. The formula for accuracy, predictions, recall, and F1 score is given below:
3.3 Simulations
The FRCNN model was trained on a high-performance computer with an Intel® Core™ i9-9900 k processor, 32 GB of RAM, and an Nvidia GeForce RTX 2080 Ti GPU with 11 GB of GDDR6 memory and 4352 CUDA cores. The training was done using the Anaconda development environment, and the minimum and maximum training times were 942 and 966 s, respectively. The training environment enabled fast training of the FRCNN model.
The food-101 dataset, which contains 101,000 images, was used to train and validate the FRCNN model. The dataset was divided into training and validation sets, with 75% used for training and 25% used for validation. This resulted in 75,750 images being used for training and 25,250 images being used for validation.
After the training, the Fig. 4 shows the accuracy and validation plots of the FRCNN model while Fig. 5 shows the training loss and validation loss of the FRCNN model.
The Table 2 shows the comparison of FRCNN with other models using CNN or transfer learning methodology on the Food-101 dataset. Below are the results of testing the FRCNN model’s ability to perform well on new data by downloading random food images from the internet within the food-101 dataset classes (Fig. 6).
Table 2 Comparing the FRCNN model with other models