IDENTIFYING OF SECURITY THREAT IN THE NETWORK USING ML TECHNIQUES by IRJET Journal

IDENTIFYING OF SECURITY THREAT IN THE NETWORK USING ML TECHNIQUES

Nelluri Raja Sekhar2 ,Bandlamudi Sarath3 ,Konakanchi Sai teja4 ,Gokarla Syam Sundhar5 , Mr.B.Kalyan Chakravarthy 6 M.Tech

2,3,4,5 UG Students, Department of IT, 6(Associate Professor)

Vasireddy Venkatadri Institute of Technology, Nambur, Guntur Dt., Andhra Pradesh.

Abstract

Effective methods for identifying malicious activity in computer networks are in greater demand due to the complexity anddiversityofcyberattacksbecomingmoreandmorecomplicated.Inthispaper,auniquemachinelearningapproachto network intrusion detection is presented. We provide a multi-phase system that includes the steps of feature selection, extraction, and classification.Thesuggested framework analysenetwork traffic data andlooksfor patternsof suspicious behaviourusingavarietyofstatisticalandmachinelearningalgorithms.Experimentscarriedout onareal-worlddataset showhoweffectivethesuggestedmethodis.Thefindingsdemonstratethatavarietyofnetworkassaults,suchasDenialof Service(DoS),RemotetoLocal(R2L),UsertoRoot(U2R),andprobingattacks,maybereliablyidentifiedby ourmethod. Additionally,ourmethodperformsbetterthansomecutting-edge

Keywords: XGBoost,LSTM,SMOTE,NSL-KDD,NetworkAttack,MachineLearning.

1.Introduction

The proliferation of cyber dangers and harmful actions can be attributed to the fast expansion of computer networksandthegrowingdependenceontechnology.In order to protect their networks and sensitive data, businessesarenowveryconcernedwithidentifyingand stopping these operations. Intrusion detection systems (IDS) and firewalls, two common forms of network security,arenotverygoodatdetectingandneutralizing. As a result, more sophisticated and effective methods of identifying and stopping harmful activity are required. A promising method for identifying and stopping harmful activity in computer networks is machine learning. Large amounts of network traffic data can be analysed by machine learning algorithms, which can thenbeusedtospottrendsandabnormalitiesthatmight pointtopossiblemaliciousactivity.

Next, network traffic is divided into types based on the model: malicious and regular. The efficacy of the suggested methodology is assessed using a dataset comprising diverse network assaults, showcasing the capability of machine learning to identify and avert malevolentactionswithincomputernetworks.

2.Objective

This paper's primary goal is to suggest a machine learning-basedmethodforidentifyingharmfulactivityin computer networks. The strategy looks to analyse network traffic data using machine learning techniques in order to spot trends and abnormalities that might

point to possible hostile activity. This research specifically aims to develop a machine learning model capable of reliably classifying network data into two categories: harmful and normal.

1. Assessing the suggested method's efficacy using a dataset made up of different network attacks. 2. Evaluating how well the suggested method performs in comparison to more established methods of network security, like intrusion detection systems and firewalls (IDS).

3. Providing information about how machine learning may be used to identify and stop dangerous activity in computernetworks.

3.Related Work

Numerous investigations have been carried out to identify malevolent actions within computer networks using the utilization of machine learning methods and the NSL-KDD dataset. In this linked article, we review a few recent research that have employed the NSL-KDD datasetandtheXGBOOSTandLSTMalgorithmstodetect harmful activity in computer networks. OnestudysuggestedutilizingtheXGBOOSTalgorithmin conjunction with machine learning to identify network assaults. The study trained the model using a variety of features taken from network traffic data The XGBOOST model was then applied to categorize network traffic as harmfulorlegitimate.

AnotherstudysuggestedutilizingtheLSTMalgorithmin conjunction with deep learning to identify network assaults.InordertotraintheLSTMmodelandmodelthe

International Research Journal of Engineering and Technology (IRJET)

Volume: 11 Issue: 03 | Mar 2024 www.irjet.net

network traffic data, the study used a time series-based methodology.

Overall, by utilizing the NSL-KDD dataset, this research show how machinelearning and deep learning methods can be used to identify harmful activity in computer networks.Futurestudycaninvestigatetheapplicationof additional machine learning and deep learning algorithms for further enhancing the performance of networkintrusiondetectionsystems.BoththeXGBOOST and LSTM algorithms have demonstrated promising resultsindetectingvariousformsofnetworkintrusions.

4.Dataset Description

A benchmark dataset that is frequently used in studies assessing intrusion detection systems is the NSL-KDD dataset. It was developed in response to the shortcomingsoftheinitialKDDCup1999dataset,which had a number of problems such as duplicate entries, an unbalanced class distribution, and erroneous assumptions.

A modified version of the KDD Cup 1999 dataset, the NSL-KDD dataset contains a variety of network attack techniques, including DoS, probing, and user-to-root attacks. There are 41 features in all in the dataset: 7 nominal characteristics and 34 numerical features. With 125,973 instances in the training set and 22,544 instances in the testing set, the dataset is split into training and testing sets.

TABLE I:List of NSL-KDD dataset files and their descriptions

The NSL-KDD dataset consists of several files, including thefollowing:

S.No. File name Description

1 KDDTrain+.t xt

2 KDDTest+.tx t

Thetrainingdata,comprising42 columns with 41 characteristics andoneclasslabel,and125,973 instances overall, are contained inthisfile.

The testing data, comprising 42 columns, 41 characteristics, and one class label, total 22,544 occurrences, are contained in thisfile.

3 KDDTrain+_ 20 Percent.txt

For quicker experimentation, a 20% randomly sampled subset of the KDDTrain+.txt file with a total of 25,294 instances and 42 columnsisincludedinthisfile.

The testing data for 21 different attacktypes,includingDoS,U2R, R2L, and probing attacks, is

KDDTest21.txt

KDDTest10Percent.tx t

KDDTest21Percent.tx t

KDDTest10Percent21.txt

p-ISSN:2395-0072

contained in this file. This file is used to assess how well intrusion detection algorithms work against different kinds of attacks.

For quicker testing, this file includesa10%randomlypicked portion of the KDDTest+.txt file, which has 2,255 occurrences and42columnsoverall.

This file includes a randomly selected 21% subset of the KDDTest+.txt file with 21 different attack types, such as probing, DoS, U2R, and R2L assaults.

For quicker testing, this file includesa10%randomlypicked portion of the KDDTest-21.txt file, which has 2,226 occurrences and 42 columns overall.

TABLE II:Mapping of Attack Class with Attack Type

The NSL-KDD dataset contains several types of attacks.

Accordingtotheattacktargets,can be dividedintofour categories:

Attack

Class Description

Denial-ofService (DoS) attacks

By flooding networks with traffic or other kinds of demands, these attacks seek to interfere with the availability of network resources. The NSL-KDD dataset contains a variety of DoS attack types, including ICMP, UDP, and SYN floods.

User-toRoot(U2R) attacks

These exploits target user account vulnerabilities in order to obtain unauthorized access to a system. The NSL-KDD dataset consists of many types of U2R attacks, such as buffer overflow, loadmodule,andperl.

Remote-toLocal(R2L) attacks

These attacks are designed to take advantage of weaknesses in the remote user's account in order to obtain unauthorized access to a system. The R2L attacks in the NSL-KDD dataset include ftp_write, guess_passwd, and imap.

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 11 Issue: 03 | Mar 2024 www.irjet.net

Probing attacks

Bysendingpacketstodifferentportsand protocols, these attacks seek to learn as muchaspossibleaboutasysteminorder to find flaws. Numerous prodding attack types, including portsweep, nmap, and satan, are included in the NSL-KDD dataset.

There are twenty-three different typesofassaultsinthe NSL-KDD dataset: four types of R2L attacks, three types of U2R attacks, two types of probing attacks, and fourteen types of DoS attacks. For the purpose of representing actual network activity, the collection also contains examples of typical traffic.

5.System Implementation

The system implementation for malicious activities detection in network typically involves the following steps:

1. Data preprocessing: In order to prepare the NSLKDD dataset for machine learning methods, this stage entails cleaning and preparing it. This could entail turning categorical features into representations, eliminating superfluous features, anddistributingtheclassesevenly.

2. Feature Selection: In order to train the machine learning models, this phase entails picking the most pertinentfeaturesfromthe preprocessdataset.This enhances the model's performance and lowers the dataset'sdimensionality.

3. Model training: Usingthepreprocessedandchosen features,themachinelearningmodelsaretrainedin this step. XGBoost and LSTM are the two models employed for this system. While LSTM is a kind of recurrentneuralnetworkthatcanhandlesequential input, XGBoost is a gradient boosting approach that makesuseofdecisiontrees.

p-ISSN:2395-0072

4. Model evaluation: Inthisstep,thetestingdatasetis usedtoassesshowwellthetrainedmodelsperform. Themodels'performanceisassessedusingavariety of performance indicators, including accuracy, precision, recall, and F1 score. effectiveness in detectingmaliciousactivities.

5. Model tuning: In order to maximize the models' performance, this stage entails adjusting their hyperparameters. Hyperparameters, such the learning rate, number of trees, and number of hidden layers, are those that are not discovered during training. To determine the optimal hyperparameters, using grid search or other optimizationalgorithms.

6. System integration: Inthisstep,thetrainedmodels are integrated into a wider intrusion detection system. Real-time network traffic data analysis is possible with the models, which can also be used to notifysecuritystaffofanyquestionableactivity.

The overall goal of this system implementation is to use machinelearningtechniques,XGBoostandLSTM,trained on the NSL-KDD dataset, to increase the precision and effectiveness of hostile activity identification in network traffic.

6.Prerequisites

The following are the prerequisites for implementing maliciousactivitiesdetectioninNetwork:

1. Python Programming Language: Python is a wellliked machine learning programming language. It offers a number of frameworks and tools, including keras, pandas, numpy, scikit-learn, and tensorflow. which are essential for implementing machine learningalgorithms.

2. Google Colab Notebook: An open-source web tool calledGoogleColabNotebookallowsuserstocreate and share documents with live code, mathematics, graphics, and narrative text. It offers an interactive platform for machine learning and data analysis, which facilitates model implementation and NSLKDDdatasetexploration.

3. Scikit-learn Library: A Python machine learning toolkit called Scikit-learn offers a number of methods for dimensionality reduction, regression, clustering, and classification. In order to apply harmful activity detection using the NSL-KDD dataset, it also contains tools for feature selection, datapreprocessing,andmodelevaluation.

4. XGBoost Library: The gradient boosting algorithm is efficiently implemented by the open-source

Fig 1. NSL-KDD Dataset

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 11 Issue: 03 | Mar 2024 www.irjet.net

software library XGBoost. It can handle big datasets withmillionsofsamplesandthousandsofattributes andismadetobeextremelyscalable.

5. LSTM Architecture: In order to use the NSL-KDD dataset to develop the LSTM model for malicious activitydetection,itisnecessarytocomprehendthe architectureofLSTManditsapplicationinsequence modeling.

6. Awareness with Machine Learning Concepts: To perform malicious activity detection utilizing NSLKDD dataset and machine learning methods, one musthavea fundamental understanding of machine learning principles such as supervised and unsupervised learning, feature engineering, model selection,andevaluation.

7.Results

Thereisnoneedforpreprocessingbecausealltestswere carried out with Google Colab and the data have been cleansed.Eighty percent of thedata issplit,and SMOTE, XGBoost,andLSTMalgorithmsareused

Fig.7.1.Shows the number of protocol types in the NSL-KDD dataset. The dataset consists of 3 different types of protocols: udp, tcp and icmp.

p-ISSN:2395-0072

Fig.7.2.service_types of plots

Fig.7.3.attack plot

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 11 Issue: 03 | Mar 2024 www.irjet.net

Confusion Matrix and Classification Report for XGBoost classifier:

matrix of XG-Boost model

Fig.7.5.Classification report of XG-Boost model

Confusion Matrix and Classification Report for LSTM Classifier:

Fig.7.7 Classification report of LSTM model

p-ISSN:2395-0072

Accuracy comparison of the models:

The model accuracy of XG-Boost and LSTM using the whole test and training set of the NSL-KDD data set is comparedinFigure6.15.

Fig.7.8 Accuracy comparison of models

8.Conclusion

In conclusion, the suggested approach to identify harmful activity in networks through the use of XGBOOST and LSTM machine learning algorithms exhibits encouraging outcomes.Methodsforselecting In ordertodetectmanyformsofassaults,suchasDoS,U2R, R2L, and probing, high accuracy, precision, recall, and F1-score are achieved through the use of features and two different types of models. When it comes to AUC-ROC score and computational efficiency,theXGBOOSTmodelperformsbetterthanthe LSTM model, however when it comes to identifying temporal dependencies in the data, the LSTM model performsbetterthantheXGBOOSTmodel.Theoutcomes demonstrate how the two models' complementary qualities can be used to further enhance detection performance.

In order to identify and stop harmful activity and enhancenetworksecurityoverall,thesuggestedstrategy canbeimplementedinareal-timenetworkenvironment. With an accuracy rate of higher than the LSTM method, the XGBoost algorithm has a superior classification effect.

9.Future Enhancements

Some potential future enhancements for malicious activities detection using NSL-KDD dataset and machine learning algorithms via XGBOOST and LSTM in network include:

Fig.7.4.Confusion

Fig.7.6. Confusion matrix of LSTM model

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 11 Issue: 03 | Mar 2024 www.irjet.net p-ISSN:2395-0072

1. Incorporating real-time data streaming: The dynamicnature of network trafficisnotreflected in the static NSL-KDD dataset. Real-time data streaming integration can assist in identifying and addressing attacks in real-time, lowering the potentialdamagecausedbymaliciousactivities.

2. Investigating other datasets: Although the NSLKDD dataset is frequently used for intrusion detection, other datasets, such UNSW-NB15 and CICIDS2017, can also be utilized to assess how well the suggested method works. Examining additional datasets can assist in verifying the detection system'srobustnessandgeneralizability.

10.References

[1] "Combating imbalance in network intrusion datasets," by D. A. Cieslak, N. V. Chawla, and A. Striegel, in Proc. IEEE Int. Conf. Granular Comput., May 2006, pp.732–737.

[2]M.ZamaniandM.Movahedi,"Intrusiondetection using machine learning techniques," arXiv:1312.2177, 2013. [Online]. http://arxiv.org/abs/1312.2177 is accessible.

[3] The paper "Feature selection and intrusion classification in NSL-KDD cup 99 dataset employing SVMs"waspresentedbyM.S.PervezandD.M.Farid at the 8th International Conference on Softw., Knowledge, Inf. Manage. Appl. (SKIMA), December 2014, pages 1-6.

[4]H.ShapoorifardandP.Shamsinejad,"Introducing a novel hybrid approach integrating an enhanced KNN for intrusion detection," International Journal of Computer Applications, vol. 173, no. 1, pp. 5–9, September2017.

[5] A novel PCA-firefly based XGBoost classification model for intrusion detection in networks utilizing GPU, S. Bhattacharya, P. K. R. Maddikunta, R. Kaluri, S. Singh, T. R. Gadekallu, M. Alazab, and U. Tariq, Electronics, vol. 9, no. 2, p. 219, Jan. 2020.

[6]Adeep learning techniquefor network intrusion detection system, A. Javaid, Q. Niyaz, W. Sun, and M. Alam, Proc. 9th EAI Int. Conf. Bioinspired Inf. Commun.Technol