Objectives: A highly accurate Intrusion detection model is developed that classifies both the network-based and host-based intrusions without any complexity issues. Method: An optimized Deep Learning (DL) algorithm of IDS model is presented in the form of a Hyper-Heuristic Firefly Algorithm based Convolutional Neural Networks (HHFA-CNN). This proposed HHFA-CNN reduces false values and improves accuracy without increasing the complexities. Findings: The proposed HHFA-CNN system is performed on two network traffic datasets: NSL-KDD and ISCX-IDS. The outcomes demonstrated that the proposed HHFA-CNN model gives predominant execution than the other existing models. Novelty: The proposed model has employed a novel Hyper-Heuristic Firefly Algorithm for optimizing the hyper-parameters of the CNN. This model maintains the standard guidelines of the firefly algorithm and applies the high-level technique for controlling the exploration and determination of low-level heuristics.
With the introduction of advanced technologies in the recent years, the big data analytics have attained significant interest in various domain applications such as medicine, healthcare, education, smart cities, environment analytics, business analytics, data processing and cyber security
The most common duty of BDCA is to monitor the network and internet traffic to analyze the intrusions. The intrusion detection is considered as a fundamental security solution as the intrusions pave the way for other malicious events. The malicious cyber-attacks lead to serious security degradation and hence the research community has insisted on the requirements of a novel, adaptive and reliable IDS. Depending upon the detected intrusion behaviors, the IDS are classified as network-based IDS (NIDS)
This paper has suggested the use of optimized deep learning algorithm for accurately identifying the attacks in the network flow data with less false positive rate and less complexity. Previously, Hyper-Heuristic Improved Particle Swarm Optimization based Support Vector Machines (HHIPSO-SVM)
The contributions of this paper are summarized as follows:
The Natural Language Processing (NLP) Text Representation methods are used to process the log files to determine the host-level events. As NLP based text representation methods identify the contextual and semantic similarity from a large amount of unstructured and fragmented texts, it enhances the detection accuracy of the IDS model.
A scalable IDS framework has been developed using an effective deep earning approach of HHFA-CNN to handle the deep characteristics of network-level and host-level events. The collaborative combination of NIDS and HIDS increases the complexity and hence the proposed deep learning HHFA-CNN is introduced in this paper.
The proposed HHFA-CNN based IDS model is applied on benchmark datasets of NIDS and HIDS for conducting the experimental comparisons.
Recent studies have employed different types of deep learning algorithms and ensemble approaches for big data analytics-based intrusion detection. To compete with such IDS approaches, machine learning algorithms were predominantly employed using optimization algorithms. Sabar et al.
Due to the limitations in machine learning approaches including the ELM, the researchers have started employing deep learning algorithms for the big data cyber security models. Lopez-Martin et al.
Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) neural networks have achieved maximum exposure and increased classification accuracy in IDS models. Xiao et al.
Almiani et al.
Irrespective of their advantages, CNN and LSTM have limitations in learning the spatial and temporal features. Hence some studies have combined them for increasing their effectiveness. Khan et al.
Although the combination of CNN and LSTM has provided better classification accuracy, their main drawback is the computational complexity. In some studies, the class imbalance problem is also cited as a limitation. From the literature, it has been found that the optimized CNN has provided significantly better accuracy and also has less complexity. Hence this study focuses on exploring the optimized CNN and suggests the use of advanced optimization algorithms to overcome the limitations in the GA search process.
The proposed HHFA-CNN methodology includes the hyper-heuristic modelling of the firefly algorithm for elevating the hyper-parameters of CNN to attain the best structural design of CNN.
CNN comprises four main operators namely Convolution layer, pooling layer, and fully connected layer and non-linear activation function.
It forms the major core of CNN that analyses and extracts the desired features. This convolution task conserves the spatial connection amongst the input data by acquiring the aspects by the kernel function. The outcome of the CL will be the convolved aspect plot. The kernel points are updated automatically based on the optimal structure configuration. The magnitude of the aspect plot is reliant on the depth of the layers.
After the convolution operation, the additional nonlinear function is used before the creation of feature maps. The NLA can either be tanh, sigmoid or Rectified Linear Unit (ReLU). This NLA acts as the element-wise task to compromise the negative points of the aspects. In most cases, the sigmoid or ReLU provided better performance.
Spatial pooling is the sub-sampling or down-sampling process in CNN, performed to reduce the dimensionality of the feature maps. It is similar to the feature reduction process that removes the less important data while retaining vital information. Kinds of pooling are average, max, stochastic and sum pooling denoted by the pooling numbers 1-4. In most cases, max-pooling provides the most important features.
It is a conventional multi-level neural layer employing a SoftMax initiation utility in the outcome layer. The FCL has the preceding layer nodes interlinked with the succeeding layer nodes. The complex aspects yielded from CL and PL are used by this FCL for labelling the data into classes using the past learning knowledge.
Combining all these operators, CNN is formed. The hyper-parameters used in CNN are listed in
Here
Hyper-parameter |
Range |
Difference |
Number of CL |
1-4 |
1 |
Number of PL |
1- |
1 |
Number of FCL |
1-5 |
1 |
Hidden units/layer |
256-1024 |
256 |
Pooling type |
1-4 |
1 |
Kernel size |
1-8 |
1 |
The HHFA is developed by fusing the hyper-heuristics to the multi-objective firefly algorithm. The hyper-heuristic framework consists of two strategies namely high-level and low-level strategies for enhancing the optimization function of the firefly algorithm. The low-level strategy explores the problem and forms the rules to select the solutions. Then one or more solutions is considered, combined or modified to form a new set of solutions to increase better options. The high-level strategy initiates the heuristic search process to select the solutions from the set of possible solutions based on the rules generated by the low-level strategy.
The low-level heuristics contains the set of problem-related rules generated to provide solutions to each selected problem instances. It forms a new set of solutions by considering one or more solutions and transforming or combining them using different search processes. In this study, the FA based search process is used as one of the search processes to generate new solutions. Once the new solutions are formed, the high-level strategy imitates the selection process. The high-level strategy automatically performs the heuristic selection by choosing the heuristics one-by-one and applying it to the solutions. From the existing set of heuristics formed by the rules generated by low-level strategy, the heuristics are selected through an online heuristic selection mechanism. The empirical reward and the confidence level variables are the main metrics for measuring the efficiency of the heuristics. The rewards obtained in past performance are called empirical reward while the frequency of utilization of the heuristic denotes the confidence level. Using these two variables, the heuristics is deemed fit or unfit for the current state of operation. Thus selected heuristics are applied to the solutions through the firefly foraging process
The heuristics are initialized as the population of fireflies
The light intensity
Here
The attractiveness
Where
In CNN optimization, the computational complexity must be reduced which means the resource utilization must be less. So, the attractiveness expression is modified for the practical application as given below
The distance amongst any two fireflies (nodes) i and j positioned at
Where
The firefly moves towards the best firefly and this location is updated after each iteration using the following equation
Here
The heuristic is applied to each of the solutions obtained by the firefly based on the light intensity and the attractiveness of the firefly algorithm. The firefly which is returned as the global best solution contains the solution to be applied. The heuristic is applied with the selected solution to form a new set of solutions. In this stage, the serial scheduling and double justification are used. Serial scheduling is used to select the solutions without interleaving the feasible solutions. Likewise, the double justification is a simple local search technique which searches the solutions with exacting shifting to control the search quality. The new solutions are compared, and then they are analysed by their properties. This analysis in terms of configuration determines whitener to include them in the existing set of solutions or terminate them to accommodate newer solution from next iterations.
After the formation of new solutions by low-level heuristics and the selection by the high-level strategy, they are saved in the non-dominated set of solutions in the archive. The non-dominated sorting procedure is used to classify the archive to create several levels for saving the newer solutions. The first level is given to the solution with high priority and the next level will be given to the second-best priority and vice versa. The HHFA selects the solutions from this archive based on the Pareto-front and returns the best configuration as the final solution. Algorithm 1 summarizes the steps involved in HHFA.
|
|||
Begin |
|||
Initialize population of fireflies |
|||
Assign heuristics as fireflies |
|||
The light intensity I at |
|||
Set light absorption coefficient 𝛾 |
|||
Evaluate the fireflies to determine the fitness |
|||
While (m < Max_Generation) |
|||
|
For 𝑖=1:𝑛 all 𝑛 fireflies |
||
|
|
For 𝑗=1:𝑖 all 𝑛 fireflies |
|
|
|
Call the j-th low-level heuristics of the firefly search space |
|
|
|
Apply serial scheduling and double justification |
|
|
|
If ( |
|
|
|
|
Move firefly 𝑖 towards 𝑗 in 𝑑-dimension; |
|
|
End if |
|
|
|
Estimate new solutions and update light intensity |
|
|
|
Update the location of fireflies |
|
|
|
End for 𝑗 |
|
|
End for 𝑖 |
||
|
Check the stopping criteria |
||
|
Update the firefly ranking list to determine current best |
||
End while |
|||
Return best firefly |
|||
End process |
The CNN architecture is represented as
For the optimal selection of the CNN hyper-parameters, each solution is made up of the problem parameters subject to optimization by the firefly search process of exploitation (intensification) and exploration (diversification). The exploitation in HHFA is controlled by the values assigned to
The hyper-parameters problem is encoded as
Here
In this study, the hyper-parameters like dropout rate, the learning rate, etc. are not optimized as they mostly have real values. The hyper-parameters that provide the integer values are only optimized using HHFA. As the upper and lower bounds for each parameter are set high i.e. greater than 1, the equations (1) to (7) depicted in HHFA can be adaptively used for the CNN optimization problem. The classification error rate is used as the fitness function. The objective is to minimize the error rate while calculating the fitness for the i-th solution which can be expressed as
Here
Layer type |
Configuration |
Kernel size |
Error rate |
CNN configuration 1 CL PL FCL |
2 layers; max pooling 3 layers; 512 units |
1*1 |
16.3 |
2*2 |
16.7 |
||
3*3 |
16.6 |
||
CNN configuration 2 CL PL FCL |
2 layers; max pooling 3 layers; 256 units |
1*1 |
16.7 |
2*2 |
17.3 |
||
3*3 |
17.2 |
||
CNN configuration 3 CL PL FCL |
2 layers; max pooling 3 layers; 512 units |
1*1 |
16.8 |
2*2 |
17.1 |
||
3*3 |
16.9 |
||
CNN configuration 4 CL PL FCL |
3 layer; max pooling 2 layers; 1024 units |
1*1 |
17.1 |
2*2 |
17.8 |
||
3*3 |
17.5 |
The configurations are obtained such that the CL, PL, FCL are determined and the kernel size is varied to obtain the three different error rates. This CNN can extract the spatial features by setting many kernels of varying sizes. The most common kernels are the convoluted 1*1, 2*2, and 3*3 kernels among which the 2*2, and 3*3 kernels learn the features accurately while 1*1 kernel helps in increasing the learning rate. Considering the configurations from the above table, the CNN configuration with less classification error is chosen by the HHFA. The best performance was obtained only after 13 to 18 iterations in all conducted HHFA runs. In this case, CNN configuration 1 has less classification error of 16.3 when the 1*1 kernel is used and hence it will become the optimal CNN architecture. This optimal CNN increases the classification of the intrusion datasets.
The assessment of the suggested HHFA-CNN prototype is achieved using two benchmark cases of cyber security problems, NSL-KDD and ISCX-IDS datasets. The tests are performed in MATLAB R2016b on a Windows 64 bit machine of processor Intel core i5 3470 3.2 GHz, RAM 4GB DDR3 and Storage of 500GB Intel SSD. The two benchmark instances are collected from https://www.unb.ca/cic/datasets/index.html.
The NSL-KDD consists of training, testing, 20% training and 20% testing data files. It also contains a subset file with difficulty levels. The NSL-KDD is an improved version of the popular KDDCUP99 dataset. NSL-KDD problem instance consists of 311,027 training samples and 77,289 testing samples which are classified as either normal or malicious. ISCX-IDS was created by monitoring the network activity for 7 days from Friday 11/6/2010 to Thursday, 17/6/2010. It consists of records of normal, HTTP Denial of Service attacks, Brute Force attacks and infiltration activities. Around 208,667 training samples and 78,400 testing samples that are classified as either normal or attack activities are used for this evaluation.
The proposed HHFA-CNN is implemented along with the existing HHIPSO-SVM
Algorithm / Metrics |
Accuracy (%) |
Precision (%) |
Recall (%) |
F-measure (%) |
Time (seconds) |
HH-SVM |
89.76 |
67.10 |
62.81 |
62.22 |
4.65 |
HHIPSO-SVM |
93.33 |
73.99 |
64.29 |
68.37 |
2.55 |
HHFA-CNN |
96.6667 |
93.9394 |
74 |
82.7860 |
1.38 |
DT |
80.14 |
72.33 |
61.25 |
85.12 |
5.62 |
FC |
82.98 |
74 |
60.28 |
61.35 |
6.58 |
GNBT |
80 |
69 |
70.23 |
76.52 |
5.35 |
It can be seen that the performance values of HHFA-CNN are higher than the HHIPSO-SVM and HH-SVM. HHFA-CNN has 96.6667% accuracy which is 3.3% and 6.9% higher than HHIPSO-SVM and HH-SVM. Likewise, HHFA-CNN has outperformed both HHIPSO-SVM and HH-SVM in terms of precision, recall and f-measure. HHFA-CNN has 20% and 26.9% high precision, 9.7% and 11.2% higher recall, 14.5% and 20.5% higher f-measure than the HHIPSO-SVM and HH-SVM models, respectively. The execution time taken by HHFA-CNN is also less than the HHIPSO-SVM and HH-SVM.
Algorithm / Metrics |
Accuracy (%) |
Precision (%) |
Recall (%) |
F-measure (%) |
Time (seconds) |
HH-SVM |
86.6 |
63.3 |
60.0 |
56.19 |
126 |
HHIPSO-SVM |
92.4 |
69.65 |
61.1 |
59.82 |
49.5 |
HHFA-CNN |
93.33 |
99.7 |
93.33 |
96.55 |
48.2 |
Similar to NSL-KDD, the performance obtained on ISCX-IDS shows that HHFA-CNN has outperformed the HHIPSO-SVM and HH-SVM models. HHFA-CNN has 0.97% and 6.7% higher accuracy, 30% and 36.3% higher precision, 32.2% and 33.3% higher recall, 36.7% and 40.4% higher f-measure than the HHIPSO-SVM and HH-SVM models,- respectively. HHFA-CNN also consumes 1.3 seconds and 77.8 seconds less than HHIPSO-SVM and HH-SVM models, respectively for executing the ISCX-IDS data.
The performance of the proposed HHFA-CNN is also compared with other popular algorithms from the literature that were tested on NSL-KDD dataset. The accuracy values of the algorithms namely HH-SVM
Algorithm |
Accuracy (%) |
HH-SVM |
89.76 |
SVM-IBGWO |
96 |
DRL |
89.78 |
Multi-CNN |
86.95 |
GA-CNN |
98.2 |
DRNN |
92.18 |
DLSTM |
86.99 |
CNN-LSTM |
96.47 |
HHIPSO-SVM |
93.33 |
|
|
From
In this study, a hyper-heuristic firefly optimization is intended for the improvement of the CNN design to determine the big data intrusion problems. In the first part, the CNN design issue is displayed as a multi-objective optimization issue dependent on the hyper-parameters. This problem is addressed by adopting the proposed HHFA structure which uses the high-level methodology and low-level heuristics of hyper-heuristic methodology on the standard firefly optimization. The proposed HHFA-CNN system was assessed on two network traffic datasets: NSL-KDD and ISCX-IDS. The outcomes demonstrated that the proposed HHFA-CNN model gives predominant execution than the other existing models. In the future, the proposed hyper-heuristic system can be used for multi-class attack detection. Also, other cyber security instances such as UNSW-NB15 will be tested. Moreover, the impact of feature dimension reduction techniques will also be investigated.