ML algorithms like Support Vector Machine (SVM), Decision Trees, Random Forest, and the sensational Convolutional Neural Networks for image classification tasks are few examples for supervised classification problems. These algorithms are used to automate a variety of securitycritical tasks, including malware detection, traffic forecasting for unmanned vehicles, realtime object detection, online fraud detection, and many more. In supervised learning, the training dataset provides the model with patterns of input data and matching class labels. The training dataset's quality determines how well the machine learning model performs. However, traditional machine learning architectures are susceptible to potential vulnerabilities and lack security measures.
Data poisoning attacks have become prevalent in recent years, specifically targeting machine learning algorithms used for classification problems using supervised learning. Fahri Anıl Yerlikaya et al. ^{1} have demonstrated that several machine learning and deep learning models used for spam, malware, and cancer detection datasets are susceptible to attacks by testing their resilience. ML hackers deploy a variety of attack methods to undermine the effectiveness of the model, breach security measures aimed at the availability and integrity of the dataset, and ultimately compromise the dependability of the ML model's endtoend usefulness. According to a recent survey by the authors of ^{2}, data poisoning attacks rank first among ML threats that significantly impact industrial machine learning applications. Additionally, identifying these attacks is challenging.
Training datasets are targets of data poisoning attacks. The integrity of the dataset is put at risk by this attack, which inserts false or attack samples into the training data. Neither a machine learning algorithm nor humans can distinguish between these samples. When such misleading occurrences exist in the training data, the model learns erroneous decision limits, which reduces the efficiency of the machine learning process and its outcomes. Poisoning attack samples can be constructed using a number of methods and incorporated to a training set for offensive purposes. This work focuses exclusively on gradientbased techniques for contaminating the training dataset. Gradientbased techniques include, for instance, Madrey et al.'s attack (MAD), Basic Iterative Method (BIM), Momentive Iterative Method (MIM), Carlini and Wagner Attack (CW), Projected Gradient Descent (PGD), and Fast Gradient Sign Method (FGSM) ^{3}. These techniques take the original training samples from the dataset and use a specific method to artificially alter the selected training samples into a poisoned data set. Poisoned datasets are those that contain samples that have been contaminated; nonpoisoned or clean datasets are those that do not contain samples that have been contaminated.
There are a plethora of datapoisoning attack detection methods developed by the researchers in this field. Anisie Uwimana et al., in their study, assess the reliability of a Mahalanobis distancebased confidence score identifying malariaparasitized and uninfected cells. The classification prediction confidence score for every class is determined by computing Mahalanobis distance to detect anomalies in their distributions. The authors then improve the performance of deep learning models on adversarial and outofdistribution samples by training the neural networks with a plausible additional noise to the input data. While the detection accuracy of attack samples on text data exceeds 95%, its performance in detecting attack samples on image data is substandard^{ 4. }Yubo Hou and his coauthors trained the neural network using the Generative Adversarial Network (MDAN), which computes the Mahalanobis distance score for a given input. A high score indicates an anomaly. This approach only obtained a 63% detection accuracy on the image data, which is not desirable for security critical applications ^{5}. Another method of thresholding the distance from a Gaussian distribution fitted to the target class representations, the author of ^{6} investigates a technique for identifying adversarial samples and states that the Mahalanobis distance detecting technique is the most vulnerable to attack. Fabio Carrara and fellow authors have proposed ENAD, an ensemble approach for adversarial detection that improves performance by integrating layerspecific scores from three independent detectors (LID, Mahalanobis, and OCSVM), achieving significantly enhanced performance on benchmark datasets, methods, and attacks but requiring training ^{7}. Ibrahim Aliyu et al., in their study, have performed statistical analysis on the attack samples with the help of distancebased statistical tests to understand the statistical deviations of the attack samples from the original ones and detect their presence in the training set. Then, the ML model trains these attack samples to learn their patterns and identify them as malicious ones. This entire procedure is called adversarial training (AT)^{ 8. }Although these techniques perform well on known attacks, they are still susceptible to unknown attacks and lack generality. The current detection strategies either protect themselves by improving robustness of model or rely on additional ML model which is trained on the attack patterns. Either of the strategies are ineffective when the adversary is aware of the detection methods and employ new attack to explode them ^{9}. Scaling to large models becomes difficult due to increased cost and resource requirements. Hence, a generalized approach to identify antagonistic situations and a secured ML architecture is sought.
The main contributions of this paper are:
A Secured Machine Learning Architecture to safeguard against gradientbased data poisoning attacks.
A novel detection method, MLFilter Detection Algorithm (MLFDA), to identify data poisoning attacks in lieu of adversarial training.
A Statistical Perturbation Bounds Identification Algorithm (SPBIA) was developed to determine the perturbation bounds of the attack dataset.
The MLFilter efficiently detects known and unknown attacks with high detection rates. Thus, the proposed method achieves generalized detection, which was a limitation of the earlier methods.
The remaining paper is organized into following sections. Section 2 provides the design and implementation of secured ML architecture and MLFilter. Section 3 presents the results and discussion. Section 5 concludes with some final thoughts.
The design and implementation of the proposed Secured Machine Learning Architecture (SMLA) is presented in this section. The SMLA integrates the MLFilter (MLF) as a secured feature into the traditional ML Architecture to safeguard against the gradientbased data poisoning attacks. An MLFilter Detection Algorithm (MLFDA) along with Statistical Perturbation Bounds Method (SPBM) contributes in identifying the presence of Poisoned data in the training dataset.
The proposed machine learning architecture introduces a Machine Learning Filter (MLFilter) between the data input and ML model. When the adversary tries to input the malicious data (i.e., the poisoned data samples) into the system, this data is redirected to the MLFilter rather than the ML Model directly, as shown in
The design overview of MLFilter  Detection Algorithm model is shown in
The symbols used to explain the methodology are
MLFilter Detection Algorithm (MLFDA) receives the input dataset and determines whether it is a poisoned dataset or not according to Algorithm 1. It requires a dataset to perform statistical operations on it, a parameter
Let ID(S)new denotes the input sample dataset fed to the MLFilter.
New dataset ID(S)new
Divide the input dataset into clusters using DBSCAN
n = number of classes output by DBSCAN
Loop
i= 1
# [LDM (C1, C2), LDM (C2, C3), LDM (C1, C3)]
Until n
End Loop
Res = SPBIA(
if Res = True then Determine as Poisonous
else Pass the sample ID(S)new to the ML model.
First, the DBSCAN algorithm divides the input dataset into several subsets based on their similarity scores. Then, a unique pair of the subsets are chosen at random and given as inputs to the Laplace Pairwise Deviation Metric (LPM) function to analyze the statistical characteristics of the dataset. For example, assume there are ‘n’ classes in the dataset, C_{1}, C_{2}, C_{3}…., C_{n}. The elements of C_{1}, C_{2}, ... C_{n} could be the group of data with similar probability distributions. (C_{1}, C_{2}), (C_{2}, C_{k}), …. (C_{k}, C_{n}). All the pairs went through the statistical process to examine anomalies in the subsets as shown in
The Laplacian kernel pairwise distance metric (LPDM), is a statistical distance metric, finds the probability distribution distance measures between a pair of vectors (in this case, the clean and poisoned samples) according to
Where, xx' 1 denotes the L1norm between a and b, and
# Modified Laplace function to calculate deviation
def Laplace_deviation(x, y, gamma):
dist = np.linalg.norm(xy)
deviation = 1np.exp(gamma*dist)
return deviation.
# End of the function
This code snippet calculates the deviation of the poisoned data set from the nonpoisoned dataset. The resulting deviation measures of the LDM test are stored in the vector space
Now, the Detection algorithm calls the SPBIA function (detailed in Algorithm 3) to determine if any of the values of the vector space
The SPBIA algorithm derives the deviation’s threshold interval
The probability distribution of a poisoned dataset differs from the original dataset. These differences are captured as prior knowledge/Experimental Data (ED) with the help of statistical pairwise deviation metric Laplace deviation metric (LDM) defined in
Let
Loop until LDM is computed ∀ the poisoned sets
#Populate deviation measures according to
# append
# i = f, g, and c where f = fgsm, g=pgd, and c = CW attack samples
return
The prior knowledge data has been utilized by the SPBIA algorithm in deriving the statistical perturbation bounds of the poisoning attacks according to Algorithm 3. The algorithm takes the ED values as input. First, the lower and upper bounds of the deviation measures of each poisoned sets ED(S(
Let
The maximum deviation between the poisonous and clean sample datasets is the upper bound
A statistical technique for determining the parameters of a probability distribution that has been conjectured based on the results of some observed data is called maximum likelihood estimation (MLE). By maximizing a likelihood function, the observed facts are rendered as probable as feasible in light of the underlying statistical model. The likelihood function's maximum point in the parameter space (
In this study, the MLE function is computed to pick the point estimates of Lower and upper bounds for
Find
Find
if
Res = True #boolean value (Poisoned dataset)
else
Res = False #boolean value (NonPoisoned dataset)
In this section, we briefly describe the experimental setup to implement the MLFilter and the metrics used to evaluate the MLFDA performance. We report the findings of the experimental results and discussion on the outcomes.
As discussed earlier in the methodology section, the MLFDA decision is based on the SPBIA output based on the threshold interval
To analyze the statistical deviations between the nonpoisoned and poisoned datasets of MNIST, we require the poisoned dataset of MNIST. The poisoned MNIST sample of 16 sets were synthetically generated using Nicolas Carlini & Wagner et al. attack algorithms available from github source ^{13}.
We evaluated the proposed MLFDA detection accuracy on three benchmark datasets, namely CIFAR10, FashionMNIST, and CIFAR100
The MLFilter employs an unsupervised densitybased spatial clustering algorithm of applications with noise (DBSCAN), which divides the input dataset into smaller subsets, as discussed in algorithm 1. It uses statistical functions such as Principal Component Analysis (PCA) / histogramoriented gradient (HOG) methods for feature extraction and distancebased clustering to divide the dataset into smaller groups.
Two evaluation metrics used to check the performance of the proposed statistical method and ML filter discussed here.
The true positive metric here calculates the percentage of poisoned image set detection with respect to the total number of image samples in the training set (clean + poisonous) for each sample set tested.
Where P_S represents poisonous samples detected as poisonous, C_S for clean samples detected as clean and T_S for total number of samples (i.e., clean + poisonous 7000 in our case).
Conjecture (H0): The Laplacian deviation measure between C_{i} and C_{j} denoted by ϴ does not belongs to SPBs if the image samples are non – poisonous.
Research hypothesis (Ha): The Laplacian deviation measure between C_{i} and C_{j} denoted by ϴ belongs to SPBs if the image samples are Poisonous.
The percentage of accuracy that the SPBM detected the poisonous samples is calculated according to the
where TP, FN, FP, TN, refers to true positive, false negative, false positive and true negative rates respectively.
A high rate of accuracy to accept the hypothesis indicates that the input dataset is nonpoisonous. A high rate of accuracy to accept the Alternative hypothesis indicates that the input data is poisonous.
The proposed MLFilter Detection Algorithm is tested for its detection accuracy on 16 test (poisoned) datasets with different parameter settings as shown in









Set1 
MNIST 
FGSM, PGD, C&W 
0.2*e6 
✔ 
 
Set2 
MNIST 
FGSM, PGD, C&W 
0.8*e6 
✔ 
 
Set3 
MNIST 
FGSM, PGD, C&W 
1.5*e6 
✔ 
 
Set4 
MNIST 
FGSM, PGD, C&W 
2.5*e6 
✔ 
 
Set5 
CIFAR 10 
FGSM, PGD, C&W 
0.1*e6, 0.5*e6, 1*e6, 2*e6, 3*e6 
 
✔ 
Set6 
FashionMNIST 
FGSM, PGD, C&W 
0.1*e6, 0.5*e6, 1*e6, 2*e6, 3*e6 
 
✔ 
Set7 
CIFAR 100 
FGSM, PGD, C&W 
0.1*e6, 0.5*e6, 1*e6, 2*e6, 3*e6 
 
✔ 
Set8 
MNIST 
BIM, MIM, MAD 
0.1*e6, 0.5*e6, 1*e6, 2*e6, 3*e6 
 
✔ 
Set9 
CIFAR 10 
BIM, MIM, MAD 
0.1*e6, 0.5*e6, 1*e6, 2*e6, 3*e6 
 
✔ 
Set10 
FashionMNIST 
BIM, MIM, MAD 
0.1*e6, 0.5*e6, 1*e6, 2*e6, 3*e6 
 
✔ 
Set11 
CIFAR 100 
BIM, MIM, MAD 
0.1*e6, 0.5*e6, 1*e6, 2*e6, 3*e6 
 
✔ 
Set12 
MNIST 
FGSM, PGD, C&W, MIM, BIM, MAD 
1 
 
✔ 
Set13 
MNIST 
FGSM, PGD, C&W, MIM, BIM, MAD 
2 
 
✔ 
Set15 
FashionMNIST 
FGSM, PGD, C&W, MIM, BIM, MAD 
1 
 
✔ 
Set16 
FashionMNIST 
FGSM, PGD, C&W, MIM, BIM, MAD 
2 
 
✔ 
This section reports the deviation measures obtained after LDM test computed between the original MNIST dataset and synthetically generated poisoned datasets of FGSM, PGD and CW are presented in
To estimate the range of deviations resulting for FGSM, PGS and CW attacks, the lower bounds (minimum) and upper bounds (maximum) are determined by SPBIA from the acquired ED values for the MNIST original and poisoned datasets. The minimum deviation observed for FGSM, PGD and CW attacks
The MLE function is applied on the SPBIA outcomes to find the point estimates of lower bound and upper bounds. The outcomes of MLE function are given as follows:
MLE (Lower Bound) (
MLE (Upper Bound) (
The
The cluster formation of the input (nonpoisoned and poisoned) dataset to the MLFilter is shown in
The significance test mentioned in 3.1.2 is conducted for detection of the known and unknown attack category of datasets listed in the
The
The results of TPR accuracy obtained for BIM, MIM, MAD, FGSM, PGD and CW on MNIST, Fashion MNIST, CIFAR10 and CIFAR100 poisoned sets are shown in
In this section, the results of our proposed method are compared with the existing works in the literature.





ED 
FGSM, PGD 
2.41,3 
✔ 
X 
FGSM, PGD 
0.1 *e6 to 3.0*e6 
X 
X 

MMD 
FGSM, PGD 
2.41,3 
✔ 
X 
FGSM, PGD 
0.1 *e6 to 3.0*e6 
X 
X 

ECMF 
FGSM, PGD 
2.11 
✔ 
X 
FGSM, PGD 
0.1 *e6 to 3.0*e6 
X 
X 

Threshold 
FGSM, PGD 
2.0, 3.0 
✔ 
X 
FGSM, PGD 
0.1 *e6 to 3.0*e6 
X 
X 

Mahalanobis 
FGSM, CW 
2, 3 
✔ 
X 
FGSM, CW 
0.1 *e6 to 3.0*e6 
X 
X 


FGSM, PGD, CW, BIM, MIM, MAD 
1, 2, 0.1 *e6 to 3.0*e6, 1, 2 
✔ 
✔ 







Mahalanobis + GAN 
MNIST, CIFAR10, ImageNet 
MNIST, CIFAR10, ImageNet 
FGSM C&W 
75.68% 
Yes 
No 
Mahalanobis + ResNet 
SVHN, MNIST, CIFAR10 
SVHN, MNIST, CIFAR10 
FGSM 
99.32% 
Yes 
No 

• * 
MNIST, CIFAR10, FashionMNIST, CIFAR 100 




*The datasets are not used for training. Instead, only MNIST dataset used for deriving perturbation Bounds.
The earlier Mahalanobis distancebased methods cannot determine whether or not the dataset is poisonous without adversarial training. This is because, the data poisoning attacks create poisoned images with a specific perturbation value called epsilon size. The resulting poisoned image features vary with the epsilon size, the attack type, and the target dataset (in this case, images), and the model used for training them. Due this reason, when there is a variation in the epsilon size, attack type, the target dataset, the resulting image patterns vary accordingly. Hence, the earlier Mahalanobis detection methods are ineffective for malicious instances with distinct characteristics adapted by new attacks and lack generalized detection. Our MLFilter is based on the statistical deviations caused by the gradientbased poisoning attacks with a wide range of epsilon perturbations. This is the reason why our method is free from probability distributions of datasets, adaptable to new attacks and ML Models as well. Thus, the proposed method MLFilter achieved generalized detection of known and unknown attacks which substantiates our claim made earlier in this paper. Also, the proposed Secured ML Architecture integrate and leverage the capabilities of MLFilter to safeguard against gradientbased data poisoning attacks effectively.
Securing the Machine Learning models is necessary in the context of adversarial machine learning. The Mahalanobis distancebased methods are dependent on adversarial training and lack generality in detecting new attacks. The proposed method MLFilter is independent of dataset’s probability distribution, type of attack and ML models. The secured architecture, leveraging the attackagnostic detection capabilities of proposed MLFilter method successfully identifies the data poisoning attack with 99% TPR to known attacks and 98.96% to unknown attacks. Thus, it achieves generalized detection of unknown attacks without need for adversarial training, substantiating the claims we made earlier in this paper.
In this study only image datasets and CNN architecture were considered. The decision factor of MLFilter is based on the range of epsilon deviations derived by SPBM and need to refine its bounds to accommodate more detection features to MLFilter. Also, usage of DBSCAN for splitting the dataset is a timeconsuming task. In future more timeefficient method for anomaly detection from the dataset is needed.