The statistical learning theory provides a framework for machine learning in the statisticsrelated fields. It is started, when the first generation computers become capable of conducting multidimensional analysis of reallife problems^{ }
Support vector machine (SVM) is currently a popular topic in the statistical learning area. SVM was first introduced in Conference on computational learning theory (COLT). SVM is one of the most successful algorithms for classification and regression without any distributional assumption
Robust Support vector regression in primal with asymmetric Huber loss by Balasundaram. S and Yogendra meena (2019) ^{4}. Application of Support Vector Regression for Modeling Low Flow Time Series by Bibhuti Bhusan Sahoo et al (2019) ^{5}. Linear Regression Supporting Vector Machine and Hybrid LOG FilterBased Image Restoration by Khalandar Basha. D and Venkateswarlu. T (2020) ^{6}. Significance SVR for Image Denoising was given by Bing Sun and Xiaofeng Liu (2021) ^{7}. SVR model for seasonal time series data by Hanifah Muthiah et al (2021) ^{8}. Robust regression using support vector regressions by Mostafa Sabzekar (2021) ^{9}. A New Support Vector Regression Model for Equipment Health Diagnosis was studied by Qinming Liu et al (2021) ^{10}. Projection wavelet weighted twin support vector regression by Wang et al (2021) ^{11}. Evolution of Support Vector Machine and Regression Modeling by Raquel Rodriguez Perez (2022) ^{12}. An overview on twin support vector regression was discussed by Huajuan Huang et al (2022) ^{13}. Shortterm forecasting of COVID19 using support vector regression by Claris Shoko and Caston Sigauke (2023) ^{14}. Adaptive Weighted Least Squares Support Vector Regression Based on Genetic Algorithm was studied by Maosheng Wei et al., (2023) ^{15}.
One class of SVM methods is the Support Vector Regression (SVR), which is established as a robust technique for constructing datadriven and nonlinear empirical regression models. The concept of SVR has been applied to various fields by the researchers. This paper is organized as follows. Section 2 gives a brief introduction to the methodology of regression procedures. Section 3 presents empirical comparisons demonstrating the advantages of the RWSVR procedure by real datasets and simulation study in results and discussion. Finally, Section 4 provides a conclusion.
Regression analysis is one of the most extensively used statistical techniques, which is used in almost all fields such as engineering, medical science, commerce, marketing research, social science, etc.
The least squares (LS) estimator of regression parameter is obtained by minimizing the sum of squares of deviations between the actual and predicted values of the response variable. The simple linear regression model is defined as
where y is the dependent variable, x is an independent variable,
The standard multiple regression model in matrix notation is given as
where
The ordinary least squares estimates for linear regression are optimal when all of the regression assumptions are valid. Least squares regression may perform poorly when some of these presumptions are false. By requiring fewer strict assumptions than least squares regression, robust regression techniques offer an alternative. These techniques make an effort to reduce the impact of extreme cases in order to better fit the majority of the data. Outliers influence is diminished by robust regression, which causes their residuals to grow and become more noticeable.
Consider a dataset set of size n such that
where
The Mestimator class of estimators is the robust regression technique used in this study. Mestimators aim to reduce the sum of a selected function acting on the residuals
Since
The SVR (Support Vector Regression) or is a supervised learning technique that can, given one dependent variable and one or more independent variables, find a separating hyperplane that has a maximum deviation value equal
The SVR can be formulated as quadratic programming problem, in special, the dual formulation of the problem is indicated because it reduces the number of constraints and allows the application of the Kernel Trick to solve nonlinear problems.
Suppose the training data points given are
The function
where
In SV machines the soft margin loss function, similarly slack variables
Vapnik formulated as
The constant
The main aim is to construct a Lagrange function from the objective function and the corresponding constrains, by involving the dual set of variables can be written by,
Using the equation, the dual variables are removed and which can be reconstructed as
Thus
This function depicts the socalled Support Vector (SV) Expansion.
Support vector machines and other models employing the kernel trick do not scale well to large numbers of training samples or large numbers of features in the input space, several approximations to the RBF kernel have been introduced. The commonly used kernel function is Radial Basis Function kernel (RBF) and it is expressed as follows
In this paper, a robust weightbased support vector regression is proposed to address the problem of outliers in Support Vector Regression (SVR). When performing a sum, integral, or average, a weight function is a mathematical tool that is used to give some observations in the same set more "weight" or influence on the outcome than other observations in the same set. Hampel’s three part Redescending Mestimator was proposed by Hampel. Hampel's local method entails building an estimator with a predetermined influence function, which in turn establishes an estimation procedure's sensitivity to significant outliers, rounding off, and other qualitative robustness characteristics.
Mestimators are a broad class of extremum estimators for which the objective function is a sample average. Both nonlinear least squares and maximum likelihood estimation are special cases of Mestimators. The definition of Mestimators was motivated by robust statistics, which contributed new types of Mestimators. The statistical procedure of evaluating an Mestimator on a data set is called Mestimation. Maximum Likelihood Estimators (MLE) are thus a special case of Mestimators. With suitable rescaling, Mestimators are special cases of extremum estimators.
The function ρ, or its derivative, ψ, can be chosen in such a way to provide the estimator desirable properties (in terms of bias and efficiency) when the data are truly from the assumed distribution, and 'not bad' behaviour when the data are generated from a model that is, in some sense, close to the assumed distribution.
The Hampel’s three part Redescending Mestimator has three tuning constants
Its score function (r) is given by
Its weight function
where
Steps of the proposed RWSVR are as follows,
The model can be further modified as
This proposed model is said to be Robust weighted based support vector Regression (RWSVR) and it is applied for real and simulation study.
For the purpose of illustrating the performance of Robust Weighted Support Vector Regression over linear, SVR and robust regression models, the real and simulation study has been performed. The most commonly used classical procedure is the Least Squares, which is less efficient and very sensitive when the data contains outliers. This work mainly focuses on increasing the accuracy of Proposed method. It overcomes the drawback by adding weight to each sample observations. The efficiency of the proposed method has been observed with the existing regression methods such as Least Squares (LS), Robust Linear Model (RLM) and Support Vector Regression (SVR) by computing various error measures such as Mean Absolute Error (MAE), Median Absolute Error (MDAE), Mean Absolute Percent Error (MAPE) and Root Mean Square Error (RMSE). The proposed RWSVR method is used for the researchers whenever the data contains outliers.
This section presents the experimental results being carried out under real datasets having one, two and more than two predictors by considering cases of three different real datasets.
Case 1: starsCYG dataset  For this study, data taken from the R package robustbase, which contains 47 stars in the HertzsprungRussell Diagram of the Star Cluster CYG OB1. It has one predictor variable, logarithm of the effective temperature at the surface of the star (log.Te) and a response variable, logarithm of its light intensity (log.light). In that dataset, 9 outliers are identified using cook’s distance.
Case 2: Carbonation dataset  It contains 12 observations. The first variable is the Temperature, which is a predictor variable and the second variable Pressure is another predictor variable and Carbonation, which is a response variable. In this dataset, one outlier is appeared using cook’s distance.
Case 3: Prostate dataset  The dataset has 97 observations each having 8 independent variables namely lweight (log of prostate weight), age, lbph (log of benign prostatic hyperplasia amount), svi (seminal vesicle invasion), lcp (log of capsular penetration), gleason (Gleason score), pgg45 (percentage Gleason scores 4 or 5), lpsa (log of prostate specific antigen) and one dependent variable lcavol (log of cancer volume). This dataset contains 9 outliers when it is checked using cook’s distance.
The error measures are computed for the datasets under with and without outliers and summarized in



















0.478 (0.264) 
0.098 (0.054) 
0.478 (0.240) 
0.552 (0.309) 
0.682 (0.601) 
0.152 (0.149) 
0.571 (0.547) 
0.823 (0.722) 
0.542 (0.450) 
2.314 (2.217) 
0.517 (0.403) 
0.667 (0.557) 

0.477 (0.264) 
0.098 (0.054) 
0.474 (0.239) 
0.553 (0.309) 
0.677 (0.599) 
0.15 (0.148) 
0.573 (0.538) 
0.823 (0.722) 
0.541 (0.449) 
2.340 (2.229) 
0.509 (0.387) 
0.667 (0.557) 

0.280 (0.222) 
0.057 (0.045) 
0.223 (0.176) 
0.354 (0.282) 
0.55 (0.498) 
0.102 (0.102) 
0.557 (0.494) 
0.684 (0.546) 
0.405 (0.370) 
1.877 (1.831) 
0.234 (0.194) 
0.563 (0.503) 

0.230 (0.202) 
0.046 (0.041) 
0.167 (0.147) 
0.306 (0.273) 
0.518 (0.498) 
0.090 (0.098) 
0.555 (0.494) 
0.680 (0.531) 
0.390 (0.353) 
1.861 (1.734) 
0.217 (0.176) 
0.548 (0.486) 
(.)without outliers
This section deals with the results of the simulation environment. The efficiency of LS, RLM, SVR and RWSVR procedures have been studied by computing the error measures.
Here, the simulation study has been performed by considering three cases with the sample of sizes 50, 100 and 500. The details are briefly described as follows:
Case 1: Let, X ~ N(µ, σ), where µ = 30 and σ = 0.5,
The regression model is
Error distribution follows normal with µ = 0 and σ = 1.
The simulated model is contaminated with N(µ, σ), where µ = 30 and σ = 1.01 of 0%, 5%, 10% and 20% levels.
Case 2 : X ~ N(µ,
The regression model is
Error distribution follows normal distribution with µ= (
The simulated model is contaminated with N(µ,
Case 3 : X ~ N(µ,
The regression model is
Error distribution follows normal distribution with µ= (
The simulated model is contaminated with N(µ,
Here, the contamination levels of 0%, 5%, 10% and 20% are considered for checking the efficiencies of proposed method over the existing methods and thus obtained results are summarized in
A novel robust procedure namely, Robust Weighted Support Vector Regression (RWSVR) is proposed to deal with the outlier sensitivity problem in Support Vector Regression (SVR). It improves robustness by utilizing the Hampel weight function in the support vector regression. Traditional regression models performs bad when the cause of outliers. To overcome this drawback, RWSVR concept has been proposed. The robust Hampel’s weight function is extended into kernel space to get more accuracy over the traditional SVR. Newly developed kernel function has a higher mapping power than the commonly used linear, polynomial and RBF kernel functions. Experimental results have shown that the proposed RWSVR can reduce the effect of outliers and yield higher accuracy rate than standard SVR does when the data set is contaminated by outliers. This study concluded that the proposed RWSVR is more applicable in almost all areas of statistical learning, specifically when building prediction models with/without contamination in the data. Also, this can be applied almost all the fields of statistical learning in the context of construction of prediction models. Future study is that, may consider depth function as kernel and apply in the contest of pattern recognition.
LS 50 0.00 1.197 0.017 1.028 1.492 3.819 1.772 3.240 4.764 3.665 0.041 3.132 4.582 0.05 1.182 0.017 1.013 1.474 3.837 1.825 3.224 4.784 3.738 0.042 3.205 4.651 0.10 1.182 0.017 1.013 1.474 3.837 1.825 3.224 4.784 3.738 0.042 3.205 4.651 0.20 1.182 0.017 1.013 1.474 3.837 2.400 3.223 4.785 3.738 0.042 3.205 4.651 100 0.00 1.193 0.017 1.016 1.490 3.920 1.628 3.323 4.895 3.885 0.044 3.286 4.859 0.05 1.183 0.017 1.014 1.478 3.899 1.960 3.306 4.876 3.841 0.043 3.263 4.793 0.10 1.183 0.017 1.014 1.478 3.899 1.944 3.306 4.876 3.841 0.043 3.263 4.793 0.20 1.183 0.017 1.014 1.478 3.899 1.940 3.306 4.876 3.841 0.043 3.263 4.793 500 0.00 1.196 0.017 1.017 1.496 3.983 2.822 3.377 4.985 3.971 0.045 3.362 4.972 0.05 1.193 0.017 1.011 1.494 3.981 3.162 3.372 4.988 3.971 0.045 3.362 4.972 0.10 1.193 0.017 1.011 1.494 3.981 3.164 3.372 4.988 3.966 0.045 3.336 4.980 0.20 1.193 0.017 1.011 1.494 3.981 3.176 3.372 4.988 3.966 0.045 3.336 4.980 RLM 50 0.00 1.194 0.017 1.021 1.494 3.799 1.770 3.182 4.776 3.629 0.041 3.031 4.603 0.05 1.179 0.017 1.003 1.476 3.815 1.842 3.164 4.796 3.693 0.041 3.065 4.677 0.10 1.179 0.017 1.003 1.476 3.815 1.842 3.164 4.796 3.693 0.041 3.064 4.677 0.20 1.179 0.017 1.003 1.476 3.815 2.419 3.164 4.796 3.693 0.041 3.064 4.677 100 0.00 1.192 0.017 1.014 1.491 3.910 1.628 3.307 4.901 3.866 0.043 3.235 4.869 0.05 1.181 0.017 1.013 1.479 3.888 1.938 3.288 4.881 3.823 0.043 3.214 4.802 0.10 1.181 0.017 1.013 1.479 3.888 1.922 3.287 4.881 3.823 0.043 3.214 4.802 0.20 1.181 0.017 1.013 1.479 3.888 1.918 3.287 4.881 3.823 0.043 3.214 4.802 500 0.00 1.196 0.017 1.016 1.496 3.981 2.807 3.373 4.986 3.967 0.045 3.353 4.974 0.05 1.193 0.017 1.011 1.494 3.978 3.175 3.367 4.989 3.967 0.045 3.353 4.974 0.10 1.193 0.017 1.011 1.494 3.978 3.177 3.367 4.989 3.963 0.045 3.332 4.982 0.20 1.193 0.017 1.011 1.494 3.978 3.189 3.367 4.989 3.963 0.045 3.332 4.982 SVR 50 0.00 0.998 0.014 0.680 1.362 3.101 1.158 1.623 4.364 2.897 0.033 1.386 4.199 0.05 1.011 0.014 0.718 1.375 3.081 1.303 1.551 4.374 2.899 0.033 1.421 4.123 0.10 1.011 0.014 0.720 1.375 3.081 1.303 1.551 4.374 2.899 0.033 1.421 4.124 0.20 1.011 0.014 0.721 1.375 3.081 1.393 1.553 4.374 2.900 0.033 1.421 4.124 100 0.00 1.069 0.015 0.810 1.423 3.306 1.159 1.947 4.547 3.085 0.035 1.473 4.357 0.05 1.060 0.015 0.792 1.416 3.276 1.487 1.947 4.533 3.044 0.035 1.466 4.283 0.10 1.060 0.015 0.792 1.416 3.276 1.484 1.946 4.533 3.044 0.035 1.467 4.283 0.20 1.061 0.015 0.792 1.416 3.276 1.482 1.944 4.533 3.044 0.035 1.466 4.283 500 0.00 1.158 0.017 0.957 1.479 3.690 2.685 2.846 4.818 3.441 0.039 2.310 4.582 0.05 1.154 0.017 0.952 1.476 3.683 3.108 2.817 4.828 3.441 0.039 2.310 4.582 0.10 1.154 0.017 0.952 1.476 3.683 3.109 2.816 4.828 3.454 0.039 2.320 4.596 0.20 1.154 0.017 0.952 1.476 3.683 3.129 2.814 4.828 3.454 0.039 2.321 4.597 RWSVR 50 0.00 0.991 0.014 0.673 1.358 3.100 1.144 1.596 4.347 2.855 0.033 1.360 4.159 0.05 1.000 0.014 0.710 1.359 3.071 1.236 1.544 4.363 2.876 0.033 1.411 4.103 0.10 1.001 0.014 0.710 1.359 3.071 1.236 1.544 4.363 2.876 0.033 1.411 4.104 0.20 1.000 0.014 0.709 1.359 3.071 1.326 1.474 4.364 2.877 0.033 1.412 4.104 100 0.00 1.066 0.015 0.804 1.422 3.325 1.145 1.030 4.547 3.077 0.035 1.473 4.353 0.05 1.058 0.015 0.778 1.411 3.260 1.477 1.945 4.516 3.043 0.035 1.456 4.282 0.10 1.058 0.015 0.776 1.411 3.260 1.475 1.939 4.516 3.035 0.035 1.457 4.277 0.20 1.058 0.015 0.775 1.411 3.260 1.473 1.940 4.516 3.043 0.035 1.465 4.282 500 0.00 1.157 0.017 0.957 1.478 3.688 2.676 2.839 4.818 3.440 0.039 2.309 4.581 0.05 1.154 0.017 0.944 1.476 3.671 3.070 2.812 4.822 3.441 0.039 2.309 4.582 0.10 1.154 0.017 0.951 1.476 3.681 3.072 2.811 4.822 3.443 0.039 2.300 4.587 0.20 1.154 0.017 0.951 1.476 3.681 3.094 2.801 4.802 3.443 0.039 2.300 4.587