The current work proposes SMOTS, Spider monkey optimisation algorithm with tabu search to find number of cluster automatically. In this work to balance between exploration and exploitation we introduced neighbour search mechanism with memory component of tabu search to update the local leader stage of SMO. Further to improved fitness function, the Gaussian kernel distribution is implemented as punishment function to compact separated index (CSIndex). Finally to improve dynamic nature of fitness function with data distribution it is made function of intra to inter cluster distance.
In clustering N data points are grouped in to K_{max} clusters, so that intra cluster distance will be minimum and intercluster distance will be maximum.
It is derived as,
N Data points presented as
Individual with D attribute written as
Which to be grouped in K_{max} cluster as
Subjected to constraint,
Data points are assigned to respective group based on Euclidian distance.
Spider monkey optimization algorithm is designed by Jagdish Chand Bansal
The SMO algorithm is execute with six stages to complete the food searching task.
Initially the population is generated randomly.
Let,
Where, the lower and upper limit is written as
To update the position of each individual Eq.5 is used in which position values of local leader and individual is used. Further to check position fitness function is used.
Where, the
The position is updated with position value of global leader and other individual of the group as per Eq.6
The
Where,
The position value of global leader is change based on best fitness value derived from population. The counter of global limit is increment by one if position is not updated.
The position value of local leader is change based on best fitness value derived from particular group. The counter of local limit is increment by one if position is not updated.
After reaching to predefine number of count if local leader position is not change then each member of that group is change randomly or by Eq.8 with probability, Pi.
After reaching to the predefined number of count if global leader is not then, entire group is form subgroup with minimum two candidates and with maximum of group size by increment of one. Again same process is repeated, but if still no improvement then all subgroups are merged and form single group.
In SMO the position is updated randomly and its result in slow convergence, rapid breaking and merging of group
Tabu search is derived by Fred Glover
To improve local search we modified local leader search strategy by tabu search algorithm. The detail steps are given in following algorithm (
The SMOTS start by initialising input parameters and data set. The initial positions are generated randomly based on input data set which consist cluster centres and activation threshold. First fitness is calculated and then SMO update the initial position iteration by iteration. Further explanation is given in algorithm 7 (
The particle position is represented as,
In following example, two cluster centres with three attributes can be shown as
The activation threshold value decides whether current cluster is activated or not. If the activation threshold value is grater then cut off threshold then cluster is activated otherwise deactivated and it is fix to 0.5. It may be chances of zero cluster is activated in that cases two clusters with maximum threshold are selected by applying this step minimum two clusters are activated.
In this work compact separated index (CS Index) is applied as a fitness function. For better clustering result smaller value of CS Index is preferred.
The CS measure is written as,
Where,
Where, z_{i} and z_{j} are the centres of the clusters i and j.
For better clustering results we combine Gaussian kernel function in fitness function and it is presented as,
Where, σ is the standard deviation. After Gaussian kernel function, The CS measures modified as,
The automatic clustering has tendency to grouped in to two clusters, to remove this limitation Turi
Where, the k is used instead of x. The mean (µ) and standard deviation (σ) values are initialised during experiments.
Finally, the fitness function is made function of Intra and intercluster distance.
The proposed SMOTS algorithm is implemented in MATLAB. In this part SMOTS is validated with wellknown as well as seven previously published algorithms.
The proposed SMOTS algorithm is tasted on reallife data set taken from UCI repository




Vowel 
6 
3 
871 
Iris 
3 
4 
150 
Wine 
3 
13 
178 
Seed 
3 
7 
210 
Ecoli 
8 
7 
336 
Thyroid 
3 
5 
215 
In SMOTS and other comparison algorithms parameters are set as, maximum iterations equal to 100. The number of runs is 20. Population size is 30.




Iris 
10 
0 
0 
Ecoli 
30 
0.5 
0 
Vowel 
30 
0 
0 
Wine 
10 
1 
0.5 
Thyroid 
10 
0.5 
0 
Seed 
10 
0 
0 
As shown in








Differential Evolution 
Mean 
4.0500 
5.7500 
3.5000 
3.8500 
5.9000 
2.0000 
Std 
0.2236 
2.4252 
0.8885 
1.3485 
2.1740 


Genetic Algorithm 
Mean 
3.7000 
7.6500 
3.5500 
3.6500 
8.8000 
2.3000 
Std 
0.4702 
2.4979 
0.5104 
1.2587 
2.2850 
1.1286 

Grey wolf Optimisation 
Mean 
3.1500 
12.0000 
3.7500 
5.4500 
2.5500 
3.4500 
Std 
0.5871 
2.6754 
1.4096 
1.1910 
1.0501 
0.9445 

Whale Optimisation Algorithm 
Mean 
4.3500 
9.9500 
3.6000 
3.5400 
9.8000 
2.2000 
Std 
0.7452 
2.4810 
0.5525 
1.5832 
2.0417 
0.4104 

Particle Swarm Optimisation 
Mean 
3.9000 
8.7000 
3.4500 
4.1000 
8.7000 
2.1400 
Std 
0.7182 
2.7357 
0.6387 
1.5927 
1.9762 
1.0990 

SMOTS 
Mean 






Std 





0.3660 








AHPSOM 
Mean 
3.0250 



 
 
Std 
0.0196 
0.4640 


 
 

ACICA 
Mean 

5.7800 
 
5.7800 
 
 
Std 


 
0.2300 
 
 

ACDCSA 
Mean 

5.2000 
2.9000 
2.95 

 
Std 

0.951 
0.307 
0.223 

 

ACMeanABC 
Mean 

 
 
3.1000 
 
5.0240 
Std 

 
 
0.2510 
 
0.3880 

SMOTS 
Mean 

5.9000 

3.1500 
3.1500 

Std 

1.1400 

0.4300 
0.3660 









Differential Evolution (DE) 
Intra cluster (Mean) 
0.3982 
48.0902 
53.1425 
0.0860 
5.1778 
0.8341 
Std 
0.0913 
17.6914 
30.6873 
0.0268 
0.4289 


Inter cluster (Mean) 
4.7895 
1261.7495 
897.7026 
1.2461 
79.3529 
10.1480 

Std 
0.0858 
311.5889 
288.8303 
0.1126 
4.5882 


Genetic Algorithm (GA) 
Intra cluster (Mean) 
0.4224 
37.5000 
75.0906 
0.0689 
6.4918 
0.8231 
Std 
0.1360 
13.7171 

0.0248 
3.6653 
0.1825 

Inter cluster (Mean) 
4.5957 
1094.7462 
1097.3939 
1.1239 
81.8145 
10.3337 

Std 
0.5913 
157.3490 

0.1274 
1.9317 
1.3644 

Grey wolf Optimisation (GWO) 
Intra cluster (Mean) 
0.4616 
37.0000 

0.1447 
5.1608 

Std 
0.5170 
10.8094 
3.2551 
0.0413 
3.5234 
0.1202 

Inter cluster (Mean) 
4.3229 
946.5393 
462.5302 

76.9858 
8.8522 

Std 
0.8377 

142.9202 

9.4289 
1.5236 

Whale Optimisation Algorithm (WOA) 
Intra cluster (Mean) 


72.8941 
0.0569 
7.2322 
0.7700 
Std 
0.1317 
13.5336 
9.8233 
0.0157 
3.1553 
0.2119 

Inter cluster (Mean) 
4.0508 
1044.9741 
1102.6403 
1.0826 
81.3685 
9.1789 

Std 
0.8308 
181.5115 
23.4626 
0.1109 
2.1913 
1.2962 

Particle Swarm Optimisation (PSO) 
Intra cluster (Mean) 
0.4036 
47.5000 
63.6901 
0.0778 
5.4996 
0.6521 
Std 
0.0735 

23.4802 
0.0271 
0.9212 
0.2131 

Inter cluster (Mean) 
4.6951 
1064.0085 
947.5021 
1.1611 
80.9664 
10.3074 

Std 
0.3145 
175.5770 
317.1571 
0.1670 
2.7881 
1.6285 

SMOTS 
Intra cluster (Mean) 
1.6401 
51.8271 
73.2543 


0.9101 
Std 

28.8180 
24.5846 


0.2901 

Inter cluster (Mean) 



1.1792 



Std 

375.9623 
89.275 
0.0012 

0.5470 
TMKGSO produced better results but SMOTS give zero standard deviation in iris data set with exact number of clusters. For vowel data set SMOTS produce very less value compare to previously published algorithms. In seed data set proposed algorithm give better results compared to previously published algorithms. For thyroid data set SMOTS produced minimum of intra cluster distance compare to other algorithms. ACDCSA produced better results in wine data set. Proposed algorithm perform better for
SMOTS produced maximum of inter cluster distance compare to previously published algorithms with exact number of clusters in iris data set. In vowel data set AHPSOM produce better results. In seed data set TMKGSO give better results and SMOTS produced second best result. SMOTS produced significant results compare to other algorithm in thyroid data set. For wine data set TMKGSO produced better result. In








AHPSOM 
Intracluster(Mean) 
0.644 
171.6 
 
 
 
 
(Std) 
1.17E16 

 
 
 
 

Intercluster (Mean) 
5.648 

 
 
 
 

(Std) 
1.939 

 
 
 
 

ACDCSA 
Intracluster(Mean) 
1.1642 
251.592 
2.1115 
32.1426 

 
(Std) 
0.4166 
47.6612 
0.3557 
3.8762 
0.0802 
 

Intercluster (Mean) 
4.9521 
2358.521 
7.3545 
47.7413 
1.6298 
 

(Std) 
1.4926 
472.9768 
2.3824 
12.9715 
0.6031 
 

TMKGSO 
Intracluster (Mean) 

 
3.0182 
 
28956.4 
0.1330 
Std 

 

 



Intercluster (Mean) 
3.2299 
 

 

0.0516 

Std 

 

 

0.0040 

Black Hole kMeans 
Intracluster (Mean) 
6.998 
 
 
 
48.954 
 
EOAKmeans 
Intracluster (Mean) 
33.8800 
 
 
 
 
 
Std 
41.2310 
 
 
 
 
 

SMOTS 
Intracluster(Mean) 
1.6401 



73.2543 

(Std) 

28.8179 
0.2901 

24.5846 
0.0118 

Intercluster (Mean) 

1303.349 
11.4064 

1402.352 


(Std) 

375.9623 
0.5470 

89.275 








DE 
Iris 
0.0196 
53.3381 
1.01005 to 1.08975 
< 0.0001 
HSS 
DE 
Vowel 
112.02 
0.37135 
185.179 to 268.379 
> 0.10 
NSS 
WOA 
Wine 
21.1766 
14.15 
256.842 to 342.581 
< 0.0001 
HSS 
GWO 
Ecoli 
0.0170 
23.797 
0.370181 to 0.439019 
< 0.0001 
HSS 
GA 
Thyroid 
0.5105 
4.0498 
1.03413 to 3.10127 
< 0.001 
HSS 
GA 
Seed 
0.3372 
3.1808 
0.390007 to 1.75539 
< 0.01 
HSS 
To validate the performance of the SMOTS algorithm, an unpaired ttest is performed on the obtained value of the inter cluster distance. An unpaired ttest is applied to find the best algorithm based on the mean inter cluster distance between the best and secondbest algorithms. To calculate Confidence Interval (CI), 20 data sizes and 95% confidence level are considered for both algorithms. The significance level of SMOTS compared to the secondbest algorithm can be predicted by the confidence interval and twotailed pvalue of the ttest. If P<=0.01 It shows HSS: Highly Statistically Significant results, If P< = 0.05, It shows SS: Statistically Significant results. If P > 0.10, it shows NSS: Not Statistically Significant results. From
In the proposed work to improve the balance between exploration and exploitation the local leader phase of SMO is updated with tabu search algorithm. Further, the compact separated index with Gaussian punishment function is used as a fitness function and applied to automatic clustering. To validate the proposed method, it is compared with 5 wellknown and 7 recently published algorithms. Results show that SMOTS algorithm produced 100%, 33.33%, 83.33% accurate results on cluster optimality, intra and inter cluster distance respectively in comparison with wellknown algorithms. In comparison with recently published algorithms on six data set SMOTS produced 50%, 66.66%, 50% accurate results on cluster optimality, intra and inter cluster distance respectively. The hypothesis testing results shows that pvalue of the ttest is less than 1% except vowel data set means SMOTS is highly statistical significant compare to second best algorithms. In future proposed algorithm can be modified with new search strategies, parallel computing, and multiobjective function for more accurate results. It may be applied to image segmentation, optimisation problems, medical datasets, etc.