• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2016, Volume: 9, Issue: 46, Pages: 1-10

Original Article

Proposed Discriminative Lexical Features for Real-time Detection of Malware Uniform Resource Locator

Abstract

Objective: To identify discriminative lexical features of malware URL through manual examination, and to study prevalence of these features thereby leading to proposition of discriminative lexical feature for real-time detection of malware URL. Methods/Statistical Analysis: Manual examination of malware URL using existing blacklist of malware URLs and empirical analysis allowed the authors to identify discriminative lexical features and to determine whether there is consistency in the way the attackers craft malware URLs respectively. Empirical analysis was carried on both the existing blacklisted malware URLs and newly collected malware URLs. Empirical analysis revealed that there is consistency in the way malware URLs is crafted by the attackers. To evaluate performance of our proposed lexical features, two previously used machine learning models were applied on our trained dataset of malware URLS and benign URLs. The essence of using these models is to enable us compare performance of our proposed lexical features with previous studies proposed feature groups. Our comparison shows that our proposed lexical features outperform previously proposed feature groups. Findings: Our first step was to manually examine blacklisted malware URLs. This step led to the identification of 12 discriminative lexical features which was later reduced to 11. The second step was an empirical analysis of the identified features of existing blacklisted malware URLs and newly collected malware URLs. Empirical analysis was carried out to determine whether there was consistency in the way malware URLs are crafted by the attackers. The results of our empirical analysis revealed that there is indeed consistency in the way malware URLs are crafted by the attackers. This implies that our carefully identified lexical features are common features of malware URL. After experimentation, the evaluation results reveal that our proposed lexical features outperform previously proposed feature groups. Applications/Improvements: Discriminative features are required to build real-time malware URLs detection system with machine learning algorithm. The proposed lexical features are set of discriminative feature that rely on textual properties of malware URL.

Keywords: Attackers, Blacklist, Lexical Features, Malware URL, Rea-time Malware URL Detection

DON'T MISS OUT!

Subscribe now for latest articles and news.