Indian Journal of Science and Technology
DOI: 10.17485/ijst/2015/v8i32/92127
Year: 2015, Volume: 8, Issue: 32, Pages: 1-12
Original Article
Adekunle Isiaka Obasa1,2*, Naomie Salim1 and Atif Khan1
1 Faculty of Computing, Universiti Teknologi Malaysia, 81310, Skudai, Johor, Malaysia;
[email protected], [email protected], [email protected]
2 Department of Computer Science, College of Science and Technology, Kaduna Polytechnic, P.M.B 2021, Kaduna, Nigeria
Background/Objective:A web forum is a problem-solving online community.Web forum research activitieshave been focused on answer mining with the assumption that the starting post is a question post. This paper proposes methods for mining standard web forum questions. Methods/Statistical Analysis:Popular methods for web forum question post detection are question mark, question words, higher n-grams and sequential pattern mining. These methods have problem of low detection rate and implementation complexity. Implemented in this paper is hybridization of simple bag-of-words model with web forum metadata, simple rule of question mark and question words. Dimensional reduction was performed using chi-square and wrapper techniques. Findings:The quality of web forum question posts varies from excellent to mediocre or even spam. Detecting good question posts is non-trivial. It requires utilization of salient features. Combination of simple rule of question mark and question words with forum metadata performed better than each of the two.Integration of bag-of-words model with simple rule of question marks, question words and forum metadata enhances question post detection. Dimensionality reduction using chi-square were found to perform better than other popular filters like info gain, gain ratio and symmetric uncertain. Applications/Improvements: Three publicly available datasets of varying technical degrees were used for the experiments.The experimental results revealed that an enhanced bag-of-words model can perform better than complex techniques that implement higher N-gram with part-of-speech tagging.
Keywords: Bag-of-words, Forum Metadata, Web Forum,Question Detection, Dimensionality Reduction, Web Forum Question
Subscribe now for latest articles and news.