The contemporary business setting has become highly competitive and volatile. This setting has made the staffing of the right people more critical than ever before. However, most business firms have a limited staffing budget to invest in Human Resources (HR) strategies and functions i.e. staffing. Staffing is an important function of HRs unites that aim to select the most competent and qualified applicants suited for job vacancies. Furthermore, the shrinking job market is linked to the higher volume of job seekers. This context has increased the complexity of the staffing function associated with HRs units. This function requires a precious assessment and selection procedures to produce the shortlisted candidates by matching the job requirements (experience, skills, knowledge, qualifications, etc.) with the applicant's profile
In the staffing process, the task of shortlisting a few numbers of applicant's profiles can be easy; while the task becomes harder when assessing too many applicant's profiles through their written Curriculum Vitals (CVs). Thus, filtering a huge number of CVs becomes a time and cost challenges; further, it needs to be objective and free of prejudice
This study attempts to enhance the performance of HR’s staffing function by providing an intelligent approach that allows a convenient assessment and selection procedures. This approach aims to handle the contradiction between the panel members and the uncertainty involved in decision making while staffing. The proposed approach mainly uses Data Mining (DM) and Machine Learning (ML) to develop and train an intelligent framework by learning the behavior of the staffing panel members in assessing and selecting applicants for specific job requirements. It utilizes fuzzy logic to mitigate the decision’s uncertainty and provide an objective mechanism for ranking the applicants’ profiles for the nest selection phase.
According to Han et al. (2011), DM is a field in computer science involving AI and ML and refers to extracting knowledge from huge amounts of raw data, where we can extract an interesting novel pattern from rules that are found in the data. DM algorithms deal only with structured data (database)
Text categorization is the process of document classification into predetermined labels (Ghosh, Roy, & Bandyopadhyay, 2012), while text clustering is the process of dividing a set of documents into similar groups. Unlike text categorization, text clustering analyzes the documents without knowing labels. It is considered a supervised technique while text clustering is an unsupervised technique.
There is confusion between the two fundamental concepts of Information Retrieval (IR) and Information Extraction (IE). IE is the process of detecting and obtaining predefined information from unstructured data, then converting it into structured data automatically
Jayaraj and Mahalakshmi (2015) proposed two new algorithms. The first one is called Information Retrieval Configuration File (IRCF), which was utilized to extract the required information from resumes. This algorithm had several steps: Identifying the type of document (word, pdf, excel); the configuration file will be created; the document reader will be selected to read the resume line by line. The problem with this algorithm is that for every condition, the document reader will read the configuration file to extract the required information
The second algorithm, called weighted ranking, was employed to rank the resumes. The algorithm ranked the resumes based on the candidate’s education level, technical Skills, general Skills, experience and age. Actual Resume Relevancy (ARR) was used to evaluate the performance of an IR system, where ARR gave results of 69.6%. This study did not provide any performance measurement for information extraction processes such as precision or recall. According to Jayaraj,(2015), this assumes that the total years of experience are found in the resumes. However, in some resumes, the candidate does not write this, so we have to take into account the calculation of the candidate's total years of experience.
Yu, Guan and Zhou (2005) proposed a new hybrid model to extract information from Chinese resumes. This model had two steps: the first step was dividing resumes into several sections (blocks) using the Hidden Markov Model (HMM). The second step was the IE from resumes, where HMM was used to extract educational information such as graduate school, degree and major, while Support Vector Machines (SVM) were used to extract personal information such as name, birthday, address, phone, mobile, and email. Also, it was used to enhance the process of personal information extraction. The experimental results carried out on 1200 Chinese resumes showed that the precision was 86% and recall 76% of the personal information and showed the precision of 70% and recall 76% of the educational information
Kopparapu (2010) proposed a new system to extract information from resumes automatically. Also, the system provided a search engine for resumes. Where regular expression technique was used to extract six fields that include qualification, experience, skills, age, name and email. The experimental results carried out in 100 resumes showed that the precision was 87% and recall 71 %
Jiang, Zhang, Xiao and Lin (2009) proposed a new model to extract 18 different pieces of information from Chinese resumes, where resumes were divided into two main blocks: the basic information and complex information. The basic information includes name, gender, birthday, address, mobile phone, email, while complex information includes education, work experience, project experience, award and skills. The regular expression technique was used to extract the basic information where the accuracy was 87%, while SVM and regular expression techniques were used to extract the complex information where the accuracy was 81%. The accuracy of the whole model was 84%
Chuang, Ming, Guang, Bo, and Zhiqing, (2009) proposed a new system to extract information from Chinese resumes. The system had three modules: segmentation module, information extraction module, and feedback control module. The system divided resumes into small classes using a segmentation algorithm to prevent information overlapping. Then, SVM and regular expression were used to extract the required information from each class. The experimental results carried out on 5000 resumes, where 2000 resumes were used as training data and 3000 were used as a test sample, showed that the accuracy was 84.65%. Should mention here that the system repeats the previous steps to extract the required information from each class and that takes considerable time
The staffing process has become more important in the last decades as the business environment became volatile and highly competitive, which has made HR to become more valuable for organizations. The processes of assessing and selecting candidates for a job position are the key elements in staffing decisions. They allow producing the shortlisted candidates by matching their profiles with the job requirements such as experience, skills, knowledge, qualifications, etc. However, performing this task with a huge number of CVs is a time, cost and objectivity issue that needs to be solved by initiative solutions based on intelligent information systems. Unfortunately, HRs still lack such a system to manage the process of staffing to save their time and cost and avoid the panel members' contradiction, uncertainty, and subjective decisions. Therefore, this paper fills the technological gap in HRs staffing function and provides an intelligent approach to enhance the performance of HR’s staffing function.
This study aims to improve the performance of HRs staffing function by providing convenient assessment and selection procedures. It adopts a fuzzy-based adaptive intelligent framework for shortlisting candidates based on job specifications. This requires satisfying the following objectives :
Exploring the current methods, techniques, tools and issues in the assessment and selection process.
Providing a labeled dataset has been created for this purpose and could be used by other researchers who are interested in the same field in the future.
Proposing a fuzzy-based intelligent adaptive approach to improve the assessment and selection of the right candidates and the way of filtering the CVs in the organization.
Evaluating the accuracy of the proposed approach.
This research is carried out to answer the following questions:
What are the current solutions used for supporting the process of assessment and selection in staffing?
How can an intelligent fuzzy-based framework enhance the assessment and selection process?
What is the performance accuracy of the proposed framework?
The scope of the current research paper is limited to the following:
It uses open-source and freely available technologies.
It focuses on improving the assessment and selection processes for staffing by smart filtering the CVs in an organization.
It handles CVs written in the English language only.
In order to achieve the aim and objectives of the research, a multi-step methodology has been applied. It exploits the ML approach where the proposed model is iteratively developed and trained. Thus, the methodology involved five steps to build the proposed approach, which are: problem identification and definition; proposing the approach to address the research problem; collecting data; and training and validating the proposed approach; evaluating the accuracy of the proposed approach. The proposed framework is an adaptive intelligent recommended approach consists of two main levels as shown in
Job description usually contains a different job category which had been the different weight of importance between panel members and the job itself. The assessment panel members define the job categories and the weights out of the full mark on the job as per appendix. The unified weighting mechanism proposed in this framework provides a means to resolve this potential contradiction by taking the average weight of the giving weights for each job spec.
As the CVs are scored by different people there a potential contradiction between the member’s scores for each CV. In order to handle this contradiction a contradiction fuzzy-based mechanism was developed as follows:
Each assessment panel is asked to provide a score for each applicant for each job category.
For each panel member, the scores are linguistically labeled using fuzzy sets as shown in
The linguistic labels of all staffing committee members for each candidate are evaluated against a common-sense fuzzy rule set to identify the final label (L, M, H) for each candidate for each job category. The rule set is shown in
E1* |
E2** |
E3*** |
→ |
Final decision |
L |
L |
L |
→ |
L |
L |
L |
M |
→ |
L |
L |
L |
H |
→ |
M/L |
L |
M |
L |
→ |
L |
L |
H |
L |
→ |
M/L |
L |
M |
M |
→ |
M |
L |
H |
H |
→ |
H |
L |
M |
H |
→ |
M |
L |
H |
M |
→ |
M |
M |
M |
M |
→ |
M |
M |
M |
L |
→ |
M |
M |
M |
H |
→ |
M |
M |
L |
M |
→ |
M |
M |
H |
M |
→ |
M/H |
M |
L |
L |
→ |
L |
M |
H |
H |
→ |
H |
M |
H |
L |
→ |
M/H |
M |
L |
H |
→ |
M/H |
H |
H |
H |
→ |
H |
H |
H |
L |
→ |
H |
H |
H |
M |
→ |
H |
H |
L |
H |
→ |
H |
H |
M |
H |
→ |
H |
H |
M |
L |
→ |
M/H |
H |
L |
M |
→ |
M/H |
H |
M |
M |
→ |
M |
H |
L |
L |
→ |
M/L |
These results of linguistic label representation are the candidate’s scores and the final judgment of panel members. The proposed framework was constructed using filtering the dataset of CV to have the nominated CV for an interview. In the dataset, each candidate is represented as a set of weighted terms which represent their strength compared with other candidates. More precisely, the following steps were followed to develop the dataset:
A set of job description JR which led to determine the job requirements were selected.
A set of applicants A were selected and for each applicant An ∈ A.
After identifying the sets ΩAn in step 3, the CV information in the dataset was pre-processed and transformed into a set of candidates through a weighting system to define the strength of his CV.
The frequency measures: first status (include the resident place), second status (including the year of experience), third status (including knowledge and competencies), fourth status (including skills and abilities) of each candidate term were calculated based on each set ΩAn and used as inputs to a fuzzy system. These frequency measures were used to calculate the term frequency in CVs collection.
The clear values of the four input variables (first status, second status, third status and fourth status) were mapped to sets of predefined fuzzy sets which have three linguistic labels (High, Moderate, Low) as per
Member function center of gravity has been used to fuzzified the value to calculate the linguistic label.
Some job category builds on numeric as “years of experience” which is easy to match the score for it. However, in some other job categories, the value accepted for it is “text” which means the need to use retrieving and matching the keywords.
In order to retrieve information from a candidate’s CV the framework proposed the following six steps to retrieve the information needed and save it in a separate dataset and this done by a system:
Scan the database to check the presence of the skills and abilities of information fields.
Mark the answer as per the number of words specified appearance on the applicant’s answer.
Store a set of keywords as defined by the recruiter on the database.
Retrieve the stored keywords from the database and sets them in a list.
Find common words between two lists or arrays using the function intersect.
Save results in a database.
Let’s say that
The inputs and output values were mapped to predefined fuzzy sets with the linguistic labels “Low”, “Medium” and “High” based on the Mendel Wang method described in (Wu, Mendel, and Joo, 2010). As shown in
The outcome of this step was a set of antecedents and consequences also called ‘if-then’ fuzzy rules where each of the inputs represented by the associated linguistic label {“Low”, “Moderate”, “High”} and outputs represented by the associated linguistic label as shows in
A K-fold of five folds (k=5) training and validation method was applied in order to select the rule set that achieves a high level of accuracy in classifying the relevance between the categories of job requirements and the applicant's answers. This method included the following substeps:
Dataset partitioning: The fuzzy rules set which resulted from 3.3.2.1 “Fuzzy rule extraction” was partitioned into five equal-sized folds and for each fold will repeat steps 2, 3, 4 were carried out.
Training rules set selection: The training rules set was selected by holding out the current fold (
Compression of Fuzzy Rules: A rule compression was performed on the fuzzy rules in the training set
The confidence of a rule is a measure of a rule’s validity representing the strength of a unique rule pattern against contradictory rule patterns
Calculation of Scaled Rule Weights: the product of the scaled fuzzy support and confidence of a rule was used to calculate the rule’s scaled fuzzy weight as shown in equation 4. In this step, the unique rule patterns resulting from the previous step were weighted by calculating a scaled fuzzy weight for each of the patterns. The scaled fuzzy weight was calculated as a multiplication of the scaled fuzzy support and the scaled confidence.
The scaled fuzzy weight
The developed fuzzy rule-based system was validated using a well-known validation method called k-fold cross-validation (
In order to train and validate the fuzzy rule-based system, a 5-fold cross-validation was applied. The dataset was partitioned into five-folds, representing 20% of the dataset. And then at each iteration, one of these subsets was held out and the system was trained on the other folds representing the remaining 80% of the dataset to extract a set of weighted fuzzy rules. The resulting fuzzy rules of the training process were used to build a fuzzy system that classifies the relevancy of each instance in the hold-out data. The resulting relevancy classifications were compared with the associated linguistic labels of the predicted documented relevancy values from the linear predictive model.
In this method a dataset
In order to train and validate the fuzzy rule-based system, a 5-fold cross-validation was applied. The dataset was partitioned into five-folds, representing 20% of the dataset. And then at each iteration, one of these subsets was held out and the system was trained on the other folds representing the remaining 80% of the dataset to extract a set of weighted fuzzy rules. The resulting fuzzy rules of the training process were used to build a fuzzy system that classifies the relevancy of each instance in the hold-out data. The resulting relevancy classifications were compared with the associated linguistic labels of the predicted the priorities of applicants from the linear predictive model as shown in
The resulting fuzzy-based approach includes (76) rules to be applied during the testing phase. The dataset contained 414 CVs, which mean 83 CVs on each k; so, the process was repeated five times for the five different folds, and for each fold, the accuracy was calculated for the classifier.
Fold K |
Accuracy |
1 |
71% |
2 |
84% |
3 |
79% |
4 |
73% |
5 |
81% |
Average |
77.6% |
Results in
The proposed framework is an adaptive fuzzy based intelligent system, which proves its ability to filter out the best fit candidate using ML approaches. It improves the HR staffing task in terms of assessing and selecting the best-fit candidates using a dynamic weighting schema to enhance the classification accuracy. It consists of five components: unified the weighting approach for each job category, extract information from the dataset, applying text mining technique, fuzzy rules extraction, and best fuzzy rules set selection. In comparison with other approaches presents by other researchers, this approach presents a fuzzy-based approach while other approaches such Jayaraj and Mahalakshmi, Yu, Guan and Zhou, and Saxena adapted other classifiers like NLP, SVM, IRCF, ARR and LP. The proposed framework accuracy reached 77.6%; while a significant number of researches presented frameworks with accuracy between 71% and 87%. They generally adapted how to extract information from CVs with several techniques for doing so; however, this research uniquely extended to evaluate the information and how it is useful for HRs especially in the assessment and selection for staffing.
For future work, this research opens up the opportunities to conduct new researches including more job descriptions, as this study was designed based on the framework of administration, accountant, and IT job description.