ABSTRACT
Natural Language Processing can help the Human Resources department in the recruitment process and focus on more promising candidates in today’s globally connected and competitive marketplace, which is propelled by the explosion of digital information. HR or People Analytics apply sophisticated data science and machine learning to help organizations manage their people practically and efficiently. The flood of resumes can be onerous and time-consuming and the task of sorting through the pile can be downright profusely tedious for small-business owners and even for large organizations with robust and fully-fledged human resources departments.
Natural language processing (NLP) takes text analysis to the much higher level of detail, correctness, granularity, and reliability. It evolves from generic text analytics (via sentiment analysis) and goes to advanced insights (via computational linguistics models) and can even include potential semi-automation. Once the information is extracted, a ranking score is calculated based on the matching skillsets. The score describes how well the candidates fit based on their education, work expertise, skills and other alternative requirements.
Save your time!
We can take care of your essay
- Proper editing and formatting
- Free revision, title page, and bibliography
- Flexible prices and money-back guarantee
Place an order
It is not the case that Natural Language Processing systems or Artificial Intelligence systems can replace HR but these systems further empower HR personnel within their organization by enabling them to make a far better data-driven decision. The various already existing tools for equipping the recruiters with artificial intelligence solutions are- Censia-People Intelligence for the Modern Talent Acquisition World, Mya -an AI Recruiting Assistant. However, these tools need an intensive training with huge amount of data. The proposed system tries to bridge the domain knowledge gap that HR Personnel have in the field they are recruiting.
INTRODUCTION
People analytics is also known as HR analytics or talent analytics. It refers to the method of analytics that can help organizations make strategic decisions about their employees or workforce.
People analytics aims at collecting, managing and leveraging predictive analytics to create actionable insights from their data. It helps managers and executives to make smarter, more strategic and more informed talent decisions related to their workforce. With people analytic, organizations can find better applicants, make smarter hiring decisions, and increase employee performance and retention. People Analytics apply sophisticated data science and machine learning to help organizations more efficiently and effectively manage their people. HR teams have a lot of catching up to do in leveraging these people analytics — what data to track, analyze, manage protect and to gain insights to predict future behavior.
LITERATURE REVIEW
“When digital transformation is done right, it’s like a caterpillar turning into a butterfly, but when done wrong, all you have is a really fast caterpillar.” – George Westerman
The research work in the field of Natural Language Processing and People Analytics have been increasingly addressed in the recent years. The Literature Survey culminates to the fact that the HR is underutilizing data analytics in workforce planning ,which is a very active area of research and development. Organizations are sitting on ginormous amount of rich employee data; it is enormously underutilized. Most companies lack good analytics at their disposal to make sense of it and draw insight from it. The absence of the right analytical resources and data is often cited for slow adoption of data driven strategic techniques for workforce planning. Data analytics can help HR become more objective, prudent, tactical and strategic.
The sole reason for the underutilization of these data driven techniques is because of the fact that most machine learning models can’t process raw text. It requires an intense usage of Natural Language Processing techniques. Natural Language processing is considered a difficult problem in computer science due to its ambiguous and imprecise characteristics of natural language.
NLP can reduce time-to-hire by automating pre-qualification using text analytics to provide advanced insights on employee or prospective employees. It can also help the organizations to strategically allocate employees to a particular project.
It is not just a revolutionary change but also an evolutionary phenomenon. In addition to digitization, the transition to business with technology analytics at its center also entails a shift in the culture and mindset of an organization and having a data driven approach.
RESUME CLASSIFICATION USING TEXT CLASSIFICATION
HR has been using Boolean keyword searches for identifying good resumes or job applications for a long time already which often results in imprecise, unpredictable and humorous results. The level of detail, granularity, and accuracy is not so high in existing systems. There is not much focus on advanced NLP to HR processes yet.
Like early machine translation - the sorting algorithms these platforms used were quite ineffective as the text processing needed to be at a higher level then morphological analysis. Say an applicant uses “Supervised Machine Learning” in his resume and the HR personnel is filtering the candidates based on “Data Mining”: His resume wouldn’t show in results and chances are pretty high that a creative candidate with a more specialized knowledge in a domain might get overlooked. The various already existing tools for equipping the recruiters with artificial intelligence solutions are- Celsia-People Intelligence for the Modern Talent Acquisition World, Mya -an AI Recruiting Assistant. However, these tools require an intense data training with an existing dataset and do not try to bridge the domain knowledge gap that the HR personnel have with respect to the field that they are recruiting.
Natural language processing (NLP) takes text analysis to the much higher level of detail, granularity, scalability and accuracy. Most HR business engagement generates high volumes of natural language and raw text which is unstructured data which can be utilized to gain valuable insights.
Recruiters can get deeper insights from candidate profiles with the help of resume filtering without the domain knowledge of the field in which they are recruiting. They can validate why and how the application ranked them and can even alter it as per their needs. It is not the case that natural language processing systems or Artificial Intelligence systems can replace HR but these systems further empower HR personnel within their organization by enabling them to make a better data-driven decision.
METHODS AND APPROACHES
Today’s systems move afar exact keyword match or simple searching techniques. Extrapolating new terms keeps qualified candidates from being at disadvantage. The proposed system tries to bridge the domain knowledge gap that the HR personnel have with respect to the field that they are recruiting.
We focus on building a Resume classifier that classifies or filters the resumes that are suitable for a job profile using Semantic Natural Language Processing. The main focus is on automated or semi-automated building of a Knowledge Base that searches the internet and collects the terminologies and the skillsets that are related to the query given by the HR personnel.
The goal of the system is to enable the HR managers need to accurately design recruitment plans so that the investment made will be fully deployed at recruiting the right person for the right position. If an unsuitable candidate is recruited then it will end up costing more money to the organization as they have to do the whole process once again wasting and misspending time and resources of the organization. Hence HR or People Analytics acts a strategic and a prudent element for recruitment and selection purpose in any organization. It is essential for the People Analytics to take a leap forward by combining precise data from the real and the virtual world. People Analytics is poised to a transformation and revolution, the exponential growth of data being the catalyst.
The proposed system successfully uses People Analytics to leverage talent strategically to meet the business objectives. There are several market dynamics driving the need for such an automated data driven model which includes the increased competition for talent ,decreased employee loyalty etc.
DATASET PREPARATION USING WEB SCRAPING
The various resumes to be analysed and classified are taken from LinkedIn and other sources. The knowledge base is created via Web Scraping of websites like Wikipedia Web Scraping (also termed Screen Scraping, Web Data Extraction, Web Harvesting etc.) is a technique employed to load and extract large amounts of data from websites based on the requirements whereby the data is extracted and saved to a local file in your computer or to a database in table (spreadsheet) format.
Modules/Libraries used
- Beautiful Soup 4: Beautiful Soup (BS4) is a parsing library that can use different parsers. A parser is solely a program that can extract data from HTML and XML documents.
- LXML is the most feature-rich, high-performance, production-quality extensive library written for parsing XML and HTML documents very quickly, even handling complex tags in the process. .
- Requests and Selenium for fetching HTML/XML from web pages with speed and readability. Then, Beautiful Soup or LXLM can be used to parse it into useful data.
FEATURE EXTRACTION
The next step is the feature engineering step wherein raw text data will be transformed into feature vectors and new features will be created using the existing dataset as most machine learning algorithms cannot take in raw text. Instead we need to perform feature “extraction” and pass numerical features to machine learning algorithm. Different ideas are to be featured in order to obtain relevant features from our dataset.
Count Vectors as features: Count Vector is a matrix notation of the dataset in which every row represents a document from the corpus, every column represents a term from the corpus, and every cell represents the frequency count of a particular term in a particular document.
TF-IDF score represents the relative significance of a particular term in the document and the entire corpus. TF-IDF score is composed by two terms: the first computes the normalized Term Frequency (TF) which is the raw count of the term t in the document d and the second term is the Inverse Document Frequency (IDF), which reduces the weight of the terms that occur very frequently in the document set and magnifies the weight of the terms that occur . IDF is computed as the logarithm of the number of the documents in the corpus divided by the number of documents where the specific term appears. It is a logarithmically scaled inverse fraction of the documents that contain the word.
TF(t) = (Number of times the specific term t appears in a document) / (Total number of terms in the document)
IDF(t) = loge (Total number of documents in the document corpus)/ (Number of documents with the term t in it)
TF-IDF vectors can be generated at distinct levels of input tokens (words, characters, n-grams, sentence).TF-IDF enables us to understand the context of the words across the entire corpus of documents, instead of just the relative importance in a single document.
- a. Word Level TF-IDF: Matrix representing TF-IDF scores of every term in various documents.
- b. N-gram Level TF-IDF: N-grams are the combination of contiguous sequence of N terms together. This Matrix representing TF-IDF scores of the N-grams.
- c. Character Level TF-IDF: Matrix representing TF-IDF scores of character level n-grams in the corpus.
WORD EMBEDDINGS
A word embedding is a set of language modelling and feature learning form of representing words and documents using a dense vector representation. The position of a word or phrase within the vector space is learned from text and is based on the words that surround the word when it is used and are mapped to vectors of real numbers. Word embeddings can be trained using the input corpus itself or can be generated using pre-trained word embeddings such as Glove, Fast Text, and Word2Vec.
There are four essential steps to use pre-trained words:
- Loading the pretrained word embeddings.
- Creating a tokenizer object.
- Transforming text documents to sequence of tokens which are the basic building blocks of document object. Everything that helps us to understand the meaning of the text is derived from the tokens and their relationship with one another.
- Create a mapping of token/words/phrases and their respective embeddings.
JACCARD SIMILARITY
The Jaccard similarity index (sometimes called the Jaccard similarity coefficient) compares members for two sets and checks which members are shared and which are distinct. It is a measure of similarity for the two sets of data, witha range from 0% to 100%. The similarity of the two sets would me more if the percentage is higher.
MODEL BUILDING
The final step in the text classification framework is to train a classifier using the feature vectors and new features created in the previous step of feature engineering. There are many different choices of machine learning models which can be used to train a final model. The following different classifiers are implemented for this purpose:
- Naive classification technique based on Bayes’ Theorem with an assumption of independence among predictors.
- Linear Classifier: Logistic regression
- Support Vector Machine (SVM) is large margin classifier. It is a supervised machine learning algorithm that determines the best decision boundary between vectors classifying whether they belong to a given category and do not belong to it.
- Bagging Models for text classification involves a “bootstrap” procedure for model generation ie text classification and prediction using the Bag of Words approach
- Boosting Models apply ensemble method (meta algorithms that combine several machine learning algorithms) Ensemble learning methods decrease variance, bias, or improving the model predictions of any given learning algorithm.
- Neural networks are a set of algorithms, which try to mimic human brain and recognize patterns. They try interpret sensory data through a moderate degree of machine perception, labeling or clustering raw input.
CONCLUSION
NLP is a recent area of research which has not been efficiently used by HR domain. NLP and ML based transformation of the HR Department will help the HR departments in managing the various operations such as recruiting and workforce planning in a more strategic and efficient manner. NLP simplifies the task of HR domain researching into the domain that they are recruiting. With the help of NLP, an efficient knowledge base can be created with the semantic analysis which can be further used to classify resumes/ applicants. The organization can thus ensure that right candidate has been hired for the right position or job profile.
REFERENCES
- Bruce Fecheyr-Lippens, Bill Schaninger, and Karen Tanner “Power to the new people analytics”
- David Angrave, Andy Charlwood, Ian Kirkpatrick, Mark Lawrence and Mark Stuart “HR and analytics: why HR is set to fail the big data challenge”
- Julia Hirschberg, Christopher D. Manning (2015) “Advances in natural language processing”
- Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson F. Liu, Matthew Peters, Michael Schmitz, Luke Zettlemoyer (2018) “AllenNLP: A Deep Semantic Natural Language Processing Platform”
- Ronan Collobert, Jason Weston (2008) “A unified architecture for natural language processing: deep neural networks with multitask learning”
- Tom Young, Devamanyu Hazarika, Soujanya Poria, Erik Cambria “Recent Trends in Deep Learning Based Natural Language Processing”
- Weena Yancey M Momin, Kushendra Mishra “ HR Analytics as a Strategic Workforce Planning”