Using NLP on Job Descriptions

in STEMGeeks3 years ago

Rewind

In the previous post, I explained the process of web scraping job descriptions from a certain website. After collecting and cleaning the data, some NLP models were developed in order to detect skills, knowledge, minimum experience and levels (degree) in job descriptions.

image.png

The models

Manual annotation model

After some research, me and my group decided to pursue with a NER model from the spaCy library. To feed the model we had to perform manual annotation by using a free annotation software called docanno. We manual labelled around 200 job descriptions.

Automated labelling model

This model isn't capable of differentiate the different labels (skills, knowledge, minimum experience and levels), basically because it is only taking the skills from a dictionary and matching those with the job descriptions.

Entity Ruler

Very similar to the Automated labelling model, the only difference is that the data isn't trained, so basically it is only looking for the words in the job descriptions, but it is not taking into account the position of the word in the sentence.

Examples

Let's focused on the manual labelling, since it is the one that allows us to see the different labels.

Blockchain Developer

image.png

Web Developer

image.png

Data Scientist

image.png

Final Thoughts

More data annotation would definitely improve the model, we can clearly see, that some entities are still missing in the examples above.
These models can be applied on several use cases, such as: helping HR to tackle the right candidate, helping the job seeker to find his perfect match, etc, it's up to your imagination!

You can find the code on my github:

https://github.com/macrodrigues

Sort:  

Congratulations @macrodrigues! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s):

<table><tr><td><img src="https://images.hive.blog/60x70/http://hivebuzz.me/@macrodrigues/upvoted.png?202112290959" /><td>You received more than 200 upvotes.<br />Your next target is to reach 300 upvotes. <p dir="auto"><sub><em>You can view your badges on <a href="https://hivebuzz.me/@macrodrigues" target="_blank" rel="noreferrer noopener" title="This link will take you away from hive.blog" class="external_link">your board and compare yourself to others in the <a href="https://hivebuzz.me/ranking" target="_blank" rel="noreferrer noopener" title="This link will take you away from hive.blog" class="external_link">Ranking<br /> <sub><em>If you no longer want to receive notifications, reply to this comment with the word <code>STOP <p dir="auto"><strong><span>Check out the last post from <a href="/@hivebuzz">@hivebuzz: <table><tr><td><a href="/hivebuzz/@hivebuzz/pud-202201"><img src="https://images.hive.blog/64x128/https://i.imgur.com/i4ysvke.png" /><td><a href="/hivebuzz/@hivebuzz/pud-202201">PUD - PUH - PUM - It's all about to Power Up!<tr><td><a href="/hivebuzz/@hivebuzz/christmas-challenge-1000-hive-power-delegation-winner"><img src="https://images.hive.blog/64x128/https://i.imgur.com/p7iwfiD.png" /><td><a href="/hivebuzz/@hivebuzz/christmas-challenge-1000-hive-power-delegation-winner">Christmas Challenge - 1000 Hive Power Delegation Winner <h6>Support the HiveBuzz project. <a href="https://hivesigner.com/sign/update_proposal_votes?proposal_ids=%5B%22199%22%5D&approve=true" target="_blank" rel="noreferrer noopener" title="This link will take you away from hive.blog" class="external_link">Vote for <a href="https://peakd.com/me/proposals/199" target="_blank" rel="noreferrer noopener" title="This link will take you away from hive.blog" class="external_link">our proposal!