Everywhere you turn, the talent world is abuzz with the power of machine learning and artificial intelligence. From video interviews that read your facial gestures to recruiting platforms that scrape social information to determine candidate fit, the new technology is having a major impact on how companies source, hire, develop, manage, and engage their human capital.
Technology is amazing!
What is also amazing is how The Predictive Index (PI), across 60 years, has adapted and flourished through nearly every technological revolution. Consider when PI was founded, data was stored on punched paper and computers were the size of houses (and their $120K price tag meant they cost as much as well). For PI, it was paper and pencil assessments, manual scoring, and scientific analyses using only those statistical techniques that could be calculated by hand.
In recent years, PI’s tackled the internet, mobile, social, informal learning, and UX advances in technology, and now we’ve tackled machine learning. That’s right: this 60+ year company is still learning, adapting, and flourishing, and here’s how.
We started with a purpose
Our journey into machine learning started with the need to solve a practical problem. Specifically, the PI has over 24 million Behavioral Assessment™ results, over 1 million Cognitive Assessment™ results, and over 640 thousand Job Assessment™ results in our database. Pretty impressive numbers! But one of the glaring challenges with all that data is we lacked a clear organizing structure for the them. We had no way to easily to draw conclusions or tell stories about how the data differs by jobs and job functions.
Basically, the PI software had asked people to type in the name of their job titles, which meant we had millions of cases of unstructured job function text and no easy way to categorize it for the purposes of data analytics. Our favorite example was “Head of Thailand” which, despite our growing presence globally, was not actually the assessment for the person who runs Thailand. And that was the problem—not only was the text unstructured, but people came up with some very distinct ways to name their jobs.
“We have built the system to continuously learn based on the choices clients make.”
With this problem in hand, we set off on a mission to bring order to the job naming chaos. First, we needed to identify a job taxonomy that would serve as a backbone of our categorization efforts. We evaluated building our own, we researched various models, but we landed on using the O*NET, because it was comprehensive and had the power to provide a lot of potential benefits, like critical skills, work activities, even salary information linked to jobs.
The other major benefit of O*NET is that it is hierarchical, meaning while there are over 800 specific jobs, each job rolls up to a broader group, which rolls up to a minor group, and then finally one of 23 major groups. This means that, as we collect data, we know that we may not have enough data to make meaningful conclusions for specific jobs, but by using the broader categorizations, we still will be able to make informative generalizations. For example, we don’t expect to see many police officers, but with 6,000 clients we will likely see some police, some firefighters, and some corrections officers, so we will at least be able to say something about protective services jobs.
“Our clients will continue to benefit from the toolset we provide to tackle the management challenges of today and tomorrow.”
Next, we hired a machine learning and natural language processing expert to guide the way. While we could articulate our ideal end results, we were absolutely clueless about how to get there. Our expert jumped in and started to guide the process. Some of the most important steps included:
- Identifying target criteria for matches: O*NET tables, including standard job titles and alternative titles, were combined to create a dictionary of target terms. These would become the criteria that would be used from the O*NET side for the match.
- Tokenize clients’ job titles: Clients’ job titles were cleaned, expanded, trimmed, spell checked, lemmatized, stemmed, and checked against a stop list. This process helped standardize the words within the job titles as tokens—words or parts of words that could be used as relevant evidence identification for the client side of the match. It also served to drop irrelevant information from a job title, such as a job’s location or a company’s internal job tracking number.
- Making the match: The tokens from the clients’ job titles were compared to the text associated with each O*NET job in the target criteria tables. Potential matches are generated based on the number of tokens that matched the data associated with each O*NET job. Parameters are set based on desired minimal similarities and the desired number of matches to return to the user.
- Checking our work: Data validation work was done with empirical and subjective methods. Baseline accuracy rates were derived from other similar services. Models were adjusted, stop lists updated, and thousands of rows of output were reviewed to iteratively improve the algorithms until we met an acceptable threshold of accuracy for use in the field.
- Speeding it up: An additional process was built to respond to new job titles quickly through an API call, using a slimmed down version of the full matching process and leveraging prepped target criteria to find a match faster, thus allowing clients to see a list of matches immediately as they fill out job information in the software.
We put it into action
Once the machine learning project was completed, we had a fancy new tool that analyzed unstructured job title text and then provided clients with a list of the best matches out of the more than 800 jobs in the O*NET taxonomy. From there, clients could choose which job match best fits their job, and this simple step allows us to build powerful benchmarks and data stories about the characteristics needed for certain jobs (using Job Assessment™ data), what typical applicants look like for different jobs (using Behavioral Assessment™ data), and the average Cognitive Assessment™ scores for different job functions.
Pumped for some more science?! Learn more about PI’s 60+ years of research and science behind what makes our assessments so powerful.
In a perfect world, we would have loved to build the tool to be even more accurate. In the current version, clients still need to take the step of choosing the job that is the best match, but this is still better than asking them to sort through 800+ jobs. We determined that we could improve the match by asking clients to upload job descriptions, but we also weighed that against the practicality of the request. How many clients would take this extra step? We decided that for our first run, we were close enough to our goals to go live.
“Whoever said you can’t teach an old dog new tricks is wrong.”
At this point, the machine learning and natural language processing solution was built into our software and clients are now required to complete it so that for every job-specific assessment (e.g., mostly hiring situations) we have job titles tied directly to the Job Assessment™, Behavioral Assessment™, and Cognitive Assessment™ data. As our year progresses, we are rapidly collecting benchmark data for all jobs that use our assessments and it won’t be long before we have powerful stories to tell and, eventually, additional intelligence with which to arm our clients.
We learned a lot but have a lot further to go
Starting from scratch wasn’t easy, but the entire experience provided terrific learning for the science, product, and technology teams at PI. We had a problem that needed to be solved. We thoughtfully considered the end results that we needed and worked backward from there. Although we were responsible for deciding on the job taxonomy, the data-scientific mindset of our machine learning/natural language processing expert was critical, as he was able to help build processes to clean up the unstructured text, then experiment with different algorithmic models to get us closer to our desired results. And just because the system is live, doesn’t mean we are done.
We have built the system to continuously learn based on the choices clients make. This means that as clients type in their unstructured titles and select from the list of recommendations, we will see what the most frequent choices are and start training the model to move them up on the list. We also know that the O*NET isn’t perfect, so we will also make small iterative improvements to the taxonomy in a way that makes the most sense to our clients.
We are going again
While we won’t claim to be machine learning or natural language processing experts after our first round, we can say that we have a much better understanding about the power of these technological tools. This understanding translates into the confidence to go forth and conquer another problem. And that is just what we are doing. Our next project is already underway and it is much more ambitious! We don’t want to give away our secrets but let’s just say that we may very well change the nature of job ads forever…or at the least have some fun trying.
Whoever said you can’t teach an old dog new tricks is wrong. We have always been a forward-thinking company and even after 60 years, PI is continuing to grow our services with the most modern technology and analyses. As we expand machine learning and natural language processing as a core competency of our business, our clients will continue to benefit from the toolset we provide to tackle the management challenges of today and tomorrow.