We chatted with our SVP of Science about the importance of scientific validity when it comes to workforce assessments.
In the quest for the right assessment for candidate selection — hiring, onboarding, and/or development — finding a solution that provides meaningful and accurate results is more difficult than it seems. While we assume many assessment companies have made great efforts to ensure their assessments follow general psychometric guidelines, it is an area where many companies provide the least amount of evidence. Validity is arguably the easiest way to provide this evidence. While there are different ways to determine and present validity, the most important component has to do with the degree in which an assessment can be directly related to business impact through peoples’ performance.
Whether you’re a client of an assessment provider, working with a consultant, or still in the market, understanding the science behind a workforce assessment system can have a dramatic impact on the success of your company.
I recently sat down with my colleague, Greg Barnett, Ph.D., I/O psychologist, and SVP of Science at The Predictive Index (PI) to talk about the importance of scientific validity when it comes to workforce assessments.
Greg Barnett has spent 20 years wearing many hats across the workforce assessment and talent world. At PI, he’s responsible for overseeing the strategic direction of our science, the quality and design of our current and future assessments, and developing best practices that provide maximum impact for our consultants and clients. Before PI, Greg was a Senior Managing Consultant at IBM, the Director of Product Development at Hogan Assessment Systems, and the Director of Operations and Product Strategy at Kenexa (now part of IBM).
Q. Why is scientific validity so important when considering which assessment to use?
A: When talking about validity, I think it is important to explain what we are talking about it. One of the key purposes of using any assessment is it does what it’s supposed to do. This may seem painfully obvious, but if the assessment you choose fails to predict meaningful work outcomes (like employee performance) your efforts and investment can be nearly meaningless. The market is flooded with thousands of assessments, many who have a good cover story, so it’s imperative that you ensure the one you choose is going to help you predict or measure against the workplace outcomes you’re targeting.
Q: Are there any caution areas consumers should be wary of when evaluating assessment providers?
A: Often, people ask me to evaluate different assessment vendors. Even with 20 years of experience, I can find it very challenging to get beyond a well-designed website or carefully creative marketing messages in order to tell how the assessment is really built. Without some digging, it’s hard to determine if the assessment will actually do what it claims to do. I often see assessment providers using the right words, but quickly find that they have little to show in terms of their claims. Also, big client rosters don’t always mean a tool is well-built. An extensive client roster is obviously one indicator of how well a provider is established in a market, but the size of the client roster or big-name companies are by no means an indicator of how well-constructed their assessments are.
Q: What are validity studies and why are they so important?
A: Validity studies are scientific research that uncover clear statistical relationships between an assessment and job performance or other critical workplace outcomes. They typically involve getting a sample of current employees, having them complete an assessment or assessments, and collecting performance metrics in the form of objective data (e.g., sales numbers) and subjective data (e.g., supervisor ratings). After cleaning the data and using inferential statics—like correlation and —you can determine the strength of relationships between the factors measured by the assessment and the target work outcomes.
When making an investment in a new assessment, it can often be a good idea to identify workforce assessments that predict across a wide range of jobs, industries, roles, etc. This allows a client to scale the same assessment and its accompanying methodology such as training, how the platform works, how to interpret the reports, etc. across more types of workplace situations and roles. That said, it is also important to figure out if an assessment works for a specific role of interest. It’s all too easy for assessment providers to say they have experience with all job roles when in fact, they really don’t have evidence to support those claims.
Q: When it comes to validity studies, does size matter?
A: I’m not necessarily sure “size” is the correct term, but I prefer to say that breadth and depth matter. Breadth refers to whether the assessment is likely to be valid and predict job performance across different jobs, performance criteria, etc. Depth is more about the ability to provide evidence about how the assessment works with specific roles, situations, or types of performance.
For instance, The Predictive Index can support validity breath because we know our behavioral assessment consistently predicts performance across tons of different sales roles. However, we also know that not all sales jobs are created equal. Therefore, the depth side of validity is about showing that there are important differences between various types of sales jobs. For example, there are some jobs where you need hunters who can function with little detail vs. highly structured sales roles where you need a totally different set of characteristics.
Q: What if I want to run my own validity study for a role at my company?
A: Validity studies can be fairly easy to do and just require understanding some of the basics. I won’t get into all the nitty gritty here, but generally, you want to make sure you have a good sample with a minimum of 40-50 people with at least one year of tenure in the same role. Next, you need to have access to performance data. Some companies track everything an employee does, and there are some roles where performance data is very natural, like sales.
In some companies, there’s very little in terms of tracked performance data. So another form of performance metrics is supervisor ratings like those that come from performance appraisals.
Ideally, you want both the hard data and the supervisor ratings so you can really understand the performance side of the equation. Finally, if you are going to run a validity study, take the time to do the study before you formally start using the assessment.
By doing so you can establish the right stuff to be hiring for and have a great ROI story to share with key stakeholders about the quality of the tool. I see far too many clients wait until they’ve hired a lot of people using the assessment, then when they go to run a validity study there are no good findings because everybody’s the same. In other words, if your entire workforce is highly detail-oriented. If you wanted to see if detail-orientation predicts better accuracy, you don’t have any low-detail orientation people to make a comparison with. PI continues to conduct quite a lot of custom validity studies across many industries.
Q: How much validity does an assessment actually need?
A: Different companies put a different emphasis on the importance of validity. The companies who do the gold standard treat their research as one of their most valuable assets. For example, Hogan Assessments (a company I’ve worked for) has excellent evidence of the validity of their tool. On the other hand, some companies don’t put a lot of value in the tool’s ability to predict meaningful things, so they tend to deemphasize validity or speak about it in very broad terms.
Here at PI, we’ve been quietly differentiating ourselves by conducting hundreds of validity studies over the last 60 years. In fact, as an industrial-organizational psychologist, I would say that PI is one of the best-kept secrets in terms of its validity experience. In fact, over the past 20 years, we’ve conducted over 350 studies including hundreds of different jobs in almost any industry across the globe.
Q: Your team has been putting together a database of all the data captured and analyzed in PI validity studies. What’s the goal of the project?
A: We’re really excited about this project. We call it the Validity Vault. The primary purpose of the Validity Vault is to have a formally organized database that houses decades of research into a searchable, referenceable database so we can easily find and report on the correlations we’ve found between real work outcomes and the data captured by assessments. We’ve literally reviewed every study, every job, every criteria collected, and every correlation, and organized it into one database. There are currently 7,566 correlations between PI Factors and performance criteria in the vault. This is incredibly powerful data we make accessible to our clients. I call it practical validity because it leaves no doubt at all that the PI Behavioral Assessment, for instance, works really well in almost any industry, for any job, anywhere in the world.
Validity allows us to craft a story that is highly relevant and high impact to our clients. It gives us the ability to tell a really compelling story across job performance in specific industries.
For instance, we did three different meta-analytic studies supporting the generalizability of the PI Behavioral Assessment across sales performance, overall performance, tenure, counter-productivity, management roles, call center roles, and sales service roles. It supports that idea that PI predicts across different jobs, but also provides clear evidence that different performance requirements in different jobs absolutely require different PI behavioral patterns.
What’s cool is that this data is further validated because it’s generally aligned with previous meta-analyses, such as the ones conducted by Barrick and Mount (1991) and Tett, Jackson, and Rothstein (1991).
Outside this broad view, the Validity Vault allows us to work with our clients to find a similar industry, similar jobs, and similar performance so we can determine how our assessments best align with their needs. This can help them to feel comfortable that the tool they’re choosing works, or maybe aid in understanding how they might best use the PI assessments in their own specific situation.
The validity of workforce assessments is something any assessment consumer should pay attention to when doing their due diligence on talent assessments. And for us, numbers don’t lie: 328 unique studies, 111 job roles, and 7,566 correlations to performance which is research inclusive of 25,000 people and 11 industries.
How to hire for culture fit
In this 10-minute interactive course, you can learn how to assess your culture, build an interview guide, conduct and score a cultural interview, and add value to the hiring process
Company culture quiz
Find out how strong your culture is and practical ways to improve it.