Reliability and Validity of the PI Behavioral Assessment

The Predictive Index has an extensive scientific background going back over 60 years. We have conducted over 400 client validation studies that support the PI Behavioral Assessment’s ability to predict job performance across jobs, industries, and countries. We adhere to the standards set forth in The Standards for Educational and Psychological Testing. In addition, the PI Behavioral Assessment achieved certification with the European Federation of Psychologists’ Associations (EFPA) in 2018 (as well as re-certification in 2021), which involved the analysis and presentation of data regarding the assessment’s reliability, validity, and fairness. The Predictive Index has also submitted region specific evidence and achieved this certification for the countries of Sweden, Norway, and most recently, South Africa.

What is reliability? What reliability evidence does PI have for the BA?

Reliability refers to the precision of the scores and their consistency across time. Reliability can be established in various ways such as test-retest reliability and internal consistency. Strong test-retest reliability indicates that participants’ scores remain relatively stable over time. Internal consistency indicates that the questions on an assessment consistently capture the same construct, or “hang together.” Our Science team has conducted numerous test-retest reliability studies on the PI Behavioral Assessment that show that the results of the PI Behavioral Assessment are stable enough over time to support the assessment’s use cases. In 2017 we conducted a comprehensive reliability study, covering samples retesting out to eight years. This study demonstrated that the test-retest stability of the PI Behavioral Assessment generally outperforms Big Five personality assessments after a 4 year interval (which happens to be the median job tenure in the United States).

Does personality change over time?

There is research to show that people’s personalities sometimes change very slowly as they grow older, which makes sense—who we are when we are 18 is not exactly who we are when we are 40. This is a general trend that would affect any personality assessment. In general, results from the PI Behavioral Assessment are expected to be stable enough to support decisions that span multiple years. Specifically, our test-retest analyses show that the results remain reasonably stable for up to 6-8 years. This means that results will remain reasonably stable for decisions or inferences we make that pertain to the next 6-8 years (the median job tenure in the U.S. is about 4 years).

Generally speaking, we advise clients to stick with the Self results of the first assessment administration unless there are extenuating circumstances that affected the first result (e.g., the participant did not take the assessment in their preferred language). Self-Concept can be administered more frequently. Further, we recommend re-administering the BA after 6-8 years if a high-stakes decision needed to be made about an individual.

For more information on administering the BA multiple times click here.

Can times of stress affect BA results?

Distractions and extreme extenuating circumstances can affect someone’s BA results, as they could with any personality or behavioral assessment. With that being said, the BA has been extensively validated on samples taking the assessment in potentially distracting situations and as such, we are still confident in its accuracy. 

In terms of how scores on the BA might be affected (i.e., in ways related to anxiety), there is very little scientific literature to inform an answer to this question. It is reasonable to assume that anxiety can impact results, but it is unclear as to how results will be impacted. We hypothesize that Self-Concept scores might be subject to more rapid change as they are reflective of more recent circumstances.

How do we minimize error in PI Assessments?

PI minimizes error as measured by Standard Error of Measurement (SEM) by ensuring that assessments perform reliably, that confusing irrelevant, or biased items are removed, and that the assessments are accessible and easy to complete. We remove sources of potential error in the assessment and administration that might impact a participant’s score. This is achieved by using simple assessment formats, carefully field-testing instruments before they are used in the field, vigilantly monitoring for statistical bias, accurately translating assessment content, and frequently monitoring the statistics behind the assessment. By taking these precautions, PI creates assessments that can be trusted to report scores that are as accurate as possible for your workplace applications.


Validity refers to evidence supporting interpretations and use cases for an assessment. Our Science Team engages in continuous improvement for our validation. Throughout the body of our validity research, we have found that the PI Behavioral Assessment is linked to multiple work outcomes, which has helped us to establish criterion-related validity. Further, we have researched how the BA is linked to theoretically related (and unrelated) dimensions, allowing us to establish convergent (and discriminant) validity.

Since 1992, we’ve conducted approximately 400 validity studies. In 94% of those studies, we’ve found that scores on the BA were significantly associated with various measurements of job performance. The average criterion validity coefficient in the significant tests was r=0.30. These studies were conducted across 111 unique job roles in 11 different industries using assessment scores and performance data from more than 25,000 working adults, demonstrating the flexibility of the instrument for a variety of roles.

While tenure can be difficult to capture (based on insufficient sample sizes and it being a broad construct), PI has conducted approximately 194 validity studies that found significant relationships between PI Behavioral Assessment Factor scores and tenure. Generally speaking, longer tenure is associated with lower Extraversion (B), higher Patience (C), and higher Formality (D), although this will differ by company and job role.

There are several commercially available assessments that consider “the dark side” of personality (i.e., risk factors and derailers). It is important to keep in mind that many of them are simply overused strengths; they are not necessarily “negative” in and of themselves, but can pose a problem when they appear very consistently or strongly (or in some cases, both). Although the BA does not directly measure negative aspects of personality, it can be interpreted similarly to “dark side” assessments from the overused strengths perspective. For example, consider an individual who is high Dominance in a role that calls for low Dominance. This individual might have difficulty adjusting to the roles demands depending on their ability to “stretch.” In addition, during times of stress or unrest, it is possible that individuals will rely more heavily on their behavioral strengths, leaning in to them to the point that they might become problematic. While we don’t measure the dark side directly, it is straightforward to examine where there might be overused strengths in an individual’s BA factor pattern.

The relevance of the PI Behavioral Assessment to talent development is built into its design, with the selection of four work-related behavioral factors that are used to drive inferences about a person’s drives and motivations. The assessment reports provide interpretive links to specific workplace behaviors, such as communication style, risk tolerance, connecting with others, and more. To verify that our tools are having the intended benefits for clients, we track a variety of feedback measures, both from users and assessment takers. For example, of the clients using the PI Behavioral Assessment for employee development in a 2017 study, 559 clients (81% of those responding) agreed or strongly agreed that PI’s tools have helped them develop better employees. PI also tracks how well the reports resonate with assessment takers and users, with 2018 data showing that 87% of people agree or strongly agree with the interpretive text provided in their reports.

The BA is related to many other criteria that are not detailed here. For further detail, reach out to your PI consultant.

The BA and Other Assessments

Some competitors may purport to measure more factors than the BA. The BA likely doesn’t measure these factors because it was designed to measure factors representative of behavior one can observe on the job. For example, emotional stability (sometimes called neuroticism) is becoming an increasingly controversial trait to measure for workplace purposes. The BA factors have been established to be comprehensive and job-relevant, as well as correlated with job performance.It might seem that those assessments measuring more factors or traits are more comprehensive, more valid, and/or more predictive than the BA, but more factors do not necessarily result in a better assessment. Additional factors measured are often redundant with other factors, not work- or job-related, less predictive, or generally superfluous to the assessment overall. Furthermore, the BA’s streamlined approach to measuring workplace behavior results in a fast and easy testing experience for the candidate with similar predictive power as assessments that are longer and measure more factors.

To inform us of a typo or other error, click here. To request a new feature, click here.