Strength of Evidence in UX Research

Illustration of a weightlifter
User Research / Terminology

Strength of Evidence in UX Research

“Evidence is the available body of facts or information indicating whether a belief or proposition is true or valid.” (Oxford Languages

In other words, anything that helps us prove that something is true is called evidence.

In UX research, for example, evidence comprises all the findings and data we generate from our experiments and investigations. We use it to confirm or refute our assumptions and hypotheses, to make decisions, or to support our arguments. 

Strength of evidence

The strength of evidence describes how solidly the findings contribute to confirming or disproving a supposition. High strength means that the data contributes very solidly to clarification. One could also call the strength of evidence the significance or effectiveness of an experiment.

Different experiments and sources of information generate varying strengths of evidence. A rule of thumb is that the more valid and reliable the results of a research method are, the higher the effectiveness. Valid means that an experiment factually measures what it is supposed to measure. Reliable means that other researchers replicating the experiment would make the same conclusions.

In the scientific context, most disciplines have developed distinct methods for determining the strength of evidence – but they are not fundamentally different in terms of content. They usually assess how the data was collected and rank the results using scales or graphs.

In their book “Testing Business Ideas”, Bland and Osterwalder present the following table for evaluating evidence strength in a UX research context:

Weak EvidenceStrong(er) Evidence
1.Opinions (beliefs)
When people say things like “I would…,” I think ___ is important,” “I believe…,” or “I like…”
Facts (events)
When people say things like “Last week I ___,” “In that situation I usually ___,” or “I spent ___ on.”
2.What people say
What people say in an interview or survey is not necessarily what they do in real life or will do in the future.
What people do
Observable behavior is generally a good predictor of how people act and what people might do in the future.
3.Lab settings
When people are aware that you are testing something, they may behave differently than in a real world setting.
Real world settings
The most reliable predictor of future behavior is what you observe people doing when they are not aware they are being tested
4.Small investments
Signing up by email to be informed about an umpcoming product release is a small investment and relatively weak evidence of interest.
Large investments
Pre-purchasing a product or putting one’s professional reputation on the line es an important investment and strong evidence or real interest.
Bland & Osterwalder (2020, p. 52)

Strong evidence in UX research

When we observe subjects actually performing relevant tasks or activities, the collected data has a very high strength of evidence – provided it is factually valid and reliable! Exemplary experiments to collect these kinds of data are:

  • Contextual studies
  • Usability tests
  • Analytics and other automatically collected data

A/B and multivariate tests could be added to this list as they also collect data automatically. However, since, in my opinion, these involve so many pitfalls that they rarely produce valid data, I would like to exclude them.

Medium strength of evidence in UX research

If we only listen to subjects talk about activities they have performed (without observing them carrying them out) or have them perform tasks that are not directly relevant to the problem to be solved, the evidence is weaker. We do this, for example, through:

  • Interviews
  • Diary studies

Low strength of evidence in UX research

Gathering data from people not part of the target group – even if they are experts- generates the weakest evidence strength. Collecting this kind of data is not wrong, but it is essential to classify its significance correctly. Examples are:

  • Stakeholder interviews
  • Team brainstorming sessions
  • Expert reviews

Let us face it. There is no precise classification in our discipline. Moreover, by looking at the lists, many experiments are missing, and those listed could also be classified differently. However, I did this purposefully because this article can only provide an overview. If in doubt, Bland and Osterwalder’s table is a good starting point to evaluate experiments in combination with your given context.

Do we always need high strength of evidence?

Experiments with weaker evidence are often sufficient in a product discovery or design process when we are still learning about the problem space. Here, we want to determine whether the general direction is correct or gain initial insights and generate assumptions.

Nevertheless, as soon as we want to validate assumptions and hypotheses, we need stronger evidence and higher quality.


Before deciding on a research method, we should be clear about the strength of the evidence we need for our task. It is crucial to note that an experiment that provides a high strength of evidence can still be set up or carried out incorrectly – and obviously, this rarely happens intentionally. Of course, data collected this way no longer has any evidence. The focus of every experiment should thus be on collecting valid and reliable data.

Relevant resources

Bland, J. & Osterwalder, A. (2020). Testing Business Ideas. John Wiley & Sons.

Subscribe to my mailing list. And always stay up to date with my latest articles, videos and offers. Plus exclusive mailing list content. Jump to subscription.
Subscribe to my mailing list. And alsways stay up to date with my latest articles, videos and offers, plus exclusive mailing list content. Jump to subscription.