Strength of Evidence in UX Research5. August 2022 2022-09-09 9:39
Strength of Evidence in UX Research
„Evidence is the available body of facts or information indicating whether a belief or proposition is true or valid.“ (Oxford Languages)
In other words, anything that helps us prove that something is true is called evidence.
In UX research, for example, evidence comprises all the findings and data we generate from our experiments and investigations. We use it to confirm or refute our assumptions and hypotheses, to make decisions, or to support our arguments.
Strength of evidence
The strength of evidence describes how solidly the findings contribute to confirming or disproving a supposition. High strength means that the data contributes very solidly to clarification. One could also call the strength of evidence the significance or effectiveness of an experiment.
Different experiments and sources of information generate varying strengths of evidence. A rule of thumb is that the more valid and reliable the results of a research method are, the higher the effectiveness. Valid means that an experiment factually measures what it is supposed to measure. Reliable means that other researchers replicating the experiment would make the same conclusions.
In the scientific context, most disciplines have developed distinct methods for determining the strength of evidence – but they are not fundamentally different in terms of content. They usually assess how the data was collected and rank the results using scales or graphs.
In their book „Testing Business Ideas“, Bland and Osterwalder present the following table for evaluating evidence strength in a UX research context:
|Weak Evidence||Strong(er) Evidence|
When people say things like „I would…,“ I think ___ is important,“ „I believe…,“ or „I like…“
When people say things like „Last week I ___,“ „In that situation I usually ___,“ or „I spent ___ on.“
|2.||What people say|
What people say in an interview or survey is not necessarily what they do in real life or will do in the future.
|What people do|
Observable behavior is generally a good predictor of how people act and what people might do in the future.
When people are aware that you are testing something, they may behave differently than in a real world setting.
|Real world settings|
The most reliable predictor of future behavior is what you observe people doing when they are not aware they are being tested
Signing up by email to be informed about an umpcoming product release is a small investment and relatively weak evidence of interest.
Pre-purchasing a product or putting one’s professional reputation on the line es an important investment and strong evidence or real interest.
Strong evidence in UX research
When we observe subjects actually performing relevant tasks or activities, the collected data has a very high strength of evidence – provided it is factually valid and reliable! Exemplary experiments to collect these kinds of data are:
- Contextual studies
- Usability tests
- Analytics and other automatically collected data
A/B and multivariate tests could be added to this list as they also collect data automatically. However, since, in my opinion, these involve so many pitfalls that they rarely produce valid data, I would like to exclude them.
Medium strength of evidence in UX research
If we only listen to subjects talk about activities they have performed (without observing them carrying them out) or have them perform tasks that are not directly relevant to the problem to be solved, the evidence is weaker. We do this, for example, through:
- Diary studies
Low strength of evidence in UX research
Gathering data from people not part of the target group – even if they are experts- generates the weakest evidence strength. Collecting this kind of data is not wrong, but it is essential to classify its significance correctly. Examples are:
- Stakeholder interviews
- Team brainstorming sessions
- Expert reviews
Let us face it. There is no precise classification in our discipline. Moreover, by looking at the lists, many experiments are missing, and those listed could also be classified differently. However, I did this purposefully because this article can only provide an overview. If in doubt, Bland and Osterwalder’s table is a good starting point to evaluate experiments in combination with your given context.
Do we always need high strength of evidence?
Experiments with weaker evidence are often sufficient in a product discovery or design process when we are still learning about the problem space. Here, we want to determine whether the general direction is correct or gain initial insights and generate assumptions.
Nevertheless, as soon as we want to validate assumptions and hypotheses, we need stronger evidence and higher quality.
Before deciding on a research method, we should be clear about the strength of the evidence we need for our task. It is crucial to note that an experiment that provides a high strength of evidence can still be set up or carried out incorrectly – and obviously, this rarely happens intentionally. Of course, data collected this way no longer has any evidence. The focus of every experiment should thus be on collecting valid and reliable data.
Bland, J. & Osterwalder, A. (2020). Testing Business Ideas. John Wiley & Sons.