Experimentation Works

https://aristidouandreas.com/book-review-experimentation-works-the-surprising-power-of-business-experiments-by-stefan-h-thomke/

https://www.wsj.com/articles/experimentation-works-and-the-power-of-experiments-review-test-test-and-test-again-11584313983

Does the experiment have a testable hypothesis? That is, are we asking a question that can and should be tested? Creativity, skill, and imagination play a role here: it’s about art and science.
Is there a commitment to abide by the results, whatever they may be? This is a critical issue: if a proposed initiative is a done deal, why go through the time and expense of conducting a test and risk discovering our assumptions are wrong?
Can our organization actually do the experiment? There are multiple reasons for why it is, or is not, possible to run certain kinds of experiments—and it is vitally important to know what these are ahead of time.
How can we ensure that the results are reliable? There are principles and methods that can improve experiments and even help with difficult conditions (such as small samples). Reliable experiments are needed to create trust in an organization.
Do we understand cause and effect? Is the experiment’s design (and hence the thinking behind the design) clear on what is the independent variable (the presumed cause) vis-à-vis the dependent variable (the observed effect)? Are correlations good enough for taking action or do we need to go deeper?
Have we gained the most value from the experiment? Is there more learning to be achieved from a single experiment? Have we left anything on the table? Can we use value engineering to maximize the ROI of an experiment?
Are we, organizationally, really having our decisions driven by the work of experimentation?

Whether it’s improving customer experiences, trying out new business models, or developing new products and services, even the most experienced business leaders are often wrong.

Examples:

Famous Predictions of Customer Behavior “[The iPhone is] the most expensive phone in the world, and it doesn’t appeal to business customers because it doesn’t have a keyboard, which makes it not a very good e-mail machine.” —Microsoft CEO Steve Ballmer (2007) “People have told us over and over and over again, they don’t want to rent their music . . . they don’t want subscriptions.” —Apple CEO Steve Jobs (2003) “Television won’t be able to hold on to any market it captures after the first six months. People will soon get tired of staring at a plywood box every night.” —Attributed to 20th Century Fox studio head Darryl F. Zanuck (1946)

Companies struggle with innovation for many reasons. A focus on predictable short-term results can drive out innovation activities that require long-term financial commitments with uncertain outcomes.

“Failure and invention,” notes Amazon’s CEO Jeff Bezos, “are inseparable twins. If you already know it’s going to work, it’s not an experiment.”

In 2016, Jeff Bezos gave shareholders a rare insight into Amazon’s innovation engine. In his annual letter, he explained: “One area where I think we are especially distinctive is failure. I believe we are the best place in the world to fail (we have plenty of practice!), and failure and invention are inseparable twins. To invent you have to experiment, and if you know in advance that it’s going to work, it’s not an experiment. Most large organizations embrace the idea of invention, but are not willing to suffer the string of failed experiments necessary to get there.”

"Outsized returns often come from betting against conventional wisdom, and conventional wisdom is usually right. Given a ten percent chance of a 100 times payoff, you should take that bet every time. But you’re still going to be wrong nine times out of ten. We all know that if you swing for the fences, you’re going to strike out a lot, but you’re also going to hit some home runs. The difference between baseball and business, however, is that baseball has a truncated outcome distribution. When you swing, no matter how well you connect with the ball, the most runs you can get is four. In business, every once in a while, when you step up to the plate, you can score 1,000 runs. This long-tailed distribution of returns is why it’s important to be bold."

Thomke, Stefan H.. Experimentation Works (p. 65). Harvard Business Review Press. Kindle Edition.

More subtly, the notion of “experimentation” has often been confined to verification of known outcomes; testing at the end of innovation programs are managed to find late-stage problems.

the Latin proverb Quod gratis asseritur, gratis negatur (“What is asserted gratuitously [with little evidence] may be denied gratuitously [with little evidence]”) may come in handy.

However, innovation processes with high variability behave quite differently. As utilization increases, delays lengthen dramatically. Add 5 percent more work, and completing the job may take 100 percent longer (figure 1–2). Conversely, add 5 percent more resources and feedback can come 50 percent faster. Few managers understand this relationship, and as a result, they significantly overcommit resources. High utilization creates queues; partially completed work sits idle, waiting for capacity to become available, and feedback gets delayed.

In my discussions with managers, I’ve been very straightforward: installing an infrastructure with an abundance of testing capacity is a must for high-velocity experimentation.

It’s important here to note an important tenet of the scientific method—that experiments can refute, but not prove, a hypothesis. This important tenet of the scientific method is neatly worded by Albert Einstein: “No amount of experimentation can ever prove me right; a single experiment can prove me wrong.”

The team also discovered that testing customers’ actual behavior is more important than trusting what they say they will do. It’s not unusual to run into this saying-doing gap in customer focus groups.

Twyman's law is the principle that "the more unusual or interesting the data, the more likely they are to have been the result of an error of one kind or another".

Instead of viewing leaders primarily as decision makers, the model encompasses three important responsibilities. First, a senior executive’s job is to set a grand challenge that can be broken into testable hypotheses and key performance metrics (e.g., “Best customer experience in the industry”). Second, they need to put in place systems, resources, organizational designs, and standards (e.g., tools, program management, skills training) that allow for large-scale, trustworthy experimentation. And third, executives need to be a role model for all employees. That means living by the same rules as everyone else: having their own ideas subjected to tests and demanding that experiments, not just feature or product releases, are integrated into business roadmaps.

We now make people write down what problem they are trying to solve, and to formulate the hypothesis they want to test, in the form of a falsifiable statement that could logically be proven wrong. It forces everyone to think things through, to no longer just guess but to collect evidence and learn to how solve customer problems.

There was no perfect threshold since an experiment’s p-value also measured the chance of mistakenly accepting B as the winner (false positive). A stricter threshold would result in fewer test wins; by contrast, a more lenient threshold would yield more false positives. At Booking, a test’s p-value had to fall below 0.10 (90 percent confidence) for most tests to be considered considered statistically significant.

Vismans felt that A/B testing was no substitute for leadership when it came to strategic decisions.

Thomke, Stefan H.. Experimentation Works (p. 199). Harvard Business Review Press. Kindle Edition.

Thomke, Stefan H.. Experimentation Works (p. 193). Harvard Business Review Press. Kindle Edition.

Thomke, Stefan H.. Experimentation Works (p. 191). Harvard Business Review Press. Kindle Edition.

Thomke, Stefan H.. Experimentation Works (p. 163). Harvard Business Review Press. Kindle Edition.

Thomke, Stefan H.. Experimentation Works (p. 72). Harvard Business Review Press. Kindle Edition.

Thomke, Stefan H.. Experimentation Works (p. 70). Harvard Business Review Press. Kindle Edition.