Iterating What and How You Measure Product Metrics

iterating product metrics, measure, measuring tape

Many founders I meet focus on, and rightly so, optimizing their core metrics – a set of units that surprisingly don’t change after its initial inception. But metrics and the way you measure them should undergo constant iteration. Metrics are a way to measure and test your assumptions. 9/10 assumptions, if not all, are honed through the process of iteration. And by transitive property, the metrics we measure, but more importantly, the way we measure them, is subject to no less.

Though I’m not as heavily involved on the operating side as I used to be, although I try to, the bug that inspires me to build never left. So, let’s take it from the perspective of a project a couple friends and I have been working on – hosting events that stretch people’s parameters of ‘possible’. Given our mission, everything we do is to help actuate that. One such metric that admittedly had 2 degrees of freedom from our mission was our NPS score.

The “NPS”

“How likely would you recommend a friend to come to the last event you joined us in?” Measured on a 1-10 scale, we ended up seeing a vast majority, unsurprisingly in hindsight, pick 7 (>85%). A few 9’s, and a negligible amount of 5s, 6s, and 8s. 7 acted as the happy medium for our attendees, all friends, to tell us: “We don’t know how we feel about your event, but we don’t want to offend you as friends.”

We then made a slight tweak, hoping to push them to take a more binary stance. The question stayed the same, but this time, we didn’t allow them to pick 7. In forcing them to pick 8 (a little better than average) and 6 (a little worse than average), we ended up finding all the answers shift to 6s and 8s and nothing else. Even the ones that previously picked 9s regressed to 8s. And the ones who picked 5s picked 6s. Effectively, we created a yes/no question with just this small tweak.

There’s 3 fallacies with this:

Numbers are arbitrary. An 8 for you, may not be an 8 for me. Unless we create a consolidated rubric that everyone follows when answering this question, we’re always going to variability in semi-random expectations.
It’s a lagging indicator. There’s no predictive value in measuring this. By the time they answer this question, they’d already have made their decision. Though the post-mortem is useful, the feedback cycle between events was too long. So, we had to start looking into iterating the event live, or while it was happening.
Answers weren’t completely honest. All the attendees were our friends. So their answers are in part, a reflection of the event, but also in part, to help us ‘save face’.

In studying essentialism, Stoicism, and Rahul Vohra‘s Superhuman, we found a solution that draws on the emotional spectrum that answered 1 and 3 rather well. Instead of phrasing our questions as “How much do you value this opportunity?”, we instead phrased them as “How much would you sacrifice to obtain this opportunity?” Humans are innately loss-averse. Losing your iPhone will affect you more negatively and for longer, than if you won a $1000 lottery.

So, our question transformed into: “How distraught would you be if we no longer invited you to a future event?”, paired with the answers “Very”, “Somewhat”, and “Not at all”. Although I’m shy to say we got completely honest answers, the answers in which we did give allowed for them to follow-up and supplement why they felt that way, without us prompting them.

The only ‘unaddressed’ fallacy by this question – point #2 – was resolved by putting other methods in place to measure attention spans during the event, like the number of times people checked their phone per half hour or the number of unique people who were left alone for longer than a minute per half hour (excluding bio breaks).

Feedback

“How can we improve our event?” We received mostly logistical answers. Most of which we had already noticed either during the event or in our own post-mortem.

In rephrasing to, “How can we help you fall in love with our events?”, we helped our attendees focus on 2 things: (1) more creative responses and (2) deep frustrations that ‘singlehandedly’ broke their experience at the event.

And to prioritize the different facets of feedback, we based it off the answers from the questions:

“What was your favorite element of the event?”
And, “How distraught would you be if we no longer invited you to a future event?”

For the attendees who were excited about elements closely aligned with our mission, we put them higher on the list. There were many attendees who enjoyed our event for the food or the venue, though pertinent to the event’s success, fell short of our ultimate mission. That said, once in a while, there’s gold in the feedback from the latter cohort.

On the flip side, it may seem intuitive to prioritize the feedback of those who were “Very distraught” or “Not at all”. But they exist on two extremes of the spectrum. One, stalwart champions of our events. The other, emotionally detached from the success of our events. In my opinion, neither cohort see our product truly for both its pros and cons, but rather over-index on either the pros or the cons, respectively. On a slight tangent, this is very similar to how I prioritize which restaurants to go to or which books to read. So, we find ourselves prioritizing the feedback of the group that lie on the tipping point before they “fall in love” with our events.

Unscalability and Scalability

We did all of our feedback sessions in-person. No Survey Monkey. No Google Forms, Qualtrics, or Typeform. Why?

We could react to nuances in their answers, ask follow-up questions, and dig deeper.
We wanted to make sure our attendees felt that their feedback was valued, inspired by Google’s Project Aristotle.
And, in order to get a 100% response rate.

We got exactly what we expected. After our post-mortem, as well as during the preparation for our next event, I would DM/call/catch up with our previous attendees and tell them which feedback we used and how much we appreciated them helping us grow. For the feedback we didn’t use, I would break down what our rationale was for opting for a different direction, but at the same time, how their feedback helped evolve the discourse around our strategic direction. Though their advice was on the back burner now, I’ll be the first to let them know when we implement some element of it.

The flip side of this is that it looks extremely unscalable. You’re half-right. Our goal isn’t to scale now, as we’re still searching for product-market fit. But as you might notice, there are elements of this strategy that can scale really well.

In closing

Of course, our whole endeavor is on hold during this social distancing time, but the excitement in finding new and better ways to measure my assumptions never ceases. So, in the interim, I’ve personally carried some of these interactions online, in hopes of discovering something about virtual conversations.

Photo by Jennifer Burk on Unsplash

Stay up to date with the weekly cup of cognitive adventures inside venture capital and startups!

Finding the Sweet Spot – Iterating What and How You Measure Product Metrics

The “NPS”

Feedback

Unscalability and Scalability

In closing

Related

One Reply to “Finding the Sweet Spot – Iterating What and How You Measure Product Metrics”

Leave a Reply Cancel reply

The “NPS”

Feedback

Unscalability and Scalability

In closing

Share this:

Related

One Reply to “Finding the Sweet Spot – Iterating What and How You Measure Product Metrics”

Leave a Reply Cancel reply