I don’t like any of the funnel or cohort analysis tools available to me
Ten years ago, while starting the Data Engineering team at Harry’s, I wrote a suite of cohort analysis queries that looked more-or-less like the queries in this post.
The success of the razor and blades model hinges on high customer retention so we tracked it religiously from day one. Retention was so critical at Harry’s that I even wrote a haiku about it once.*
I wrote 15-20 permutations of cohort queries that we would analyze regularly. At Harry’s we cut our cohorts in a bunch of ways: the channel customers came from, the contents of their first order, age/gender, geography, etc. We analyzed account creation -> 1st order, 1st order -> 2nd order, 2nd order -> 3rd order separately.
There were a lot of charts. We wanted to compare them and see how they changed week-over-week and month-over-month.
I could have sent our data to Mixpanel at the time, but it didn’t allow us to explore our data this way. So I hand-rolled a lot of SQL queries and my colleague analyzed the data in Excel. It was cumbersome and time-consuming, but it was a critical analysis for the business.
This year I was retained as a fractional Head of Product for a long-time client and this summer I took a few days to dive into my client’s retention metrics. They have a loyal base of regular users and I wanted to figure out how they differed from folks who churned. So I reached for my trusty friend: the cohort analysis!
Imagine my surprise when my options were no different than they were a decade ago:
- I can send my data to tool like Amplitude, Mixpanel, or Heap and use its tools.
- I can write SQL and visualize data in Excel or a BI tool.
I didn’t like either of these options ten years ago! I still don’t.
While product analytics suites provide useful tools for exploratory analyses, I usually need to drop down to SQL to answer common follow-up questions. And ensuring accuracy is challenging: the data that ends up in these tools is frequently suspect because event instrumentation is so brittle. Last week I came across Olga Berezovsky’s feedback on Amplitude after its most recent conference and it resonates — Amplitude and its competitors don’t serve me well when I’m doing deeper analyses and I can’t rely on their accuracy.
Meanwhile, I love the flexibility I have when I’m writing SQL: I can query my data however I'd like to. I can clean it up, run spot checks, and write data quality tests to make sure my analysis is correct. But SQL queries for funnel and cohort analyses are challenging to write correctly and scale. Exploring funnels and cohorts this way — asking follow-up questions like, “Why is our March cohort so much more active than our February cohort?” — is time-consuming and I miss out on the interactivity I get from product analytics tools.
I want to be able to analyze behavioral data with the affordances that existing product analytics tools provide. But I don’t want to be shackled to those tools.
The tool I want sits in a happy middle:
- It’s interactive, like product analytics tools. I want to select parameters for my funnels and cohorts and get charts. I want to click around to answer follow-up questions.
- I can use the data I have. I don’t want to send the data to a third-party. I want to use the data I have in my application database, data warehouse, or on my local machine.
- It’s SQL-native and integrates with my workflows. I want to generate queries for 90+% of the charts I explore so I can use them in my BI tool, data pipeline, or notebook.
- I can go deep without having to write SQL. I want to be able to answer questions and explore data well beyond the point that product analytics tools fall short.
I want this to exist! So I'm working on it.
If you analyze behavioral data and the above resonates, I would be delighted to speak with you. Please say hello!
*You can read about my pursuit of poetry in the workplace on my secret sandwich blog. (Currently on hiatus.)