Baxter's free funnel analysis SQL generation tool is out!

Headshot of
                Vikram Oberoi

Vikram Oberoi

·

January 30, 2023

vikramoberoi.com

· 6 min read

Baxter's free funnel analysis SQL generation tool is out!

A cartoon of a person using a hand crank to deliver funnel parameters into a funnel.
Our funnel analysis SQL generator is significantly easier to use than this contraption. Credit: katiebcartoons.com

Today we’re releasing a free tool that generates multi-step funnel analysis queries in nine SQL dialects: BigQuery, Snowflake, Redshift, Spark, ClickHouse, DuckDB, PostgreSQL, MySQL, and SQLite.

It’s free, it’ll save you time and tears, and it’s available here (no longer available).

Give it a spin! Reach out to us if you have requests or run into issues!

Implementing funnel analysis queries across nine SQL dialects will lead you into some pretty goofy corners of SQL implementations. Stay tuned to the Baxter blog for posts about query strategies we’ve implemented and the SQL quirks that we made friends and enemies with along the way.

For now, let’s get you situated.

The rest of this post outlines the assumptions our funnel SQL generator makes about your event data and the kinds of funnel analyses it enables.

I’m also going to share a bit about the strategies we employ and how we test them so you can be confident in their results.


Table of contents


What data model do these queries require?

If you want these queries to work out of the box, you’ll need an events table with at least these three 3 columns:

If your table or column names differ, you can modify them in the SQL we generate. And if you have more columns, these queries will still work. In fact, that’s a pretty common scenario!

You might already have a wide table of events if:

Some companies do custom event tracking or build data pipelines that materialize event data this way, too.

For example, a row in your orders table might turn into a row in your events table. If a user with id 23 placed an order on January 22nd, 2023, you might create this row in events: 23, Order Placed, 2023-01-22 13:34:01, (various columns with order properties)

The are two primary benefits to modeling event data as a wide table with many columns:

If you don’t have a table of events, you’ll need to massage your data so that it’s structured this way.

Finally, you might have more complicated predicates in your funnel steps like event_name == 'Pageview' and url_path LIKE '...'. You can modify the queries we generate with your own predicates and remain confident that the query will yield accurate multi-step funnel conversion numbers.

What kinds of funnel analyses do these queries allow me to do?

You can analyze funnels…

Another way to state the above is that these queries support funnel analyses with loose ordering and a total sequence time constraint.

They do not support strict ordering or a stepwise time constraint.

Credit for this terminology goes to TJ Murphy — a certifiable funnel query expert — who shared it with me some months ago. I don’t know if this language is actually formalized anywhere.

Here’s the difference between loose and strict ordering.

A chart showing how to match funnels with loose vs. strict ordering, with four examples.
Loose ordering allows events to occur between each step in your funnel. Strict ordering doesn’t.

Here’s the difference between total sequence and stepwise time constraints.

An image showing the different between total sequence and stepwise time constraints when analyzing funnels.
Total sequence: all steps must complete within some duration. Stepwise: the time between steps must not exceed some duration.

Growth marketers, product managers, and analysts most commonly want loose ordering and a total sequence time constraint when they perform funnel analyses. That’s what our funnel query generator supports today.

Funnel queries that implement strict ordering and a stepwise time constraint are substantially different. We haven’t implemented them yet but we’ll work on it if folks are keen on it, so tell us if you are.

Nine SQL dialects, three funnel query strategies

At release we employ three funnel query strategies across nine SQL dialects.

If we don’t support your SQL dialect and you’d like for us to, we want to help!

These are the strategies we employ today:

Right now, our goal with Baxter’s funnel SQL query generator is to help users solve their immediate problem: writing a funnel analysis query that just works.

While we haven’t tested the performance behind these strategies, we know with certainty that one of them scales poorly: the self-join. We’ll benchmark and improve the strategies we use for each dialect over time.

If you have an approach that works particularly well for your dialect and data warehouse, we’d love to hear about it.

How do you know these are correct?

We’ve built a synthetic dataset that we run and check our queries against. We also have test harnesses that run generated queries against a data warehouse for every dialect we support.

We do a lot of automated testing!

If you come across a bug, please let us know and we’ll get it fixed promptly.