-
Moving to substack
2024-02-08
These days I’m writing at philosophicalhacker.substack.com
-
For Data Notebooks, Functions Arent Enough (and what to do about it)
2023-01-12
The Dark Knight has this great scene where Batman punches The Joker to intimate him into giving up information and he just starts laughing and says: You have nothing! Nothing to threaten me with. Nothing to do with all your strength. Batman, shocked and powerless, just stands there like an idiot. While working in a data notebook a few months back, I felt the same way. The typical power that I had as a programmer felt worthless against the data work I was doing.…
-
The code that ChatGPT can't write
2022-12-07
HN Discussion ChatGPT is game-changing, and, more generally, language models may be the most important dev tool of our generation. (It takes some humility to admit this, as we’re working on a dev tool for data scientists.) But neither ChatGPT nor some larger descendent model will ever be able to write the most difficult pieces of our software given natural language descriptions of desired functionality. Here’s my argument for this claim, drawing on observations from Fred Brooks’ “No Silver Bullet” and Eric Evans’ Domain Driven Design.…
-
Postgres SQL Lessons From Advent of Code Challenges
2022-02-17
Note: This post was originally published on heap’s blog and was co-written with Amanda Murphy We did something odd for Advent of Code this year: We solved a few challenges in javascript and then in PostgreSQL. We learned a few interesting things about SQL that we’d like to share here. Disclaimer: We did not complete all 25 days in SQL (judging by the links from this HN thread, it looks like pretty much no one did), but we still think the things we learned about SQL are useful and worth sharing, especially for non-experts.…
-
Optimizing Postgres Queries at Scale
2021-12-28
Note: This post was originally published on heap’s blog Heap is a product analytics tool that automatically captures web and mobile behavior like page views, clicks, and taps. We’re operating at a scale of billions of events per day, which we store across a distributed Postgres cluster. Heap’s thousands of customers can build queries in the Heap UI to answer almost any question about how users are using their product.…
-
How Postgres Audit Tables Saved Us From Taking Down Production
2021-11-08
Note: This post was originally published on heap’s blog Audit tables record changes that occur to rows in another table. They’re like commit logs for database tables, and they’re typically used to figure out who made what changes when. But surprisingly, we’ve found them useful for keeping our distributed Postgres cluster stable. To convince you of the stability-related value of audit tables, we’ll cover how audit tables helped us avoid a serious incident.…
-
Lessons from Deploying 1 million Postgres Indexes
2021-10-13
Note: This post was originally published on heap’s blog Heap is a product analytics tool that automatically captures web and mobile behavior like page views, clicks, and taps. We’re operating at a scale of billions of events per day, which we store across a distributed Postgres cluster. Our cluster has over a million tables of events. Recently, we discovered an index that makes our new Effort Analysis feature faster, and we attempted to roll out that index across the cluster.…
-
Working Around a Case Where the Postgres Planner Is "Not Very Smart"
2021-08-10
Note: This post was originally published on heap’s blog Heap is a digital insights platform that automatically captures web and mobile behavior like page views, clicks, and taps. We recently shipped Effort Analysis, a way for Heap customers to see the median number of interactions and seconds engaged between each step within a funnel. Here’s what it looks like: To build this feature, we needed to write a query that could quickly scan more than a billion rows of event data.…
-
How and When to Control for Confounders During Product Usage Analyses
2021-06-12
Note: This post was originally published on heap’s blog We all know that correlation isn’t causation, but when we’re assessing the impact of a feature we’ve just shipped or searching for an “aha moment” that leads to better retention, it’s easy to forget this. It’s tempting to look at the increased conversion rates of users who did X versus users who didn’t, and conclude that our feature is working or that we’ve found the “aha moment.…
-
Maybe you should build a faster horse
2021-04-09
Building features no one uses is the cardinal sin of product work, and building features merely because users are asking for them is a well-known way of committing it. To keep us from falling into this trap, product leaders often quote Steve Jobs: I think Henry Ford once said, “If I’d asked customers what they wanted, they would have told me, ‘A faster horse!’” People don’t know what they want until you show it to them.…