Early-stage engineering

Early on you need to be fast. Your team, your stack, your infrastructure — they all need to be set up for that. To do that, you have to have the confidence to break with best practices. That confidence comes from knowing what risks actually matter in your context. The risks you care about when you’re building version 1.0 are very different from the risks a large organization cares about. The way you approach engineering has to reflect that.


When you start out building something new, everything is in flux. Your requirements aren’t understood yet, your data model will evolve, your API boundaries will shift, your interfaces haven’t firmed up yet, etc. That’s natural, and it’s key to embrace that uncertainty when working on something new. At that point it’s all about optimizing feedback loops. Make them as tight and fast as possible. What does that look like in practice:

  • Instrument everything — If you move fast, things break more often. You need to be able to figure out quickly and easily what’s wrong. People often interject here that adding instrumentation is extra work you can’t afford at this stage. The trick is to make it a total no-op. It should take minimal developer effort to get things like tracing, metrics and log aggregation in place. Like, zero is the goal here. Tracing in particular is such an easy win because it doesn’t require any thinking. Adding a span to a method takes at most one or two lines of (templated) code. Over time you’ll want to capture more information on a span and that takes more thinking, but just knowing the code path something took typically solves like 80% of the puzzle.

  • Make deployments automated and continuous — This one should be non controversial at this point. Every merge into main should trigger an image build, which gets deployed automatically. No action required. No release cycles. At Kappa we do on the order of 100s of “deployments” a day. A change is live in the dev cluster within 2 minutes of being merged. You get (near) immediate feedback.

  • Make running things locally easy and cheap — Even faster than deploying things is to just run them locally. Make it as easy as possible to run services locally and to connect to other services running remotely. Running a whole cluster locally can sometimes be hard given hardware constraints but that’s almost never necessary (Corollary: Buy good machines for everyone, see below). One interesting development here are services like Modal which try to abstract away the gap between local vs. cloud infra completely.

  • Make writing tests easy and cheap — The reason people don’t write more tests is because it’s hard and takes time. So it makes sense to invest to bring that cost down — e.g. by auto-generating mocks for your services, or writing sample data generators to give you representative data for your domain. It’s pretty clear at this point that LLMs have changed the game for unittests. It takes all of two clicks / two copy-pastes now to generate a reasonable test suite. Most of the time there are issues/mistakes the model makes, but they’re typically easy to fix. Net-net it can still be a big time saver.

  • Integration tests over unit tests — Integration tests that run on every build or multiple times a day give you fast, meaningful feedback. Modern systems are distributed and it’s the boundaries were most of the bugs sit. Unit tests are fine but if you have to choose on what to spend your time on, write integration tests. The components of your software obviously need to work in isolation but it’s really the interactions where things go wrong. Especially if those interactions are asynchronous.

  • Minimize wait times — Waiting for CI to finish, waiting for a code review, waiting for something to build, etc. — these things are especially detrimental to productivity because they keep you from getting closure on one piece of work and discourage you from moving on to the next task. Even if the work itself is done, it still lingers until it’s deployed. This is one strong argument for choosing a language that compiles and builds quickly.

  • No branch protections — One way to eliminate PR approval wait times is to not require them. Sounds crazy but you wouldn’t believe how much time is wasted waiting for a review on a trivial change (the true cost is even higher than wall time because waiting (and checking) breaks your flow state). So trust your engineers. We’re all adults here. If your team is 5 people with experience, you can coordinate your work often well enough without PRs, just over Slack. You do end up with merge issues at times, but they’re typically infrequent and easy to resolve because early on people tend to work on fairly orthogonal things. Most definitely the time spent on resolving those is easily made up for the increase in velocity.

  • Minimize task overhead — This one is almost tautological at this point. Maximize interrupted blocks of time for people to focus. Minimize meetings and process.

  • Automate stack upgrades — A lot of time can be wasted when you don’t update dependencies until you’re forced to for compatibility reasons. That’s when you have to deal with a potentially large number of issues all at once, usually at the worst possible time. This is easy to fix: Just set up Dependabot.

  • Buy good machines for everyone — The added cost of getting high-spec machines for everyone amortizes literally in a day. Remove the constraint of local hardware as much as you can. The added cost for a team of 5 is totally negligible compared to what you pay on cloud compute.

  • Hire owners and generalists — The person making a change is also responsible for ensuring that it actually works once deployed. Integration tests go a long way here, but sometimes you actually have to make an API call or open the app and check UX impact. If you wait for QA to catch issues, you’ve wasted 3 days to find out you had a bug somewhere. And because you’re often out of context at that point, it becomes harder to fix.

  • Understand your team’s strengths — While everyone agrees that hiring great ICs is important, far too little thought goes into team composition. In fact, it’s often completely absent from recruitment plans. This is strange, since in areas outside software engineering, like professional sports, it gets at least as much attention. Building a technical product from scratch is a high performance team sport. You need great individual performers, but you also need them to complement each other, technically and personality-wise.

Some of the points above may sound crazy to someone in a mature engineering org. And for good reason! Your approach has to evolve as your product matures. The key is to understand what risks you need to care about at the point you’re at. Zero-risk deployments, well-managed sprints, carefully groomed tickets, etc. — these things all sound great in isolation, but the risk-adjusted return of doing them is just too low in the beginning.

The only risks you should care about early on are existential ones: (1) Running out of cash before you launch, (2) launching too late to get enough proof points, (3) shipping too late to iterate meaningfully, (4) being too slow to incorporate feedback. The risks that are considered existential in a larger org are just fundamentally different. Reputation, competitive threats, losing customers, losing market share, product stability, service uptime — those things matter when you have an existing product with good traction. But early on, you don’t have many customers yet, and those you do have are (hopefully) more forgiving. There’s typically also little to no competition to worry about. If not, you may want to reconsider what you’re working on.

I believe a significant number of startups die because they cling to all the best practices of later stage engineering — doing what big companies do. What these companies do is solve for their problems, not yours. Blindly following their advice means you end up over indexing on the wrong risks. The material in books and on blogs is heavily biased towards late stage engineering. People simply have more time to write when there’s an existing product with stable cash flow and growth. And that’s why it’s so important to think for yourself and understand your idiosyncratic risks.

Previous
Previous

Using fewer parts

Next
Next

Why I don’t give investment advice