Our legacy Things Cloud service was built on Python 2 and Google App Engine. While it was stable, it suffered from a growing list of limitations. In particular, slow response times impacted the user experience, high memory usage drove up infrastructure costs, and Python’s lack of static typing made every change risky. For our push notification system to be fast, we even had to develop a custom C-based service. As these issues accumulated and several deprecations loomed, we realized we needed a change.
Adopting Wide Event-style instrumentation has been one of the highest-leverage changes I’ve made in my engineering career. The feedback loop on all my changes tightened and debugging systems became so much easier. Systems that were scary to work on suddenly seemed a lot more manageable.
Many things at Twitter were broken to the point that they could bring the entire site to a halt. Greg’s strategy, now distorted through multiple retellings and my own foggy memory, was to focus on short-term triage rather than long-term fixes.
You could say, then, that by the late 1960s, software development was facing three crises: a crying need for more programmers; an imperative to wrangle development into something more predictable; and, as businesses saw it, a managerial necessity to get developers to stop acting so weird.
There are two parts to software development: creating a design and expressing it as code. The code is tangible but the design is conceptual. Keeping a project healthy means doing both well. Here’s my concern: whenever you mix the conceptual with the tangible, it’s easier to neglect the conceptual. When you miss a tangible target, it’s obvious, but when you miss a conceptual target, you might not recognize it, or might rationalize that, because it’s impossible to measure, you were really quite close.
The word “scalable” has a mystical and stupefying power over the mind of the software engineer. Its mere utterance can whip them into a depraved frenzy. Grim actions have been justified using this word.
Alright folks, gather round and let me tell you the story of (almost) the biggest engineering disaster I’ve ever had the misfortune of being involved in. It’s a tale of politics, architecture and the sunk cost fallacy
Not everything in your application needs to be in a single state object. Keep things logically separated (user settings does not necessarily have to be in the same context as notifications). You will have multiple providers with this approach.
Not all of your context needs to be globally accessible! Keep state as close to where it’s needed as possible.
TL;DR “Microservices” was a good idea taken too far and applied too bluntly. The fix isn’t just to dial back the granularity knob but instead to 1) focus on the split-join criteria as opposed to size; and 2) differentiate between the project model and the deployment model when applying them.
Peter Naur’s classic 1985 essay “Programming as Theory Building” argues that a program is not its source code. A program is a shared mental construct (he uses the word theory) that lives in the minds of the people who work on it. If you lose the people, you lose the program. The code is merely a written representation of the program, and it’s lossy, so you can’t reconstruct a program from its code.
Below is a list of some lessons I’ve learned as a distributed systems engineer that are worth being told to a new engineer. Some are subtle, and some are surprising, but none are controversial.
As software engineers our job is not to produce code per se, but rather to solve problems. Unstructured text, like in the form of a design doc, may be the better tool for solving problems early in a project lifecycle, as it may be more concise and easier to comprehend, and communicates the problems and solutions at a higher level than code.
There is a sweet spot of React: in moderately interactive interfaces. Complex forms that require immediate feedback, UIs that need to move around and react instantly.
Deploys require a careful balance of speed and reliability. At Slack, we value quick iteration, fast feedback loops, and responsiveness to customer feedback.
Every line of code written comes at a price: maintenance. To avoid paying for a lot of code, we build reusable software. The problem with code re-use is that it gets in the way of changing your mind later on.
(we look at) how to handle writes to two independent backends without using two-phase commits. Instead we can rely on using at-least-once delivery guarantees and ask other systems to deduplicate our messages.
it’s actually an unqualified good for engineers to be interacting with production on a daily basis, observing the code they wrote as it interacts with infrastructure and users in ways they could never have predicted.
The first step is acknowledging that our relationship is more important than the design of the system. As long as we have a productive working relationship we can move the design in any direction. When our relationship breaks down we don’t get anywhere.
In order to succeed at production ownership, a team needs a roadmap for developing the necessary skills to run production systems. We don’t just need production ownership; we also need production excellence. Production excellence is a teachable set of skills that teams can use to adapt to changing circumstances with confidence. It requires changes to people, culture, and process rather than only tooling.
Standardizing technology is a powerful way to create leverage: improve tooling a bit and every engineer will get more productive. Adopting superior technology is, in the long run, an even more powerful force, with successes compounding over time. The tradeoffs and timing between standardizing on what works, exploring for superior technology, and supporting adoption of superior technology are at the core of engineering strategy.
Once you’ve learned enough that there’s a certain distance between the current version of your product and the best version of that product you can imagine, then the right approach is not to replace your software with a new version, but to build something new next to it — without throwing away what you have.
Dark is a holistic programming language, structured editor, and infrastructure, for building backend web services. It’s aimed at frontend, backend, and mobile engineers.
A post that builds up from simple princples. It looks into React, its programming model, its goals, and the trade offs it takes in solving its design challenges.
We should identify the error kernel. The error kernel of a system is that part which must be correct. That’s what the error kernel is. All the other code can be incorrect, it doesn’t matter. The error kernel is the part of the system that must be correct. If it’s incorrect, then all bets are off. The error kernel must be correct.
Hashing syntax trees and storing them directly in a database is very interesting. I’ve long wondered what will come after the grab bag of text files approach we’ve been using to date.
Now, let’s try a much better, O(1) control plane. DUMB AS ROCKS. Imagine if instead that the customer API pretty much edits a document directly, like a file on S3 or whatever. And imagine the servers just pull that file every few seconds, WHETHER IT CHANGED OR NOT.
it’s important to build up a solid instinct for separating the technologies and products worth spending time on from the ones that will fade into obscurity after their 15 minutes of fame is over
What follows is my personal, opinionated, idiosyncratic approach to React/Redux development, forged in half a dozen attempts to pursue test-driven development at a good pace.
Rich Hickey explained the design choices behind Clojure and made many statements about static typing along the way. I share an interesting perspective and some stories from my time as a Haskell programmer. I conclude with a design challenge for the statically typed world.
Versioning is always a compromise between improving developer experience and the additional burden of maintaining old versions. We strive to achieve the former while minimizing the cost of the latter, and have implemented a versioning system to help us with it. Let’s take a quick look at how it works.
PumpkinDB is essentially a database programming environment, largely inspired by core ideas behind MUMPS. Instead of M, it has a Forth-inspired stack-based language, PumpkinScript. Instead of hierarchical keys, it has a flat key namespace and doesn’t allow overriding values once they are set.
But there’s a deeper problem. I actually think ten lines of code is too big for an abstraction (Metz agrees). Ten lines of code that happen to be repeated are so unlikely to have any interesting properties, including being bug-free, reusable, and composable, that you should be very skeptical of being able to factor the whole thing out as common.
It’s not that new technology should never be introduced, but it should be done with rational defensiveness, and with a critical eye in how it’ll fit into an evolving (and hopefully ever-improving) architecture.
Building applications following domain-driven design and using CQRS feels really natural with the Elixir – and Erlang – actor model. Aggregate roots fit well within Elixir processes, which are driven by immutable messages through their own message mailboxes, allowing them to run concurrently and in isolation.
Clean and robust state management for React and React-like libs.
The library grew from the idea that state should be just as flexible as your React code; the state containers you build with freactal are just components, and you can compose them however you’d like. In this way, it attempts to address the often exponential relationship between application size and complexity in growing projects.
The single biggest bad thing that Young has seen during the last ten years is the common anti-pattern of building a whole system based on Event sourcing. That is a really big failure, effectively creating an event sourced monolith. CQRS and Event sourcing are not top-level architectures and normally they should be applied selectively just in few places.
Cap’n Proto RPC employs TIME TRAVEL! The results of an RPC call are returned to the client instantly, before the server even receives the initial request!
From my point of view, the CQRS-induced complexity is largely accidental, and thus can be avoided. To illustrate my point, I want to discuss the goal of CQRS, and then analyze 3 common sources of accidental complexity in CQRS-based systems.
The Mikado Method is a structured way to make major changes to complex code. It helps you visualize, plan, and perform business value-focused improvements over several iterations and increments of work, without ever having a broken codebase during the process.
A collection of information about the PagerDuty incident response process. Not only how to prepare new employees for on-call responsibilities, but also how to handle major incidents, both in preparation and after-work.
After fighting with Docker on OSX and the need for 2-way syncs, fsevents, etc. I developed a desire to get back to a simple(r) development on a linux based VM. This project is a jumping off point.
The idea of this series was to teach enough Haskell to be able to read Simon Marlow’s book of the same title. It turned out to be both a Haskell course and a selection of topics on parallelism and concurrency. There should be something interesting for both total beginners and experts.
Models are used to save money when designing. i.e. The maths came second as a cheaper way to build stuff. When your end medium is so malleable you can design in it.
Rackt is a GitHub organization that was created to ensure critical open source projects in the React community receive long-term support and maintenance.
At Grammarly, the foundation of our business, our core grammar engine, is written in Common Lisp. It currently processes more than a thousand sentences per second, is horizontally scalable, and has reliably served in production for almost 3 years.