Learn how to design large scale systems

21 Mar 2017

Learning how to design scalable systems will help you become a better engineer.

System design is a broad topic. There is a vast amount of resources scattered throughout the web on system design principles.

This repo is an organized collection of resources to help you learn how to build systems at scale.

https://github.com/donnemartin/system-design-primer/blob/master/README.md

Reducing work using pull request refs

16 Mar 2017

Refspecs are cool and you should not fear them. They are simple mappings from remote branches to local references, in other words a straight forward way to tell git “this remote branch (or this set of remote branches), should be mapped to these names locally, in this name space.”

http://blogs.atlassian.com/2014/08/how-to-fetch-pull-requests/

To checkout a pull request locally:

git fetch origin +refs/pull-requests/your-pr-number/from:local-branch-name

Or add the refspec that will map remote pull requests heads to a local pr name space. You can do it with a config command

git config --add remote.origin.fetch '+refs/pull-requests/*/from:refs/remotes/origin/pr/*'
git fetch origin
git checkout pr/1

From Transactions to Streams

13 Mar 2017

Martin Kleppmann explores using event streams and Kafka for keeping data in sync across heterogeneous systems, and compares this approach to distributed transactions, discussing what consistency guarantees can it offer, and how it fares in the face of failure.

https://www.infoq.com/presentations/event-streams-kafka

The Hardest Part About Microservices is your Data

13 Mar 2017

Of the reasons we attempt a microservices architecture, chief among them is allowing your teams to […] be autonomous, capable of making decisions about how to best implement and operate their services, and free to make changes as quickly as the business may desire.

To gain this autonomy, […] don’t share a single database across services because then you run into conflicts like competing read/write patterns, data-model conflicts, coordination challenges, etc. But a single database does afford us a lot of safeties and conveniences: ACID transactions, single place to look, well understood (kinda?), one place to manage, etc. So when building microservices how do we reconcile these safeties with splitting up our database into multiple smaller databases?

http://blog.christianposta.com/microservices/the-hardest-part-about-microservices-data/

Kafka and Event Sourcing

11 Feb 2017

Very nice talk on using Kafka as an event store in a distributed architecture.

Reducing Microservice Complexity with Kafka and Reactive Streams - by Jim Riecken

start from a monolith
- single build pipeline
- good for small teams
- doesn’t scale well
strangler pattern
- tease out small services
- monolith becomes a facade that calls into microservices
microservices
- clear ownership
- fast build times
- independently scalable
- allows innovation and new technology
Cons
- latency
- cascading failure
- uptime is based on combined critical service path
non-essential calls should be asynchronous
- decoupling
- producers don’t need to be aware of consumers
- define delivery requirements
- buffering for slow consumers
kafka
- append only
  - fast O(1)
  - LinkedIn sent 800billion/day (2015)
- broker data persisted to disk
- topics + partitions
  - balanced or hashed
- persistence of hours to weeks
- pulling consumers
  - save current index
- immutable log files
- consumers can pick up where they left off due to upgrades or downtime