Max Desiatov

Max DesiatovI'm Max Desiatov, a software consultant building mobile and backend apps. My interests include, but aren't limited to coding in Swift and TypeScript, machine learning, product design, video games and music.

Blog

Hire Me

Coroutines and “yield” expressions in Swift

12 September, 2018

This article is a part of my series about concurrency and asynchronous programming in Swift. The articles are independent, but after reading this one you might want to check out the rest of the series:


We know that generators produce output by passing arguments to the yield statement. I also mentioned previously it would be great if generators could also consume input. Turns out, a generator is a special case of a coroutine. From computer science perspective there are other subtle differences between generators and coroutines, but in this proposal let’s assume that coroutines are more powerful generators that can consume values.

How would that work? Would that be useful at all? To answer these questions, we need to have a closer look at how we interact with generators and coroutines.

Review of Generator type and syntax in details

As a reminder, to define a generator, we use this syntax that looks similar to a closure definition:

func sequence(start: Int = 0,
              step: Int = 1,
              end: Int = .max) -> Generator<Output> {
  return Generator {
    var counter = start
    while counter <= end {
      yield counter
      counter += step
    }
  }
}

yield in the generator body marks suspension “checkpoints” and values that the generator produces. Let’s also have a look at the Generator type itself:

// this is only the interface, not the implementation
final class Generator<Output>: IteratorProtocol {
  // describes possible generator status
  enum Status {
    case suspended
    case failed(Error)
    case finished
  }

  private(set) var status: Status = .suspended

  // get output of a generator with these functions
  func next() -> Output?
  func try next() -> Output

  // cancelling a generator
  func stop()
  func fail(error: Error)
}

Main points of interest are next() functions. I mentioned previously that we can get values out of a generator with for ... in ... { } syntax. We also know that this is a syntax sugar available for all types implementing IteratorProtocol, as clarified in its documentation. The shape of that protocol is trivial:

protocol IteratorProtocol {
  associatedtype Element
  func next() -> Element?
}

Which exactly matches the Generator type’s shape and allows us to use the iteration syntax on all generators:

// without IteratorProtocol
let s1 = sequence(start: 0, step: 1, end: 3)
while let i = s1.next() {
  print(i)
}
// prints lines of "generator starts" 1 2 3 "generator ends"

print(s1.status)
// prints `.finished`

print(s1.next())
// prints `nil`, generator is finished, no more output

// with IteratorProtocol
let s2 = sequence(start: 0, step: 1, end: 5)
// created a new generator `s2` as the previous `s1` is finished
for i in s2 {
  print(i)
}
// prints lines of "generator starts" 1 2 3 "generator ends"

Notice that we can’t use a generator after it’s finished. Allowing this would make no sense because a finished generator reached the end of its body. There is no more code to execute and no more output to yield. What’s interesting, to fully execute a generator yielding 3 values, we had to make 4 next() calls. Without the 4th call, you wouldn’t get the "generator ends" line printed, but both while let and for ... in version automatically issue the terminating call for you.

Coroutine type and syntax

As next() is the main “entry point” that resumes a generator, it would make sense to give coroutines a similar shape, but to allow next function to take input arguments:

// this is only the interface, not the implementation
final class Coroutine<Input, Output> {
  // describes possible coroutine status
  enum Status {
    case suspended
    case failed(Error)
    case finished
  }

  private(set) var status: Status = .suspended

  // get output of a generator with these functions
  func next(_ input: Input) -> Output?
  func try next(_ input: Input) -> Output

  // cancelling a coroutine
  func stop()
  func throw(error: Error)
}

But how would a coroutine receive an input in its body? We already can use yield as a statement, what if it could be an expression evaluating to input values? My first naïve approach to implement this in Swift looked like this:

// first approach, similar to how Python and JavaScript
// implement coroutines
let co = Coroutine<Int, Int> {
  print("coroutine starts")

  var input = yield 42
  print(input)

  input = yield 24
  print(input)

  print("coroutine ends")
}

print(co.next(5))
// coroutine body print: "coroutine starts"
// this print: `42`

print(co.next(10))
// coroutine body print: `10`
// previous `5` is lost :(
// this print: `24`

print(co.next(15))
// coroutine body prints: `15` and "coroutine ends"
// this print: `nil`

But the main problem here is the order of evaluation of a yield expression: it produces output first and consumes an input second. Compare this with the evaluation order of next(_ input: Input) function: it consumes an input first and produces output second. And like with generators, we need to call next one more time to fully execute a coroutine, e.g. for 2 yield expressions executed, you need to call next 3 times. With this approach Python and JavaScript always ignore the first argument passed to next, which is fine because they don’t have a static type system.

But what do we do in Swift? Do we always ignore arguments to the first call of next on a coroutine, like we did with argument 5 in the example above? We can’t make it optional, because that would require excessive optional unwrapping in the generator body. But so far in this proposal, both generator and coroutine closures that we pass to Generator and Coroutine initialisers didn’t have any arguments. What if we could utilise that syntax for passing the first next call input?

// proposed coroutine syntax: first `next` input is explicit
let co = Coroutine<Int, Int> { firstInput in
  print("coroutine starts")
  print("firstInput is \(firstInput)")

  var input = yield 42
  print(input)

  input = yield 24
  print(input)

  print("coroutine ends")
}

print(co.next(5))
// coroutine body prints: "coroutine starts", "firstInput is 5"
// this print: `42`

print(co.next(10))
// coroutine body print: `10`
// this print: `24`

print(co.next(15))
// coroutine body prints: `15` and "coroutine ends"
// this print: `nil`

Great, we don’t lose any input values anymore and can guarantee it in compile time, and we also don’t need optionals for that. I like this version even more than coroutine syntax in Python and JavaScript, that feeling when a remake is better than the original. 😆

Writer stream as a coroutine

To piece it all together, let’s proceed with a Big Data™ example from the previous article: you have hundreds of gigabytes of sensor data, which is stored as a JSON array of integers in a file. We already implemented a chunkReader stream that can read files of any size producing a sequence of chunks of specified length.

A counterpart would be a chunkWriter stream looking like this:

func chunkWriter(path: String,
                 start: UInt64 = 0) -> Coroutine<Data, ()> {
  return Coroutine { firstInput in
    guard let handle = FileHandle(forReadingAtPath: path) else {
      return
    }

    handle.seek(toFileOffset: start)

    var data = firstInput
    while !data.isEmpty {
      handle.write(data)
      data = yield
    }
  }
}

Streaming parser of a JSON subset as a coroutine

But the full power of coroutines becomes apparent with something more advanced. Consider a streaming parser for a sequence of chunks read from a file. This implementation should be able to handle all valid JSON arrays of integers while consuming a nearly constant amount of memory during parsing. And even for invalid arrays or in case of corrupted data we’d like this parser to produce as many results as possible up to the failure point.

I look forward to seeing an implementation of this streaming parser in Swift 4.2. I bet it would be much longer and less readable than the generator version that follows. 😄 The essence of this version is pretty simple: it consumes characters one by one passed evaluating next(char) at the point of usage. Those characters are available inside of the coroutine body as values of yield expressions. At the same time the coroutine produces parsed values that are returned as results of next(char):

enum ParsingError: String, Error {
  case expectedOpenBracket = "expected \"[\""
  case expectedDigitOrSpace = "expected a digit or a space"
  case unexpectedInput = "unexpected input"
}

let whitespaceChars: [Character] = [" ", "\r", "\n", "\t"]

func arrayParser() -> Coroutine<Character, Int?> {
  return Coroutine { firstInput in
    // integer digits are accumulated in a buffer before they're parsed
    var acc = ""
    var input = firstInput

    // JSON arrays start with '['
    guard input == '[' else {
      throw ParsingError.expectedOpenBracket
    }

    // a value to yield
    var value: Int? = nil
    // spaces aren't permitted between digits, so we need
    // a special flag to handle it
    var previousIsSpace = false

    // flushes accumulated buffer
    func flush() {
      // integer digits parsed
      value = Int(acc)
      acc = ""
    }

    // iterate over all input characters
    repeat {
      input = yield value
      // resetting next value to yield
      value = nil

      // special handling for whitespace characters
      if !whitespaceChars.contains(input) {
        switch input {
        case ",":
          if !acc.isEmpty {
            flush()
          } else if !previousIsSpace {
            throw ParsingError.expectedDigitOrSpace
          }
        case "0"..."9" where !previousIsSpace || acc.isEmpty:
          acc.append(input)
        case "]":
          flush()
          break
        default:
          throw ParsingError.unexpectedInput
        }
        previousIsSpace = false
      } else {
        previousIsSpace = true
      }
    } while input != nil

    yield value
  }
}

let parser = arrayParser()

for char in '[1, 2,3,4,5 , 42]') {
  if let result = parser.next(char) {
    print(result)
  }
}
// prints lines of 1 2 3 4 5 42

A great property of this parser is that it consumes a constant amount of memory and it doesn’t need to load the whole input sequence at once. It’s highly performant and composable, while the definition is only ~70 lines of code with comments, supporting error enum and whitespace constants.

Summary

Generators are good for expressing reader streams, but coroutines are even more powerful and can express writer streams and stream processing concisely. We had a look at the Generator and Coroutine types and how they differ and implemented a highly-performant streaming parser for arrays of JSON integers.

Whenever you need to read and process huge amounts of uniform data and generators and coroutines would be able to simplify your code in a quite significant way. Instead of writing state machines manually coroutines allow you to express that in a nice way with usual control flow structures like loops and conditions.

The concept of code that can be suspended and resumed as needed is very powerful and allows building many more useful abstractions on top of that. Adding these foundational building blocks to Swift is as important as ever, especially as other programming languages already have them or plan to add them soon. As always, I appreciate comments and questions from you on Twitter or Mastodon, talk soon. 🙌



What are generators and why Swift needs them?

10 September, 2018

This article is a part of my series about concurrency and asynchronous programming in Swift. The articles are independent, but after reading this one you might want to check out the rest of the series:


When developing apps, I frequently need to switch between different programming languages. In addition to using Swift on iOS/macOS and Linux, a lot of the time JavaScript/TypeScript is a good fit for full-stack web development, especially with the popularity of React and Node.js. And when doing data engineering, Python’s ecosystem is pretty good and is widely supported.

But every time I come back to Swift, there is one big feature I miss called “generators”. Whether you’ve used generators and yield keyword before in other languages, or you are only interested to learn about it, you might find this short pitch interesting. Here I try to imagine how generators could look in Swift and what are the implications.

Generators? What’s that?

The best way to explain is to compare generators to functions: functions have one entry point (function application), execute once and finish. Functions return values and/or return early with a return statement. A generator is like a function with “holes”, after initialisation it executes up to a next yield statement, “generates” a value passed as yield argument and suspends, saving its state. The generator state is a position of yield in generator’s body and values of all variables in the generator’s scope. A caller of a generator can then resume the generator again and it would continue from the same yield position it was suspended preserving its scope.

I tried to capture the difference in this diagram:

Generators vs functions

Thanks to the fact that generators produce a series of values of the same type, usually they can be treated like sequences and iterated with for ... in syntax. This is a powerful feature that can be used where common sequences and collections don’t fit well or there is a need to describe an algorithm that suspends and resumes later.

How do different languages implement generators?

If you’re interested in programming languages history, here’s a quick overview how a few of them approached generators:

Naming Version and year introduced
JavaScript Generators EcmaScript 6 (~2015)
Python Generators and coroutines Generators in Python 2.3 (2003), coroutines in Python 2.5 (2006)
Ruby Fibers Ruby 1.9 (2007)
C# “iterators” C# 2.0 (2006)

I’m sure there is a plenty of other languages that have generators under different names with a different syntax or with 3rd-party libraries, but I listed only those with relatively wide adoption that have it implemented as a language feature.

Streams

Let’s get started with a simple use case for generators: reading huge amounts of Big Data™. Say you have hundreds of gigabytes of sensor data, which is stored as a JSON array of integers in a file. The only API you have at your disposal is a plain old synchronous blocking API to read a few bytes in a file at a given offset. We would like to have a nice and efficient way of reading this data and passing it to other code for post-processing. Reading all of the data up front and storing it in memory might be ok for small files, but do we really need servers with a hundred gigs of RAM to read a file with our sensor data? Maybe we could read the file only in small portions?

Meet streams:

a stream is a sequence of data elements made available over time. A stream can be thought of as items on a conveyor belt being processed one at a time rather than in large batches.

Wikipedia

Here’s how an implementation of a chunk reader could look like without syntax support for generators:

final class ChunkReader {
  private let handle: FileHandle
  let chunkSize: Int

  init?(path: String, chunkSize: Int, start: UInt64 = 0) {
    guard let handle = FileHandle(forReadingAtPath: path) else {
      return nil
    }

    self.handle = handle
    self.chunkSize = chunkSize

    handle.seek(toFileOffset: start)
  }

  func next() -> Data {
    return handle.readData(ofLength: chunkSize)
  }
}

This ChunkReader class maintains state in the handle instance of FileHandle, which itself is a state machine.

Of all existing generator implementations, I find Python’s and JavaScript’s version the most straightforward and easy to use. You may be surprised by my choice of languages with dynamic typing for inspiration, but I think this generator design hasn’t appeared yet in statically typed languages only due to implementation details. In my experiments, I’ve reached a conclusion that it can be translated to a statically typed language, such as Swift, surprisingly well.

Without further ado, here’s a version with generator support in Swift syntax:

func chunkReader(path: String,
                 chunkSize: Int,
                 start: UInt64 = 0) -> Generator<Data> {
  // marker for a generator body
  return Generator {
    guard let handle = FileHandle(forReadingAtPath: path) else {
      return
    }

    handle.seek(toFileOffset: start)

    var data: Data

    repeat {
      data = handle.readData(ofLength: chunkSize)
      // suspends and resumes execution here at every iteration
      yield data
    } while !data.isEmpty
  }
}

let stream = chunkReader(path: "test.txt", chunkSize: 5)

for chunk in stream {
  // prints lines of every 5 characters in test.txt
  print(chunk)
}

Neat, we no longer need a dedicated class declaration and are able to express generators concisely, so it doesn’t differ much from a common function declaration. It’s quite important that the generator syntax version reads naturally with common control flow constructs such as loops. It’s easy to follow what happens when generator execution is resumed after the yield keyword. In a class version, you’re forced to introduce instance variables and initialise those explicitly.

Generators are state machines

An interesting observation is that the proposed language support for generators is a “syntax sugar” for state machines, in a similar way to closures being a “syntax sugar” for delegate objects. On a lower level closures are transformed to objects that capture external environment declared outside of the closure body. A reference to this environment is then stored in the closure object as an “instance variable”. Generators capture and maintain internal state declared inside the generator body, which is also stored as generator “instance variables”.

Another sensible addition is to allow generator to capture variables in parent scope as closures do, which is the default behaviour in both Python in JavaScript.

Here’s an example with more complex state and more state transitions to illustrate the point: HTTP reader stream. Imagine you’d like to report a detailed stream state for logging purposes or to make it clear in the UI what’s going on. Let’s introduce an enum to represent it:

enum HTTPReaderState {
  case initialized
  case dnsResolved(ipAddress: String)
  case connectionEstablised
  case requestWritten
  case headersParsed(headers: [String: String])
  case redirectRequired(redirected: URL)
  case bodyChunkDownloaded(chunk: Data)
  case connectionClosed
}

And here’s a class implementation of this stream, assume that Socket implementation and all DNS resolution and HTTP parsing functions are already provided:

final class HTTPReader {
  private let url: URL
  private var statusCode: Int?
  private var state: HTTPReaderState = .initialized
  private var currentBodyLocation = 0

  // not a concrete type, abstract `Socket` used as an example
  private var socket: Socket?

  init(url: URL) {
    self.url = url
  }

  func next() throws -> HTTPReaderState {
    switch state {
    case .initialized, .redirectRequired(let redirectURL):
      let ipAddress = try resolveDNS(redirectURL ?? url)
      state = .dnsResolved(ipAddress)
    case .dnsResolved(let ipAddress):
      socket = try establishConnection(ipAddress)
      state = .connectionEstablished
    case .connectionEstablished where let socket = socket:
      try writeHTTPRequest(socket, url)
      state = .requestWritten
    case .requestWritten where let socket = socket:
      statusCode = try parseHTTPStatusCode(socket)
      let headers = try parseHTTPHeaders(socket)
      state = .headersParsed(headers)
    case .headersParsed(let headers) where let socket = socket:
      if isRedirectCode(statusCode) {
        state = .redirectRequired(headers["Location"])
      } else {
        fallthrough
      }
    case .bodyChunkDownloaded:
      let data = try readHTTPBody(socket)
      if data.isEmpty {
        socket.close()
        state = .connectionClosed
      } else {
        state = .bodyChunkDownloaded(data)
      }
    }

    return state
  }
}

Disadvantages of this approach are pretty obvious: we need optionals and optional unwrapping for instance variables that have to outlive a single run of the next() function. All state transitions are described as a giant switch statement, which is error-prone and obscures natural control flow. At a glance, it would be hard to say which of the state transitions could be executed more than once and what’s the expected order of transitions.

Notice where let socket parts to unwrap the socket instance variable. Another approach would be to make the socket an associated value of HTTPReaderState to make sure a state always has the required values to proceed, but this would expose private socket in the public API. To fix that we’d probably need yet another enum for private state, which illustrates the point that the class version gets complicated really quickly. Thanks to Cory Benfield for highlighting the problems with optional unwrapping and absence of socket in associated values. Turns out, this is a pattern that you can see used in SwiftNIO.

So check out a version with the new generator syntax:

func httpReader(url: URL) -> Generator<HTTPReaderState> {
  return Generator {
    var redirectURL: URL?
    let socket: Socket
    repeat {
      let ipAddress = try resolveDNS(redirectURL ?? url)
      yield .dnsResolved(ipAddress)

      socket = try establishConnection(ipAddress)
      yield .connectionEstablished

      try writeHTTPRequest(socket, url)
      yield .requestWritten

      let statusCode = try parseHTTPStatusCode(socket)
      let headers = try parseHTTPHeaders(socket)
      yield .headersParsed(headers)

      if isRedirectCode(statusCode) {
        redirectURL = headers["Location"]
        if let u = redirectURL {
          yield .redirectRequired(u)
        }
      } else {
        redirectURL = nil
      }
    } while redirectURL != nil

    var data = try readHTTPBody(socket)
    while !data.isEmpty {
      yield .bodyChunkDownloaded(data)
      data = try readHTTPBody(socket)
    }

    socket.close()
    yield .connectionClosed
  }
}

It is significantly smaller and easier to read: there aren’t as many optionals and optional unwrapping, variables are introduced with known values when needed. Control flow constructs like repeat ... while can naturally express repeating state transitions and you no longer need that giant switch altogether.

I don’t get it… 🤔 Wouldn’t the addition of generators make Swift more complicated?

I see a significant amount of complaints like this on Twitter. The narrative is that Swift has become unnecessarily complicated and everyone should focus on the quality of developer tools and frameworks instead of adding new stuff to the language itself. I disagree with the first point strongly: of all programming languages I worked with, Swift follows progressive disclosure principle most consistently. You don’t have to use advanced features if you aren’t ready or don’t want to.

In addition, generators and yield keyword implemented in the way I propose are purely additive and don’t change anything in the Standard Library or in the way your old code works. If you don’t need this feature, you won’t even notice it exists if it’s introduced in some future version of Swift.

As for developer tools and frameworks, we need to stop seeing it as a zero-sum game. Making the language more powerful enables more powerful tools and frameworks. As I’m in no way a member of the Swift compiler team or affiliated with Apple whatsoever, I’d be very happy to work together with the open-source community on this proposal. This wouldn’t have any negative impact on other high priority developments in the Swift ecosystem.

I get it! 👍 Is this what Chris Lattner proposed a year ago?

Yes and no. 😄 My pitch is heavily inspired by that proposal, but in no way conflicts with it. When I say I miss generators in Swift, I should mention I miss async/await as described in that proposal even more. I’ve been obsessed with the possibility to get it introduced for a while and been actively working on my generators proposal for months now because of this.

await keyword and async functions can be directly expressed with yield and generators. In fact, this is how Python and JavaScript do it. From all my research and based on how other languages implemented this, I’m convinced we could get yield and generators in Swift much earlier than async/await. Given that yield and generators are simpler and more “primitive”, it would be a good thing to introduce these concepts gradually with smaller proposals over time. Even better, these features work so well together that we could get async generators after that, which would make operating on non-blocking streams natural. Good examples of these are: reading big amounts of uniform data from a database without blocking, streaming download/upload of huge amount of files, streaming parsers etc.

Wait, I only saw generator output. Your diagram says generators also get input?

Yeah, this one more thing… With generators you get not only reader streams, but also writer and reader-writer (i.e. duplex) streams. In my examples yield was only a statement with arguments treated as output. I’d propose to allow yield expressions, which evaluate to generator input. This is a more general and complex case and deserves a separate article, which is coming soon! 🤓

Summary

We were able to implement streams abstraction for reading huge amounts of data in small chunks. These chunks can be processed incrementally even with blocking synchronous APIs as a foundation. While you can implement streams with plain classes, it has numerous disadvantages. Based on what other programming languages have done before for a long time, I propose introducing generators to Swift with new generator syntax and yield statement. This is also a foundational feature for adding more support for asynchronous programming to Swift in the future.

I won’t be able to proceed without your help, so I ask you to send me your comments and questions on Twitter or Mastodon. Whether you have interest in Swift and like or dislike this pitch, or you’re a compiler engineer who’d like to help, I look forward to hearing from you. Subscribe to this blog (multiple options available at the top) if you haven’t done that yet to be able to follow my progress with this proposal, stay tuned. 👋



Event loops, building smooth UIs and handling high server load

16 August, 2018

This article is a part of my series about concurrency and asynchronous programming in Swift. The articles are independent, but after reading this one you might want to check out the rest of the series:


Have you ever used an app that felt unresponsive? Not reacting to input on time, slow animations even for some simple tasks, even when there isn’t anything heavy running in the background? Most probably, an app like that is waiting for some blocking long-running computation to finish before consuming more user input or rendering animation frames. Scheduling all workloads correctly and in the right order might seem hard, especially as there are different approaches to choose from.

What does it mean to be “non-blocking”?

When developing native apps for iOS, we always have this thesis: “avoid blocking the main thread”. Due to the fact that UIKit and AppKit APIs are not thread-safe, most if not all UI rendering happens on the main thread. If you have a long-running blocking computation on that thread, your app isn’t able to respond to user input or to render UI updates.

For an app to feel smooth and interactive on a device with 60 Hz display refresh rate, it has to render 60 frames a second and update those in response to user input as close possible to real time. That means you as a developer only have about 16.6 milliseconds of run time in event handlers on the main thread to complete. Want the app to feel smooth on 120 Hz iPad Pro? That’s 8.3 milliseconds on the main thread at your disposal.

Here’s a conceptual diagram of CPU load on the main thread that refreshes UI at 60 Hz and runs a blocking operation on the main thread. Red solid spikes in CPU load are caused by UI rendering code, while blue ones are caused by HTTP client code.

UI rendering timeline example

As you can see, if an HTTP client operation is blocking and runs for more than 33 milliseconds, up to 3 UI frames would not be rendered. Imagine if the request runs on a slow network and takes seconds to complete, the app UI would freeze and be unresponsive during all that time.

The important thing to note on the diagram is that CPU stays idle while waiting for the network operation to complete, so in principle, it could have rendered the frames while waiting. This type of operation is called I/O-bound, which means that a bottleneck in the operation is I/O (filesystem access, networking, communicating with sensors and peripherals etc) rather than intensive computations on the CPU. On the contrary, operations where CPU load is a bottleneck are called CPU-bound operations. This distinction is quite important, it turns out it’s quite easy to make I/O-bound code non-blocking.

Event loops

How do people avoid blocking the main thread with I/O-bound code? Here’s one approach: as soon as main app initialisation has finished, an event loop is started (also known as a “run loop” in Cocoa/Foundation/GCD world). It is a loop that processes I/O events and invokes corresponding event handlers at each iteration. While the loop is not truly infinite, certain exit events, say a user closing the app, would clean up the resources and stop the loop. But it’s indefinite in a sense that usually there’s no predefined number of iterations when it starts.

Building smooth UIs

Most of the code an iOS/macOS developer writes is called from an event loop iteration. All gesture recognisers, sensors input and higher level event handlers such as button taps and navigation handlers are invoked on the main thread and are scheduled by an event loop. This is applicable to browser apps too: all your code (except web workers) runs on the same thread and blocking computations would make UI freeze. In the end, the browser is just a native application with its own event loop that forwards native events to your JavaScript code.

Here’s an example of a stacktrace of a native iOS app, which shows event loop functions at the bottom of the stack:

CFRunLoop stack example

There's always an event loop there near the bottom of the call stack for UIKit apps.

I bet, in 90+% of the cases when debugging the main thread of iOS apps, you’d see these calls at the bottom of a stacktrace. It’s an implementation detail that’s hidden quite well from us, but a very important one.

How would an event loop fix our UI example with a blocking HTTP client? We saw from the CPU load on the diagram that this operation is composed of two parts: forming a request and parsing a response. Quite simply, we could run request formation as soon the operation starts and then schedule response parsing on a subsequent event loop iteration to be called only when this response is ready.

Here’s a diagram of the same UI example with an event loop:

UI rendering timeline example

A notable change here is the introduction of callbacks. Instead of performing an I/O-bound operation all at once, it is split into multiple parts, which are chained with a callback. For example, in Swift a callback could be created in a form of a closure:

let session = URLSession.shared
session.dataTask(with: URL(string: "https://your.api/call")!) {
  data, response, error in
  // ...
}

or a delegate object:
class Delegate: URLSessionTaskDelegate {
  func urlSession(_ session: URLSession,
  task: URLSessionTask, didCompleteWithError: Error?) {
    // ...
  }
}

let task = session.dataTask(with: URL(string: "https://your.api/call")!)
task.delegate = Delegate()

or even with target-action pattern (usually reserved for UI handlers):
class Controller: UIViewController {
  func viewDidLoad() {
    super.viewDidLoad()
    let button = UIButton(type: .custom)
    button.addTarget(self, action: #selector(handler),
                     for: .touchUpInside)
    view.addSubview(button)
  }

  @objc func handler() {
    // ...
  }
}

All of these patterns are enabled by event loops. On a higher level, the existing event loop is able to invoke these callbacks at some later iteration without blocking other code.

Handling high HTTP server load

Even if you aren’t working on the full stack, as a frontend dev you might be interested how those servers can handle the increased amount of requests from your smooth apps powered by event loops. And if you’re a backend dev, I hope that previous section was not too boring and you like this one even more. 😄

Now imagine an HTTP server processing a high amount of requests coming at the same time. Firstly, a request is parsed to get a few parameters and then the server issues a query to a database with the parameters. After the query is finished, a response is formed with the query result and sent to a client.

It’s easy to see with an abstract CPU load diagram how a blocking database query can kill the server’s throughput. On the following diagram there are three separate CPU load spikes for every request and response sequence:

Server-side blocking IO example

While HTTP servers don’t drop UI animation frames like in the UI example, a blocking operation makes a client wait in a queue for a response until requests are processed one by one. Of 6 requests shown, only 4 were handled by the end of the diagram timeline. Again, it’s obvious that a database query is I/O-bound and you could have processed many requests simultaneously. This would also saturate CPUs more efficiently, which is quite important in reducing the cost of running servers.

You probably have guessed by this point, event loops come to the rescue! 😄 What if the server started parsing a response as soon as it’s received, not waiting for a database query of a previous request to complete? As I’ve marked three steps of the operation in different colours, you can see that parts of asynchronous request handling are executed out of order on a linear CPU timeline. It doesn’t mean anything’s wrong, as HTTP is a stateless protocol, you can assume that requests are independent and can be processed simultaneously.

Server-side event loop

Overall, we can’t say that database queries became faster when you look at a given request and response pair. But throughput is much better thanks to the introduction of an event loop (assuming your database is able to handle the increase). For 6 requests by the end of diagram’s timeline, 5 responses were issued, and it was done much earlier than it would have happened in a blocking version. The CPU load is more evenly distributed within one process, which reduces context switching overhead and server cost.

Current status of event loops on the server side

Nowadays, if you write server-side code, quite probably you already use an event loop under the hood. Let’s have a quick overview of how major languages and frameworks actually do this.

Node.js provides an implicit event loop in a single thread that you work with. setTimeout, setImmediate and setInterval functions schedule closures to be executed later by the default event loop. Turns out, ASP.NET Core supports non-blocking I/O out of the box. This may not be so surprising, given that C# supported async methods since 2012. If you’re into JVM ecosystem, Netty is the way to go with event loops. And you probably heard of it, Norman Maurer, one of the main contributors to Netty now works for Apple and is a top committer to the new SwiftNIO framework.

Now we’ve mentioned Swift, Vapor 3.0 already migrated to SwiftNIO and there’s also a “tech preview of Kitura on SwiftNIO” available. Interesting to note that SwiftNIO allows you to utilise event loops across multiple threads.

Scaling event loops across multiple threads is a very powerful concept and is something that’s also already available on the language level in Go and Elixir. While I have my share of scepticism for both Go1 and Elixir2, these are very interesting languages that I highly recommend to explore and wrap your head around.

In our day and age of diminishing returns from Moore’s law, being able to easily scale tasks across multiple CPU cores, GPUs and other devices is indispensable. Concurrency, parallelism and distributed computing is the new functional programming. 🤓 In some sense, these topics initially seem to be difficult and obscure, but can drastically improve how you approach different problems when you have at least some understanding of basic concepts.

How do you even deploy a backend app without an event loop?

For quite some time event loops weren’t as popular in backend software as they are now. Back in the 90s, spawning a new process for every HTTP request seemed normal, I refer you to the history of Common Gateway Interface for more info on this. A few optimisations followed, like SCGI and FastCGI, but they still maintained a clear separation between an HTTP server and request handling itself. An HTTP server (e.g. Apache) could have used an event loop on the lower level, but the application logic still lived in a separate process. It was possible to eke out a bit more performance by spawning multiple worker processes or threads, but this still caused context switching overhead and same blocking I/O bottlenecks as previously.

You could’ve asked: “who even writes blocking HTTP code these days?“. Well, if you’re writing code for Python 3.2 and older, most probably you don’t have access to asyncio, which is Python’s standard implementation of event loops. Most popular Python’s web frameworks like Flask and Django are synchronous, not powered by an event loop and are commonly deployed via WSGI. The most popular HTTP client library requests provides only blocking API as well.3 When you do have access to newer Python versions and to asyncio, event loops there are a bit more explicit than in Node.js, but the underlying concept is still the same.

While I don’t have much exposure to Ruby and Ruby on Rails, in my research I haven’t found any event loop frameworks that have become mainstream. It does look like most common solutions for deployment utilise multithreading and multiprocessing to improve scalability.

Similarly with PHP, while ReactPHP is about 5 years old at this point, you can’t beat inertia of deploying WordPress on Apache. 😄

How do event loops work?

There is a ton of details hidden under the hood that we don’t need to worry about on a daily basis. After all, event loops are meant to improve how we work with I/O-bound tasks and most of I/O stuff can get complicated real quick.

A good explanation is this pseudocode from SwiftNIO documentation. As SwiftNIO is primarily a server-side framework, a channel here means a server connection established by a client. But on a low level, this is applicable to UI event loops too. From a certain perspective, interactions with touch sensors, display drivers and GPUs can be considered I/O.

while eventLoop.isOpen {
  /// Block until there is something to process for 1...n Channels
  let readyChannels = blockUntilIoOrTasksAreReady()
  /// Loop through all the Channels
  for channel in readyChannels {
    /// Process IO and / or tasks for the Channel.
    /// This may include things like:
    ///    - accept new connection
    ///    - connect to a remote host
    ///    - read from socket
    ///    - write to socket
    ///    - tasks that were submitted via EventLoop methods
    /// and others.
    processIoAndTasks(channel)
  }
}

Here blockUntilIoOrTasksAreReady interacts with sockets or device drivers via system calls and provides events that are ready to process. In turn, processIoAndTasks dispatches those events to corresponding callbacks that were registered on previous event loop iterations.

Disadvantages of event loops

As I previously mentioned, the server-side event loop diagram is a bit harder to read when establishing the sequence of events. This especially might get in the way during debugging. In a function call stacktrace you probably would be able to see what kind of event led to a specific callback handler, but tracing it to the original code that scheduled the handler has to be done separately.

You might’ve stumbled upon a similar problem when debugging multithreaded code. A stacktrace of a particular thread can be retrieved, but by the time your point of interest executes, the original thread that scheduled the work has already moved on to completely different code.

In a situation with multiple threads, this can be further complicated by the use of synchronisation primitives and non-deterministic nature of thread scheduling. On the contrary, if your event loop code is single-threaded and tasks are scheduled and handled from the same thread, it’s not as bad.

A quite acceptable solution is to maintain a separate callback stack trace that can be examined later when needed. For example, SwiftNIO records file names and line numbers of futures and promises that were created. This information is then logged if a deallocated future was unfulfilled, indicating that there is a potential leak creating redundant future objects. In principle, a higher level framework (say an HTTP server) could attach richer information to its own stack of requests to be logged when something goes wrong.

Summary

It would be fair to say that an event loop is a great tool for building high-performance and scalable applications. We’ve looked into how it helps with both UI and server-side code. Specifically, on the server side, there are plenty of approaches, different languages and frameworks evolved in different ways that are still fundamentally the same under the hood.

Non-blocking I/O on a low level can be relatively tricky, and it’s great that event loops can abstract this away. There are a few caveats with regards to debugging event loop based code, but it’s something that can be overcome with tooling, better support from frameworks and runtime of a programming language that you use.

As with any tool, it’s important to know the use cases and apply it where it fits best. With event loops, you can get substantial gains in throughput and responsiveness of your apps if applied well, while paying a small cost of recording more runtime information to ease debugging if needed.

I hope I provided some information on that in the article, but please feel free to shoot a message on Twitter or Mastodon with your comments and questions. Talk soon. 👋


  1. No generics in a statically typed language in 2018? Boilerplate error handling? Some say this all happened on purpose to keep things simple and minimalistic. Sorry, this doesn’t feel to me like minimalism, more like brutalism. 😄

  2. While Elixir is a great improvement on top of Erlang, limitations of dynamic virtual machines become too obvious when trying to write CPU-bound code. Calling into native binaries is quite difficult, but I love how Elixir makes distributed computing native to the language and its standard library.

  3. The situation with Python and blocking APIs is peculiar enough to be considered in more detail. I’m convinced that this is a great illustration of how important ergonomics of programming languages is. It was possible for a long time to build asynchronous non-blocking applications in JavaScript, especially with Node.js. New language features such as generators and async/await have given JavaScript mostly syntax sugar to help with patterns that have been available for a while. Compare this with Python, where even in 2018 with the availability of async/await and new libraries based on event loops, a lot of people still hesitate to make a transition to non-blocking I/O.

    Admittedly, Python 2 to Python 3 migration could explain some of this conservatism, but it doesn’t explain the scarcity of callback-based APIs in Python 2. What is important to consider is a lack of a crucial, but frequently overlooked language feature: anonymous closures/lambdas with multiple statements. Lambdas in Python allow only a single expression within, and this prevents you from creating an ergonomic callback-based non-blocking API. While libraries such as Twisted existed at the time, their API based on delegates wasn’t very accessible.

    In turn, delegates are pretty good when you’re able to express a required delegate shape with protocols/interfaces. Python lacks those too, although something similar can be hacked with abstract base classes.



How do closures and callbacks work? It's turtles all the way down

26 June, 2018

This article is a part of my series about concurrency and asynchronous programming in Swift. The articles are independent, but after reading this one you might want to check out the rest of the series:


The saying holds that the world is supported by a chain of increasingly large turtles. Beneath each turtle is yet another: it is “turtles all the way down”.

Wikipedia

Turtles. All. The. Way. Down.

By Pelf at en.wikipedia, public domain, via Wikimedia Commons

In a few episodes of great podcast Analog(ue) hosts Casey and Myke discuss Crash Course Computer Science series and how it presents abstractions. In software engineering and computer science, we constantly deal with layers of abstractions, but we don’t cross too many of them on daily basis. We work with software libraries that are built on top of a few modules and functions, which are written in a programming language, which is executed as binary code, which runs on CPUs and GPUs, which use logic gates built with transistors, to understand which you might need to understand chemistry and physics. Quite possibly someday particle physicists will discover a few more turtles further down the stack. But as long as you use that software library, you don’t need to care about logic gates, transistors and quantum tunnelling.

So here’s an abstraction that developers work with a lot: asynchronous computations. These days it’s hard to write code that doesn’t touch something asynchronous: any kind of I/O, most frequently networking, or maybe long-running tasks that are scheduled on different threads/processes. For something more concrete, consider a database query, which could be local disk I/O or something that sits somewhere on the network.

I want to write better code and build more abstractions where needed and where it makes the most sense. Because of that, I’ve recently spent a lot of time researching and trying to understand different ways to deal with asynchronous code. In a lot of programming languages callbacks is one of the ways to write it, but it’s also a relatively low level to deal with this abstraction. Nevertheless, it’s important to understand how it works on this level and what can be built on top of it. I hope my explanation could be useful to you as well.

Using closures

In Swift using callbacks is the main way to deal with asynchronicity. While Swift is taken as an example, I don’t use or discusss any advanced features here and I hope it’s going to be interesting even if you don’t use Swift on daily basis.

To pass a callback you could use a “closure”. The Swift Programming Language Guide introduces it this way:

Closures are self-contained blocks of functionality that can be passed around and used in your code. Closures in Swift are similar to blocks in C and Objective-C and to lambdas in other programming languages.

Closures can capture and store references to any constants and variables from the context in which they are defined. This is known as closing over those constants and variables.

[…]

Global and nested functions, as introduced in Functions, are actually special cases of closures. Closures take one of three forms:

  • Global functions are closures that have a name and do not capture any values.
  • Nested functions are closures that have a name and can capture values from their enclosing function.
  • Closure expressions are unnamed closures written in a lightweight syntax that can capture values from their surrounding context.

Copyright © 2018 Apple Inc. Excerpt provided for personal use.

Here’s a simple example of a closure that returns a sum of captured variable and its own argument.

func example() {
  var a = 10

  func f(_ x: Int) -> Int {
    return x + a
  }
  f(5)
  // x = 5, a = 10, returns 15

  a = 15
  f(5)
  // x = 5
  // a in captured environment is 15
  // f returns 20

  let g = f
  // assigned closure f to a new constant g

  g(5)
  // x = 5
  // g now references same environment,
  // where a is set to 15
  // g returns 20

  a = 20
  f(5)
  // a in captured environment is 20
  // f returns 25

  g(5)
  // g shares environment with f,
  // where a is set to 20
  // g returns 25
}

At this point, we can say that a closure is just function code with some captured environment. “Environment” means variables and constants defined outside of closure body. “Capturing” means you have access to a variable value in the environment. If that variable was subsequently modified before closure execution, code within the closure would get the latest up-to-date value. Obviously, constant values are captured as well, but by definition, they can’t be changed.

The example code is wrapped in a function to demonstrate that environment can be local to a current scope as opposed to global environment. If variable a was declared globally, there wouldn’t be anything interesting in using its value, but capturing access to it in local scope requires special treatment in the language implementation.

As seen in the example above, closures have reference semantics, not value semantics. Assigning a closure to a variable doesn’t create a copy of that closure and its environment, but only creates a reference to that. If you were to pass a closure as an argument, it would be passed by reference, not by value.

A more interesting example of closure and callback use:

let backend = Backend()

func example() {
  let indicator = UIActivityIndicatorView()
  let label = UILabel()
  // non-blocking code
  backend.login(email, password,
    onCompletion: { success, failure in
    // code called when `backend.login` completes.

    // indicator variable captured here
    indicator.stopAnimating()

    // UILabel instance is captured here
    label.text = "\(success ?? failure)"
  })

  // ...
  // some other code here that
  // won't be blocked by `backend.login`
  // e.g. UI updates
  indicator.startAnimating()
}

How would you implement closures on a lower level?

Because explaining closures in terms of closures doesn’t make much sense, let’s imagine a programming language that doesn’t have them. To tell a story about the closure turtle, we need to place it on top of other turtles.

What follows is an example of how this could work. It doesn’t require any knowledge of how compilers work, it’s just a lower level representation of closures for our studies.

Consider a subset of Swift, where functions (including anonymous functions) can’t capture the outside environment. That is a language where this:

func example() {
  var a = 5

  func f(_ x: Int) -> Int {
    return x + a
  }

  let g = { x in return x + a }
}

would cause two compiler errors about a being undefined in the body of functions f and g.

To implement closures in this imaginary language, we’d need to use classes to get the reference semantic mentioned above. It’s also useful to have a protocol defining a generic closure shape. It could look like this:

protocol Closure: class {
  associatedtype Environment
  associatedtype Input
  associatedtype Output
  var env: Environment { get set }

  func willRun(_: Input) -> Output
}

And here’s a class conforming to this protocol and implementing our closure with example usage code:

class F: Closure {
  class Environment {
    var a: Int
    init(_ a: Int) {
      self.a = a
    }
  }

  var env: Environment

  init(_ e: Environment) {
    self.e = e
  }

  func willRun(_ x: Int) -> Int {
    return x + e.a
  }
}

func example() {
  var e = F.Environment(10)
  let f = F(e)
  f.willRun(5)
  // returns 15

  e.a = 15
  f.willRun(5)
  // returns 20

  let g = f
  g.willRun(5)
  // returns 20

  e.a = 20
  f.willRun(5)
  // returns 25

  g.willRun(5)
  // returns 25
}

Well, this is verbose. But it’s a pattern you might consider when writing in C, C++ before C++11 and its support for lambdas, or Objective-C before blocks were introduced. And here’s a point that I often don’t see mentioned when closures are explained: you can think of closures as syntactic sugar for anonymous delegates. If you have experience with Objective-C or Cocoa, you might have noticed the analogy already, I deliberately named the protocol function willRun similarly to Cocoa delegate function naming pattern.

Under the hood, a closure-capable compiler will generate a separate function body with a corresponding environment when it sees there are actual variable captures going on. In some implementations, like C++11, you have to specify a list of captured variables explicitly. In Objective-C, that’s what the __block modifier is for, while in Swift it’s automatic by default with a possibility to override it. The latter is quite handy when you want some variables to be captured with weak references to avoid reference cycles.

Callbacks and asynchronous code

In a lot of cases using the right abstractions saves us from verbosity in our code. Are callbacks a good abstraction for all asynchronous scenarios then?

Consider a more complex version of the code we’ve seen before:

let label = UILabel()
// non-blocking code
backend.login(email, password,
  onCompletion: { success, failure in
  // called when `backend.login` completes.
  // UILabel instance is captured here
  label.text = success

  guard !failure else {
    // ...
    // error handling here
    return
  }

  // notice `backend` is captured here as well
  backend.fetchUserInfo { userInfo, failure in
    guard !failure else {
      // ...
      // error handling here again
      return
    }

    if userInfo.isAdmin {
      backend.doAdminStuff { success, failure in
        guard !failure {
          // tired of repeated manual
          // error handling by this point...
          return
        }

        // ...
        // more code here to
        // handle admin stuff results
      }
    }
  }
})

// ...
// some other code here that
// won't be blocked by `backend.login`

To handle a sequence of async actions we need to use nested callbacks. Turns out it’s quite easy to start building these pyramids of doom. Error handling is quite tedious and overall this type of code becomes quite error-prone with shadowed variables all around: notice shadowed failure callback argument within all completion handlers. With plain callbacks, this isn’t an exaggerated scenario, you can often stumble upon this when building real apps that involve any kind of long-running non-blocking operation, e.g. networking.

Promises of future turtles

There is a way out, consider a concept called “futures”, also know as “promises”. Using BrightFutures library as an example, the pyramid of doom can be converted to this flat chain of futures:

let label = UILabel()
// non-blocking code
backend.login(email, password)
.flatMap { success in
  // called when `backend.login` succeeds.
  // UILabel instance is captured here
  label.text = success

  return backend.fetchUserInfo()
}
.onSuccess { userInfo in
  if userInfo.isAdmin {
    return backend.doAdminStuff()
    .onSuccess {
      // ...
      // more code here to handle
      // admin stuff results
    }
  }
}
.onFailure {
  guard !failure else {
    // ...
    // unified error handling here,
    // that can be triggered by any
    // future in the chain
    return
  }
}

// ...
// some other code here that
// won't be blocked by `backend.login`

That’s much better, we have only one error handler that will be triggered regardless of where an error might occur. There is only one nested callback after backend.fetchUserInfo() and only because we have other async code executed on a condition that our user is an admin. More readability and much easier to maintain and refactor if you need to re-order the async operations.

This code assumes you’ve converted all callback-based functions like login, fetchUserInfo and doAdminStuff to return futures. What’s a future? Think of it as a container that can only be opened at some later time (in the future) and it might contain either an actual result or an error. You can attach callbacks to a future “container” to be triggered when the computation is ready, and you can chain these containers together to run a sequence (or sometimes a parallel batch) of asynchronous computations. A simplified declaration could look like this:

class Future<Result> {
  func onSuccess(callback: (Result) -> ())
    -> Future<Result>
  func onFailure(callback: (Error) -> ())
    -> Future<Result>
  func map<NewResult>(transformer: (Result)
    -> NewResult)
    -> Future<NewResult>
  func flatMap<NewResult>(transformer: (Result) -> Future<NewResult>)
    -> Future<NewResult>
}

There’s a peculiar flatMap function here. In a few other Swift libraries implementing futures, it’s called then, which is a more straightforward name if you don’t focus on the container aspect of futures. In the JavaScript standard library the Promise class uses the name then as well.

So why name it flatMap? Turns out there is already a flatMap function in Swift standard library defined on the Sequence protocol, which is similar to map you’ve probably used if you’re into functional programming. Standard map transforms a container of elements of type A to a container of elements of type B with a closure of type (A) -> B. flatMap does the same for closures of type (A) -> [B], if we assume this container is an array. With simple map the end result would be of type [[B]], which is a container of containers. This isn’t something you’d always want, so to “flatten” these containers to only one level [B] use flatMap. It works the same way for all types of containers that implement the Sequence protocol.

To chain Future<A> with Future<B> that depends on result A, our result transformer closure would need to look like A -> Future<B>. Taking this analogy further, if you applied this closure to simple map, you’d get the end result of type Future<Future<B>>, which is bananas and doesn’t make any sense. Hence we use flatMap to get a “flat” result type.

More turtles coming…

We’ve just reviewed closures and had a quick look at how they’re used for callbacks. We tried to write asynchronous code with callbacks and found that it’s not very maintanable. In the end we tried using futures, which leveraged closures as well, but allowed us to structure the example code in a better way.

Admittedly, this futures stuff is an ok abstraction, but it doesn’t feel as writing plain old synchronous code. If only we could write non-blocking asynchronous code that almost looked like usual synchronous code… Turns out, plenty of programming languages provide a few abstractions to work with asynchronous code and concurrency, like coroutines and async/await syntax sugar in JavaScript, Python, C# to name a few. Some take a different approach, e.g. Erlang/Elixir and Pony are built on actor model, while Go introduces goroutines and channels.

I’ve got an idea of how coroutines could look in Swift and how one could build async/await on top of that. Going to elaborate on this in a future article, stay tuned. 👋

Acknowledgements

Special thanks to Neil Kimmett for feedback on draft versions of this article. Thanks to Thomas Visser for BrightFutures library, which is still the best helper for writing async code in Swift in my opinion.



Why I use GraphQL and avoid REST APIs

28 May, 2018

Server interactions take a significant amount of time and effort to develop and test in most mobile and web apps. In apps with most complex APIs I worked on, the networking layer took up to 40% of the development time to design and maintain, specifically due to some of the edge cases I mention below in this article. After implementing this a few times, it’s easy to see different patterns, tools and frameworks that can help with this. While we’re lucky (well, most of us are, I hope) not to care about SOAP anymore, REST isn’t the end of history either.

Recently I had a chance to develop and to run in production a few mobile and web apps with GraphQL APIs, both for my own projects and my clients. This has been a really good experience, not least thanks to wonderful PostGraphile and Apollo libraries. At this point, it’s quite hard for me to come back and enjoy working with REST.

But obviously, this needs a little explanation.

So what’s wrong with REST?

Every REST API is a snowflake

To be fair, REST isn’t even a standard. Wikipedia defines it as

an architectural style that defines a set of constraints and properties based on HTTP

While something like JSON API spec does exist, in practice it’s very rare that you see a RESTful backend implementing it. In best case scenario, you might stumble upon something that uses OpenAPI/Swagger. Even then, OpenAPI doesn’t specify anything about APIs shape or form, it’s just a machine-readable spec, that allows (but not requires) you to run automatic tests on your API, automatically generate documentation etc.

The main problem is still there. You may say your API is RESTful, but there are no strict rules in general on how endpoints are arranged or whether you should, for example, use PATCH HTTP method for object updates.

There are also things that look RESTful on a first glance, but not so much if you squint: Dropbox HTTP API.

endpoints accept file content in the request body, so their arguments are instead passed as JSON in the Dropbox-API-Arg request header or arg URL parameter.

JSON in a request header? (╯°□°)╯︵ ┻━┻

That’s right, there are Dropbox API endpoints that require you to leave request body empty and to serialise a payload as JSON and chuck it an a custom HTTP header. It’s fun to write client code for special cases like this. But we can’t complain, because there is no widely-used standard after all.

In fact, most of the caveats mentioned below are caused by lack of a standard, but I’d like to highlight what I’ve seen in practice most frequently.

And yes, you can avoid most of these problems in a disciplined experienced team, but wouldn’t you want some of this stuff to be resolved already on a software side?

No static typing means caring about type validation

No matter how much you try to avoid this, sooner or later you stumble upon misspelt JSON properties, wrong data types sent or received, fields missing etc. You’re probably ok if your client and/or server programming language is statically typed and you just can’t construct an object with a wrong field name or type. You’re probably doing good if your API is versioned and you have an old version on /api/v1 URL and a new version with a renamed field on /api/v2 URL. Even better if you have an OpenAPI spec that generates client/server type declarations for you.

But can you really afford all this in all your projects? Can you afford setting up /api/v1.99 endpoint when during a sprint your team decides to rename or rearrange object fields? Even if it’s done, will the team not forget to update the spec and to ping the client devs about the update?

You sure you have all the validation logic right either on client or on server? Ideally, you want it validated on both sides, right? Maintaining all of this custom code is a lot of fun. Or keeping your API JSON Schema up to date.

Pagination and filtering is not so simple

Most APIs work with collections of objects. In a todo-list app, the list itself is a collection. Most collections can contain more than 100 items. For most servers returning all items in a collection in same response is a heavy operation. Multiply that by a number of online users and it can add up to a hefty AWS bill. Obvious solution: return only a subset of a collection.

Pagination is comparatively straightforward. Pass something like offset and limit values in query parameters: /todos?limit=10&offset=20 to get only 10 objects starting at the 20th. Everyone names these parameters differently, some prefer count and skip, I like offset and limit because they directly correspond to SQL modifiers.

Some backend databases expose cursors or tokens to be passed for next page query. Check out Elasticsearch API that recommends using scroll calls when you need to go through a huge list of resulting documents sequentially. There are also APIs that pass relevant information in headers. See GitHub REST API (at least that’s not JSON passed in headers 😅).

When it comes to filtering, it’s so much more interesting… Need filtering by one field? No problem, it could be /todos?filter=key%3Dvalue or maybe more human-readable /todos?filterKey=key&filterValue=value. How’s about filtering by two values? Hm, that should be easy, right? Query would look like /todos?filterKeys=key1%2Ckey2&filterValue=value with URL encoding. But often there is no way to stop the feature creep, maybe a requirement appears for advanced filtering with AND/OR operators. Or maybe complex full-text search queries together with complex filtering. Sooner or later you can see a few APIs that invent their own filtering DSL. URL query components are no longer sufficient, but request body in GET requests is not great either, which means you end-up sending non-mutating queries in POST requests (which is what Elasticsearch does). Is the API still RESTful at this point?

Either way, both clients and servers need to take extra care with parsing, formatting and validating all these parameters. So much fun! 🙃 As an example, without proper validation and with uninitialised variables you can easily get something like /todos?offset=undefined.

Not easy to document and test

Swagger mentioned above is probably the best tool for this at the moment, but it isn’t used widely enough. Much more frequently I see APIs with documentation maintained separately. Not a big deal for a stable widely used API, but much worse during development in an agile process. Documentation stored separately means it’s frequently not updated at all, especially if it’s a minor, but client-breaking change.

If you don’t use Swagger, it probably means you have specialised test infrastructure to maintain. There’s also a much higher chance you need integration tests rather than unit-tests, means testing both client and server-side code.

Relations and batch queries make it even more frustrating

This becomes a problem with much larger APIs, where you might have a number of related collections. Let’s go further with an example of a todo-list app: suppose every todo item can also belong to a project. Would you always want to fetch all related projects at once? Probably not, but then there are more query parameters to add. Maybe you don’t want to fetch all object fields at once. What if the app needs projects to have owners and there’s a view with all this data aggregated in addition to separate views displaying each collection separately? It’s either three separate HTTP requests or one complex request with all data fetched at once for aggregation.

Either way, there are complexity and performance tradeoffs, maintaining which in a growing application brings more headaches than one would like.

You need every endpoint implemented both on server and client

There is also a ton of libraries that can automatically generate a REST endpoint with some help from ORMs or direct database introspection. Even when those are used, usually they aren’t very flexible or extensible. That means reimplementing an endpoint from scratch if there is a need for custom parameters, advanced filtering behaviour or just some smarter handling of request or response payload.

Yet another task is consuming those endpoints in client code. It’s great to use code-generation if you have it, but again it seems to be not flexible enough. Even with helper libraries like Moya, you stumble upon the same barrier: there is a lot of custom behaviour to handle, which is caused by edge cases mentioned above.

If a dev team isn’t full-stack, communication between server and client teams is crucial, even critical when there’s no machine-readable API spec.

And how’s GraphQL better?

With all issues discussed, I’m inclined to say that in CRUD apps it would be great to have a standard way to produce and consume APIs. Common tooling and patterns, integrated testing and documentation infrastructure would help with both technical and organisational issues.

GraphQL has a draft RFC spec and a reference implementation. Also, check out GraphQL tutorial, which describes most of the concepts you’d need to know. There are implementations for different platforms, and there is plenty of developer tools available as well, most notably GraphiQL, which bundles a nice API explorer with auto-completion and a browser for documentation automatically generated from a GraphQL schema.

In fact, I find GraphiQL indispensable. It can help in solving communication issues between client and server-side teams I’ve mentioned earlier. As soon as any changes are available in a GraphQL schema, you’ll be able to see it in GraphiQL browser, same with embedded API documentation. Now client and server teams can work together on API design in an even better way with shorter iteration time and shared documentation that’s automatically generated and visible to everyone on every API update. To get a feeling of how these tools work check out a Star Wars API example that is available as a GraphiQL live demo.

Being able to specify object fields requested from a server allows clients to fetch only data they need when they need. No more multiple heavy queries issued to a rigid REST API, which are then stitched on the client just to display it all at once in app UI. You are no longer restricted to a set of endpoints, but have a schema of queries and mutations, being able to cherry-pick fields and objects that a client specifically requires. And a server only needs to implement top-level schema objects this way.

A quick example

A GraphQL schema defines types that can be used in communication between servers and clients. There are two special types that are also core concepts in GraphQL: Query and Mutation. Most of the time every request that is issued to a GraphQL API is either a Query instance that is free of side-effects or a Mutation instance that modifies objects stored on the server.

Now, sticking with our todo app example, consider this GraphQL schema:

type Project {
  id: ID
  name: String!
}

type TodoItem {
  id: ID
  description: String!
  isCompleted: Boolean!
  dueDate: Date
  project: Project
}

type TodoList {
  totalCount: Int!
  items: [TodoItem]!
}

type Query {
  allTodos(limit: Int, offset: Int): TodoList!
  todoByID(id: ID!): TodoItem
}

type Mutation {
  createTodo(item: TodoItem!): TodoItem
  deleteTodo(id: ID!): TodoItem
  updateTodo(id: ID!, newItem: TodoItem!): TodoItem
}

schema {
  query: Query
  mutation: Mutation
}

This schema block at the bottom is special and defines root Query and Mutation types as described previously. Otherwise, it’s pretty straightforward: type blocks define new types, each block contains field definitions with their own types. Types can be non-optional, for example String! field can’t ever have null value, while String can. Fields can also have named parameters, so allTodos(limit: Int, offset: Int): TodoList! field of type TodoList! takes two optional parameters, while its own value is non-optional, meaning it will always return a TodoList instance that can’t be null.

Then to query all todos with ids and names you’d write a query like this:

query {
  allTodos(limit: 5) {
    totalCount
    items {
      id
      description
      isCompleted
    }
  }
}

GraphQL client library automatically parses and validates the query against the schema and only then sends it to a GraphQL server. Note that offset argument to allTodos field is absent. Being optional, its absence means it has null value. If the server supplies this sort of schema, it’s probably stated in documentation that null offset means that first page should be returned by default. The response could look like this:

{
  "data": {
    "allTodos": {
      "totalCount": 42,
      "items": [
        {
          "id": 1,
          "description": "write a blogpost",
          "isCompleted": true
        },
        {
          "id": 2,
          "description": "edit until looks good",
          "isCompleted": true
        },
        {
          "id": 2,
          "description": "proofread",
          "isCompleted": false
        },
        {
          "id": 4,
          "description": "publish on the website",
          "isCompleted": false
        },
        {
          "id": 5,
          "description": "share",
          "isCompleted": false
        }
      ]
    }
  }
}

If you drop isCompleted field from the query, it’ll disappear from the result. Or you can add project field with its id and name to traverse the relation. Add offset parameter to allTodos field to paginate, and so allTodos(count: 5, offset: 5) will return the second page. Helpfully enough, you’ve got totalCount field in the result, so now you know you’ve got 42 / 5 = 9 pages in total. But obviously, you can omit totalCount if you don’t need it. The query is in full control of what actual information will be received, but underlying GraphQL infrastructure also ensures that all required fields and parameters are there. If your GraphQL server is smart enough, it won’t run database queries for fields you don’t need, and some libraries are good enough to provide that for free. Same with the rest of mutations and queries in this schema: input is type-checked and validated, and based on the query a GraphQL server knows what result shape is expected.

Under the hood, all communication runs through a predefined URL (usually /graphql) on a server with a simple POST request that contains the query serialised as a JSON payload. You almost never have a need to be exposed to an abstraction layer this low though.

Not too bad overall: we’ve got type-level validation issues taken care of, pagination is also looking good and entity relations can be easily traversed when needed. If you use some GraphQL -> database query translation libraries that are available, you wouldn’t even need to write most of the database queries on the server. Client-side libraries can unpack a GraphQL response automatically as an object instance of a needed type quite easily, as naturally the response shape is known upfront from the schema and queries.

GraphQL is this new hipster thing, a fad, right?

While falcor by Netflix seemed to be solving a similar problem, was published on GitHub a few months earlier than GraphQL and came up on my personal radar earlier, it clearly looks like GraphQL has won. Good tooling and strong industry support make it quite compelling. Aside from a few minor glitches in some client libraries (that since have been resolved), I can’t recommend highly enough to have a good look at what GraphQL could offer in your tech stack. It is out of technical preview for almost two years now and the ecosystem is growing even stronger. While Facebook designed GraphQL, we see more and more big companies using it in their products as well: GitHub, Shopify, Khan Academy, Coursera, and the list is growing.

There’s plenty of popular open-source projects that use GraphQL: this blog is powered by Gatsby static site generator, which translates results of GraphQL queries into data that are rendered into an HTML file. If you’re on WordPress, a GraphQL API is available for it as well. Reaction Commerce is an open-source alternative to Shopify that’s also powered by GraphQL.

A few GraphQL libraries worth mentioning again are PostGraphile and Apollo.

If you use PostgreSQL as your database on the backend, PostGraphile is able to scan a SQL schema and automatically generate a GraphQL schema with an implementation. You get all common CRUD operations exposed as queries and mutations for all tables. It may look like it’s an ORM, but it isn’t: you’re in full control of how your database schema is designed, and what indices are used. Great thing is that PostGraphile also exposes views and functions as queries and mutations, so if there is particularly complex SQL query that you’d like to map to a GraphQL field, just create that SQL view or function and it’ll appear automatically in GraphQL schema. With advanced Postgres features like row-level security, you can get complex access control logic implemented with only a few SQL policies to write. PostGraphile even has awesome things like schema documentation generated automatically from Postgres comments 🤩.

In turn, Apollo provides both client libraries for multiple platforms and code generators that produce type definitions in most popular programming languages, including TypeScript and Swift. In general, I find Apollo much simpler and manageable to use than, for example, Relay. Thanks to simple architecture of Apollo client library, I was able to slowly transition an app that used React.js with Redux to React Apollo, component by component and only when it made sense to do so. Same with native iOS apps, Apollo iOS is a relatively lightweight library that’s easy to use.

In a future article, I’d like to describe some of my experience with this tech stack. In the meantime, shoot me a message on Twitter about your experience with GraphQL or if you’re just interested how it could work in your app 👋.