Max Desiatov

Max DesiatovI'm Max Desiatov, a software consultant building mobile and backend apps. My interests include coding in Swift and TypeScript, product design, video games and music.

Blog

Hire Me

What are generators and why Swift needs them?

10 September, 2018

This article is a part of my series about concurrency and asynchronous programming in Swift. The articles are independent, but after reading this one you might want to check out the rest of the series:


When developing apps, I frequently need to switch between different programming languages. In addition to using Swift on iOS/macOS and Linux, a lot of the time JavaScript/TypeScript is a good fit for full-stack web development, especially with the popularity of React and Node.js. And when doing data engineering, Python’s ecosystem is pretty good and is widely supported.

But every time I come back to Swift, there is one big feature I miss called “generators”. Whether you’ve used generators and yield keyword before in other languages, or you are only interested to learn about it, you might find this short pitch interesting. Here I try to imagine how generators could look in Swift and what are the implications.

Generators? What’s that?

The best way to explain is to compare generators to functions: functions have one entry point (function application), execute once and finish. Functions return values and/or return early with a return statement. A generator is like a function with “holes”, after initialisation it executes up to a next yield statement, “generates” a value passed as yield argument and suspends, saving its state. The generator state is a position of yield in generator’s body and values of all variables in the generator’s scope. A caller of a generator can then resume the generator again and it would continue from the same yield position it was suspended preserving its scope.

I tried to capture the difference in this diagram:

Generators vs functions

Thanks to the fact that generators produce a series of values of the same type, usually they can be treated like sequences and iterated with for ... in syntax. This is a powerful feature that can be used where common sequences and collections don’t fit well or there is a need to describe an algorithm that suspends and resumes later.

How do different languages implement generators?

If you’re interested in programming languages history, here’s a quick overview how a few of them approached generators:

Naming Version and year introduced
JavaScript Generators EcmaScript 6 (~2015)
Python Generators and coroutines Generators in Python 2.3 (2003), coroutines in Python 2.5 (2006)
Ruby Fibers Ruby 1.9 (2007)
C# “iterators” C# 2.0 (2006)

I’m sure there is a plenty of other languages that have generators under different names with a different syntax or with 3rd-party libraries, but I listed only those with relatively wide adoption that have it implemented as a language feature.

Streams

Let’s get started with a simple use case for generators: reading huge amounts of Big Data™. Say you have hundreds of gigabytes of sensor data, which is stored as a JSON array of integers in a file. The only API you have at your disposal is a plain old synchronous blocking API to read a few bytes in a file at a given offset. We would like to have a nice and efficient way of reading this data and passing it to other code for post-processing. Reading all of the data up front and storing it in memory might be ok for small files, but do we really need servers with a hundred gigs of RAM to read a file with our sensor data? Maybe we could read the file only in small portions?

Meet streams:

a stream is a sequence of data elements made available over time. A stream can be thought of as items on a conveyor belt being processed one at a time rather than in large batches.

Wikipedia

Here’s how an implementation of a chunk reader could look like without syntax support for generators:

final class ChunkReader {
  private let handle: FileHandle
  let chunkSize: Int

  init?(path: String, chunkSize: Int, start: UInt64 = 0) {
    guard let handle = FileHandle(forReadingAtPath: path) else {
      return nil
    }

    self.handle = handle
    self.chunkSize = chunkSize

    handle.seek(toFileOffset: start)
  }

  func next() -> Data {
    return handle.readData(ofLength: chunkSize)
  }
}

This ChunkReader class maintains state in the handle instance of FileHandle, which itself is a state machine.

Of all existing generator implementations, I find Python’s and JavaScript’s version the most straightforward and easy to use. You may be surprised by my choice of languages with dynamic typing for inspiration, but I think this generator design hasn’t appeared yet in statically typed languages only due to implementation details. In my experiments, I’ve reached a conclusion that it can be translated to a statically typed language, such as Swift, surprisingly well.

Without further ado, here’s a version with generator support in Swift syntax:

func chunkReader(path: String,
                 chunkSize: Int,
                 start: UInt64 = 0) -> Generator<Data> {
  // marker for a generator body
  return Generator {
    guard let handle = FileHandle(forReadingAtPath: path) else {
      return
    }

    handle.seek(toFileOffset: start)

    var data: Data

    repeat {
      data = handle.readData(ofLength: chunkSize)
      // suspends and resumes execution here at every iteration
      yield data
    } while !data.isEmpty
  }
}

let stream = chunkReader(path: "test.txt", chunkSize: 5)

for chunk in stream {
  // prints lines of every 5 characters in test.txt
  print(chunk)
}

Neat, we no longer need a dedicated class declaration and are able to express generators concisely, so it doesn’t differ much from a common function declaration. It’s quite important that the generator syntax version reads naturally with common control flow constructs such as loops. It’s easy to follow what happens when generator execution is resumed after the yield keyword. In a class version, you’re forced to introduce instance variables and initialise those explicitly.

Generators are state machines

An interesting observation is that the proposed language support for generators is a “syntax sugar” for state machines, in a similar way to closures being a “syntax sugar” for delegate objects. On a lower level closures are transformed to objects that capture external environment declared outside of the closure body. A reference to this environment is then stored in the closure object as an “instance variable”. Generators capture and maintain internal state declared inside the generator body, which is also stored as generator “instance variables”.

Another sensible addition is to allow generator to capture variables in parent scope as closures do, which is the default behaviour in both Python in JavaScript.

Here’s an example with more complex state and more state transitions to illustrate the point: HTTP reader stream. Imagine you’d like to report a detailed stream state for logging purposes or to make it clear in the UI what’s going on. Let’s introduce an enum to represent it:

enum HTTPReaderState {
  case initialized
  case dnsResolved(ipAddress: String)
  case connectionEstablised
  case requestWritten
  case headersParsed(headers: [String: String])
  case redirectRequired(redirected: URL)
  case bodyChunkDownloaded(chunk: Data)
  case connectionClosed
}

And here’s a class implementation of this stream, assume that Socket implementation and all DNS resolution and HTTP parsing functions are already provided:

final class HTTPReader {
  private let url: URL
  private var statusCode: Int?
  private var state: HTTPReaderState = .initialized
  private var currentBodyLocation = 0

  // not a concrete type, abstract `Socket` used as an example
  private var socket: Socket?

  init(url: URL) {
    self.url = url
  }

  func next() throws -> HTTPReaderState {
    switch state {
    case .initialized, .redirectRequired(let redirectURL):
      let ipAddress = try resolveDNS(redirectURL ?? url)
      state = .dnsResolved(ipAddress)
    case .dnsResolved(let ipAddress):
      socket = try establishConnection(ipAddress)
      state = .connectionEstablished
    case .connectionEstablished where let socket = socket:
      try writeHTTPRequest(socket, url)
      state = .requestWritten
    case .requestWritten where let socket = socket:
      statusCode = try parseHTTPStatusCode(socket)
      let headers = try parseHTTPHeaders(socket)
      state = .headersParsed(headers)
    case .headersParsed(let headers) where let socket = socket:
      if isRedirectCode(statusCode) {
        state = .redirectRequired(headers["Location"])
      } else {
        fallthrough
      }
    case .bodyChunkDownloaded:
      let data = try readHTTPBody(socket)
      if data.isEmpty {
        socket.close()
        state = .connectionClosed
      } else {
        state = .bodyChunkDownloaded(data)
      }
    }

    return state
  }
}

Disadvantages of this approach are pretty obvious: we need optionals and optional unwrapping for instance variables that have to outlive a single run of the next() function. All state transitions are described as a giant switch statement, which is error-prone and obscures natural control flow. At a glance, it would be hard to say which of the state transitions could be executed more than once and what’s the expected order of transitions.

Notice where let socket parts to unwrap the socket instance variable. Another approach would be to make the socket an associated value of HTTPReaderState to make sure a state always has the required values to proceed, but this would expose private socket in the public API. To fix that we’d probably need yet another enum for private state, which illustrates the point that the class version gets complicated really quickly. Thanks to Cory Benfield for highlighting the problems with optional unwrapping and absence of socket in associated values. Turns out, this is a pattern that you can see used in SwiftNIO.

So check out a version with the new generator syntax:

func httpReader(url: URL) -> Generator<HTTPReaderState> {
  return Generator {
    var redirectURL: URL?
    let socket: Socket
    repeat {
      let ipAddress = try resolveDNS(redirectURL ?? url)
      yield .dnsResolved(ipAddress)

      socket = try establishConnection(ipAddress)
      yield .connectionEstablished

      try writeHTTPRequest(socket, url)
      yield .requestWritten

      let statusCode = try parseHTTPStatusCode(socket)
      let headers = try parseHTTPHeaders(socket)
      yield .headersParsed(headers)

      if isRedirectCode(statusCode) {
        redirectURL = headers["Location"]
        if let u = redirectURL {
          yield .redirectRequired(u)
        }
      } else {
        redirectURL = nil
      }
    } while redirectURL != nil

    var data = try readHTTPBody(socket)
    while !data.isEmpty {
      yield .bodyChunkDownloaded(data)
      data = try readHTTPBody(socket)
    }

    socket.close()
    yield .connectionClosed
  }
}

It is significantly smaller and easier to read: there aren’t as many optionals and optional unwrapping, variables are introduced with known values when needed. Control flow constructs like repeat ... while can naturally express repeating state transitions and you no longer need that giant switch altogether.

I don’t get it… 🤔 Wouldn’t the addition of generators make Swift more complicated?

I see a significant amount of complaints like this on Twitter. The narrative is that Swift has become unnecessarily complicated and everyone should focus on the quality of developer tools and frameworks instead of adding new stuff to the language itself. I disagree with the first point strongly: of all programming languages I worked with, Swift follows progressive disclosure principle most consistently. You don’t have to use advanced features if you aren’t ready or don’t want to.

In addition, generators and yield keyword implemented in the way I propose are purely additive and don’t change anything in the Standard Library or in the way your old code works. If you don’t need this feature, you won’t even notice it exists if it’s introduced in some future version of Swift.

As for developer tools and frameworks, we need to stop seeing it as a zero-sum game. Making the language more powerful enables more powerful tools and frameworks. As I’m in no way a member of the Swift compiler team or affiliated with Apple whatsoever, I’d be very happy to work together with the open-source community on this proposal. This wouldn’t have any negative impact on other high priority developments in the Swift ecosystem.

I get it! 👍 Is this what Chris Lattner proposed a year ago?

Yes and no. 😄 My pitch is heavily inspired by that proposal, but in no way conflicts with it. When I say I miss generators in Swift, I should mention I miss async/await as described in that proposal even more. I’ve been obsessed with the possibility to get it introduced for a while and been actively working on my generators proposal for months now because of this.

await keyword and async functions can be directly expressed with yield and generators. In fact, this is how Python and JavaScript do it. From all my research and based on how other languages implemented this, I’m convinced we could get yield and generators in Swift much earlier than async/await. Given that yield and generators are simpler and more “primitive”, it would be a good thing to introduce these concepts gradually with smaller proposals over time. Even better, these features work so well together that we could get async generators after that, which would make operating on non-blocking streams natural. Good examples of these are: reading big amounts of uniform data from a database without blocking, streaming download/upload of huge amount of files, streaming parsers etc.

Wait, I only saw generator output. Your diagram says generators also get input?

Yeah, this one more thing… With generators you get not only reader streams, but also writer and reader-writer (i.e. duplex) streams. In my examples yield was only a statement with arguments treated as output. I’d propose to allow yield expressions, which evaluate to generator input. This is a more general and complex case and deserves a separate article, which is coming soon! 🤓

Summary

We were able to implement streams abstraction for reading huge amounts of data in small chunks. These chunks can be processed incrementally even with blocking synchronous APIs as a foundation. While you can implement streams with plain classes, it has numerous disadvantages. Based on what other programming languages have done before for a long time, I propose introducing generators to Swift with new generator syntax and yield statement. This is also a foundational feature for adding more support for asynchronous programming to Swift in the future.

I won’t be able to proceed without your help, so I ask you to send me your comments and questions on Twitter or Mastodon. Whether you have interest in Swift and like or dislike this pitch, or you’re a compiler engineer who’d like to help, I look forward to hearing from you. Subscribe to this blog (multiple options available at the top) if you haven’t done that yet to be able to follow my progress with this proposal, stay tuned. 👋