Max Desiatov

Max DesiatovI'm Max Desiatov, a software consultant building mobile and backend apps. My interests include coding in Swift and TypeScript, product design, video games and music.

Blog

Hire Me

Coroutines and “yield” expressions in Swift

12 September, 2018

This article is a part of my series about concurrency and asynchronous programming in Swift. The articles are independent, but after reading this one you might want to check out the rest of the series:


We know that generators produce output by passing arguments to the yield statement. I also mentioned previously it would be great if generators could also consume input. Turns out, a generator is a special case of a coroutine. From computer science perspective there are other subtle differences between generators and coroutines, but in this proposal let’s assume that coroutines are more powerful generators that can consume values.

How would that work? Would that be useful at all? To answer these questions, we need to have a closer look at how we interact with generators and coroutines.

Review of Generator type and syntax in details

As a reminder, to define a generator, we use this syntax that looks similar to a closure definition:

func sequence(start: Int = 0,
              step: Int = 1,
              end: Int = .max) -> Generator<Output> {
  return Generator {
    var counter = start
    while counter <= end {
      yield counter
      counter += step
    }
  }
}

yield in the generator body marks suspension “checkpoints” and values that the generator produces. Let’s also have a look at the Generator type itself:

// this is only the interface, not the implementation
final class Generator<Output>: IteratorProtocol {
  // describes possible generator status
  enum Status {
    case suspended
    case failed(Error)
    case finished
  }

  private(set) var status: Status = .suspended

  // get output of a generator with these functions
  func next() -> Output?
  func try next() -> Output

  // cancelling a generator
  func stop()
  func fail(error: Error)
}

Main points of interest are next() functions. I mentioned previously that we can get values out of a generator with for ... in ... { } syntax. We also know that this is a syntax sugar available for all types implementing IteratorProtocol, as clarified in its documentation. The shape of that protocol is trivial:

protocol IteratorProtocol {
  associatedtype Element
  func next() -> Element?
}

Which exactly matches the Generator type’s shape and allows us to use the iteration syntax on all generators:

// without IteratorProtocol
let s1 = sequence(start: 0, step: 1, end: 3)
while let i = s1.next() {
  print(i)
}
// prints lines of "generator starts" 1 2 3 "generator ends"

print(s1.status)
// prints `.finished`

print(s1.next())
// prints `nil`, generator is finished, no more output

// with IteratorProtocol
let s2 = sequence(start: 0, step: 1, end: 5)
// created a new generator `s2` as the previous `s1` is finished
for i in s2 {
  print(i)
}
// prints lines of "generator starts" 1 2 3 "generator ends"

Notice that we can’t use a generator after it’s finished. Allowing this would make no sense because a finished generator reached the end of its body. There is no more code to execute and no more output to yield. What’s interesting, to fully execute a generator yielding 3 values, we had to make 4 next() calls. Without the 4th call, you wouldn’t get the "generator ends" line printed, but both while let and for ... in version automatically issue the terminating call for you.

Coroutine type and syntax

As next() is the main “entry point” that resumes a generator, it would make sense to give coroutines a similar shape, but to allow next function to take input arguments:

// this is only the interface, not the implementation
final class Coroutine<Input, Output> {
  // describes possible coroutine status
  enum Status {
    case suspended
    case failed(Error)
    case finished
  }

  private(set) var status: Status = .suspended

  // get output of a generator with these functions
  func next(_ input: Input) -> Output?
  func try next(_ input: Input) -> Output

  // cancelling a coroutine
  func stop()
  func throw(error: Error)
}

But how would a coroutine receive an input in its body? We already can use yield as a statement, what if it could be an expression evaluating to input values? My first naïve approach to implement this in Swift looked like this:

// first approach, similar to how Python and JavaScript
// implement coroutines
let co = Coroutine<Int, Int> {
  print("coroutine starts")

  var input = yield 42
  print(input)

  input = yield 24
  print(input)

  print("coroutine ends")
}

print(co.next(5))
// coroutine body print: "coroutine starts"
// this print: `42`

print(co.next(10))
// coroutine body print: `10`
// previous `5` is lost :(
// this print: `24`

print(co.next(15))
// coroutine body prints: `15` and "coroutine ends"
// this print: `nil`

But the main problem here is the order of evaluation of a yield expression: it produces output first and consumes an input second. Compare this with the evaluation order of next(_ input: Input) function: it consumes an input first and produces output second. And like with generators, we need to call next one more time to fully execute a coroutine, e.g. for 2 yield expressions executed, you need to call next 3 times. With this approach Python and JavaScript always ignore the first argument passed to next, which is fine because they don’t have a static type system.

But what do we do in Swift? Do we always ignore arguments to the first call of next on a coroutine, like we did with argument 5 in the example above? We can’t make it optional, because that would require excessive optional unwrapping in the generator body. But so far in this proposal, both generator and coroutine closures that we pass to Generator and Coroutine initialisers didn’t have any arguments. What if we could utilise that syntax for passing the first next call input?

// proposed coroutine syntax: first `next` input is explicit
let co = Coroutine<Int, Int> { firstInput in
  print("coroutine starts")
  print("firstInput is \(firstInput)")

  var input = yield 42
  print(input)

  input = yield 24
  print(input)

  print("coroutine ends")
}

print(co.next(5))
// coroutine body prints: "coroutine starts", "firstInput is 5"
// this print: `42`

print(co.next(10))
// coroutine body print: `10`
// this print: `24`

print(co.next(15))
// coroutine body prints: `15` and "coroutine ends"
// this print: `nil`

Great, we don’t lose any input values anymore and can guarantee it in compile time, and we also don’t need optionals for that. I like this version even more than coroutine syntax in Python and JavaScript, that feeling when a remake is better than the original. 😆

Writer stream as a coroutine

To piece it all together, let’s proceed with a Big Data™ example from the previous article: you have hundreds of gigabytes of sensor data, which is stored as a JSON array of integers in a file. We already implemented a chunkReader stream that can read files of any size producing a sequence of chunks of specified length.

A counterpart would be a chunkWriter stream looking like this:

func chunkWriter(path: String,
                 start: UInt64 = 0) -> Coroutine<Data, ()> {
  return Coroutine { firstInput in
    guard let handle = FileHandle(forReadingAtPath: path) else {
      return
    }

    handle.seek(toFileOffset: start)

    var data = firstInput
    while !data.isEmpty {
      handle.write(data)
      data = yield
    }
  }
}

Streaming parser of a JSON subset as a coroutine

But the full power of coroutines becomes apparent with something more advanced. Consider a streaming parser for a sequence of chunks read from a file. This implementation should be able to handle all valid JSON arrays of integers while consuming a nearly constant amount of memory during parsing. And even for invalid arrays or in case of corrupted data we’d like this parser to produce as many results as possible up to the failure point.

I look forward to seeing an implementation of this streaming parser in Swift 4.2. I bet it would be much longer and less readable than the generator version that follows. 😄 The essence of this version is pretty simple: it consumes characters one by one passed evaluating next(char) at the point of usage. Those characters are available inside of the coroutine body as values of yield expressions. At the same time the coroutine produces parsed values that are returned as results of next(char):

enum ParsingError: String, Error {
  case expectedOpenBracket = "expected \"[\""
  case expectedDigitOrSpace = "expected a digit or a space"
  case unexpectedInput = "unexpected input"
}

let whitespaceChars: [Character] = [" ", "\r", "\n", "\t"]

func arrayParser() -> Coroutine<Character, Int?> {
  return Coroutine { firstInput in
    // integer digits are accumulated in a buffer before they're parsed
    var acc = ""
    var input = firstInput

    // JSON arrays start with '['
    guard input == '[' else {
      throw ParsingError.expectedOpenBracket
    }

    // a value to yield
    var value: Int? = nil
    // spaces aren't permitted between digits, so we need
    // a special flag to handle it
    var previousIsSpace = false

    // flushes accumulated buffer
    func flush() {
      // integer digits parsed
      value = Int(acc)
      acc = ""
    }

    // iterate over all input characters
    repeat {
      input = yield value
      // resetting next value to yield
      value = nil

      // special handling for whitespace characters
      if !whitespaceChars.contains(input) {
        switch input {
        case ",":
          if !acc.isEmpty {
            flush()
          } else if !previousIsSpace {
            throw ParsingError.expectedDigitOrSpace
          }
        case "0"..."9" where !previousIsSpace || acc.isEmpty:
          acc.append(input)
        case "]":
          flush()
          break
        default:
          throw ParsingError.unexpectedInput
        }
        previousIsSpace = false
      } else {
        previousIsSpace = true
      }
    } while input != nil

    yield value
  }
}

let parser = arrayParser()

for char in '[1, 2,3,4,5 , 42]') {
  if let result = parser.next(char) {
    print(result)
  }
}
// prints lines of 1 2 3 4 5 42

A great property of this parser is that it consumes a constant amount of memory and it doesn’t need to load the whole input sequence at once. It’s highly performant and composable, while the definition is only ~70 lines of code with comments, supporting error enum and whitespace constants.

Summary

Generators are good for expressing reader streams, but coroutines are even more powerful and can express writer streams and stream processing concisely. We had a look at the Generator and Coroutine types and how they differ and implemented a highly-performant streaming parser for arrays of JSON integers.

Whenever you need to read and process huge amounts of uniform data, generators and coroutines would simplify your code in a quite significant way. Instead of writing state machines manually coroutines allow you to express that in a nice way with usual control flow structures like loops and conditions.

The concept of code that can be suspended and resumed as needed is very powerful and allows building many more useful abstractions on top of that. Adding these foundational building blocks to Swift is as important as ever, especially as other programming languages already have them or plan to add them soon. As always, I appreciate comments and questions from you on Twitter or Mastodon, talk soon. 🙌