Guido van Rossum guido@python.org 9th LASER summer school, Sept. 2012
It took many steps to get to Py3k generators Interesting example of a "random walk" Origins go as far back as it gets
ABC has datatypes text, list, table You can iterate over each of these FOR ch IN 'abc':... 'a', 'b', 'c' (order as specified in the string) FOR elem IN {'one'; 'two'; 'three'}:... 'one', 'three', 'two' (sorted) FOR val IN {1: 'one'; 3: 'three'; 2: 'two'}:... 'one', 'two', 'three' (values in key order) Note that the FOR-loop is polymorphic nearly the only thing strings, lists have in common
The for-loop works with strings, lists, tuples but not with dicts! These are unified into "sequences" Sequences support (among others): length: len(xs) indexing, slicing: xs[i], xs[i:j] concatenation, repetition: xs + ys, xs * n optionally: assignment to item or slice Type object has slots for each of these The for-loop calls len(xs) and xs[i] each time
Use case: a class wrapping e.g. a file so you can write e.g. for line in wrapped_file: print line Steve Majewski invented a hack: implement getitem () to get the "next" item implement len () to lie until exhausted Python 1.0.2 change: get xs[i] for increasing i until it raises IndexError avoids one (expensive) call per iteration implementing len () became optional
PEP 234 (Ka-Ping Yee, GvR, 2001) Separate "iterator" from "iterable" Downgrade "sequence" to backward compatibility All sequences become iterables Some new iterables added (e.g. dict) ironically, Python "for x in <dict>:" gives the keys The iterator object holds the iteration state E.g. list iteration: list object, index One iterable may have many iterators Iterator operation: it.next(), returns next value
How does an iterator indicate it is done? We had many long discussions about this Possibilities: separate has_next() or is_done() method return (value, done) tuple return special sentinel value when done raise exception when done In the end we chose the exception Plus a special case in the C code NULL without exception == exhausted
Now, "for x in xs: <body>" translates to: it = xs. iter () while True: try: x = it.next() except StopIteration: break <body> But we want to be able to pass an iterator! e.g. for x in xs. iter (): <body> Solution: iter(it) returns it Every iterator must support it.next(), iter(it)
PEP 255 (Neil Schemenauer, Tim Peters, M.L. Hetland, 2001) def foobar(n): for i in range(n): for j in range(i, n): yield j foobar(5) <generator object foobar at 0x104d93f00> list(foobar(5)) [0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4] A generator is an iterator; not every iterator is a generator!
Umm... Parser recognizes 'yield' keyword (then forbids "return <expr>" syntax) Function gets flagged "is generator" (actually, it's the code object that gets flagged) When called, constructs generator object next(genobj) runs bytecode until next YIELD
PEP 342 (GvR, Phillip Eby, 2005) Change yield from statement to expression def f(): a = yield 'a' print(a) yield 'b' g = f() print(next(g)) print(g.send(1)) 'a' 1 'b'
Other APIs added: g.throw(exc) # Causes yield to raise exc g.close() # Complex protocol, GeneratorExit Allow yield inside try/finally Use cases: Coroutines exchanging values Async I/O using "trampoline" or scheduler Still not quite Knuth-style coroutines Can suspend only one frame Still can't use "return <expr>"
Python 2 protocol is it.next(), it. iter () PEP 234 explains why (with regret!): explicit calls expected considered prev(), current(), reset() as extensions Python 3 changed to it. next () PEP 3114 (Ka-Ping Yee, 2007) Also adds next(it) (We had iter(it) from the start) next(it) added in Python 2.6 to help transition
(Greg Ewing, 2009) Approximate semantics: Make "yield from <expr>" Equivalent to "for v in <expr>: yield v" Also allow "return <expr>" equivalent to "raise StopIteration(<expr>)" True semantics are more complex send() and throw() pass values down Use case: refactoring generators/coroutines Sadly, we have no experience using this yet
The itertools module has many useful ops: E.g. itertools.chain(it1, it2), itertools.islice(it, i, j) Can't we write these as overloaded operators? E.g. it1+it2, it[i:j] No! Iterators are a (nice, small) protocol There is no common base class We wouldn't want to burden each iterator implementation with such extra methods We could offer an optional base class but that would make use of the overloading unreliable