Python Generators

Introduction

Python generators allow you to easily create efficient sequences of values using iterators.

They can be implemented with special functions called generator functions, which use the yield keyword to yield values one at a time, rather than returning them all at once. Generators can also be created with generator expressions, which are similar to list comprehensions but produce values lazily as well. (Generator expression will be discussed in a following lesson.)

Generators are particularly useful for producing values dynamically (i.e. on-the-fly) without holding them all in memory at once. They enable lazy evaluation, meaning that values are computed only when requested, which can lead to significant memory savings and improved performance compared to eagerly generating and storing all values up front.

Another advantage of generators is their ability to represent infinite sequences or streams of data. Since they produce values on-the-fly, you can iterate over them indefinitely without worrying about exhausting the entire sequence.

Benefits of Generators

  1. Memory Efficiency: Generators produce values on-the-fly, allowing you to process large datasets without loading everything into memory at once. This makes them suitable for handling data streams or infinite sequences.
  2. Lazy Evaluation: Values are generated only when needed, reducing unnecessary computation. This can lead to significant performance improvements, especially when dealing with complex computations or I/O-bound tasks.
  3. Simplified Code: Generators enable you to write concise and readable code by separating the generation of values from their consumption. This can result in more modular and maintainable codebases.

Generator Functions

In Python, generators functions are characterized by the use of the yield keyword instead of return. A yield signifies to Python that it should create a generator rather than a regular function. Like a return statement, a yield returns a value and stops the execution of the function. However, unlike return, yield allows the function resume where it left off when used again, maintaining its state between invocations.

Another difference between generator functions and regular functions is that when a generator function is called, its code is not executed then. Instead, the function call returns a generator object, which is a kind of an iterator. Just like any other iterator, a generator object can be iterated over with a for loop or the next() function. The generator function’s code executes when the iterator is asked to produce the next item.

Let’s see how a generator function works with an example that counts to three.

Example:

def count_to_three():
    print("One")
    yield 1
    print("Two")
    yield 2 
    print("Three")
    yield 3 
    print("The End")

counter = count_to_three()
for num in counter:
    print(num)

Output:

One
1
Two
2
Three
3
The End

In the example above, count_to_three() is a generator function (line 1) since it uses yield statements (lines 3, 5, and 7). Following the function definition, the code gets the generator by calling count_to_three() (line 10) and then iterates over it with a for loop (line 11), which prints the returned value.

With each iteration, count_to_three()‘s code executes until the subsequent yield statement it encounters, returns the value given to the yield and stops execution. In the following iteration, the code execution starts up again immediately after the previous yield.

The resulting output is printouts of the word count and “The End” at the end done by statements in the count_to_three() function (lines 2, 4, 6, and 8), interleaved with printouts of the numeric count done by the body of the for loop (line 11).

Iterators Returned By Generator Functions

This section takes a closer look at the iterators that generator functions return. Let’s begin with an example that implements a very simple generator.

Example:

def generator_fn():
    print("Hello")
    yield 1

my_generator = generator_fn()
print(my_generator)

Output:

<generator object generator_fn at 0x7f894f1cb920>

The example above defines the generator_fn() generator function (line 1), which prints “Hello” (line 2) and yields the number 1 once (line 3). The last two lines of code in the example get the generator from the generator_fn() function (line 5), and print out its representation (line 6). As you can see from the output, “Hello” is not printed because the call to generator_fn() does not execute the function’s code. The call just returns a generator object, which is an iterator. The generator object is assigned to my_generator, and its printout indicates that it is indeed a generator object.

The following example extends the code in the previous one to iterate over the generator using the next() function call.

Example:

next(my_generator)

Output:

Hello

The call to next(my_generator) in the example above does one iteration, and consequently executes the code in generator_fn(), which prints “Hello”.

Note that a second call to next(my_generator) would result in a StopIteration exception, since this is the normal behavior of an iterator that has exhausted all its iterations. Because generator_fn() only executes one yield, it only provides one iteration.

Infinite Iteration

Since generator functions can compute values dynamically, it is possible to generate them as long as they are requested by the user of the iterator, providing a theoretically infinite supply of values. This is impossible with collection, such as a lists, which also provide iterators, but store all their values in memory all at once.

Infinite generators are implemented using infinite loops, and the next example creates an infinite generator for a Fibonacci sequence.

The Fiboncci sequence is an infinite sequence of integers in which the current number is always the sum of the previous two, assuming the first two integers are 1 and 1. It proceeds as follows: 1, 1, 2, 3, 5, 8, 13…

Example:

def fibonacci():
    prev = 0
    curr = 1
    while(True):
        yield curr
        prev, curr = curr, curr + prev
               
fib_gen = fibonacci()
for _ in range(6):
    print(next(fib_gen))

Output:

1
1
2
3
5
8

In the example above, fibonacci() is a generator function that creates an infinite generator for the Fibonacci sequence. It uses an infinite while loop using the construct while(True) (line 4). Following the function’s definition, the generator is assigned to the variable fib_gen. It is then is iterated over six time using a for loop (line 9), which prints out (line 10) the first six Fibonacci numbers. (The underscore _ in the for loop is a variable name that is understood by convention to mean that its is not being used.)

Generator Function Parameters

Generator functions can accept any parameters, just like regular functions can. Those parameters can be used to control any aspect of the generator.

The next example will demonstrate using parameterized generators by creating a Fibonacci sequence generator that has a fixed length instead of being infinite.

Example:

def fibonacci(length):
    prev = 0
    curr = 1
    for _ in range(length):
        yield curr
        prev, curr = curr, curr + prev
               
for num in fibonacci(6):
    print(num)

Output:

1
1
2
3
5
8

The example above modifies the fibonacci() function from the previous one to have a length parameter. This parameter limits length of the sequence. by replacing the infinite while loop with a for loop that repeats fixed length number of times (line 4). With these changes, the for loop can use fibonacci(6) directly because it terminates the sequence on its own after six iterations. In the previous example, the for loop had to use a range(6) iterator to terminate the iteration since the generator was infinite.

Restarting a Generator

Like any iterator, a generator will continue generating the next value in the sequence and technically can’t be restarted. However, another call of the generator function will create a new generator object that starts at the beginning again.

Example:

def fibonacci():
    prev = 0
    curr = 1
    while(True):
        yield curr
        prev, curr = curr, curr + prev
              
print("First Sequence:")
fib_gen = fibonacci()
for _ in range(3):
    print(next(fib_gen))
    
print()
print("Second Sequence:")
fib_gen = fibonacci()
for _ in range(4):
    print(next(fib_gen))

Summary & Reference for Python Generators

Python generators are a tool for creating efficient sequences of values with iterators. They are particularly useful for handling large datasets or infinite sequences. They enable lazy evaluation, meaning values are computed on-the-fly only when needed, leading to memory savings and improved performance.

Generator functions are special functions characterized by the use of the yield keyword. A yield is like a return, but in addition to returning a value and exiting, it allows the function to resume where it left off when used again, maintaining its state between invocations.

def greeting():
   yield "Hello"
   yield "there"
   yield "neighbor."
   
for x in greeting():
    print(x) 

A call to a generator function returns a generator object, which is a kind of iterator.

def greeting():
   yield "Hello"
   yield "there"
   yield "neighbor."

print(greeting())  # --> <generator object greeting at 0x7f894f1cb8b0>

Infinite sequences can be generated using generator functions with infinite loops. For instance, the Fibonacci sequence generator produces values dynamically as needed.

def fibonacci():
    prev = 0
    curr = 1
    while True:
        yield curr
        prev, curr = curr, curr + prev

fib_gen = fibonacci()

Generator functions can accept parameters to control aspects of the generated sequence. This allows for flexible and customizable behavior.

def count_up_to(limit):
    for i in range(1, limit + 1):
        yield i

counter = count_up_to(5)

While generators technically can’t be restarted, calling the generator function again creates a new generator object, effectively “restarting” the sequence.