Generators in Python

Generators in Python are a special type of iterable, allowing you to iterate over a sequence of values without storing the entire sequence in memory at once. They are particularly useful when working with large datasets or sequences that would be inefficient to store entirely in memory. Instead, generators produce values one at a time, only when required, making them memory efficient.

Overview

Python generator

  • In Python, a generator is a function that returns an iterator that produces a sequence of values when iterated over.
  • Generators are useful when we want to produce a large sequence of values, but we don’t want to store all of them in memory at once.
  • A generator function is defined just like a normal function, but whenever it needs to generate a value, it does so with the yield keyword rather than return.
  • If the body of a def contains yield, the function automatically becomes a generator function. 

Yield

  • The yield statement suspends a function’s execution and sends a value back to the caller, but retains enough state to enable the function to resume where it left off.
  • When the function resumes, it continues execution immediately after the last yield run.
  • This allows its code to produce a series of values over time, rather than computing them at once and sending them back like a list.
  • Yield is used in Python generators.
Generators

Yield over return

  • Return sends a specified value back to its caller whereas Yield can produce a sequence of values.
  • We should use yield when we want to iterate over a sequence but don’t want to store the entire sequence in memory.

Example

def top():
    yield 1
    yield 2
    yield 3
    yield 4
    yield 5

values = top()

print(values.__next__())  # 1
print(values.__next__())  # 2
print(values.__next__())  # 3
print(values.__next__())  # 4
print(values.__next__())  # 5

Different ways to use


# Way 1
print(values.__next__())
print(values.__next__())
print(values.__next__())
print(values.__next__())
print(values.__next__())

# Way 2
for i in values:
    print(i)

#Way 3
print(next(values))
print(next(values))
print(next(values))

Creating a Generator:

You can create a generator by simply using the yield keyword inside a function.

def count_up_to(max_value):
    count = 1
    while count <= max_value:
        yield count
        count += 1

# Using the generator
counter = count_up_to(5)
print(next(counter))  # Output: 1
print(next(counter))  # Output: 2

for number in counter:
    print(number)  # Output: 3, 4, 5

Generator Expressions

Generator expressions are similar to list comprehensions but use parentheses instead of square brackets. Here’s a deeper look into generator expressions:

Syntax

(generator_expression for item in iterable if condition)

Example 1

With Yield

# Generator function
def generate_squares(n):
    for x in range(n):
        yield x**2

Equivalent expression

squares_gen_expr = (x**2 for x in range(10))

Example 2

With Yield

# Generator function to produce Fibonacci numbers
def generate_fibonacci(n):
    a, b = 0, 1
    for _ in range(n):
        yield a
        a, b = b, a + b

# Using the generator function
print("Using Generator Function:")
fibonacci_gen_func = generate_fibonacci(10)

Equivalent expression

def fib(n):
    a, b = 0, 1
    return (a := b, b := a + b)[0] for _ in range(n)

# Generator expression (using a helper function)
fibonacci_gen_expr = (fib(i) for i in range(10))

Memory usage comparison

Without generator

import sys

# Large list of numbers
numbers = range(1, 1000001)  # 1 to 1,000,000

# Calculate squares and store them in a list
squares_list = [n**2 for n in numbers]

# Calculate memory usage
print(f"Memory used by list: {sys.getsizeof(squares_list)} bytes")

Output

Memory used by list: 8448728 bytes

With Generator

import sys

# Generator to calculate squares
squares_gen = (n**2 for n in numbers)

# Calculate memory usage
print(f"Memory used by generator: {sys.getsizeof(squares_gen)} bytes")

output

Memory used by generator: 208 bytes

Advantages of Generators:

  • Memory Efficiency: Since generators don’t store the entire sequence in memory, they are ideal for working with large data sets.
  • Performance: Generators are faster than creating a list or other sequences because they generate items on the fly.

When to Use Generators:

  • When dealing with large datasets or streams of data.
  • When you want to iterate over data only once.
  • When you don’t need the whole dataset in memory at once.

Conclusion

Generators are a powerful feature in Python that enable efficient iteration over large data sets without consuming excessive memory. By leveraging the yield keyword, they provide a way to produce values one at a time, allowing for lazy evaluation and reducing the overhead associated with creating and maintaining large lists or other collections in memory.

Resources

Leave a Comment