List comprehensions vs generator expressions in Python

Jul 18, 2024#python

While the syntax for list comprehensions and generator expressions in Python is nearly identical, the difference in brackets leads to different behaviors. List comprehensions create lists in memory, whereas generator expressions create generators that yield items one at a time, useful for processing data streams.

Syntax

Both have very similar syntax, which can be confusing at first, use the same basic structure with a loop and an expression, wrapped in either brackets or parentheses.

list_comp = [expression for item in iterable if condition]
gen_expr = (expression for item in iterable if condition)

List comprehension returns a list, generator expression returns a generator object which can be iterated over.

# Create a list of squares of even numbers
squares = [x**2 for x in range(10) if x % 2 == 0]
print(squares)  # Output: [0, 4, 16, 36, 64]

# Create a generator for squares of even numbers
squares_gen = (x**2 for x in range(10) if x % 2 == 0)
print(list(squares_gen))  # Output: [0, 4, 16, 36, 64]

Generators can be iterated over only once. Once all items are consumed, they cannot be reused or reset. If you need to reuse the values produced by a generator, you can store them in a list or another collection.

squares_gen = (x**2 for x in range(10))
squares = list(squares_gen)  # Store the generated values in a list

# Now you can iterate multiple times
for square in squares:
    print(square)

for square in squares:
    print(square)

If you need to iterate again and you can’t store all values in memory, consider creating a new generator each time.

def squares_gen_func():
    return (x**2 for x in range(10))

# First iteration
for square in squares_gen_func():
    print(square)

# Second iteration
for square in squares_gen_func():
    print(square)

Performance

The generator will generally be much smaller in size compared to the list since generator expressions generate items on-the-fly, they are much more memory efficient compared to list comprehensions.

import sys
list_comp = [x**2 for x in range(1000000)]
gen_expr = (x**2 for x in range(1000000))

print(sys.getsizeof(list_comp))  # Size of list in bytes (8448728)
print(sys.getsizeof(gen_expr))   # Size of generator in bytes (104)

Generator expressions use lazy evaluation, meaning that they compute each item only when it is accessed. When a list comprehension is executed, Python evaluates the entire expression and stores all the results in a list.

Use cases

List comprehensions are ideal when you need the entire dataset immediately, when the dataset is small to medium-sized, and when you need to access the data multiple times.

names = ["apple", "banana", "cherry"]
uppercase_names = [name.upper() for name in names]
print(uppercase_names)  # Output: ["APPLE", "BANANA", "CHERRY"]

Generators are particularly useful for processing data streams or when working with large files. For example, reading lines from a large file without loading the entire file into memory:

def read_large_file(file_path):
    with open(file_path, 'r') as file:
        for line in file:
            yield line.strip()

lines_gen = read_large_file('large_file.txt')
for line in lines_gen:
    process(line)

Generators can be used to create data processing pipelines where each step in the pipeline processes one item at a time. This is useful for chaining operations without creating intermediate lists.

numbers = (x for x in range(1000000))
squares = (x**2 for x in numbers if x % 2 == 0)
cubes = (x**3 for x in squares if x % 4 == 0)

for cube in cubes:
    process(cube)