生成器是 Python 中容易被忽视但极其重要的特性。在处理大规模数据时,生成器可以显著降低内存占用;而在协程和异步编程中,yield 更是核心语法。本文深入探讨生成器的原理和实战应用。
迭代器协议详解 理解生成器之前,需要了解 Python 的迭代器协议。
可迭代对象与迭代器 可迭代对象 (Iterable):实现了 __iter__ 方法,可以使用 for 循环遍历。
迭代器 (Iterator):同时实现了 __iter__ 和 __next__ 方法,可以逐个返回元素。
1 2 3 4 5 6 7 8 9 10 11 12 numbers = [1 , 2 , 3 ] print (hasattr (numbers, '__iter__' )) print (hasattr (numbers, '__next__' )) it = iter (numbers) print (type (it)) print (next (it)) print (next (it)) print (next (it)) print (next (it))
for 循环的内部原理 1 2 3 4 5 6 7 8 9 10 11 12 for item in iterable: print (item) iterator = iter (iterable) while True : try : item = next (iterator) print (item) except StopIteration: break
自定义迭代器 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 class Range : """模拟 range(start, end) 的迭代器""" def __init__ (self, start, end ): self.current = start self.end = end def __iter__ (self ): return self def __next__ (self ): if self.current >= self.end: raise StopIteration value = self.current self.current += 1 return value for i in Range(0 , 3 ): print (i)
生成器函数与生成器对象 生成器函数 如果函数中包含 yield 关键字,这个函数就不再是普通函数,而是生成器函数 :
1 2 3 4 5 6 7 8 9 10 11 def count_up_to (max_val ): """生成 0 到 max_val 的数字""" count = 0 while count <= max_val: yield count count += 1 generator = count_up_to(3 ) print (type (generator)) print (list (generator))
注意:生成器函数不会立即执行 ,只有调用 next() 或迭代时才开始执行。
生成器执行过程 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 def simple_gen (): print ("开始执行" ) yield 1 print ("继续执行" ) yield 2 print ("结束" ) yield 3 gen = simple_gen() print ("生成器已创建" )print (next (gen)) print (next (gen)) print (next (gen)) print (next (gen))
执行流程图:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 创建生成器 │ ▼ next() 调用 ──────► yield 1 ──────► 暂停,返回 1 │ │ │ next() │ next() ▼ ▼ next() 调用 ──────► yield 2 ──────► 暂停,返回 2 │ │ │ next() │ next() ▼ ▼ next() 调用 ──────► yield 3 ──────► 暂停,返回 3 │ │ │ next() │ StopIteration ▼ ▼ 结束
yield from 用法 yield from 是 Python 3 引入的语法,用于委托给另一个生成器 。
基本用法 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 def gen1 (): yield 1 yield 2 def gen2 (): yield from gen1() yield 3 def gen2_equivalent (): yield 1 yield 2 yield 3 print (list (gen2()))
yield from 的价值 yield from 主要用于两个场景:
1. 链式生成器 1 2 3 4 5 6 7 8 9 10 11 12 def chain (*iterables ): """连接多个可迭代对象""" for it in iterables: yield from it def chain_v2 (*iterables ): for it in iterables: for item in it: yield item print (list (chain([1 , 2 ], [3 , 4 ], [5 , 6 ])))
2. 生成器 delegation 1 2 3 4 5 6 7 8 9 10 def flatten (nested_list ): """展平嵌套列表""" for item in nested_list: if isinstance (item, list ): yield from flatten(item) else : yield item nested = [1 , [2 , 3 ], [4 , [5 , 6 ]], 7 ] print (list (flatten(nested)))
生成器的惰性计算 生成器最核心的优势是惰性求值 (Lazy Evaluation):只在需要时才计算下一个值。
内存效率对比 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 def get_squares_list (n ): return [x ** 2 for x in range (n)] def get_squares_gen (n ): for x in range (n): yield x ** 2 squares_list = get_squares_list(10_000_000 ) squares_gen = get_squares_gen(10_000_000 )
实际应用场景 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 def read_large_file (file_path ): """逐行读取,不占用大量内存""" with open (file_path, 'r' ) as f: for line in f: yield line.strip() for line in read_large_file('huge_log.txt' ): if 'ERROR' in line: print (line) def fibonacci (): """斐波那契数列(无限)""" a, b = 0 , 1 while True : yield a a, b = b, a + b fib = fibonacci() for _ in range (10 ): print (next (fib))
生成器进阶:send、throw、close 生成器提供了三个高级方法,用于更复杂的控制。
send() 方法 send() 允许向生成器发送值 ,恢复生成器的同时传递一个值:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 def counter (): count = 0 while True : received = yield count if received is not None : count = received else : count += 1 gen = counter() print (next (gen)) print (gen.send(10 )) print (next (gen)) print (gen.send(5 )) print (next (gen))
典型应用:协程 。send() 是协程通信的基础。
1 2 3 4 5 6 7 8 9 10 11 def coro (): print ("协程启动" ) while True : value = yield print (f"收到值: {value} " ) c = coro() next (c) c.send(100 ) c.send(200 ) c.close()
throw() 方法 throw() 向生成器抛出异常 :
1 2 3 4 5 6 7 8 9 10 11 12 13 def gen_with_error (): try : yield 1 yield 2 except ValueError: yield "捕获了 ValueError" yield 3 g = gen_with_error() print (next (g)) print (next (g)) print (g.throw(ValueError)) print (next (g))
close() 方法 close() 正常终止生成器:
1 2 3 4 5 6 7 8 def simple_gen (): yield 1 yield 2 g = simple_gen() print (next (g)) g.close() print (next (g))
实战:内存高效的数据处理管道 结合以上知识,构建一个内存高效的数据处理管道:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 def read_users (file_path ): """读取用户数据""" with open (file_path, 'r' ) as f: next (f) for line in f: name, age, city = line.strip().split(',' ) yield {'name' : name, 'age' : int (age), 'city' : city} def filter_age (users, min_age ): """过滤年龄""" for user in users: if user['age' ] >= min_age: yield user def group_by_city (users ): """按城市分组""" groups = {} for user in users: city = user['city' ] if city not in groups: groups[city] = [] groups[city].append(user) yield from groups.items() def process_pipeline (file_path ): """处理管道""" users = read_users(file_path) adults = filter_age(users, 18 ) grouped = group_by_city(adults) return grouped for city, users in process_pipeline('users.csv' ): print (f"{city} : {len (users)} 人" )
这个管道的优势:
流式处理 :每条记录处理完即释放,无需全部加载到内存
惰性求值 :数据只在迭代时处理
可组合 :各步骤独立,易于维护和测试
总结 生成器是 Python 不可或缺的部分:
特性
说明
__iter__ / __next__
迭代器协议的基础
yield
创建生成器函数,惰性求值
yield from
委托给子生成器
send()
向生成器发送值
throw()
向生成器抛出异常
close()
正常终止生成器
掌握生成器,能够编写出内存高效、处理大数据集的 Python 代码。