异步介绍:
异步
:进入程序时I\O程序不会一直等待,而是处理其他工作- 基本
协程
&异步
爬虫结构:
async def xxx(): pass async def main(): pass if __name__ == '__main__': asyncio.run(mian())
aiohttp
简述:
requests.get()
是同步代码,而aiohttp
是强大的异步爬虫asyncio
实现了TCP、UDP、SSL
等协议,aiohttp
则是基于asyncio
实现的HTTP
框架。
aiohttp
使用:
- 导入模块:
import aiohttp
x = aiohttp.ClientSession()
<==>requests
模块x.get()
<==>requests.get()
x.post()
<==>requests.post()
async with aiohttp.ClientSession() as xxx
:- 使用
async
,实现异步操作 - 使用
with
,执行后可自动关闭
- 使用
async with xxx.get(url) as res
:- 利用协程get访问链接
res.content.read()
<==>res.content
res.text()
<==>res.text
- 实例:
# 异步爬虫练习 -- 异步下载图片 import asyncio import aiohttp urls = [ "http://kr.shanghai-jiuxin.com/file/2021/1104/d74a24d86d8b4a76ee39e90edaf99018.jpg", "http://kr.shanghai-jiuxin.com/file/2021/1104/d9a5dfe5771fcdd9ddb128f969d48956.jpg", "http://kr.shanghai-jiuxin.com/file/2020/0810/cf05e8310aceaa43a01530b84eebd380.jpg" ] async def aiodownload(link): # 发送请求 # 获取图片内容 # 保存到文件 name = link.rsplit("/",1)[1] async with aiohttp.ClientSession() as session: async with session.get(link) as res: with open('images/'
+name , 'wb' ) as w : # 读取内容是异步的,需要await挂起 w .write ( await res .content .read ( ) ) print ( f"{ name}下载完成" ) async def main ( ) : tasks = [ ] for link in urls : tasks .append (aiodownload (link ) ) await asyncio .wait (tasks ) asyncio .run (mian ( ) ) # await main() 丘 比特(Jupyter)写法