异步介绍:
异步:进入程序时I\O程序不会一直等待,而是处理其他工作- 基本
协程&异步爬虫结构:
async def xxx(): pass async def main(): pass if __name__ == '__main__': asyncio.run(mian())
aiohttp简述:
requests.get()是同步代码,而aiohttp是强大的异步爬虫asyncio实现了TCP、UDP、SSL等协议,aiohttp则是基于asyncio实现的HTTP框架。
aiohttp使用:
- 导入模块:
import aiohttp x = aiohttp.ClientSession()<==>requests模块x.get()<==>requests.get()x.post()<==>requests.post()
async with aiohttp.ClientSession() as xxx:- 使用
async,实现异步操作 - 使用
with,执行后可自动关闭
- 使用
async with xxx.get(url) as res:- 利用协程get访问链接
res.content.read()<==>res.contentres.text()<==>res.text
- 实例:
# 异步爬虫练习 -- 异步下载图片 import asyncio import aiohttp urls = [ "http://kr.shanghai-jiuxin.com/file/2021/1104/d74a24d86d8b4a76ee39e90edaf99018.jpg", "http://kr.shanghai-jiuxin.com/file/2021/1104/d9a5dfe5771fcdd9ddb128f969d48956.jpg", "http://kr.shanghai-jiuxin.com/file/2020/0810/cf05e8310aceaa43a01530b84eebd380.jpg" ] async def aiodownload(link): # 发送请求 # 获取图片内容 # 保存到文件 name = link.rsplit("/",1)[1] async with aiohttp.ClientSession() as session: async with session.get(link) as res: with open('images/'+name , 'wb' ) as w : # 读取内容是异步的,需要await挂起 w .write ( await res .content .read ( ) ) print ( f"{ name}下载完成" ) async def main ( ) : tasks = [ ] for link in urls : tasks .append (aiodownload (link ) ) await asyncio .wait (tasks ) asyncio .run (mian ( ) ) # await main() 丘 比特(Jupyter)写法