资讯详情

爬虫进阶-aiohttp异步模块

异步介绍:

  • 异步:进入程序时I\O程序不会一直等待,而是处理其他工作
  • 基本协程&异步爬虫结构:
async def xxx():     pass async def main():     pass if __name__ == '__main__':     asyncio.run(mian()) 

aiohttp简述:

  • requests.get()是同步代码,而aiohttp是强大的异步爬虫
  • asyncio实现了TCP、UDP、SSL等协议,aiohttp则是基于asyncio实现的HTTP框架。

aiohttp使用:

  • 导入模块:import aiohttp
  • x = aiohttp.ClientSession() <==> requests模块
    • x.get() <==> requests.get()
    • x.post() <==> requests.post()
  • async with aiohttp.ClientSession() as xxx:
    • 使用async,实现异步操作
    • 使用with,执行后可自动关闭
  • async with xxx.get(url) as res:
    • 利用协程get访问链接
    • res.content.read() <==> res.content
    • res.text() <==> res.text
  • 实例:
# 异步爬虫练习 -- 异步下载图片 import asyncio import aiohttp urls = [     "http://kr.shanghai-jiuxin.com/file/2021/1104/d74a24d86d8b4a76ee39e90edaf99018.jpg",     "http://kr.shanghai-jiuxin.com/file/2021/1104/d9a5dfe5771fcdd9ddb128f969d48956.jpg",     "http://kr.shanghai-jiuxin.com/file/2020/0810/cf05e8310aceaa43a01530b84eebd380.jpg" ] async def aiodownload(link):     # 发送请求     # 获取图片内容     # 保存到文件     name = link.rsplit("/",1)[1]     async with aiohttp.ClientSession() as session:         async with session.get(link) as res:             with open('images/'
       
        +name
        , 
        'wb'
        ) 
        as w
        : 
        # 读取内容是异步的,需要await挂起 w
        .write
        (
        await res
        .content
        .read
        (
        )
        ) 
        print
        (
        f"{ 
           name}下载完成"
        ) 
        async 
        def 
        main
        (
        )
        : tasks 
        = 
        [
        ] 
        for link 
        in urls
        : tasks
        .append
        (aiodownload
        (link
        )
        ) 
        await asyncio
        .wait
        (tasks
        ) asyncio
        .run
        (mian
        (
        )
        ) 
        # await main() 丘比特(Jupyter)写法 
       

标签: 二极管nubm05e模块

锐单商城拥有海量元器件数据手册IC替代型号,打造 电子元器件IC百科大全!

锐单商城 - 一站式电子元器件采购平台