HTTPX的使用
有些网站使用的是http/2.0的协议,这种情况下,urllilb和requests模块是不能爬取数据的,这个时候就要使用httpx.
官方文档
基本使用
httpx和requests很像,但是httpx有一个Client
类,可以自定义协议,所以建议先实例化一个Client
对象用于后续爬取.
import httpx
client = httpx.Client(http2 = True) ## 手动打开http2的使能,不然是默认http/1.1
response = client.get(url,headers = headers)
print(response.text)
Client对象
Client
和Session
对象类似,可以理解为维护爬虫的进程,可以随时结束进程,类似打开文件
示例代码:
with https.Client(http2 = True) as client:
response = client.get(url,headers = headers)
异步请求
httpx
还支持异步请求(即AsyncClient),支持Python的async请求模式,写法:
import httpx
import asyncio
async def fetch(url):
async with httpx.AsyncClient(http2 = True) as client:
response = client.get(url)
print(response.text)
url = ''
asyncio.get_event_loop().run_until_complete(fetch(url))