httpx的使用

爬虫学习

Publish Date: 2022-01-18

HTTPX的使用

有些网站使用的是http/2.0的协议,这种情况下,urllilb和requests模块是不能爬取数据的,这个时候就要使用httpx.
官方文档

基本使用

httpx和requests很像,但是httpx有一个Client类,可以自定义协议,所以建议先实例化一个Client对象用于后续爬取.

import httpx

client = httpx.Client(http2 = True) ## 手动打开http2的使能,不然是默认http/1.1
response = client.get(url,headers = headers)
print(response.text)

Client对象

Client和Session对象类似,可以理解为维护爬虫的进程,可以随时结束进程,类似打开文件

示例代码:

with https.Client(http2 = True) as client:
    response = client.get(url,headers = headers)

异步请求

httpx还支持异步请求(即AsyncClient),支持Python的async请求模式,写法:

import httpx
import asyncio

async def fetch(url):
    async with httpx.AsyncClient(http2 = True) as client:
        response = client.get(url)
        print(response.text)

url = ''
asyncio.get_event_loop().run_until_complete(fetch(url))

Dovahkiin

https://the-tarnished.github.io/2022/01/18/httpx%E7%9A%84%E4%BD%BF%E7%94%A8/

All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source Dovahkiin !

python spider

Numpy学习

2022-01-19 数学建模

python 数学建模 Numpy

logism自动化填充

2022-01-17 logism

logisim P0