Python requests模块基础使用方法实例及高级应用(自动登陆,抓取网页源码)实例详解

2023-08-01 12:37:04 375

1、Pythonrequests模块说明

requests是使用Apache2licensed许可证的HTTP库。

用python编写。

比urllib2模块更简洁。

Request支持HTTP连接保持和连接池，支持使用cookie保持会话，支持文件上传，支持自动响应内容的编码，支持国际化的URL和POST数据自动编码。

在python内置模块的基础上进行了高度的封装，从而使得python进行网络请求时，变得人性化，使用Requests可以轻而易举的完成浏览器可有的任何操作。

现代，国际化，友好。

requests会自动实现持久连接keep-alive

2、Pythonrequests模块基础入门

1）导入模块

importrequests

2）发送请求的简洁

示例代码：获取一个网页（个人github）

importrequests
r=requests.get('https://github.com/Ranxf')#最基本的不带参数的get请求
r1=requests.get(url='http://dict.baidu.com/s',params={'wd':'python'})#带参数的get请求

我们还可以使用requests模块其它请求方法

1 requests.get(‘https://github.com/timeline.json') #GET请求

2 requests.post(“http://httpbin.org/post”) #POST请求

3 requests.put(“http://httpbin.org/put”) #PUT请求

4 requests.delete(“http://httpbin.org/delete”) #DELETE请求

5 requests.head(“http://httpbin.org/get”) #HEAD请求

6 requests.options(“http://httpbin.org/get”) #OPTIONS请求

3）为url传递参数

>>>url_params={'key':'value'}#字典传递参数，如果值为None的键不会被添加到url中
>>>r=requests.get('yoururl',params=url_params)
>>>print(r.url)

yoururl?key=value

4）响应的内容

r.encoding #获取当前的编码

r.encoding='utf-8' #设置编码

r.text #以encoding解析返回内容。字符串方式的响应体，会自动根据响应头部的字符编码进行解码。

r.content #以字节形式（二进制）返回。字节方式的响应体，会自动为你解码gzip和deflate压缩。

r.headers #以字典对象存储服务器响应头，但是这个字典比较特殊，字典键不区分大小写，若键不存在则返回None

r.status_code #响应状态码

r.raw #返回原始响应体，也就是urllib的response对象，使用r.raw.read()

r.ok #查看r.ok的布尔值便可以知道是否登陆成功

#*特殊方法*#

r.json() #Requests中内置的JSON解码器，以json形式返回,前提返回的内容确保是json格式的，不然解析出错会抛异常

r.raise_for_status() #失败请求(非200响应)抛出异常

post发送json请求：

importrequests
importjson

r=requests.post('https://api.github.com/some/endpoint',data=json.dumps({'some':'data'}))

print(r.json())

5）定制头和cookie信息

header={'user-agent':'my-app/0.0.1''}
cookie={'key':'value'}
r=requests.get/post('yoururl',headers=header,cookies=cookie)
data={'some':'data'}
headers={'content-type':'application/json',
'User-Agent':'Mozilla/5.0(X11;Ubuntu;Linuxx86_64;rv:22.0)Gecko/20100101Firefox/22.0'}

r=requests.post('https://api.github.com/some/endpoint',data=data,headers=headers)
print(r.text)

6）响应状态码

使用requests方法后，会返回一个response对象，其存储了服务器响应的内容，如上实例中已经提到的r.text、r.status_code……

获取文本方式的响应体实例：当你访问r.text之时，会使用其响应的文本编码进行解码，并且你可以修改其编码让r.text使用自定义的编码进行解码。

r=requests.get('http://www.itwhy.org')
print(r.text,'\n{}\n'.format('*'*79),r.encoding)
r.encoding='GBK'
print(r.text,'\n{}\n'.format('*'*79),r.encoding)

示例代码：

importrequests

r=requests.get('https://github.com/Ranxf')#最基本的不带参数的get请求
print(r.status_code)#获取返回状态
r1=requests.get(url='http://dict.baidu.com/s',params={'wd':'python'})#带参数的get请求
print(r1.url)
print(r1.text)#打印解码后的返回数据

运行结果：

/usr/bin/python3.5/home/rxf/python3_1000/1000/python3_server/python3_requests/demo1.py

200

http://dict.baidu.com/s?wd=python

…………

Processfinishedwithexitcode0

r.status_code #如果不是200，可以使用r.raise_for_status()抛出异常

7）响应

r.headers #返回字典类型,头信息

r.requests.headers #返回发送到服务器的头信息

r.cookies #返回cookie

r.history #返回重定向信息,当然可以在请求是加上allow_redirects=false阻止重定向

8）超时

r=requests.get('url',timeout=1)#设置秒数超时，仅对于连接有效

9)会话对象，能够跨请求保持某些参数

s=requests.Session()
s.auth=('auth','passwd')
s.headers={'key':'value'}
r=s.get('url')
r1=s.get('url1')

10）代理

proxies={'http':'ip1','https':'ip2'}
requests.get('url',proxies=proxies)

汇总：

#HTTP请求类型
#get类型
r=requests.get('https://github.com/timeline.json')
#post类型
r=requests.post("http://m.ctrip.com/post")
#put类型
r=requests.put("http://m.ctrip.com/put")
#delete类型
r=requests.delete("http://m.ctrip.com/delete")
#head类型
r=requests.head("http://m.ctrip.com/head")
#options类型
r=requests.options("http://m.ctrip.com/get")

#获取响应内容
print(r.content)#以字节的方式去显示，中文显示为字符
print(r.text)#以文本的方式去显示

#URL传递参数
payload={'keyword':'香港','salecityid':'2'}
r=requests.get("http://m.ctrip.com/webapp/tourvisa/visa_list",params=payload)
print（r.url）#示例为http://m.ctrip.com/webapp/tourvisa/visa_list?salecityid=2&keyword=香港

#获取/修改网页编码
r=requests.get('https://github.com/timeline.json')
print（r.encoding）


#json处理
r=requests.get('https://github.com/timeline.json')
print（r.json()）#需要先importjson

#定制请求头
url='http://m.ctrip.com'
headers={'User-Agent':'Mozilla/5.0(Linux;Android4.2.1;en-us;Nexus4Build/JOP40D)AppleWebKit/535.19(KHTML,likeGecko)Chrome/18.0.1025.166MobileSafari/535.19'}
r=requests.post(url,headers=headers)
print（r.request.headers)

#复杂post请求
url='http://m.ctrip.com'
payload={'some':'data'}
r=requests.post(url,data=json.dumps(payload))#如果传递的payload是string而不是dict，需要先调用dumps方法格式化一下

#post多部分编码文件
url='http://m.ctrip.com'
files={'file':open('report.xls','rb')}
r=requests.post(url,files=files)

#响应状态码
r=requests.get('http://m.ctrip.com')
print(r.status_code)

#响应头
r=requests.get('http://m.ctrip.com')
print(r.headers)
print(r.headers['Content-Type'])
print(r.headers.get('content-type'))#访问响应头部分内容的两种方式

#Cookies
url='http://example.com/some/cookie/setting/url'
r=requests.get(url)
r.cookies['example_cookie_name']#读取cookies

url='http://m.ctrip.com/cookies'
cookies=dict(cookies_are='working')
r=requests.get(url,cookies=cookies)#发送cookies

#设置超时时间
r=requests.get('http://m.ctrip.com',timeout=0.001)

#设置访问代理
proxies={
"http":"http://10.10.1.10:3128",
"https":"http://10.10.1.100:4444",
}
r=requests.get('http://m.ctrip.com',proxies=proxies)


#如果代理需要用户名和密码，则需要这样：
proxies={
"http":"http://user:pass@10.10.1.10:3128/",
}

#HTTP请求类型
#get类型
r=requests.get('https://github.com/timeline.json')
#post类型
r=requests.post("http://m.ctrip.com/post")
#put类型
r=requests.put("http://m.ctrip.com/put")
#delete类型
r=requests.delete("http://m.ctrip.com/delete")
#head类型
r=requests.head("http://m.ctrip.com/head")
#options类型
r=requests.options("http://m.ctrip.com/get")

#获取响应内容
print(r.content)#以字节的方式去显示，中文显示为字符
print(r.text)#以文本的方式去显示

#URL传递参数
payload={'keyword':'香港','salecityid':'2'}
r=requests.get("http://m.ctrip.com/webapp/tourvisa/visa_list",params=payload)
print（r.url）#示例为http://m.ctrip.com/webapp/tourvisa/visa_list?salecityid=2&keyword=香港

#获取/修改网页编码
r=requests.get('https://github.com/timeline.json')
print（r.encoding）


#json处理
r=requests.get('https://github.com/timeline.json')
print（r.json()）#需要先importjson

#定制请求头
url='http://m.ctrip.com'
headers={'User-Agent':'Mozilla/5.0(Linux;Android4.2.1;en-us;Nexus4Build/JOP40D)AppleWebKit/535.19(KHTML,likeGecko)Chrome/18.0.1025.166MobileSafari/535.19'}
r=requests.post(url,headers=headers)
print（r.request.headers)

#复杂post请求
url='http://m.ctrip.com'
payload={'some':'data'}
r=requests.post(url,data=json.dumps(payload))#如果传递的payload是string而不是dict，需要先调用dumps方法格式化一下

#post多部分编码文件
url='http://m.ctrip.com'
files={'file':open('report.xls','rb')}
r=requests.post(url,files=files)

#响应状态码
r=requests.get('http://m.ctrip.com')
print(r.status_code)

#响应头
r=requests.get('http://m.ctrip.com')
print(r.headers)
print(r.headers['Content-Type'])
print(r.headers.get('content-type'))#访问响应头部分内容的两种方式

#Cookies
url='http://example.com/some/cookie/setting/url'
r=requests.get(url)
r.cookies['example_cookie_name']#读取cookies

url='http://m.ctrip.com/cookies'
cookies=dict(cookies_are='working')
r=requests.get(url,cookies=cookies)#发送cookies

#设置超时时间
r=requests.get('http://m.ctrip.com',timeout=0.001)

#设置访问代理
proxies={
"http":"http://10.10.1.10:3128",
"https":"http://10.10.1.100:4444",
}
r=requests.get('http://m.ctrip.com',proxies=proxies)


#如果代理需要用户名和密码，则需要这样：
proxies={
"http":"http://user:pass@10.10.1.10:3128/",
}

3、示例代码

GET请求

#1、无参数实例

importrequests

ret=requests.get('https://github.com/timeline.json')

print(ret.url)
print(ret.text)



#2、有参数实例

importrequests

payload={'key1':'value1','key2':'value2'}
ret=requests.get("http://httpbin.org/get",params=payload)

print(ret.url)
print(ret.text)

POST请求

#1、基本POST实例

importrequests

payload={'key1':'value1','key2':'value2'}
ret=requests.post("http://httpbin.org/post",data=payload)

print(ret.text)


#2、发送请求头和数据实例

importrequests
importjson

url='https://api.github.com/some/endpoint'
payload={'some':'data'}
headers={'content-type':'application/json'}

ret=requests.post(url,data=json.dumps(payload),headers=headers)

print(ret.text)
print(ret.cookies)

请求参数

defrequest(method,url,**kwargs):

"""Constructsandsendsa:class:`Request`.

:parammethod:methodforthenew:class:`Request`object.

:paramurl:URLforthenew:class:`Request`object.

:paramparams:(optional)Dictionaryorbytestobesentinthequerystringforthe:class:`Request`.

:paramdata:(optional)Dictionary,bytes,orfile-likeobjecttosendinthebodyofthe:class:`Request`.

:paramjson:(optional)jsondatatosendinthebodyofthe:class:`Request`.

:paramheaders:(optional)DictionaryofHTTPHeaderstosendwiththe:class:`Request`.

:paramcookies:(optional)DictorCookieJarobjecttosendwiththe:class:`Request`.

:paramfiles:(optional)Dictionaryof``'name':file-like-objects``(or``{'name':file-tuple}``)formultipartencodingupload.

``file-tuple``canbea2-tuple``('filename',fileobj)``,3-tuple``('filename',fileobj,'content_type')``

ora4-tuple``('filename',fileobj,'content_type',custom_headers)``,where``'content-type'``isastring

definingthecontenttypeofthegivenfileand``custom_headers``adict-likeobjectcontainingadditionalheaders

toaddforthefile.

:paramauth:(optional)AuthtupletoenableBasic/Digest/CustomHTTPAuth.

:paramtimeout:(optional)Howlongtowaitfortheservertosenddata

beforegivingup,asafloat,ora:ref:`(connecttimeout,read

timeout)`tuple.

:typetimeout:floatortuple

:paramallow_redirects:(optional)Boolean.SettoTrueifPOST/PUT/DELETEredirectfollowingisallowed.

:typeallow_redirects:bool

:paramproxies:(optional)DictionarymappingprotocoltotheURLoftheproxy.

:paramverify:(optional)whethertheSSLcertwillbeverified.ACA_BUNDLEpathcanalsobeprovided.Defaultsto``True``.

:paramstream:(optional)if``False``,theresponsecontentwillbeimmediatelydownloaded.

:paramcert:(optional)ifString,pathtosslclientcertfile(.pem).IfTuple,('cert','key')pair.

:return::class:`Response`object

:rtype:requests.Response

Usage::

>>>importrequests

>>>req=requests.request('GET','http://httpbin.org/get')

参数示例代码

defparam_method_url():
#requests.request(method='get',url='http://127.0.0.1:8000/test/')
#requests.request(method='post',url='http://127.0.0.1:8000/test/')
pass


defparam_param():
#-可以是字典
#-可以是字符串
#-可以是字节（ascii编码以内）

#requests.request(method='get',
#url='http://127.0.0.1:8000/test/',
#params={'k1':'v1','k2':'水电费'})

#requests.request(method='get',
#url='http://127.0.0.1:8000/test/',
#params="k1=v1&k2=水电费&k3=v3&k3=vv3")

#requests.request(method='get',
#url='http://127.0.0.1:8000/test/',
#params=bytes("k1=v1&k2=k2&k3=v3&k3=vv3",encoding='utf8'))

#错误
#requests.request(method='get',
#url='http://127.0.0.1:8000/test/',
#params=bytes("k1=v1&k2=水电费&k3=v3&k3=vv3",encoding='utf8'))
pass


defparam_data():
#可以是字典
#可以是字符串
#可以是字节
#可以是文件对象

#requests.request(method='POST',
#url='http://127.0.0.1:8000/test/',
#data={'k1':'v1','k2':'水电费'})

#requests.request(method='POST',
#url='http://127.0.0.1:8000/test/',
#data="k1=v1;k2=v2;k3=v3;k3=v4"
#)

#requests.request(method='POST',
#url='http://127.0.0.1:8000/test/',
#data="k1=v1;k2=v2;k3=v3;k3=v4",
#headers={'Content-Type':'application/x-www-form-urlencoded'}
#)

#requests.request(method='POST',
#url='http://127.0.0.1:8000/test/',
#data=open('data_file.py',mode='r',encoding='utf-8'),#文件内容是：k1=v1;k2=v2;k3=v3;k3=v4
#headers={'Content-Type':'application/x-www-form-urlencoded'}
#)
pass


defparam_json():
#将json中对应的数据进行序列化成一个字符串，json.dumps(...)
#然后发送到服务器端的body中，并且Content-Type是{'Content-Type':'application/json'}
requests.request(method='POST',
url='http://127.0.0.1:8000/test/',
json={'k1':'v1','k2':'水电费'})


defparam_headers():
#发送请求头到服务器端
requests.request(method='POST',
url='http://127.0.0.1:8000/test/',
json={'k1':'v1','k2':'水电费'},
headers={'Content-Type':'application/x-www-form-urlencoded'}
)


defparam_cookies():
#发送Cookie到服务器端
requests.request(method='POST',
url='http://127.0.0.1:8000/test/',
data={'k1':'v1','k2':'v2'},
cookies={'cook1':'value1'},
)
#也可以使用CookieJar（字典形式就是在此基础上封装）
fromhttp.cookiejarimportCookieJar
fromhttp.cookiejarimportCookie

obj=CookieJar()
obj.set_cookie(Cookie(version=0,name='c1',value='v1',port=None,domain='',path='/',secure=False,expires=None,
discard=True,comment=None,comment_url=None,rest={'HttpOnly':None},rfc2109=False,
port_specified=False,domain_specified=False,domain_initial_dot=False,path_specified=False)
)
requests.request(method='POST',
url='http://127.0.0.1:8000/test/',
data={'k1':'v1','k2':'v2'},
cookies=obj)


defparam_files():
#发送文件
#file_dict={
#'f1':open('readme','rb')
#}
#requests.request(method='POST',
#url='http://127.0.0.1:8000/test/',
#files=file_dict)

#发送文件，定制文件名
#file_dict={
#'f1':('test.txt',open('readme','rb'))
#}
#requests.request(method='POST',
#url='http://127.0.0.1:8000/test/',
#files=file_dict)

#发送文件，定制文件名
#file_dict={
#'f1':('test.txt',"hahsfaksfa9kasdjflaksdjf")
#}
#requests.request(method='POST',
#url='http://127.0.0.1:8000/test/',
#files=file_dict)

#发送文件，定制文件名
#file_dict={
#'f1':('test.txt',"hahsfaksfa9kasdjflaksdjf",'application/text',{'k1':'0'})
#}
#requests.request(method='POST',
#url='http://127.0.0.1:8000/test/',
#files=file_dict)

pass


defparam_auth():
fromrequests.authimportHTTPBasicAuth,HTTPDigestAuth

ret=requests.get('https://api.github.com/user',auth=HTTPBasicAuth('wupeiqi','sdfasdfasdf'))
print(ret.text)

#ret=requests.get('http://192.168.1.1',
#auth=HTTPBasicAuth('admin','admin'))
#ret.encoding='gbk'
#print(ret.text)

#ret=requests.get('http://httpbin.org/digest-auth/auth/user/pass',auth=HTTPDigestAuth('user','pass'))
#print(ret)
#


defparam_timeout():
#ret=requests.get('http://google.com/',timeout=1)
#print(ret)

#ret=requests.get('http://google.com/',timeout=(5,1))
#print(ret)
pass


defparam_allow_redirects():
ret=requests.get('http://127.0.0.1:8000/test/',allow_redirects=False)
print(ret.text)


defparam_proxies():
#proxies={
#"http":"61.172.249.96:80",
#"https":"http://61.185.219.126:3128",
#}

#proxies={'http://10.20.1.128':'http://10.10.1.10:5323'}

#ret=requests.get("http://www.proxy360.cn/Proxy",proxies=proxies)
#print(ret.headers)


#fromrequests.authimportHTTPProxyAuth
#
#proxyDict={
#'http':'77.75.105.165',
#'https':'77.75.105.165'
#}
#auth=HTTPProxyAuth('username','mypassword')
#
#r=requests.get("http://www.google.com",proxies=proxyDict,auth=auth)
#print(r.text)

pass


defparam_stream():
ret=requests.get('http://127.0.0.1:8000/test/',stream=True)
print(ret.content)
ret.close()

#fromcontextlibimportclosing
#withclosing(requests.get('http://httpbin.org/get',stream=True))asr:
##在此处理响应。
#foriinr.iter_content():
#print(i)


defrequests_session():
importrequests

session=requests.Session()

###1、首先登陆任何页面，获取cookie

i1=session.get(url="http://dig.chouti.com/help/service")

###2、用户登陆，携带上一次的cookie，后台对cookie中的gpsd进行授权
i2=session.post(
url="http://dig.chouti.com/login",
data={
'phone':"8615131255089",
'password':"xxxxxx",
'oneMonth':""
}
)

i3=session.post(
url="http://dig.chouti.com/link/vote?linksId=8589623",
)
print(i3.text)

json请求：

#!/usr/bin/python3
importrequests
importjson


classurl_request():
def__init__(self):
'''init'''

if__name__=='__main__':
heard={'Content-Type':'application/json'}
payload={'CountryName':'中国',
'ProvinceName':'四川省',
'L1CityName':'chengdu',
'L2CityName':'yibing',
'TownName':'',
'Longitude':'107.33393',
'Latitude':'33.157131',
'Language':'CN'}
r=requests.post("http://www.xxxxxx.com/CityLocation/json/LBSLocateCity",heards=heard,data=payload)
data=r.json()
ifr.status_code!=200:
print('LBSLocateCityAPIError'+str(r.status_code))
print(data['CityEntities'][0]['CityID'])#打印返回json中的某个key的value
print(data['ResponseStatus']['Ack'])
print(json.dump(data,indent=4,sort_keys=True,ensure_ascii=False))#树形打印json，ensure_ascii必须设为False否则中文会显示为unicode

Xml请求：

#!/usr/bin/python3
importrequests

classurl_request():
def__init__(self):
"""init"""

if__name__=='__main__':
heards={'Content-type':'text/xml'}
XML='WeChatJSTicket.JobWS.Job.JobRefreshTicket,WeChatJSTicket.JobWSRUN1127.0.0.11false'
url='http://jobws.push.mobile.xxxxxxxx.com/RefreshWeiXInTokenJob/RefreshService.asmx'
r=requests.post(url=url,heards=heards,data=XML)
data=r.text
print(data)

状态异常处理

importrequests

URL='http://ip.taobao.com/service/getIpInfo.php'#淘宝IP地址库API
try:
r=requests.get(URL,params={'ip':'8.8.8.8'},timeout=1)
r.raise_for_status()#如果响应状态码不是200，就主动抛出异常
exceptrequests.RequestExceptionase:
print(e)
else:
result=r.json()
print(type(result),result,sep='\n')

上传文件

使用request模块，也可以上传文件，文件的类型会自动进行处理：

importrequests

url='http://127.0.0.1:8080/upload'
files={'file':open('/home/rxf/test.jpg','rb')}
#files={'file':('report.jpg',open('/home/lyb/sjzl.mpg','rb'))}#显式的设置文件名

r=requests.post(url,files=files)
print(r.text)

request更加方便的是，可以把字符串当作文件进行上传：

importrequests

url='http://127.0.0.1:8080/upload'
files={'file':('test.txt',b'HelloRequests.')}#必需显式的设置文件名

r=requests.post(url,files=files)
print(r.text)

身份验证

基本身份认证(HTTPBasicAuth)

importrequests
fromrequests.authimportHTTPBasicAuth

r=requests.get('https://httpbin.org/hidden-basic-auth/user/passwd',auth=HTTPBasicAuth('user','passwd'))
#r=requests.get('https://httpbin.org/hidden-basic-auth/user/passwd',auth=('user','passwd'))#简写
print(r.json())

另一种非常流行的HTTP身份认证形式是摘要式身份认证，Requests对它的支持也是开箱即可用的:

requests.get(URL,auth=HTTPDigestAuth('user','pass')

Cookies与会话对象

如果某个响应中包含一些Cookie，你可以快速访问它们：

importrequests

r=requests.get('http://www.google.com.hk/')
print(r.cookies['NID'])
print(tuple(r.cookies))

要想发送你的cookies到服务器，可以使用cookies参数：

importrequests

url='http://httpbin.org/cookies'
cookies={'testCookies_1':'Hello_Python3','testCookies_2':'Hello_Requests'}
#在CookieVersion0中规定空格、方括号、圆括号、等于号、逗号、双引号、斜杠、问号、@，冒号，分号等特殊符号都不能作为Cookie的内容。
r=requests.get(url,cookies=cookies)
print(r.json())

会话对象让你能够跨请求保持某些参数，最方便的是在同一个Session实例发出的所有请求之间保持cookies，且这些都是自动处理的，甚是方便。

下面就来一个真正的实例，如下是快盘签到脚本：

importrequests

headers={'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding':'gzip,deflate,compress',
'Accept-Language':'en-us;q=0.5,en;q=0.3',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'User-Agent':'Mozilla/5.0(X11;Ubuntu;Linuxx86_64;rv:22.0)Gecko/20100101Firefox/22.0'}

s=requests.Session()
s.headers.update(headers)
#s.auth=('superuser','123')
s.get('https://www.kuaipan.cn/account_login.htm')

_URL='http://www.kuaipan.cn/index.php'
s.post(_URL,params={'ac':'account','op':'login'},
data={'username':'****@foxmail.com','userpwd':'********','isajax':'yes'})
r=s.get(_URL,params={'ac':'zone','op':'taskdetail'})
print(r.json())
s.get(_URL,params={'ac':'common','op':'usersign'})

requests模块抓取网页源码并保存到文件示例

这是一个基本的文件保存操作，但这里有几个值得注意的问题：

1.安装requests包，命令行输入pipinstallrequests即可自动安装。很多人推荐使用requests，自带的urllib.request也可以抓取网页源码

2.open方法encoding参数设为utf-8，否则保存的文件会出现乱码。

3.如果直接在cmd中输出抓取的内容，会提示各种编码错误，所以保存到文件查看。

4.withopen方法是更好的写法，可以自动操作完毕后释放资源

Pythonrequests模块抽屉自动登录

#!/urs/bin/python3
importrequests

'''requests模块抓取网页源码并保存到文件示例'''
html=requests.get("http://www.baidu.com")
withopen('test.txt','w',encoding='utf-8')asf:
f.write(html.text)

'''读取一个txt文件，每次读取一行，并保存到另一个txt文件中的示例'''
ff=open('testt.txt','w',encoding='utf-8')
withopen('test.txt',encoding="utf-8")asf:
forlineinf:
ff.write(line)
ff.close()

因为在命令行中打印每次读取一行的数据，中文会出现编码错误，所以每次读取一行并保存到另一个文件，这样来测试读取是否正常。（注意open的时候制定encoding编码方式）

Pythonrequests模块自动登陆实例：

#!/usr/bin/envpython
#-*-coding:utf-8-*-
importrequests


###############方式一##############
"""
###1、首先登陆任何页面，获取cookie
i1=requests.get(url="http://dig.chouti.com/help/service")
i1_cookies=i1.cookies.get_dict()

###2、用户登陆，携带上一次的cookie，后台对cookie中的gpsd进行授权
i2=requests.post(
url="http://dig.chouti.com/login",
data={
'phone':"8615131255089",
'password':"xxooxxoo",
'oneMonth':""
},
cookies=i1_cookies
)

###3、点赞（只需要携带已经被授权的gpsd即可）
gpsd=i1_cookies['gpsd']
i3=requests.post(
url="http://dig.chouti.com/link/vote?linksId=8589523",
cookies={'gpsd':gpsd}
)

print(i3.text)
"""


###############方式二##############
"""
importrequests

session=requests.Session()
i1=session.get(url="http://dig.chouti.com/help/service")
i2=session.post(
url="http://dig.chouti.com/login",
data={
'phone':"8615131255089",
'password':"xxooxxoo",
'oneMonth':""
}
)
i3=session.post(
url="http://dig.chouti.com/link/vote?linksId=8589523"
)
print(i3.text)

"""

Pythonrequests模块github自动登录

#!/usr/bin/envpython
#-*-coding:utf-8-*-

importrequests
frombs4importBeautifulSoup

###############方式一##############
#
##1.访问登陆页面，获取authenticity_token
#i1=requests.get('https://github.com/login')
#soup1=BeautifulSoup(i1.text,features='lxml')
#tag=soup1.find(name='input',attrs={'name':'authenticity_token'})
#authenticity_token=tag.get('value')
#c1=i1.cookies.get_dict()
#i1.close()
#
##1.携带authenticity_token和用户名密码等信息，发送用户验证
#form_data={
#"authenticity_token":authenticity_token,
#"utf8":"",
#"commit":"Signin",
#"login":"wupeiqi@live.com",
#'password':'xxoo'
#}
#
#i2=requests.post('https://github.com/session',data=form_data,cookies=c1)
#c2=i2.cookies.get_dict()
#c1.update(c2)
#i3=requests.get('https://github.com/settings/repositories',cookies=c1)
#
#soup3=BeautifulSoup(i3.text,features='lxml')
#list_group=soup3.find(name='div',class_='listgroup')
#
#frombs4.elementimportTag
#
#forchildinlist_group.children:
#ifisinstance(child,Tag):
#project_tag=child.find(name='a',class_='mr-1')
#size_tag=child.find(name='small')
#temp="项目:%s(%s);项目路径:%s"%(project_tag.get('href'),size_tag.string,project_tag.string,)
#print(temp)



###############方式二##############
#session=requests.Session()
##1.访问登陆页面，获取authenticity_token
#i1=session.get('https://github.com/login')
#soup1=BeautifulSoup(i1.text,features='lxml')
#tag=soup1.find(name='input',attrs={'name':'authenticity_token'})
#authenticity_token=tag.get('value')
#c1=i1.cookies.get_dict()
#i1.close()
#
##1.携带authenticity_token和用户名密码等信息，发送用户验证
#form_data={
#"authenticity_token":authenticity_token,
#"utf8":"",
#"commit":"Signin",
#"login":"wupeiqi@live.com",
#'password':'xxoo'
#}
#
#i2=session.post('https://github.com/session',data=form_data)
#c2=i2.cookies.get_dict()
#c1.update(c2)
#i3=session.get('https://github.com/settings/repositories')
#
#soup3=BeautifulSoup(i3.text,features='lxml')
#list_group=soup3.find(name='div',class_='listgroup')
#
#frombs4.elementimportTag
#
#forchildinlist_group.children:
#ifisinstance(child,Tag):
#project_tag=child.find(name='a',class_='mr-1')
#size_tag=child.find(name='small')
#temp="项目:%s(%s);项目路径:%s"%(project_tag.get('href'),size_tag.string,project_tag.string,)
#print(temp)

Pythonrequests模块知乎自动登录

#!/usr/bin/envpython
#-*-coding:utf-8-*-
importtime

importrequests
frombs4importBeautifulSoup

session=requests.Session()

i1=session.get(
url='https://www.zhihu.com/#signin',
headers={
'User-Agent':'Mozilla/5.0(Macintosh;IntelMacOSX10_10_5)AppleWebKit/537.36(KHTML,likeGecko)Chrome/54.0.2840.98Safari/537.36',
}
)

soup1=BeautifulSoup(i1.text,'lxml')
xsrf_tag=soup1.find(name='input',attrs={'name':'_xsrf'})
xsrf=xsrf_tag.get('value')

current_time=time.time()
i2=session.get(
url='https://www.zhihu.com/captcha.gif',
params={'r':current_time,'type':'login'},
headers={
'User-Agent':'Mozilla/5.0(Macintosh;IntelMacOSX10_10_5)AppleWebKit/537.36(KHTML,likeGecko)Chrome/54.0.2840.98Safari/537.36',
})

withopen('zhihu.gif','wb')asf:
f.write(i2.content)

captcha=input('请打开zhihu.gif文件，查看并输入验证码：')
form_data={
"_xsrf":xsrf,
'password':'xxooxxoo',
"captcha":'captcha',
'email':'424662508@qq.com'
}
i3=session.post(
url='https://www.zhihu.com/login/email',
data=form_data,
headers={
'User-Agent':'Mozilla/5.0(Macintosh;IntelMacOSX10_10_5)AppleWebKit/537.36(KHTML,likeGecko)Chrome/54.0.2840.98Safari/537.36',
}
)

i4=session.get(
url='https://www.zhihu.com/settings/profile',
headers={
'User-Agent':'Mozilla/5.0(Macintosh;IntelMacOSX10_10_5)AppleWebKit/537.36(KHTML,likeGecko)Chrome/54.0.2840.98Safari/537.36',
}
)

soup4=BeautifulSoup(i4.text,'lxml')
tag=soup4.find(id='rename-section')
nick_name=tag.find('span',class_='name').string
print(nick_name)

Pythonrequests模块博客园自动登录

#!/usr/bin/envpython
#-*-coding:utf-8-*-
importre
importjson
importbase64

importrsa
importrequests


defjs_encrypt(text):
b64der='MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCp0wHYbg/NOPO3nzMD3dndwS0MccuMeXCHgVlGOoYyFwLdS24Im2e7YyhB0wrUsyYf0/nhzCzBK8ZC9eCWqd0aHbdgOQT6CuFQBMjbyGYvlVYU2ZP7kG9Ft6YV6oc9ambuO7nPZh+bvXH0zDKfi02prknrScAKC0XhadTHT3Al0QIDAQAB'
der=base64.standard_b64decode(b64der)

pk=rsa.PublicKey.load_pkcs1_openssl_der(der)
v1=rsa.encrypt(bytes(text,'utf8'),pk)
value=base64.encodebytes(v1).replace(b'\n',b'')
value=value.decode('utf8')

returnvalue


session=requests.Session()

i1=session.get('https://passport.cnblogs.com/user/signin')
rep=re.compile("'VerificationToken':'(.*)'")
v=re.search(rep,i1.text)
verification_token=v.group(1)

form_data={
'input1':js_encrypt('wptawy'),
'input2':js_encrypt('asdfasdf'),
'remember':False
}

i2=session.post(url='https://passport.cnblogs.com/user/signin',
data=json.dumps(form_data),
headers={
'Content-Type':'application/json;charset=UTF-8',
'X-Requested-With':'XMLHttpRequest',
'VerificationToken':verification_token}
)

i3=session.get(url='https://i.cnblogs.com/EditDiary.aspx')

print(i3.text)

Pythonrequests模块拉勾网自动登录

#!/usr/bin/envpython
#-*-coding:utf-8-*-

importrequests


#第一步：访问登陆页,拿到X_Anti_Forge_Token，X_Anti_Forge_Code
#1、请求url:https://passport.lagou.com/login/login.html
#2、请求方法:GET
#3、请求头:
#User-agent
r1=requests.get('https://passport.lagou.com/login/login.html',
headers={
'User-Agent':'Mozilla/5.0(WindowsNT6.1;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/63.0.3239.132Safari/537.36',
},
)

X_Anti_Forge_Token=re.findall("X_Anti_Forge_Token='(.*?)'",r1.text,re.S)[0]
X_Anti_Forge_Code=re.findall("X_Anti_Forge_Code='(.*?)'",r1.text,re.S)[0]
print(X_Anti_Forge_Token,X_Anti_Forge_Code)
#print(r1.cookies.get_dict())
#第二步：登陆
#1、请求url:https://passport.lagou.com/login/login.json
#2、请求方法:POST
#3、请求头:
#cookie
#User-agent
#Referer:https://passport.lagou.com/login/login.html
#X-Anit-Forge-Code:53165984
#X-Anit-Forge-Token:3b6a2f62-80f0-428b-8efb-ef72fc100d78
#X-Requested-With:XMLHttpRequest
#4、请求体：
#isValidate:true
#username:15131252215
#password:ab18d270d7126ea65915c50288c22c0d
#request_form_verifyCode:''
#submit:''
r2=requests.post(
'https://passport.lagou.com/login/login.json',
headers={
'User-Agent':'Mozilla/5.0(WindowsNT6.1;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/63.0.3239.132Safari/537.36',
'Referer':'https://passport.lagou.com/login/login.html',
'X-Anit-Forge-Code':X_Anti_Forge_Code,
'X-Anit-Forge-Token':X_Anti_Forge_Token,
'X-Requested-With':'XMLHttpRequest'
},
data={
"isValidate":True,
'username':'15131255089',
'password':'ab18d270d7126ea65915c50288c22c0d',
'request_form_verifyCode':'',
'submit':''
},
cookies=r1.cookies.get_dict()
)
print(r2.text)

更多关于Pythonrequests模块基础使用方法请查看下面的相关链接

声明：本文内容来源于网络，版权归原作者所有，内容由互联网用户自发贡献自行上传，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任。如果您发现有涉嫌版权的内容，欢迎发送邮件至：czq8825#qq.com（发邮件时，请将#更换为@）进行举报，并提供相关证据，一经查实，本站将立刻删除涉嫌侵权内容。