Python requests模块基础使用方法实例及高级应用(自动登陆,抓取网页源码)实例详解
1、Pythonrequests模块说明
requests是使用Apache2licensed许可证的HTTP库。
用python编写。
比urllib2模块更简洁。
Request支持HTTP连接保持和连接池,支持使用cookie保持会话,支持文件上传,支持自动响应内容的编码,支持国际化的URL和POST数据自动编码。
在python内置模块的基础上进行了高度的封装,从而使得python进行网络请求时,变得人性化,使用Requests可以轻而易举的完成浏览器可有的任何操作。
现代,国际化,友好。
requests会自动实现持久连接keep-alive
2、Pythonrequests模块基础入门
1)导入模块
importrequests
2)发送请求的简洁
示例代码:获取一个网页(个人github)
importrequests
r=requests.get('https://github.com/Ranxf')#最基本的不带参数的get请求
r1=requests.get(url='http://dict.baidu.com/s',params={'wd':'python'})#带参数的get请求
我们还可以使用requests模块其它请求方法
1 requests.get(‘https://github.com/timeline.json') #GET请求
2 requests.post(“http://httpbin.org/post”) #POST请求
3 requests.put(“http://httpbin.org/put”) #PUT请求
4 requests.delete(“http://httpbin.org/delete”) #DELETE请求
5 requests.head(“http://httpbin.org/get”) #HEAD请求
6 requests.options(“http://httpbin.org/get”) #OPTIONS请求
3)为url传递参数
>>>url_params={'key':'value'}#字典传递参数,如果值为None的键不会被添加到url中
>>>r=requests.get('yoururl',params=url_params)
>>>print(r.url)
yoururl?key=value
4)响应的内容
r.encoding #获取当前的编码
r.encoding='utf-8' #设置编码
r.text #以encoding解析返回内容。字符串方式的响应体,会自动根据响应头部的字符编码进行解码。
r.content #以字节形式(二进制)返回。字节方式的响应体,会自动为你解码gzip和deflate压缩。
r.headers #以字典对象存储服务器响应头,但是这个字典比较特殊,字典键不区分大小写,若键不存在则返回None
r.status_code #响应状态码
r.raw #返回原始响应体,也就是urllib的response对象,使用r.raw.read()
r.ok #查看r.ok的布尔值便可以知道是否登陆成功
#*特殊方法*#
r.json() #Requests中内置的JSON解码器,以json形式返回,前提返回的内容确保是json格式的,不然解析出错会抛异常
r.raise_for_status() #失败请求(非200响应)抛出异常
post发送json请求:
importrequests
importjson
r=requests.post('https://api.github.com/some/endpoint',data=json.dumps({'some':'data'}))
print(r.json())
5)定制头和cookie信息
header={'user-agent':'my-app/0.0.1''}
cookie={'key':'value'}
r=requests.get/post('yoururl',headers=header,cookies=cookie)
data={'some':'data'}
headers={'content-type':'application/json',
'User-Agent':'Mozilla/5.0(X11;Ubuntu;Linuxx86_64;rv:22.0)Gecko/20100101Firefox/22.0'}
r=requests.post('https://api.github.com/some/endpoint',data=data,headers=headers)
print(r.text)
6)响应状态码
使用requests方法后,会返回一个response对象,其存储了服务器响应的内容,如上实例中已经提到的r.text、r.status_code……
获取文本方式的响应体实例:当你访问r.text之时,会使用其响应的文本编码进行解码,并且你可以修改其编码让r.text使用自定义的编码进行解码。
r=requests.get('http://www.itwhy.org')
print(r.text,'\n{}\n'.format('*'*79),r.encoding)
r.encoding='GBK'
print(r.text,'\n{}\n'.format('*'*79),r.encoding)
示例代码:
importrequests
r=requests.get('https://github.com/Ranxf')#最基本的不带参数的get请求
print(r.status_code)#获取返回状态
r1=requests.get(url='http://dict.baidu.com/s',params={'wd':'python'})#带参数的get请求
print(r1.url)
print(r1.text)#打印解码后的返回数据
运行结果:
/usr/bin/python3.5/home/rxf/python3_1000/1000/python3_server/python3_requests/demo1.py
200
http://dict.baidu.com/s?wd=python
…………
Processfinishedwithexitcode0
r.status_code #如果不是200,可以使用r.raise_for_status()抛出异常
7)响应
r.headers #返回字典类型,头信息
r.requests.headers #返回发送到服务器的头信息
r.cookies #返回cookie
r.history #返回重定向信息,当然可以在请求是加上allow_redirects=false阻止重定向
8)超时
r=requests.get('url',timeout=1)#设置秒数超时,仅对于连接有效
9)会话对象,能够跨请求保持某些参数
s=requests.Session()
s.auth=('auth','passwd')
s.headers={'key':'value'}
r=s.get('url')
r1=s.get('url1')
10)代理
proxies={'http':'ip1','https':'ip2'}
requests.get('url',proxies=proxies)
汇总:
#HTTP请求类型
#get类型
r=requests.get('https://github.com/timeline.json')
#post类型
r=requests.post("http://m.ctrip.com/post")
#put类型
r=requests.put("http://m.ctrip.com/put")
#delete类型
r=requests.delete("http://m.ctrip.com/delete")
#head类型
r=requests.head("http://m.ctrip.com/head")
#options类型
r=requests.options("http://m.ctrip.com/get")
#获取响应内容
print(r.content)#以字节的方式去显示,中文显示为字符
print(r.text)#以文本的方式去显示
#URL传递参数
payload={'keyword':'香港','salecityid':'2'}
r=requests.get("http://m.ctrip.com/webapp/tourvisa/visa_list",params=payload)
print(r.url)#示例为http://m.ctrip.com/webapp/tourvisa/visa_list?salecityid=2&keyword=香港
#获取/修改网页编码
r=requests.get('https://github.com/timeline.json')
print(r.encoding)
#json处理
r=requests.get('https://github.com/timeline.json')
print(r.json())#需要先importjson
#定制请求头
url='http://m.ctrip.com'
headers={'User-Agent':'Mozilla/5.0(Linux;Android4.2.1;en-us;Nexus4Build/JOP40D)AppleWebKit/535.19(KHTML,likeGecko)Chrome/18.0.1025.166MobileSafari/535.19'}
r=requests.post(url,headers=headers)
print(r.request.headers)
#复杂post请求
url='http://m.ctrip.com'
payload={'some':'data'}
r=requests.post(url,data=json.dumps(payload))#如果传递的payload是string而不是dict,需要先调用dumps方法格式化一下
#post多部分编码文件
url='http://m.ctrip.com'
files={'file':open('report.xls','rb')}
r=requests.post(url,files=files)
#响应状态码
r=requests.get('http://m.ctrip.com')
print(r.status_code)
#响应头
r=requests.get('http://m.ctrip.com')
print(r.headers)
print(r.headers['Content-Type'])
print(r.headers.get('content-type'))#访问响应头部分内容的两种方式
#Cookies
url='http://example.com/some/cookie/setting/url'
r=requests.get(url)
r.cookies['example_cookie_name']#读取cookies
url='http://m.ctrip.com/cookies'
cookies=dict(cookies_are='working')
r=requests.get(url,cookies=cookies)#发送cookies
#设置超时时间
r=requests.get('http://m.ctrip.com',timeout=0.001)
#设置访问代理
proxies={
"http":"http://10.10.1.10:3128",
"https":"http://10.10.1.100:4444",
}
r=requests.get('http://m.ctrip.com',proxies=proxies)
#如果代理需要用户名和密码,则需要这样:
proxies={
"http":"http://user:pass@10.10.1.10:3128/",
}
#HTTP请求类型
#get类型
r=requests.get('https://github.com/timeline.json')
#post类型
r=requests.post("http://m.ctrip.com/post")
#put类型
r=requests.put("http://m.ctrip.com/put")
#delete类型
r=requests.delete("http://m.ctrip.com/delete")
#head类型
r=requests.head("http://m.ctrip.com/head")
#options类型
r=requests.options("http://m.ctrip.com/get")
#获取响应内容
print(r.content)#以字节的方式去显示,中文显示为字符
print(r.text)#以文本的方式去显示
#URL传递参数
payload={'keyword':'香港','salecityid':'2'}
r=requests.get("http://m.ctrip.com/webapp/tourvisa/visa_list",params=payload)
print(r.url)#示例为http://m.ctrip.com/webapp/tourvisa/visa_list?salecityid=2&keyword=香港
#获取/修改网页编码
r=requests.get('https://github.com/timeline.json')
print(r.encoding)
#json处理
r=requests.get('https://github.com/timeline.json')
print(r.json())#需要先importjson
#定制请求头
url='http://m.ctrip.com'
headers={'User-Agent':'Mozilla/5.0(Linux;Android4.2.1;en-us;Nexus4Build/JOP40D)AppleWebKit/535.19(KHTML,likeGecko)Chrome/18.0.1025.166MobileSafari/535.19'}
r=requests.post(url,headers=headers)
print(r.request.headers)
#复杂post请求
url='http://m.ctrip.com'
payload={'some':'data'}
r=requests.post(url,data=json.dumps(payload))#如果传递的payload是string而不是dict,需要先调用dumps方法格式化一下
#post多部分编码文件
url='http://m.ctrip.com'
files={'file':open('report.xls','rb')}
r=requests.post(url,files=files)
#响应状态码
r=requests.get('http://m.ctrip.com')
print(r.status_code)
#响应头
r=requests.get('http://m.ctrip.com')
print(r.headers)
print(r.headers['Content-Type'])
print(r.headers.get('content-type'))#访问响应头部分内容的两种方式
#Cookies
url='http://example.com/some/cookie/setting/url'
r=requests.get(url)
r.cookies['example_cookie_name']#读取cookies
url='http://m.ctrip.com/cookies'
cookies=dict(cookies_are='working')
r=requests.get(url,cookies=cookies)#发送cookies
#设置超时时间
r=requests.get('http://m.ctrip.com',timeout=0.001)
#设置访问代理
proxies={
"http":"http://10.10.1.10:3128",
"https":"http://10.10.1.100:4444",
}
r=requests.get('http://m.ctrip.com',proxies=proxies)
#如果代理需要用户名和密码,则需要这样:
proxies={
"http":"http://user:pass@10.10.1.10:3128/",
}
3、示例代码
GET请求
#1、无参数实例
importrequests
ret=requests.get('https://github.com/timeline.json')
print(ret.url)
print(ret.text)
#2、有参数实例
importrequests
payload={'key1':'value1','key2':'value2'}
ret=requests.get("http://httpbin.org/get",params=payload)
print(ret.url)
print(ret.text)
POST请求
#1、基本POST实例
importrequests
payload={'key1':'value1','key2':'value2'}
ret=requests.post("http://httpbin.org/post",data=payload)
print(ret.text)
#2、发送请求头和数据实例
importrequests
importjson
url='https://api.github.com/some/endpoint'
payload={'some':'data'}
headers={'content-type':'application/json'}
ret=requests.post(url,data=json.dumps(payload),headers=headers)
print(ret.text)
print(ret.cookies)
请求参数
defrequest(method,url,**kwargs):
"""Constructsandsendsa:class:`Request
`. :parammethod:methodforthenew:class:`Request`object.
:paramurl:URLforthenew:class:`Request`object.
:paramparams:(optional)Dictionaryorbytestobesentinthequerystringforthe:class:`Request`.
:paramdata:(optional)Dictionary,bytes,orfile-likeobjecttosendinthebodyofthe:class:`Request`.
:paramjson:(optional)jsondatatosendinthebodyofthe:class:`Request`.
:paramheaders:(optional)DictionaryofHTTPHeaderstosendwiththe:class:`Request`.
:paramcookies:(optional)DictorCookieJarobjecttosendwiththe:class:`Request`.
:paramfiles:(optional)Dictionaryof``'name':file-like-objects``(or``{'name':file-tuple}``)formultipartencodingupload.
``file-tuple``canbea2-tuple``('filename',fileobj)``,3-tuple``('filename',fileobj,'content_type')``
ora4-tuple``('filename',fileobj,'content_type',custom_headers)``,where``'content-type'``isastring
definingthecontenttypeofthegivenfileand``custom_headers``adict-likeobjectcontainingadditionalheaders
toaddforthefile.
:paramauth:(optional)AuthtupletoenableBasic/Digest/CustomHTTPAuth.
:paramtimeout:(optional)Howlongtowaitfortheservertosenddata
beforegivingup,asafloat,ora:ref:`(connecttimeout,read
timeout)
`tuple. :typetimeout:floatortuple
:paramallow_redirects:(optional)Boolean.SettoTrueifPOST/PUT/DELETEredirectfollowingisallowed.
:typeallow_redirects:bool
:paramproxies:(optional)DictionarymappingprotocoltotheURLoftheproxy.
:paramverify:(optional)whethertheSSLcertwillbeverified.ACA_BUNDLEpathcanalsobeprovided.Defaultsto``True``.
:paramstream:(optional)if``False``,theresponsecontentwillbeimmediatelydownloaded.
:paramcert:(optional)ifString,pathtosslclientcertfile(.pem).IfTuple,('cert','key')pair.
:return::class:`Response
`object :rtype:requests.Response
Usage::
>>>importrequests
>>>req=requests.request('GET','http://httpbin.org/get')
参数示例代码
defparam_method_url():
#requests.request(method='get',url='http://127.0.0.1:8000/test/')
#requests.request(method='post',url='http://127.0.0.1:8000/test/')
pass
defparam_param():
#-可以是字典
#-可以是字符串
#-可以是字节(ascii编码以内)
#requests.request(method='get',
#url='http://127.0.0.1:8000/test/',
#params={'k1':'v1','k2':'水电费'})
#requests.request(method='get',
#url='http://127.0.0.1:8000/test/',
#params="k1=v1&k2=水电费&k3=v3&k3=vv3")
#requests.request(method='get',
#url='http://127.0.0.1:8000/test/',
#params=bytes("k1=v1&k2=k2&k3=v3&k3=vv3",encoding='utf8'))
#错误
#requests.request(method='get',
#url='http://127.0.0.1:8000/test/',
#params=bytes("k1=v1&k2=水电费&k3=v3&k3=vv3",encoding='utf8'))
pass
defparam_data():
#可以是字典
#可以是字符串
#可以是字节
#可以是文件对象
#requests.request(method='POST',
#url='http://127.0.0.1:8000/test/',
#data={'k1':'v1','k2':'水电费'})
#requests.request(method='POST',
#url='http://127.0.0.1:8000/test/',
#data="k1=v1;k2=v2;k3=v3;k3=v4"
#)
#requests.request(method='POST',
#url='http://127.0.0.1:8000/test/',
#data="k1=v1;k2=v2;k3=v3;k3=v4",
#headers={'Content-Type':'application/x-www-form-urlencoded'}
#)
#requests.request(method='POST',
#url='http://127.0.0.1:8000/test/',
#data=open('data_file.py',mode='r',encoding='utf-8'),#文件内容是:k1=v1;k2=v2;k3=v3;k3=v4
#headers={'Content-Type':'application/x-www-form-urlencoded'}
#)
pass
defparam_json():
#将json中对应的数据进行序列化成一个字符串,json.dumps(...)
#然后发送到服务器端的body中,并且Content-Type是{'Content-Type':'application/json'}
requests.request(method='POST',
url='http://127.0.0.1:8000/test/',
json={'k1':'v1','k2':'水电费'})
defparam_headers():
#发送请求头到服务器端
requests.request(method='POST',
url='http://127.0.0.1:8000/test/',
json={'k1':'v1','k2':'水电费'},
headers={'Content-Type':'application/x-www-form-urlencoded'}
)
defparam_cookies():
#发送Cookie到服务器端
requests.request(method='POST',
url='http://127.0.0.1:8000/test/',
data={'k1':'v1','k2':'v2'},
cookies={'cook1':'value1'},
)
#也可以使用CookieJar(字典形式就是在此基础上封装)
fromhttp.cookiejarimportCookieJar
fromhttp.cookiejarimportCookie
obj=CookieJar()
obj.set_cookie(Cookie(version=0,name='c1',value='v1',port=None,domain='',path='/',secure=False,expires=None,
discard=True,comment=None,comment_url=None,rest={'HttpOnly':None},rfc2109=False,
port_specified=False,domain_specified=False,domain_initial_dot=False,path_specified=False)
)
requests.request(method='POST',
url='http://127.0.0.1:8000/test/',
data={'k1':'v1','k2':'v2'},
cookies=obj)
defparam_files():
#发送文件
#file_dict={
#'f1':open('readme','rb')
#}
#requests.request(method='POST',
#url='http://127.0.0.1:8000/test/',
#files=file_dict)
#发送文件,定制文件名
#file_dict={
#'f1':('test.txt',open('readme','rb'))
#}
#requests.request(method='POST',
#url='http://127.0.0.1:8000/test/',
#files=file_dict)
#发送文件,定制文件名
#file_dict={
#'f1':('test.txt',"hahsfaksfa9kasdjflaksdjf")
#}
#requests.request(method='POST',
#url='http://127.0.0.1:8000/test/',
#files=file_dict)
#发送文件,定制文件名
#file_dict={
#'f1':('test.txt',"hahsfaksfa9kasdjflaksdjf",'application/text',{'k1':'0'})
#}
#requests.request(method='POST',
#url='http://127.0.0.1:8000/test/',
#files=file_dict)
pass
defparam_auth():
fromrequests.authimportHTTPBasicAuth,HTTPDigestAuth
ret=requests.get('https://api.github.com/user',auth=HTTPBasicAuth('wupeiqi','sdfasdfasdf'))
print(ret.text)
#ret=requests.get('http://192.168.1.1',
#auth=HTTPBasicAuth('admin','admin'))
#ret.encoding='gbk'
#print(ret.text)
#ret=requests.get('http://httpbin.org/digest-auth/auth/user/pass',auth=HTTPDigestAuth('user','pass'))
#print(ret)
#
defparam_timeout():
#ret=requests.get('http://google.com/',timeout=1)
#print(ret)
#ret=requests.get('http://google.com/',timeout=(5,1))
#print(ret)
pass
defparam_allow_redirects():
ret=requests.get('http://127.0.0.1:8000/test/',allow_redirects=False)
print(ret.text)
defparam_proxies():
#proxies={
#"http":"61.172.249.96:80",
#"https":"http://61.185.219.126:3128",
#}
#proxies={'http://10.20.1.128':'http://10.10.1.10:5323'}
#ret=requests.get("http://www.proxy360.cn/Proxy",proxies=proxies)
#print(ret.headers)
#fromrequests.authimportHTTPProxyAuth
#
#proxyDict={
#'http':'77.75.105.165',
#'https':'77.75.105.165'
#}
#auth=HTTPProxyAuth('username','mypassword')
#
#r=requests.get("http://www.google.com",proxies=proxyDict,auth=auth)
#print(r.text)
pass
defparam_stream():
ret=requests.get('http://127.0.0.1:8000/test/',stream=True)
print(ret.content)
ret.close()
#fromcontextlibimportclosing
#withclosing(requests.get('http://httpbin.org/get',stream=True))asr:
##在此处理响应。
#foriinr.iter_content():
#print(i)
defrequests_session():
importrequests
session=requests.Session()
###1、首先登陆任何页面,获取cookie
i1=session.get(url="http://dig.chouti.com/help/service")
###2、用户登陆,携带上一次的cookie,后台对cookie中的gpsd进行授权
i2=session.post(
url="http://dig.chouti.com/login",
data={
'phone':"8615131255089",
'password':"xxxxxx",
'oneMonth':""
}
)
i3=session.post(
url="http://dig.chouti.com/link/vote?linksId=8589623",
)
print(i3.text)
json请求:
#!/usr/bin/python3
importrequests
importjson
classurl_request():
def__init__(self):
'''init'''
if__name__=='__main__':
heard={'Content-Type':'application/json'}
payload={'CountryName':'中国',
'ProvinceName':'四川省',
'L1CityName':'chengdu',
'L2CityName':'yibing',
'TownName':'',
'Longitude':'107.33393',
'Latitude':'33.157131',
'Language':'CN'}
r=requests.post("http://www.xxxxxx.com/CityLocation/json/LBSLocateCity",heards=heard,data=payload)
data=r.json()
ifr.status_code!=200:
print('LBSLocateCityAPIError'+str(r.status_code))
print(data['CityEntities'][0]['CityID'])#打印返回json中的某个key的value
print(data['ResponseStatus']['Ack'])
print(json.dump(data,indent=4,sort_keys=True,ensure_ascii=False))#树形打印json,ensure_ascii必须设为False否则中文会显示为unicode
Xml请求:
#!/usr/bin/python3
importrequests
classurl_request():
def__init__(self):
"""init"""
if__name__=='__main__':
heards={'Content-type':'text/xml'}
XML='WeChatJSTicket.JobWS.Job.JobRefreshTicket,WeChatJSTicket.JobWS RUN 1127.0.0.1 1 false '
url='http://jobws.push.mobile.xxxxxxxx.com/RefreshWeiXInTokenJob/RefreshService.asmx'
r=requests.post(url=url,heards=heards,data=XML)
data=r.text
print(data)
状态异常处理
importrequests
URL='http://ip.taobao.com/service/getIpInfo.php'#淘宝IP地址库API
try:
r=requests.get(URL,params={'ip':'8.8.8.8'},timeout=1)
r.raise_for_status()#如果响应状态码不是200,就主动抛出异常
exceptrequests.RequestExceptionase:
print(e)
else:
result=r.json()
print(type(result),result,sep='\n')
上传文件
使用request模块,也可以上传文件,文件的类型会自动进行处理:
importrequests
url='http://127.0.0.1:8080/upload'
files={'file':open('/home/rxf/test.jpg','rb')}
#files={'file':('report.jpg',open('/home/lyb/sjzl.mpg','rb'))}#显式的设置文件名
r=requests.post(url,files=files)
print(r.text)
request更加方便的是,可以把字符串当作文件进行上传:
importrequests
url='http://127.0.0.1:8080/upload'
files={'file':('test.txt',b'HelloRequests.')}#必需显式的设置文件名
r=requests.post(url,files=files)
print(r.text)
身份验证
基本身份认证(HTTPBasicAuth)
importrequests
fromrequests.authimportHTTPBasicAuth
r=requests.get('https://httpbin.org/hidden-basic-auth/user/passwd',auth=HTTPBasicAuth('user','passwd'))
#r=requests.get('https://httpbin.org/hidden-basic-auth/user/passwd',auth=('user','passwd'))#简写
print(r.json())
另一种非常流行的HTTP身份认证形式是摘要式身份认证,Requests对它的支持也是开箱即可用的:
requests.get(URL,auth=HTTPDigestAuth('user','pass')
Cookies与会话对象
如果某个响应中包含一些Cookie,你可以快速访问它们:
importrequests
r=requests.get('http://www.google.com.hk/')
print(r.cookies['NID'])
print(tuple(r.cookies))
要想发送你的cookies到服务器,可以使用cookies参数:
importrequests
url='http://httpbin.org/cookies'
cookies={'testCookies_1':'Hello_Python3','testCookies_2':'Hello_Requests'}
#在CookieVersion0中规定空格、方括号、圆括号、等于号、逗号、双引号、斜杠、问号、@,冒号,分号等特殊符号都不能作为Cookie的内容。
r=requests.get(url,cookies=cookies)
print(r.json())
会话对象让你能够跨请求保持某些参数,最方便的是在同一个Session实例发出的所有请求之间保持cookies,且这些都是自动处理的,甚是方便。
下面就来一个真正的实例,如下是快盘签到脚本:
importrequests
headers={'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding':'gzip,deflate,compress',
'Accept-Language':'en-us;q=0.5,en;q=0.3',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'User-Agent':'Mozilla/5.0(X11;Ubuntu;Linuxx86_64;rv:22.0)Gecko/20100101Firefox/22.0'}
s=requests.Session()
s.headers.update(headers)
#s.auth=('superuser','123')
s.get('https://www.kuaipan.cn/account_login.htm')
_URL='http://www.kuaipan.cn/index.php'
s.post(_URL,params={'ac':'account','op':'login'},
data={'username':'****@foxmail.com','userpwd':'********','isajax':'yes'})
r=s.get(_URL,params={'ac':'zone','op':'taskdetail'})
print(r.json())
s.get(_URL,params={'ac':'common','op':'usersign'})
requests模块抓取网页源码并保存到文件示例
这是一个基本的文件保存操作,但这里有几个值得注意的问题:
1.安装requests包,命令行输入pipinstallrequests即可自动安装。很多人推荐使用requests,自带的urllib.request也可以抓取网页源码
2.open方法encoding参数设为utf-8,否则保存的文件会出现乱码。
3.如果直接在cmd中输出抓取的内容,会提示各种编码错误,所以保存到文件查看。
4.withopen方法是更好的写法,可以自动操作完毕后释放资源
Pythonrequests模块抽屉自动登录
#!/urs/bin/python3
importrequests
'''requests模块抓取网页源码并保存到文件示例'''
html=requests.get("http://www.baidu.com")
withopen('test.txt','w',encoding='utf-8')asf:
f.write(html.text)
'''读取一个txt文件,每次读取一行,并保存到另一个txt文件中的示例'''
ff=open('testt.txt','w',encoding='utf-8')
withopen('test.txt',encoding="utf-8")asf:
forlineinf:
ff.write(line)
ff.close()
因为在命令行中打印每次读取一行的数据,中文会出现编码错误,所以每次读取一行并保存到另一个文件,这样来测试读取是否正常。(注意open的时候制定encoding编码方式)
Pythonrequests模块自动登陆实例:
#!/usr/bin/envpython
#-*-coding:utf-8-*-
importrequests
###############方式一##############
"""
###1、首先登陆任何页面,获取cookie
i1=requests.get(url="http://dig.chouti.com/help/service")
i1_cookies=i1.cookies.get_dict()
###2、用户登陆,携带上一次的cookie,后台对cookie中的gpsd进行授权
i2=requests.post(
url="http://dig.chouti.com/login",
data={
'phone':"8615131255089",
'password':"xxooxxoo",
'oneMonth':""
},
cookies=i1_cookies
)
###3、点赞(只需要携带已经被授权的gpsd即可)
gpsd=i1_cookies['gpsd']
i3=requests.post(
url="http://dig.chouti.com/link/vote?linksId=8589523",
cookies={'gpsd':gpsd}
)
print(i3.text)
"""
###############方式二##############
"""
importrequests
session=requests.Session()
i1=session.get(url="http://dig.chouti.com/help/service")
i2=session.post(
url="http://dig.chouti.com/login",
data={
'phone':"8615131255089",
'password':"xxooxxoo",
'oneMonth':""
}
)
i3=session.post(
url="http://dig.chouti.com/link/vote?linksId=8589523"
)
print(i3.text)
"""
Pythonrequests模块github自动登录
#!/usr/bin/envpython
#-*-coding:utf-8-*-
importrequests
frombs4importBeautifulSoup
###############方式一##############
#
##1.访问登陆页面,获取authenticity_token
#i1=requests.get('https://github.com/login')
#soup1=BeautifulSoup(i1.text,features='lxml')
#tag=soup1.find(name='input',attrs={'name':'authenticity_token'})
#authenticity_token=tag.get('value')
#c1=i1.cookies.get_dict()
#i1.close()
#
##1.携带authenticity_token和用户名密码等信息,发送用户验证
#form_data={
#"authenticity_token":authenticity_token,
#"utf8":"",
#"commit":"Signin",
#"login":"wupeiqi@live.com",
#'password':'xxoo'
#}
#
#i2=requests.post('https://github.com/session',data=form_data,cookies=c1)
#c2=i2.cookies.get_dict()
#c1.update(c2)
#i3=requests.get('https://github.com/settings/repositories',cookies=c1)
#
#soup3=BeautifulSoup(i3.text,features='lxml')
#list_group=soup3.find(name='div',class_='listgroup')
#
#frombs4.elementimportTag
#
#forchildinlist_group.children:
#ifisinstance(child,Tag):
#project_tag=child.find(name='a',class_='mr-1')
#size_tag=child.find(name='small')
#temp="项目:%s(%s);项目路径:%s"%(project_tag.get('href'),size_tag.string,project_tag.string,)
#print(temp)
###############方式二##############
#session=requests.Session()
##1.访问登陆页面,获取authenticity_token
#i1=session.get('https://github.com/login')
#soup1=BeautifulSoup(i1.text,features='lxml')
#tag=soup1.find(name='input',attrs={'name':'authenticity_token'})
#authenticity_token=tag.get('value')
#c1=i1.cookies.get_dict()
#i1.close()
#
##1.携带authenticity_token和用户名密码等信息,发送用户验证
#form_data={
#"authenticity_token":authenticity_token,
#"utf8":"",
#"commit":"Signin",
#"login":"wupeiqi@live.com",
#'password':'xxoo'
#}
#
#i2=session.post('https://github.com/session',data=form_data)
#c2=i2.cookies.get_dict()
#c1.update(c2)
#i3=session.get('https://github.com/settings/repositories')
#
#soup3=BeautifulSoup(i3.text,features='lxml')
#list_group=soup3.find(name='div',class_='listgroup')
#
#frombs4.elementimportTag
#
#forchildinlist_group.children:
#ifisinstance(child,Tag):
#project_tag=child.find(name='a',class_='mr-1')
#size_tag=child.find(name='small')
#temp="项目:%s(%s);项目路径:%s"%(project_tag.get('href'),size_tag.string,project_tag.string,)
#print(temp)
Pythonrequests模块知乎自动登录
#!/usr/bin/envpython
#-*-coding:utf-8-*-
importtime
importrequests
frombs4importBeautifulSoup
session=requests.Session()
i1=session.get(
url='https://www.zhihu.com/#signin',
headers={
'User-Agent':'Mozilla/5.0(Macintosh;IntelMacOSX10_10_5)AppleWebKit/537.36(KHTML,likeGecko)Chrome/54.0.2840.98Safari/537.36',
}
)
soup1=BeautifulSoup(i1.text,'lxml')
xsrf_tag=soup1.find(name='input',attrs={'name':'_xsrf'})
xsrf=xsrf_tag.get('value')
current_time=time.time()
i2=session.get(
url='https://www.zhihu.com/captcha.gif',
params={'r':current_time,'type':'login'},
headers={
'User-Agent':'Mozilla/5.0(Macintosh;IntelMacOSX10_10_5)AppleWebKit/537.36(KHTML,likeGecko)Chrome/54.0.2840.98Safari/537.36',
})
withopen('zhihu.gif','wb')asf:
f.write(i2.content)
captcha=input('请打开zhihu.gif文件,查看并输入验证码:')
form_data={
"_xsrf":xsrf,
'password':'xxooxxoo',
"captcha":'captcha',
'email':'424662508@qq.com'
}
i3=session.post(
url='https://www.zhihu.com/login/email',
data=form_data,
headers={
'User-Agent':'Mozilla/5.0(Macintosh;IntelMacOSX10_10_5)AppleWebKit/537.36(KHTML,likeGecko)Chrome/54.0.2840.98Safari/537.36',
}
)
i4=session.get(
url='https://www.zhihu.com/settings/profile',
headers={
'User-Agent':'Mozilla/5.0(Macintosh;IntelMacOSX10_10_5)AppleWebKit/537.36(KHTML,likeGecko)Chrome/54.0.2840.98Safari/537.36',
}
)
soup4=BeautifulSoup(i4.text,'lxml')
tag=soup4.find(id='rename-section')
nick_name=tag.find('span',class_='name').string
print(nick_name)
Pythonrequests模块博客园自动登录
#!/usr/bin/envpython
#-*-coding:utf-8-*-
importre
importjson
importbase64
importrsa
importrequests
defjs_encrypt(text):
b64der='MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCp0wHYbg/NOPO3nzMD3dndwS0MccuMeXCHgVlGOoYyFwLdS24Im2e7YyhB0wrUsyYf0/nhzCzBK8ZC9eCWqd0aHbdgOQT6CuFQBMjbyGYvlVYU2ZP7kG9Ft6YV6oc9ambuO7nPZh+bvXH0zDKfi02prknrScAKC0XhadTHT3Al0QIDAQAB'
der=base64.standard_b64decode(b64der)
pk=rsa.PublicKey.load_pkcs1_openssl_der(der)
v1=rsa.encrypt(bytes(text,'utf8'),pk)
value=base64.encodebytes(v1).replace(b'\n',b'')
value=value.decode('utf8')
returnvalue
session=requests.Session()
i1=session.get('https://passport.cnblogs.com/user/signin')
rep=re.compile("'VerificationToken':'(.*)'")
v=re.search(rep,i1.text)
verification_token=v.group(1)
form_data={
'input1':js_encrypt('wptawy'),
'input2':js_encrypt('asdfasdf'),
'remember':False
}
i2=session.post(url='https://passport.cnblogs.com/user/signin',
data=json.dumps(form_data),
headers={
'Content-Type':'application/json;charset=UTF-8',
'X-Requested-With':'XMLHttpRequest',
'VerificationToken':verification_token}
)
i3=session.get(url='https://i.cnblogs.com/EditDiary.aspx')
print(i3.text)
Pythonrequests模块拉勾网自动登录
#!/usr/bin/envpython
#-*-coding:utf-8-*-
importrequests
#第一步:访问登陆页,拿到X_Anti_Forge_Token,X_Anti_Forge_Code
#1、请求url:https://passport.lagou.com/login/login.html
#2、请求方法:GET
#3、请求头:
#User-agent
r1=requests.get('https://passport.lagou.com/login/login.html',
headers={
'User-Agent':'Mozilla/5.0(WindowsNT6.1;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/63.0.3239.132Safari/537.36',
},
)
X_Anti_Forge_Token=re.findall("X_Anti_Forge_Token='(.*?)'",r1.text,re.S)[0]
X_Anti_Forge_Code=re.findall("X_Anti_Forge_Code='(.*?)'",r1.text,re.S)[0]
print(X_Anti_Forge_Token,X_Anti_Forge_Code)
#print(r1.cookies.get_dict())
#第二步:登陆
#1、请求url:https://passport.lagou.com/login/login.json
#2、请求方法:POST
#3、请求头:
#cookie
#User-agent
#Referer:https://passport.lagou.com/login/login.html
#X-Anit-Forge-Code:53165984
#X-Anit-Forge-Token:3b6a2f62-80f0-428b-8efb-ef72fc100d78
#X-Requested-With:XMLHttpRequest
#4、请求体:
#isValidate:true
#username:15131252215
#password:ab18d270d7126ea65915c50288c22c0d
#request_form_verifyCode:''
#submit:''
r2=requests.post(
'https://passport.lagou.com/login/login.json',
headers={
'User-Agent':'Mozilla/5.0(WindowsNT6.1;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/63.0.3239.132Safari/537.36',
'Referer':'https://passport.lagou.com/login/login.html',
'X-Anit-Forge-Code':X_Anti_Forge_Code,
'X-Anit-Forge-Token':X_Anti_Forge_Token,
'X-Requested-With':'XMLHttpRequest'
},
data={
"isValidate":True,
'username':'15131255089',
'password':'ab18d270d7126ea65915c50288c22c0d',
'request_form_verifyCode':'',
'submit':''
},
cookies=r1.cookies.get_dict()
)
print(r2.text)
更多关于Pythonrequests模块基础使用方法请查看下面的相关链接
声明:本文内容来源于网络,版权归原作者所有,内容由互联网用户自发贡献自行上传,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任。如果您发现有涉嫌版权的内容,欢迎发送邮件至:czq8825#qq.com(发邮件时,请将#更换为@)进行举报,并提供相关证据,一经查实,本站将立刻删除涉嫌侵权内容。