pandas read_excel()和to_excel()函数解析
前言
数据分析时候,需要将数据进行加载和存储,本文主要介绍和excel的交互。
read_excel()
加载函数为read_excel(),其具体参数如下。
read_excel(io,sheetname=0,header=0,skiprows=None,skip_footer=0,index_col=None,names=None,parse_cols=None,parse_dates=False,date_parser=None,na_values=None,thousands=None,convert_float=True,has_index_names=None,converters=None,dtype=None,true_values=None,false_values=None,engine=None,squeeze=False,**kwds)
常用参数解析:
- io:string,pathobject;excel路径。
- sheetname:string,int,mixedlistofstrings/ints,orNone,default0返回多表使用sheetname=[0,1],若sheetname=None是返回全表注意:int/string返回的是dataframe,而none和list返回的是dictofdataframe
- header:int,listofints,default0指定列名行,默认0,即取第一行,数据为列名行以下的数据若数据不含列名,则设定header=None
- skiprows:list-like,Rowstoskipatthebeginning,省略指定行数的数据
- skip_footer:int,default0,省略从尾部数的int行数据
- index_col:int,listofints,defaultNone指定列为索引列,也可以使用u”strings”
- names:array-like,defaultNone,指定列的名字。
数据源:
sheet1: IDNUM-1NUM-2NUM-3 36901142168661 3690278521602 36903144600521 3690495457468 3690569596695 sheet2: IDNUM-1NUM-2NUM-3 36906190527691 36907101403470
(1)函数原型
basestation="F://pythonBook_PyPDAM/data/test.xls" data=pd.read_excel(basestation) printdata
输出:是一个dataframe
IDNUM-1NUM-2NUM-3 036901142168661 13690278521602 236903144600521 33690495457468 43690569596695
(2)sheetname参数:返回多表使用sheetname=[0,1],若sheetname=None是返回全表注意:int/string返回的是dataframe,而none和list返回的是dictofdataframe
data_1=pd.read_excel(basestation,sheetname=[0,1]) printdata_1 printtype(data_1)
输出:dictofdataframe
OrderedDict([(0,IDNUM-1NUM-2NUM-3 036901142168661 13690278521602 236903144600521 33690495457468 43690569596695), (1,IDNUM-1NUM-2NUM-3 036906190527691 136907101403470)])
(3)header参数:指定列名行,默认0,即取第一行,数据为列名行以下的数据若数据不含列名,则设定header=None,注意这里还有列名的一行。
data=pd.read_excel(basestation,header=None) printdata 输出: 0123 0IDNUM-1NUM-2NUM-3 136901142168661 23690278521602 336903144600521 43690495457468 53690569596695 data=pd.read_excel(basestation,header=[3]) printdata 输出: 36903144600521 03690495457468 13690569596695
(4)skiprows参数:省略指定行数的数据
data=pd.read_excel(basestation,skiprows=[1]) printdata 输出: IDNUM-1NUM-2NUM-3 03690278521602 136903144600521 23690495457468 33690569596695
(5)skip_footer参数:省略从尾部数的int行的数据
data=pd.read_excel(basestation,skip_footer=3) printdata 输出: IDNUM-1NUM-2NUM-3 036901142168661 13690278521602
(6)index_col参数:指定列为索引列,也可以使用u”strings”
data=pd.read_excel(basestation,index_col="NUM-3") printdata 输出: IDNUM-1NUM-2 NUM-3 66136901142168 6023690278521 52136903144600 4683690495457 6953690569596
(7)names参数:指定列的名字。
data=pd.read_excel(basestation,names=["a","b","c","e"]) printdata abce 036901142168661 13690278521602 236903144600521 33690495457468 43690569596695
具体参数如下:
>>>printhelp(pandas.read_excel)
Helponfunctionread_excelinmodulepandas.io.excel:
read_excel(io,sheetname=0,header=0,skiprows=None,skip_footer=0,index_col=None,names=None,parse_cols=None,parse_dates=False,date_parser=None,na_values=None,thousands=None,convert_float=True,has_index_names=None,converters=None,dtype=None,true_values=None,false_values=None,engine=None,squeeze=False,**kwds)
ReadanExceltableintoapandasDataFrame
Parameters
----------
io:string,pathobject(pathlib.Pathorpy._path.local.LocalPath),
file-likeobject,pandasExcelFile,orxlrdworkbook.
ThestringcouldbeaURL.ValidURLschemesincludehttp,ftp,s3,
andfile.ForfileURLs,ahostisexpected.Forinstance,alocal
filecouldbefile://localhost/path/to/workbook.xlsx
sheetname:string,int,mixedlistofstrings/ints,orNone,default0
Stringsareusedforsheetnames,Integersareusedinzero-indexed
sheetpositions.
Listsofstrings/integersareusedtorequestmultiplesheets.
SpecifyNonetogetallsheets.
str|int->DataFrameisreturned.
list|None->DictofDataFramesisreturned,withkeysrepresenting
sheets.
AvailableCases
*Defaultsto0->1stsheetasaDataFrame
*1->2ndsheetasaDataFrame
*"Sheet1"->1stsheetasaDataFrame
*[0,1,"Sheet5"]->1st,2nd&5thsheetasadictionaryofDataFrames
*None->AllsheetsasadictionaryofDataFrames
header:int,listofints,default0
Row(0-indexed)touseforthecolumnlabelsoftheparsed
DataFrame.Ifalistofintegersispassedthoserowpositionswill
becombinedintoa``MultiIndex``
skiprows:list-like
Rowstoskipatthebeginning(0-indexed)
skip_footer:int,default0
Rowsattheendtoskip(0-indexed)
index_col:int,listofints,defaultNone
Column(0-indexed)touseastherowlabelsoftheDataFrame.
PassNoneifthereisnosuchcolumn.Ifalistispassed,
thosecolumnswillbecombinedintoa``MultiIndex``.Ifa
subsetofdataisselectedwith``parse_cols``,index_col
isbasedonthesubset.
names:array-like,defaultNone
Listofcolumnnamestouse.Iffilecontainsnoheaderrow,
thenyoushouldexplicitlypassheader=None
converters:dict,defaultNone
Dictoffunctionsforconvertingvaluesincertaincolumns.Keyscan
eitherbeintegersorcolumnlabels,valuesarefunctionsthattakeone
inputargument,theExcelcellcontent,andreturnthetransformed
content.
dtype:Typenameordictofcolumn->type,defaultNone
Datatypefordataorcolumns.E.g.{'a':np.float64,'b':np.int32}
Use`object`topreservedataasstoredinExcelandnotinterpretdtype.
Ifconvertersarespecified,theywillbeappliedINSTEAD
ofdtypeconversion.
..versionadded::0.20.0
true_values:list,defaultNone
ValuestoconsiderasTrue
..versionadded::0.19.0
false_values:list,defaultNone
ValuestoconsiderasFalse
..versionadded::0.19.0
parse_cols:intorlist,defaultNone
*IfNonethenparseallcolumns,
*Ifintthenindicateslastcolumntobeparsed
*Iflistofintsthenindicateslistofcolumnnumberstobeparsed
*IfstringthenindicatescommaseparatedlistofExcelcolumnlettersand
columnranges(e.g."A:E"or"A,C,E:F").Rangesareinclusiveof
bothsides.
squeeze:boolean,defaultFalse
IftheparseddataonlycontainsonecolumnthenreturnaSeries
na_values:scalar,str,list-like,ordict,defaultNone
AdditionalstringstorecognizeasNA/NaN.Ifdictpassed,specific
per-columnNAvalues.Bydefaultthefollowingvaluesareinterpreted
asNaN:'','#N/A','#N/AN/A','#NA','-1.#IND','-1.#QNAN','-NaN','-nan',
'1.#IND','1.#QNAN','N/A','NA','NULL','NaN','nan'.
thousands:str,defaultNone
Thousandsseparatorforparsingstringcolumnstonumeric.Notethat
thisparameterisonlynecessaryforcolumnsstoredasTEXTinExcel,
anynumericcolumnswillautomaticallybeparsed,regardlessofdisplay
format.
keep_default_na:bool,defaultTrue
Ifna_valuesarespecifiedandkeep_default_naisFalsethedefaultNaN
valuesareoverridden,otherwisethey'reappendedto.
verbose:boolean,defaultFalse
IndicatenumberofNAvaluesplacedinnon-numericcolumns
engine:string,defaultNone
Ifioisnotabufferorpath,thismustbesettoidentifyio.
AcceptablevaluesareNoneorxlrd
convert_float:boolean,defaultTrue
convertintegralfloatstoint(i.e.,1.0-->1).IfFalse,allnumeric
datawillbereadinasfloats:Excelstoresallnumbersasfloats
internally
has_index_names:boolean,defaultNone
DEPRECATED:forversion0.17+indexnameswillbeautomatically
inferredbasedonindex_col.ToreadExceloutputfrom0.16.2and
priorthathadsavedindexnames,useTrue.
Returns
to_excel()
存储函数为pd.DataFrame.to_excel(),注意,必须是DataFrame写入excel,即WriteDataFrametoanexcelsheet。其具体参数如下:
to_excel(self,excel_writer,sheet_name='Sheet1',na_rep='',float_format=None,columns=None,header=True,index=True,index_label=None,startrow=0,startcol=0,engine=None,merge_cells=True,encoding=None, inf_rep='inf',verbose=True,freeze_panes=None)
常用参数解析
- -excel_writer:stringorExcelWriterobjectFilepathorexistingExcelWriter目标路径
- -sheet_name:string,default‘Sheet1'NameofsheetwhichwillcontainDataFrame,填充excel的第几页
- -na_rep:string,default”,Missingdatarepresentation缺失值填充
- -float_format:string,defaultNoneFormatstringforfloatingpointnumbers
- -columns:sequence,optional,Columnstowrite选择输出的的列。
- -header:booleanorlistofstring,defaultTrueWriteoutcolumnnames.Ifalistofstringisgivenitisassumedtobealiasesforthecolumnnames
- -index:boolean,defaultTrue,Writerownames(index)
- -index_label:stringorsequence,defaultNone,Columnlabelforindexcolumn(s)ifdesired.IfNoneisgiven,andheaderandindexareTrue,thentheindexnamesareused.AsequenceshouldbegiveniftheDataFrameusesMultiIndex.
- -startrow:upperleftcellrowtodumpdataframe
- -startcol:upperleftcellcolumntodumpdataframe
- -engine:string,defaultNone,writeenginetouse-youcanalsosetthisviatheoptions,io.excel.xlsx.writer,io.excel.xls.writer,andio.excel.xlsm.writer.
- -merge_cells:boolean,defaultTrueWriteMultiIndexandHierarchicalRowsasmergedcells.
- -encoding:string,defaultNoneencodingoftheresultingexcelfile.Onlynecessaryforxlwt,otherwriterssupportunicodenatively.
- -inf_rep:string,default‘inf'Representationforinfinity(thereisnonativerepresentationforinfinityinExcel)
- -freeze_panes:tupleofinteger(length2),defaultNoneSpecifiestheone-basedbottommostrowandrightmostcolumnthatistobefrozen
数据源:
IDNUM-1NUM-2NUM-3 036901142168661 13690278521602 236903144600521 33690495457468 43690569596695 536906165453 加载数据: basestation="F://python/data/test.xls" basestation_end="F://python/data/test_end.xls" data=pd.read_excel(basestation)
(1)参数excel_writer,输出路径。
data.to_excel(basestation_end) 输出: IDNUM-1NUM-2NUM-3 036901142168661 13690278521602 236903144600521 33690495457468 43690569596695 536906165453
(2)sheet_name,将数据存储在excel的那个sheet页面。
data.to_excel(basestation_end,sheet_name="sheet2")
(3)na_rep,缺失值填充
data.to_excel(basestation_end,na_rep="NULL") 输出: IDNUM-1NUM-2NUM-3 036901142168661 13690278521602 236903144600521 33690495457468 43690569596695 536906165453NULL
(4)colums参数:sequence,optional,Columnstowrite选择输出的的列。
data.to_excel(basestation_end,columns=["ID"]) 输出 ID 036901 136902 236903 336904 436905 536906
(5)header参数:booleanorlistofstring,默认为True,可以用list命名列的名字。header=False则不输出题头。
data.to_excel(basestation_end,header=["a","b","c","d"]) 输出: abcd 036901142168661 13690278521602 236903144600521 33690495457468 43690569596695 536906165453 data.to_excel(basestation_end,header=False,columns=["ID"]) header=False则不输出题头 输出: 036901 136902 236903 336904 436905 536906
(6)index:boolean,defaultTrueWriterownames(index)
默认为True,显示index,当index=False则不显示行索引(名字)。
index_label:stringorsequence,defaultNone
设置索引列的列名。
data.to_excel(basestation_end,index=False) 输出: IDNUM-1NUM-2NUM-3 36901142168661 3690278521602 36903144600521 3690495457468 3690569596695 36906165453 data.to_excel(basestation_end,index_label=["f"]) 输出: fIDNUM-1NUM-2NUM-3 036901142168661 13690278521602 236903144600521 33690495457468 43690569596695 536906165453
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持毛票票。