pandas read_excel()和to_excel()函数解析
前言
数据分析时候,需要将数据进行加载和存储,本文主要介绍和excel的交互。
read_excel()
加载函数为read_excel(),其具体参数如下。
read_excel(io,sheetname=0,header=0,skiprows=None,skip_footer=0,index_col=None,names=None,parse_cols=None,parse_dates=False,date_parser=None,na_values=None,thousands=None,convert_float=True,has_index_names=None,converters=None,dtype=None,true_values=None,false_values=None,engine=None,squeeze=False,**kwds)
常用参数解析:
- io:string,pathobject;excel路径。
- sheetname:string,int,mixedlistofstrings/ints,orNone,default0返回多表使用sheetname=[0,1],若sheetname=None是返回全表注意:int/string返回的是dataframe,而none和list返回的是dictofdataframe
- header:int,listofints,default0指定列名行,默认0,即取第一行,数据为列名行以下的数据若数据不含列名,则设定header=None
- skiprows:list-like,Rowstoskipatthebeginning,省略指定行数的数据
- skip_footer:int,default0,省略从尾部数的int行数据
- index_col:int,listofints,defaultNone指定列为索引列,也可以使用u”strings”
- names:array-like,defaultNone,指定列的名字。
数据源:
sheet1: IDNUM-1NUM-2NUM-3 36901142168661 3690278521602 36903144600521 3690495457468 3690569596695 sheet2: IDNUM-1NUM-2NUM-3 36906190527691 36907101403470
(1)函数原型
basestation="F://pythonBook_PyPDAM/data/test.xls" data=pd.read_excel(basestation) printdata
输出:是一个dataframe
IDNUM-1NUM-2NUM-3 036901142168661 13690278521602 236903144600521 33690495457468 43690569596695
(2)sheetname参数:返回多表使用sheetname=[0,1],若sheetname=None是返回全表注意:int/string返回的是dataframe,而none和list返回的是dictofdataframe
data_1=pd.read_excel(basestation,sheetname=[0,1]) printdata_1 printtype(data_1)
输出:dictofdataframe
OrderedDict([(0,IDNUM-1NUM-2NUM-3 036901142168661 13690278521602 236903144600521 33690495457468 43690569596695), (1,IDNUM-1NUM-2NUM-3 036906190527691 136907101403470)])
(3)header参数:指定列名行,默认0,即取第一行,数据为列名行以下的数据若数据不含列名,则设定header=None,注意这里还有列名的一行。
data=pd.read_excel(basestation,header=None) printdata 输出: 0123 0IDNUM-1NUM-2NUM-3 136901142168661 23690278521602 336903144600521 43690495457468 53690569596695 data=pd.read_excel(basestation,header=[3]) printdata 输出: 36903144600521 03690495457468 13690569596695
(4)skiprows参数:省略指定行数的数据
data=pd.read_excel(basestation,skiprows=[1]) printdata 输出: IDNUM-1NUM-2NUM-3 03690278521602 136903144600521 23690495457468 33690569596695
(5)skip_footer参数:省略从尾部数的int行的数据
data=pd.read_excel(basestation,skip_footer=3) printdata 输出: IDNUM-1NUM-2NUM-3 036901142168661 13690278521602
(6)index_col参数:指定列为索引列,也可以使用u”strings”
data=pd.read_excel(basestation,index_col="NUM-3") printdata 输出: IDNUM-1NUM-2 NUM-3 66136901142168 6023690278521 52136903144600 4683690495457 6953690569596
(7)names参数:指定列的名字。
data=pd.read_excel(basestation,names=["a","b","c","e"]) printdata abce 036901142168661 13690278521602 236903144600521 33690495457468 43690569596695
具体参数如下:
>>>printhelp(pandas.read_excel) Helponfunctionread_excelinmodulepandas.io.excel: read_excel(io,sheetname=0,header=0,skiprows=None,skip_footer=0,index_col=None,names=None,parse_cols=None,parse_dates=False,date_parser=None,na_values=None,thousands=None,convert_float=True,has_index_names=None,converters=None,dtype=None,true_values=None,false_values=None,engine=None,squeeze=False,**kwds) ReadanExceltableintoapandasDataFrame Parameters ---------- io:string,pathobject(pathlib.Pathorpy._path.local.LocalPath), file-likeobject,pandasExcelFile,orxlrdworkbook. ThestringcouldbeaURL.ValidURLschemesincludehttp,ftp,s3, andfile.ForfileURLs,ahostisexpected.Forinstance,alocal filecouldbefile://localhost/path/to/workbook.xlsx sheetname:string,int,mixedlistofstrings/ints,orNone,default0 Stringsareusedforsheetnames,Integersareusedinzero-indexed sheetpositions. Listsofstrings/integersareusedtorequestmultiplesheets. SpecifyNonetogetallsheets. str|int->DataFrameisreturned. list|None->DictofDataFramesisreturned,withkeysrepresenting sheets. AvailableCases *Defaultsto0->1stsheetasaDataFrame *1->2ndsheetasaDataFrame *"Sheet1"->1stsheetasaDataFrame *[0,1,"Sheet5"]->1st,2nd&5thsheetasadictionaryofDataFrames *None->AllsheetsasadictionaryofDataFrames header:int,listofints,default0 Row(0-indexed)touseforthecolumnlabelsoftheparsed DataFrame.Ifalistofintegersispassedthoserowpositionswill becombinedintoa``MultiIndex`` skiprows:list-like Rowstoskipatthebeginning(0-indexed) skip_footer:int,default0 Rowsattheendtoskip(0-indexed) index_col:int,listofints,defaultNone Column(0-indexed)touseastherowlabelsoftheDataFrame. PassNoneifthereisnosuchcolumn.Ifalistispassed, thosecolumnswillbecombinedintoa``MultiIndex``.Ifa subsetofdataisselectedwith``parse_cols``,index_col isbasedonthesubset. names:array-like,defaultNone Listofcolumnnamestouse.Iffilecontainsnoheaderrow, thenyoushouldexplicitlypassheader=None converters:dict,defaultNone Dictoffunctionsforconvertingvaluesincertaincolumns.Keyscan eitherbeintegersorcolumnlabels,valuesarefunctionsthattakeone inputargument,theExcelcellcontent,andreturnthetransformed content. dtype:Typenameordictofcolumn->type,defaultNone Datatypefordataorcolumns.E.g.{'a':np.float64,'b':np.int32} Use`object`topreservedataasstoredinExcelandnotinterpretdtype. Ifconvertersarespecified,theywillbeappliedINSTEAD ofdtypeconversion. ..versionadded::0.20.0 true_values:list,defaultNone ValuestoconsiderasTrue ..versionadded::0.19.0 false_values:list,defaultNone ValuestoconsiderasFalse ..versionadded::0.19.0 parse_cols:intorlist,defaultNone *IfNonethenparseallcolumns, *Ifintthenindicateslastcolumntobeparsed *Iflistofintsthenindicateslistofcolumnnumberstobeparsed *IfstringthenindicatescommaseparatedlistofExcelcolumnlettersand columnranges(e.g."A:E"or"A,C,E:F").Rangesareinclusiveof bothsides. squeeze:boolean,defaultFalse IftheparseddataonlycontainsonecolumnthenreturnaSeries na_values:scalar,str,list-like,ordict,defaultNone AdditionalstringstorecognizeasNA/NaN.Ifdictpassed,specific per-columnNAvalues.Bydefaultthefollowingvaluesareinterpreted asNaN:'','#N/A','#N/AN/A','#NA','-1.#IND','-1.#QNAN','-NaN','-nan', '1.#IND','1.#QNAN','N/A','NA','NULL','NaN','nan'. thousands:str,defaultNone Thousandsseparatorforparsingstringcolumnstonumeric.Notethat thisparameterisonlynecessaryforcolumnsstoredasTEXTinExcel, anynumericcolumnswillautomaticallybeparsed,regardlessofdisplay format. keep_default_na:bool,defaultTrue Ifna_valuesarespecifiedandkeep_default_naisFalsethedefaultNaN valuesareoverridden,otherwisethey'reappendedto. verbose:boolean,defaultFalse IndicatenumberofNAvaluesplacedinnon-numericcolumns engine:string,defaultNone Ifioisnotabufferorpath,thismustbesettoidentifyio. AcceptablevaluesareNoneorxlrd convert_float:boolean,defaultTrue convertintegralfloatstoint(i.e.,1.0-->1).IfFalse,allnumeric datawillbereadinasfloats:Excelstoresallnumbersasfloats internally has_index_names:boolean,defaultNone DEPRECATED:forversion0.17+indexnameswillbeautomatically inferredbasedonindex_col.ToreadExceloutputfrom0.16.2and priorthathadsavedindexnames,useTrue. Returns
to_excel()
存储函数为pd.DataFrame.to_excel(),注意,必须是DataFrame写入excel,即WriteDataFrametoanexcelsheet。其具体参数如下:
to_excel(self,excel_writer,sheet_name='Sheet1',na_rep='',float_format=None,columns=None,header=True,index=True,index_label=None,startrow=0,startcol=0,engine=None,merge_cells=True,encoding=None, inf_rep='inf',verbose=True,freeze_panes=None)
常用参数解析
- -excel_writer:stringorExcelWriterobjectFilepathorexistingExcelWriter目标路径
- -sheet_name:string,default‘Sheet1'NameofsheetwhichwillcontainDataFrame,填充excel的第几页
- -na_rep:string,default”,Missingdatarepresentation缺失值填充
- -float_format:string,defaultNoneFormatstringforfloatingpointnumbers
- -columns:sequence,optional,Columnstowrite选择输出的的列。
- -header:booleanorlistofstring,defaultTrueWriteoutcolumnnames.Ifalistofstringisgivenitisassumedtobealiasesforthecolumnnames
- -index:boolean,defaultTrue,Writerownames(index)
- -index_label:stringorsequence,defaultNone,Columnlabelforindexcolumn(s)ifdesired.IfNoneisgiven,andheaderandindexareTrue,thentheindexnamesareused.AsequenceshouldbegiveniftheDataFrameusesMultiIndex.
- -startrow:upperleftcellrowtodumpdataframe
- -startcol:upperleftcellcolumntodumpdataframe
- -engine:string,defaultNone,writeenginetouse-youcanalsosetthisviatheoptions,io.excel.xlsx.writer,io.excel.xls.writer,andio.excel.xlsm.writer.
- -merge_cells:boolean,defaultTrueWriteMultiIndexandHierarchicalRowsasmergedcells.
- -encoding:string,defaultNoneencodingoftheresultingexcelfile.Onlynecessaryforxlwt,otherwriterssupportunicodenatively.
- -inf_rep:string,default‘inf'Representationforinfinity(thereisnonativerepresentationforinfinityinExcel)
- -freeze_panes:tupleofinteger(length2),defaultNoneSpecifiestheone-basedbottommostrowandrightmostcolumnthatistobefrozen
数据源:
IDNUM-1NUM-2NUM-3 036901142168661 13690278521602 236903144600521 33690495457468 43690569596695 536906165453 加载数据: basestation="F://python/data/test.xls" basestation_end="F://python/data/test_end.xls" data=pd.read_excel(basestation)
(1)参数excel_writer,输出路径。
data.to_excel(basestation_end) 输出: IDNUM-1NUM-2NUM-3 036901142168661 13690278521602 236903144600521 33690495457468 43690569596695 536906165453
(2)sheet_name,将数据存储在excel的那个sheet页面。
data.to_excel(basestation_end,sheet_name="sheet2")
(3)na_rep,缺失值填充
data.to_excel(basestation_end,na_rep="NULL") 输出: IDNUM-1NUM-2NUM-3 036901142168661 13690278521602 236903144600521 33690495457468 43690569596695 536906165453NULL
(4)colums参数:sequence,optional,Columnstowrite选择输出的的列。
data.to_excel(basestation_end,columns=["ID"]) 输出 ID 036901 136902 236903 336904 436905 536906
(5)header参数:booleanorlistofstring,默认为True,可以用list命名列的名字。header=False则不输出题头。
data.to_excel(basestation_end,header=["a","b","c","d"]) 输出: abcd 036901142168661 13690278521602 236903144600521 33690495457468 43690569596695 536906165453 data.to_excel(basestation_end,header=False,columns=["ID"]) header=False则不输出题头 输出: 036901 136902 236903 336904 436905 536906
(6)index:boolean,defaultTrueWriterownames(index)
默认为True,显示index,当index=False则不显示行索引(名字)。
index_label:stringorsequence,defaultNone
设置索引列的列名。
data.to_excel(basestation_end,index=False) 输出: IDNUM-1NUM-2NUM-3 36901142168661 3690278521602 36903144600521 3690495457468 3690569596695 36906165453 data.to_excel(basestation_end,index_label=["f"]) 输出: fIDNUM-1NUM-2NUM-3 036901142168661 13690278521602 236903144600521 33690495457468 43690569596695 536906165453
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持毛票票。