分析Python中解析构建数据知识

2023-09-17 14:33:05 340

Python可以通过各种库去解析我们常见的数据。其中csv文件以纯文本形式存储表格数据，以某字符作为分隔值，通常为逗号；xml可拓展标记语言，很像超文本标记语言Html，但主要对文档和数据进行结构化处理，被用来传输数据；json作为一种轻量级数据交换格式，比xml更小巧但描述能力却不差，其本质是特定格式的字符串；MicrosoftExcel是电子表格，可进行各种数据的处理、统计分析和辅助决策操作，其数据格式为xls、xlsx。接下来主要介绍通过Python简单解析构建上述数据，完成数据的“珍珠翡翠白玉汤”。

Python解析构建csv

通过标准库中的csv模块，使用函数reader()、writer()完成csv数据基本读写。

importcsv
withopen('readtest.csv',newline='')ascsvfile:
reader=csv.reader(csvfile)
forrowinreader:
print(row)
withopen('writetest.csv','w',newline='')ascsvfile:
writer=csv.writer(csvfile)
writer.writerrow("onetest")
writer.writerows("someiterable")

其中reader()返回迭代器，writer()通过writerrow()或writerrows()写入一行或多行数据。两者还可通过参数dialect指定编码方式，默认以excel方式，即以逗号分隔，通过参数delimiter指定分隔字段的单字符，默认为逗号。

在Python3中，打开文件对象csvfile，需要通过newline=''指定换行处理，这样读取文件时，新行才能被正确地解释；而在Python2中，文件对象csvfile必须以二进制的方式'b'读写，否则会将某些字节（0x1A）读写为文档结束符（EOF），导致文档读取不全。

除此之外，还可使用csv模块中的类DictReader()、DictWriter()进行字典方式读写。

importcsv
withopen('readtest.csv',newline='')ascsvfile:
reader=csv.DictReader(csvfile)
forrowinreader:
print(row['first_test'],row['last_test'])
withopen('writetest.csv','w',newline='')ascsvfile:
fieldnames=['first_test','last_test']
writer=csv.DictWriter(csvfile,fieldnames=fieldnames)
writer.writeheader()
writer.writerow({'first_test':'hello','last_test':'wrold'})
writer.writerow({'first_test':'Hello','last_test':'World'})
#writer.writerows([{'first_test':'hello','last_test':'wrold'},{'first_test':'Hello','last_test':'World'}])

其中DictReader()返回有序字典，使得数据可通过字典的形式访问，键名由参数fieldnames指定，默认为读取的第一行。

DictWriter()必须指定参数fieldnames说明键名，通过writeheader()将键名写入，通过writerrow()或writerrows()写入一行或多行字典数据。

Python解析构建xml

通过标准库中的xml.etree.ElementTree模块，使用Element、ElementTree完成xml数据的读写。

fromxml.etree.ElementTreeimportElement,ElementTree
root=Element('language')
root.set('name','python')
direction1=Element('direction')
direction2=Element('direction')
direction3=Element('direction')
direction4=Element('direction')
direction1.text='Web'
direction2.text='Spider'
direction3.text='BigData'
direction4.text='AI'
root.append(direction1)
root.append(direction2)
root.append(direction3)
root.append(direction4)
#importitertools
#root.extend(chain(direction1,direction2,direction3,direction4))
tree=ElementTree(root)
tree.write('xmltest.xml')

写xml文件时，通过Element()构建节点，set()设置属性和相应值，append()添加子节点，extend()结合循环器中的chain()合成列表添加一组节点，text属性设置文本值，ElementTree()传入根节点构建树，write()写入xml文件。

importxml.etree.ElementTreeasET
tree=ET.parse('xmltest.xml')
#fromxml.etree.ElementTreeimportElementTree
#tree=ElementTree().parse('xmltest.xml')
root=tree.getroot()
tag=root.tag
attrib=root.attrib
text=root.text
direction1=root.find('direction')
direction2=root[1]
directions=root.findall('.//direction')
fordirectioninroot.findall('direction'):
print(direction.text)
fordirectioninroot.iter('direction'):
print(direction.text)
root.remove(direction2)

读xml文件时，通过ElementTree()构建空树，parse()读入xml文件，解析映射到空树；getroot()获取根节点，通过下标可访问相应的节点；tag获取节点名，attrib获取节点属性字典，text获取节点文本；find()返回匹配到节点名的第一个节点，findall()返回匹配到节点名的所有节点，find()、findall()两者都仅限当前节点的一级子节点，都支持xpath路径提取节点；iter()创建树迭代器，遍历当前节点的所有子节点，返回匹配到节点名的所有节点；remove()移除相应的节点。

除此之外，还可通过xml.sax、xml.dom.minidom去解析构建xml数据。其中sax是基于事件处理的；dom是将xml数据在内存中解析成一个树，通过对树的操作来操作xml；而ElementTree是轻量级的dom，具有简单而高效的API，可用性好，速度快，消耗内存少，但生成的数据格式不美观，需要手动格式化。

Python解析构建json

通过标准库中的json模块，使用函数dumps()、loads()完成json数据基本读写。

>>>importjson
>>>json.dumps(['foo',{'bar':('baz',None,1.0,2)}])
'["foo",{"bar":["baz",null,1.0,2]}]'
>>>json.loads('["foo",{"bar":["baz",null,1.0,2]}]')
['foo',{'bar':['baz',None,1.0,2]}]

json.dumps()是将obj序列化为json格式的str，而json.loads()是反向操作。其中dumps()可通过参数ensure_ascii指定是否使用ascii编码，默认为True；通过参数 separators=(',',':')指定json数据格式中的两种分隔符；通过参数sort_keys指定是否使用排序，默认为False。

除此之外，还可使用json模块中的函数dump()、load()进行json数据读写。

importjson
withopen('jsontest.json','w')asjsonfile:
json.dump(['foo',{'bar':('baz',None,1.0,2)}],jsonfile)
withopen('jsontest.json')asjsonfile:
json.load(jsonfile)

功能与dumps()、loads()相同，但接口不同，需要与文件操作结合，多传入一个文件对象。

Python解析构建excel

通过pip安装第三方库xlwt、xlrd模块，完成excel数据的读写。

importxlwt
wbook=xlwt.Workbook(encoding='utf-8')
wsheet=wbook.add_sheet('sheet1')
wsheet.write(0,0,'HelloWorld')
wbook.save('exceltest.xls')

写excel数据时，通过xlwt.Workbook()指定编码格式参数encoding创建工作表，add_sheet()添加表单，write()在相应的行列单元格中写入数据，save()保存工作表。

importxlrd
rbook=xlrd.open_workbook('exceltest.xls')
rsheet=book.sheets()[0]
#rsheet=book.sheet_by_index(0)
#rsheet=book.sheet_by_name('sheet1')
nr=rsheet.nrows
nc=rsheet.ncols
rv=rsheet.row_values(0)
cv=rsheet.col_values(0)
cell=rsheet.cell_value(0,0)

读excel数据时，通过xlrd.open_workbook()打开相应的工作表，可使用列表下标、表索引sheet_by_index()、表单名sheet_by_name()三种方式获取表单名，nrows获取行数，ncols获取列数，row_values()返回相应行的值列表，col_values()返回相应列的值列表，cell_value()返回相应行列的单元格值。

分析Python中解析构建数据知识

热门推荐

随机推荐