【问题标题】:writer.writerow parsing XML to CSVwriter.writerow 将 XML 解析为 CSV
【发布时间】:2015-05-26 19:03:58
【问题描述】:

与我之前的帖子一样,我正在尝试解析 XML 文件。文件是:

<?xml version="1.0" ?>

<library>
 <book>
  <title>Sandman Volume 1: Preludes and Nocturnes</title>
  <author>Neil Gaiman</author>
 </book>
 <book>
  <title>Good Omens</title>
  <author>Neil Gamain</author>
  <author>Terry Pratchett</author>
  </book>
 <book>
  <title>All the Lovely Things</title>
  <author>James Daniel Wise</author>
 </book>
 <book>
  <title>Beginning Python</title>
  <author>Peter Norton, et al</author>
 </book>
</library>

我的 Python 脚本是:

from xml.dom.minidom import parse
import xml.dom.minidom
import csv

def writeToCSV(myLibrary):
    with open('csvout.csv', 'wb') as csvfile:
        writer = csv.writer(csvfile, delimiter = ',')
        writer.writerow(['title', 'author', 'author'])
        books = myLibrary.getElementsByTagName("book")
        for book in books:
            titleValue = book.getElementsByTagName("title")[0].childNodes[0].data
            authors =[]
            for author in book.getElementsByTagName("author"):
                authors.append(author.childNodes[0].data)   
            writer.writerow([titleValue] + authors)

doc = parse('library.xml')
myLibrary = doc.getElementsByTagName("library")[0]

# Call main function
writeToCSV(myLibrary)

这给了我这个输出:

title,author,author
Sandman Volume 1: Preludes and Nocturnes,Neil Gaiman
Good Omens,Neil Gamain,Terry Pratchett
All the Lovely Things,James Daniel Wise
Beginning Python,"Peter Norton, et al"

首先我想知道为什么它在最后一行“Peter Norton, et al”周围加上引号,以及如何摆脱这个!将 QUOTE_NONE 放在我的代码中可以防止此行被返回。

另外,我想添加另一个列标题“键”。我希望它由序列号填充,以给我这个输出:

key,title,author,author
1,Sandman Volume 1: Preludes and Nocturnes,Neil Gaiman
2,Good Omens,Neil Gamain,Terry Pratchett
3,All the Lovely Things,James Daniel Wise
4,Beginning Python,Peter Norton, et al

我尝试了各种方法,例如设置“key”变量 = 0,然后在我的 def 循环中执行 key=+1,但它不起作用。

【问题讨论】:

    标签: python xml python-2.7 csv


    【解决方案1】:

    首先,试试这个(可能是你期望的程序)。可以从itertools 导入计数器设置为key

    from xml.dom.minidom import parse
    import xml.dom.minidom
    import csv
    from itertools import count
    c = count(1) #starts from 1
    def writeToCSV(myLibrary):
        with open('csvout.csv', 'wb') as csvfile:
            writer = csv.writer(csvfile, delimiter = ',')
            writer.writerow(['key', 'title', 'author', 'author']) #added key
            books = myLibrary.getElementsByTagName("book")
            for book in books:
                titleValue = book.getElementsByTagName("title")[0].childNodes[0].data
                authors =[]
                for author in book.getElementsByTagName("author"):
                    authors.append(author.childNodes[0].data) 
                writer.writerow([c.next()] + [titleValue] + authors)
    
    doc = parse('library.xml')
    myLibrary = doc.getElementsByTagName("library")[0]
    

    那么为什么是quotes around the last line "Peter Norton, et al"

    由于您使用的是delimeter= ",",因此该名称包含,。所以"" 用于在csv file 中充当单个string

    【讨论】:

    • 天才,先生 - 非常感谢。我完全不知道 itertools 库和 writerow 似乎不喜欢我在它的参数中添加一个 int 。这解决了我的问题-谢谢!!!詹姆斯
    猜你喜欢
    • 1970-01-01
    • 2015-06-20
    • 2018-05-18
    • 2021-07-04
    • 1970-01-01
    • 2020-03-15
    • 2017-11-09
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多