修改 CSV 文件中的 URL 字符串 - 输出文件包含单个单元格中的每个字符答案

【问题标题】：Modify URL-strings in CSV file - Output file contains each character in individual cell修改 CSV 文件中的 URL 字符串 - 输出文件包含单个单元格中的每个字符
【发布时间】：2014-05-11 19:50:46
【问题描述】：

我正在尝试编写一个允许我从 URL 中删除某些元素的函数。这些 URL 存储在名为 Backlink_Test 的 CSV 中。我想遍历该 URL 列表中的每个项目，从 URL 中删除不需要的元素，然后将修改后的 URL 添加到新列表中，然后将其存储在名为 Cleaned_URLs 的新 CSV 中。

代码在我可以打开源文件、运行循环然后将结果存储在目标文件中的范围内工作。但是，我遇到了一个非常烦人的问题：在目标文件中，URL 存储 每个字符都在一个单独的单元格中，而不是整个 URL 在一个单元格中。

这让我感到惊讶，因为我做了一个小测试，我将内容从 CSV 复制到另一个（没有修改任何内容），并且具有多个字符的单词被存储得很好。所以我怀疑是for循环造成了问题？

任何帮助/见解将不胜感激！下面的代码，并附上目标文件的屏幕截图。

import csv

new_strings = []    

#replace unwanted elements and add cleaned strings to new list
with open("Backlink_Test.csv", "rb") as csvfile:
    reader = csv.reader(csvfile)
    for string in reader:
        string = str(string) 
        string = string.replace("www.", "").replace("http://", "").replace("https://", "")
        new_strings.append(string)

new_strings.sort()
print new_strings #for testing only; will be removed once function is working

cleaned_file = open("Cleaned_URLS.csv", "w")
writer = csv.writer(cleaned_file)
writer.writerows(new_strings)
cleaned_file.close()

现在是工作代码：

import csv

new_strings = []    

#replace unwanted elements and add cleaned strings to new list
with open("Backlink_Test.csv", "rb") as csvfile:
    reader = csv.reader(csvfile)
    for string in reader:
        string = str(string) 
        string = string.replace("www.", "").replace("http://", "").replace("https://", "")
        new_strings.append(string)

new_strings.sort()
print new_strings

cleaned_file = open("Cleaned_URLS.csv", "w")
writer = csv.writer(cleaned_file)
for url in new_strings:
    writer.writerow([url])

cleaned_file.close()

【问题讨论】：

标签： python url csv

【解决方案1】：

csvwriter.writerows 期望 rows 的可迭代对象。 row 是 cells 的可迭代对象。

你用一个字符串列表喂它。由于 string 是一个字母列表，因此在您的示例中，每个字母都被视为 cell - 这正是所写的内容。

您做错了什么是假设 csv.reader 输出字符串。它输出rows。

更新：

for url in urls:
    writer.writerow([url])

【讨论】：

谢谢，我明白你的意思。那你会用什么来代替 csv.reader 呢？
我可能会选择类似的东西。看起来您没有 CSV 文件——只有一列——但它说明了 API 的正确使用。
非常感谢，效果很好！我需要清理的最后一点是目标文件中的所有 url 现在都在 [] 但我认为没有办法解决这个问题？
当然有办法解决这个问题。只是不要这样做string = str(string) - 它会将 csv 行（列表）展平为它的字符串表示形式。请考虑：str([1, 2, 3]) => "[1, 2, 3]"。

【解决方案2】：

当你循环遍历一个字符串而不是一个列表时，这就是 Python 所做的。检查来自csv.reader() 的返回值并相应地调整您的代码。特别是，string = str(string) 正在扁平化您的输入。

【讨论】：

我明白了，这是有道理的。你能告诉我应该如何调整代码以正确输入并将输出作为 URL 字符串吗？谢谢！