【问题标题】:read_csv from url in pandas从熊猫的网址中读取_csv
【发布时间】:2020-11-12 14:08:20
【问题描述】:

我正在从 url 中读取 csv 文件,并将所有 csv 文件附加到一个 csv 中。 最终的 csv 不包含来自 https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-regioni/dpc-covid19-ita-regioni-20201023.csv

https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-regioni/dpc-covid19-ita-regioni-20201028.csv

其他都还好。 我什么都试过了,但所有的 csv 看起来都不错,我不明白为什么这些 csv 不好(从 ...20201023.csv 到 ...20201028.csv) 如果我单独阅读它们可以工作,那么问题出现在 pd.concatenate

你能帮忙吗?

'''

import pandas as pd 
from pandas import read_csv
import requests
import io
from matplotlib import pyplot
from datetime import datetime
from datetime import timedelta
from datetime import date
import matplotlib.pyplot as plt

#url = 'https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-regioni/dpc-covid19-ita-regioni-20200224.csv'

begin_date = date(2020, 10, 23)
delta3 = date.today() - begin_date
n = delta3.days
url_path_base = 'https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-regioni/dpc-covid19-ita-regioni-'
data_vec = []
urls = []
for x in range(n):
  el = datetime.today() - timedelta(x+1)
  data_vec.append(el.strftime('%Y%m%d'))
  url = url_path_base + el.strftime('%Y%m%d') + '.csv'
  urls.append(url)

ds = []
#print(urls)
for f in urls:
  s=requests.get(f).content
  ds.append(pd.read_csv(io.StringIO(s.decode('utf-8'))))
frame = pd.concat(ds, axis=0, ignore_index=True)
frame.set_index("data")

'''

【问题讨论】:

  • 欢迎堆栈溢出。如果您提供有关您收到的错误消息的更多详细信息,或者结果数据框中的内容看起来不正确,这将很有帮助。此外,您可能希望尝试将问题减少到一个小数据帧的小示例,如详细 herehere

标签: pandas csv


【解决方案1】:
Works for me as you've written your code except 
frame.set_index("data")
should be
frame.set_index("data", inplace=True)

# 21 datapoints per file
In [25]: frame['stato'].groupby(frame.index).count()
Out[25]:
data
2020-10-23T17:00:00    21
2020-10-24T17:00:00    21
2020-10-25T17:00:00    21
2020-10-26T17:00:00    21
2020-10-27T17:00:00    21
2020-10-28T17:00:00    21
2020-10-29T17:00:00    21
2020-10-30T17:00:00    21
2020-10-31T17:00:00    21
2020-11-01T17:00:00    21
2020-11-02T17:00:00    21
2020-11-03T17:00:00    21
2020-11-04T17:00:00    21
2020-11-05T17:00:00    21
2020-11-06T17:00:00    21
2020-11-07T17:00:00    21
2020-11-08T17:00:00    21
2020-11-09T17:00:00    21
2020-11-10T17:00:00    21
2020-11-11T17:00:00    21
Name: stato, dtype: int64

【讨论】:

  • 非常感谢,所以唯一的区别是 inplace=True。
猜你喜欢
  • 2015-11-30
  • 2020-12-30
  • 2021-05-16
  • 2021-12-02
  • 2019-10-09
  • 2013-04-16
  • 2019-04-30
  • 2021-05-27
  • 1970-01-01
相关资源
最近更新 更多