【问题标题】:Convert a list of data from url to csv in python在python中将数据列表从url转换为csv
【发布时间】:2018-03-06 17:22:54
【问题描述】:

我正在尝试将此威斯康星乳腺癌数据集从列表转换为带有列的数据框。

这是数据集: http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data

这些是列名:

   #  Attribute                     Domain
   -- -----------------------------------------
   1. Sample code number            id number
   2. Clump Thickness               1 - 10
   3. Uniformity of Cell Size       1 - 10
   4. Uniformity of Cell Shape      1 - 10
   5. Marginal Adhesion             1 - 10
   6. Single Epithelial Cell Size   1 - 10
   7. Bare Nuclei                   1 - 10
   8. Bland Chromatin               1 - 10
   9. Normal Nucleoli               1 - 10
  10. Mitoses                       1 - 10
  11. Class:                        (2 for benign, 4 for malignant)

我是这样把数据集导入python的

导入请求

link = "http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data"
f = requests.get(link)

print (f.text)

并将数据视为带有逗号的列表:

1000025,5,1,1,1,2,1,3,1,1,2
1002945,5,4,4,5,7,10,3,2,1,2
1015425,3,1,1,1,2,2,3,1,1,2
1016277,6,8,8,1,3,4,3,7,1,2
1017023,4,1,1,3,2,1,3,1,1,2

我需要将逗号分隔成列并在列中添加名称

我试过了,但是没用

import requests
import pandas as pd
import io

urlData = requests.get(f.text).content
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')))

【问题讨论】:

标签: python list csv dataframe


【解决方案1】:

以下内容对我有用:

import pandas as pd
import requests
link = "http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data"
f = requests.get(link)
# separate each line
newf = f.text.splitlines()
# create pandas dataframe
df = pd.DataFrame([x.split(",") for x in newf])

【讨论】:

    【解决方案2】:

    这样就可以了

    import requests
    import os
    
    csvFile = open('c:\\users\\user\\desktop\\data.csv','w')
    headers = 'sample','Clump Thickness','niformity of Cell Size','Uniformity of Cell Shape','Marginal Adhesion','Single Epithelial Cell Size','Bare Nuclei','Bland Chromatin','Normal Nucleoli','Mitoses','Class'
    r = requests.get("http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data").text
    csvFile.write(str(headers).replace("'",'').replace('(','').replace(')','') + "\n")
    csvFile.write(r)
    csvFile.close()
    

    【讨论】:

      【解决方案3】:

      我肯定会想到一个更好的方法来做到这一点,但是....我已将输出发送到带有静态标题行的 csv。由于数据已经是“,”分隔,我认为这将是最简单的方式。

      import requests
      import io
      
      def main():
          outputFile = 'someName.csv'
          link = "http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data"
          f = requests.get(link)
          headerLine = ("Sample code number(id number),Clump Thickness(1 - 10),Uniformity of Cell Size(1 - 10),Uniformity of Cell Shape(1 - 10),Marginal Adhesion(1 - 10),Single Epithelial Cell Size(1 - 10),Bare Nuclei(1 - 10),Bland Chromatin(1 - 10),Normal Nucleoli(1 - 10),Mitoses(1 - 10),Class:(2 for benign - 4 for malignant)")
          data =(f.text)
          try:
              with open(outputFile, "w+") as ofile:
                  ofile.write(headerLine + '\n')
                  ofile.write(data)
                  print("Success") 
          except Exception as e:
              raise e
      
      if __name__ == '__main__':
          main()
      

      【讨论】:

        【解决方案4】:
        import requests
        import pandas as pd
        import io
        
        names = ['Sample code number',
                 'Clump Thickness',
                 'Uniformity of Cell Size',
                 'Uniformity of Cell Shape',
                 'Marginal Adhesion',
                 'Single Epithelial Cell Size',
                 'Bare Nuclei',
                 'Bland Chromatin',
                 'Normal Nucleoli',
                 'Mitoses',
                 'Class']
        
        link = "http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data"
        csv_text = requests.get(link).text
        # if you don't care about column names omit names=names and do headers=None instead
        df = pd.read_csv(io.StringIO(csv_text), names=names)
        

        【讨论】:

          猜你喜欢
          • 2021-11-15
          • 2017-08-25
          • 2019-10-11
          • 2018-10-25
          • 2022-01-06
          • 2016-01-15
          • 2022-01-18
          • 2016-09-19
          • 1970-01-01
          相关资源
          最近更新 更多