【问题标题】:Made A Table In Python Beautiful Soup, How Do I Write It Neatly Into CSV?用 Python 漂亮的汤做了一个表,我如何将它整齐地写入 CSV?
【发布时间】:2021-07-28 06:37:22
【问题描述】:

我对 python 和 webscraping 还很陌生,但是我设法得到了一个工作良好的表格来打印,我只是好奇如何将这个表格以与 print 语句完全相同的格式保存到 CSV 文件中。任何逻辑解释将不胜感激,非常有帮助!我的代码在下面...

from bs4 import BeautifulSoup
import requests
import time

htmlText = requests.get('https://www.fangraphs.com/teams/mariners/stats').text
soup = BeautifulSoup(htmlText, 'lxml', )
playerTable = soup.find('div', class_='team-stats-table')


def BattingStats():
    headers = [th.text for th in playerTable.find_all("th")]
    fmt_string = " ".join(["{:<25}", *["{:<6}"] * (len(headers) - 1)])

    print(fmt_string.format(*headers))
    for tr in playerTable.find_all("tr")[1:55]:
        tds = [td.text for td in tr.select("td")]
        with open('MarinersBattingStats.csv', 'w') as f:
            f.write(fmt_string.format(*tds))
            print(fmt_string.format(*tds))

if __name__ == 'main':
    while True:
        BattingStats()
        timeWait = 100
        time.sleep(432 * timeWait)





BattingStats()

【问题讨论】:

    标签: python html web-scraping beautifulsoup


    【解决方案1】:

    首先,您需要导入内置的csv 模块才能更轻松地处理 csv 文件。以下是执行此操作的步骤:

    首先,使用 open() 函数打开 CSV 文件进行写入(w 模式)。

    其次,调用csv模块的writer()函数创建一个CSV writer对象。

    第三,通过调用CSV writer对象的writerow()或writerows()方法将数据写入CSV文件。

    最后,在完成向文件写入数据后关闭文件。

    有关详细信息,请参阅此link。将上述步骤转换为代码,如下所示:

    import csv
    
    ...
    
    def BattingStats():
        headers = [th.text for th in playerTable.find_all("th")]
        fmt_string = " ".join(["{:<25}", *["{:<6}"] * (len(headers) - 1)])
        
        with open('MarinersBattingStats.csv', 'w', encoding='UTF8', newline='') as f:
            writer = csv.writer(f)
            
            # Converting from str to list
            header = fmt_string.format(*headers).split()
    
            # write the header
            writer.writerow(header)
    
            for tr in playerTable.find_all("tr")[1:55]:
                # write each data
                tds = [td.text for td in tr.select("td")]
                writer.writerow(tds)
    

    【讨论】:

      【解决方案2】:

      您可以使用 Pandas 从您正在抓取的数据创建一个数据框,然后使用 to_csv 将其输出到 CSV 文件。

      from bs4 import BeautifulSoup
      import requests
      import time
      import pandas as pd
      
      htmlText = requests.get('https://www.fangraphs.com/teams/mariners/stats').text
      soup = BeautifulSoup(htmlText, 'lxml', )
      playerTable = soup.find('div', class_='team-stats-table')
      
      data = []
      
      def BattingStats():
          headers = [th.text for th in playerTable.find_all("th")]
          for tr in playerTable.find_all("tr")[1:55]:
              tds = [td.text for td in tr.select("td")]
              data.append(tds)
          return pd.DataFrame(data=data, columns=headers)
        
      if __name__ == 'main':
          while True:
              BattingStats()
              timeWait = 100
              time.sleep(432 * timeWait)
      
      df_batting_stats = BattingStats()
      
      df_batting_stats.to_csv('MarinersBattingStats.csv', index=False)
      
      Sample Output
      
      Name,Age,G,PA,HR,SB,BB%,K%,ISO,BABIP,AVG,OBP,SLG,wOBA,wRC+,BsR,Off,Def,WAR
      Mitch Haniger,30,96,414,25,0,6.8%,24.2%,.252,.289,.263,.319,.515,.353,129,0.0,14.7,-7.6,2.1
      Ty France,26,91,380,9,0,7.1%,16.8%,.147,.314,.276,.355,.423,.341,121,0.4,10.0,-2.9,2.0
      Kyle Seager,33,99,411,19,2,8.8%,25.8%,.206,.247,.217,.290,.423,.306,98,-0.1,-1.2,4.4,1.7
      

      【讨论】:

        【解决方案3】:
        import pandas as pd
        
        
        df = pd.read_html('https://www.fangraphs.com/teams/mariners/stats',
                          attrs={'class': 'tablesort'})[0]
        print(df)
        df.to_csv('data.csv',index=False)
        

        输出:

                         Name  Age     G    PA   HR  SB  ...   wOBA    wRC+  BsR   Off   Def  WAR
        0       Mitch Haniger   30    97   419   25   0  ...  0.351   128.0  0.0  14.1  -7.7  2.1      
        1           Ty France   26    92   385    9   0  ...  0.338   119.0  0.4   9.4  -3.0  2.0      
        2         Kyle Seager   33   100   416   20   2  ...  0.307    98.0 -0.1  -0.9   4.4  1.8      
        3       J.P. Crawford   26   101   415    5   3  ...  0.309    99.0 -1.7  -2.0   3.1  1.5      
        4         Jake Fraley   26    40   149    7   7  ...  0.374   144.0  0.3   8.2  -3.4  1.0      
        5          Tom Murphy   30    60   203    8   0  ...  0.295    90.0 -0.2  -2.6   4.7  0.9      
        6        Luis Torrens   25    58   207   12   0  ...  0.317   105.0 -0.2   1.0  -4.8  0.3      
        7         Dylan Moore   28    77   266   10  15  ...  0.278    79.0  1.6  -5.2  -0.7  0.3      
        8          Kyle Lewis   25    36   147    5   2  ...  0.321   108.0 -0.4   0.9  -2.8  0.3      
        9         Cal Raleigh   24     9    33    1   0  ...  0.280    80.0  0.2  -0.6   2.2  0.3      
        10       Abraham Toro   24     1     1    1   0  ...  2.022  1256.0  0.0   1.4   0.0  0.1      
        11        Justin Dunn   25    11     3    0   0  ...  0.416   172.0  0.0   0.3   0.4  0.1      
        12   Justus Sheffield   25    15     2    0   0  ...  0.441   189.0  0.0   0.2   0.2  0.1      
        13      Dillon Thomas   28     4     9    0   0  ...  0.098   -43.0  0.5  -1.1   0.9  0.0      
        14      Eric Campbell   34     4    12    0   0  ...  0.278    79.0  0.2  -0.1  -0.2  0.0      
        15          Joe Smith   37     1     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        16    Hector Santiago   33    13     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        17         Ryan Weber   30     1     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        18       Casey Sadler   30    14     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        19       James Paxton   32     1     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        20      Domingo Tapia   29     2     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        21     Rafael Montero   30    40     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        22        JT Chargois   30    30     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        23        Paul Sewald   31    31     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        24         Brady Lail   27     2     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        25       Yacksel Rios   28     3     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        26   Keynan Middleton   27    26     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        27   Kendall Graveman   30    30     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        28       Erik Swanson   27    14     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        29      Vinny Nittoli   30     1     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        30      Daniel Zamora   28     4     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        31  Anthony Misiewicz   26    43     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        32  Drew Steckenrider   30    35     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        33      Robert Dugger   25    11     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        34      Yohan Ramirez   26     7     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        35          Will Vest   26    32     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        36       Ljay Newsome   24     7     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        37        Wyatt Mills   26     8     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        38   Nick Margevicius   25     5     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        39     Aaron Fletcher   25     4     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        40      Logan Gilbert   24    12     0    0   0  ...  0.000     NaN  0.0   0.0   0.0  0.0      
        41       Chris Flexen   26    19     1    0   0  ...  0.000  -100.0  0.0  -0.3   0.1  0.0      
        42     Marco Gonzales   29    13     2    0   0  ...  0.000  -100.0  0.0  -0.5   0.2  0.0      
        43   Darren McCaughan   25     2     2    0   0  ...  0.000  -100.0  0.0  -0.5   0.2  0.0      
        44      Yusei Kikuchi   30    18     2    0   0  ...  0.000  -100.0  0.0  -0.5   0.2  0.0      
        45     Donovan Walton   27    23    69    2   1  ...  0.268    72.0  0.2  -2.2  -0.5  0.0      
        46    Taylor Trammell   23    51   178    8   2  ...  0.271    74.0  0.0  -5.7  -0.9 -0.1      
        47      Braden Bishop   27     8     5    0   0  ...  0.220    40.0  0.1  -0.3  -0.5 -0.1      
        48   Jacob Nottingham   26    10    31    1   0  ...  0.213    35.0  0.0  -2.5   0.1 -0.1      
        49      Jack Mayfield   30    12    35    0   0  ...  0.181    13.0 -1.2  -4.9   1.8 -0.2      
        50         Jose Godoy   26    16    40    0   0  ...  0.193    21.0 -0.4  -4.2   0.3 -0.3      
        51        Jake Bauers   25    33   119    1   1  ...  0.257    64.0  0.2  -5.0  -2.1 -0.3      
        52       Sam Haggerty   27    35    94    2   5  ...  0.241    53.0  0.0  -5.3  -2.0 -0.4      
        53      Shed Long Jr.   25    33   117    4   1  ...  0.264    69.0 -0.9  -5.3  -3.0 -0.4      
        54    Jose Marmolejos   28    31    94    3   0  ...  0.251    60.0  0.0  -4.5  -3.5 -0.5      
        55         Evan White   25    30   104    2   0  ...  0.198    25.0  0.5  -9.0  -1.2 -0.7      
        56     Jarred Kelenic   21    34   135    2   3  ...  0.181    13.0  1.4 -12.9  -2.5 -1.1      
        57         Team Total   27  1450  3695  128  42  ...  0.296    91.0  0.3 -40.4 -19.9  6.6      
        
        [58 rows x 19 columns]
        

        【讨论】:

          猜你喜欢
          • 2016-08-13
          • 2020-01-20
          • 2015-11-19
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2016-06-23
          • 2019-11-06
          • 1970-01-01
          相关资源
          最近更新 更多