【问题标题】:Python: Submitting web forms (HTTP POST) with multiple select fields using Requests modulePython:使用请求模块提交具有多个选择字段的 Web 表单(HTTP POST)
【发布时间】:2015-11-20 03:35:53
【问题描述】:

我正在尝试编写一个 Python 脚本,该脚本使用 Requests 模块来处理 HTTP 请求,该脚本从司法统计局获取数据。我从中请求数据的页面具有“多项选择”字段,允许用户从列表中选择一个或多个选项。

我尝试下载数据的页面位于:http://www.ucrdatatool.gov/Search/Crime/Local/OneYearofData.cfm

这是我要提交的表格(在下载过程的第二步,在您在上面的链接中提交“状态”选择表格之后):

<form name="CFForm_1" id="CFForm_1" action="RunCrimeOneYearofData.cfm" method="post" onsubmit="return _CF_checkCFForm_1(this)">
        <INPUT TYPE="Hidden" Name="StateId" Value="1">

        <INPUT TYPE="Hidden" Name="BJSPopulationGroupId" Value="">


    <table width="94%" border="0" height="151">
      <tr> 
        <td width="27%" valign="top"><font size="2" class="text"><b> 
          <LABEL FOR="agencies">a. Choose one or more agencies:</LABEL>
          </b></font><BR> <BR> <font size="2" class="text"> 
          <select name="CrimeCrossId" size="4" MULTIPLE ID="agencies">

              <option value="102" >Alabaster Police Dept</option>

              <option value="104" >Albertville Police Dept</option>

              <option value="105" >Alexander City Police Dept</option>

              <option value="110" >Anniston Police Dept</option>

              <option value="119" >Athens Police Dept</option>

              <option value="120" >Atmore Police Dept</option>

              <option value="122" >Auburn Police Dept</option>

              <option value="127" >Baldwin County Sheriff Deptartment</option>

              <option value="134" >Bessemer Police Dept</option>

              <option value="136" >Birmingham Police Dept</option>

              <option value="138" >Blount County Sheriff Department</option>

              <option value="156" >Calera Police Dept</option>

              <option value="157" >Calhoun County Sheriff Department</option>

              <option value="174" >Chilton County Sheriff Department</option>

              <option value="204" >Cullman County Sheriff Department</option>

              <option value="205" >Cullman Police Dept</option>

              <option value="210" >Daphne Police Dept</option>

              <option value="213" >Decatur Police Dept</option>

              <option value="214" >Dekalb County Sheriff Department</option>

              <option value="218" >Dothan Police Dept</option>

              <option value="228" >Elmore County Sheriff Department</option>

              <option value="229" >Enterprise Police Dept</option>

              <option value="232" >Etowah County Sheriff Department</option>

              <option value="233" >Eufaula Police Dept</option>

              <option value="237" >Fairfield Police Dept</option>

              <option value="238" >Fairhope Police Dept</option>

              <option value="247" >Florence Police Dept</option>

              <option value="248" >Foley Police Dept</option>

              <option value="251" >Fort Payne Police Dept</option>

              <option value="259" >Gadsden Police Dept</option>

              <option value="262" >Gardendale Police Dept</option>

              <option value="281" >Gulf Shores Police Dept</option>

              <option value="292" >Hartselle Police Dept</option>

              <option value="296" >Helena Police Dept</option>

              <option value="305" >Homewood Police Dept</option>

              <option value="306" >Hoover Police Dept</option>

              <option value="307" >Houston County Sheriff Department</option>

              <option value="308" >Hueytown Police Dept</option>

              <option value="310" >Huntsville Police Dept</option>

              <option value="314" >Irondale Police Dept</option>

              <option value="315" >Jackson County Sheriff Department</option>

              <option value="318" >Jacksonville Police Dept</option>

              <option value="320" >Jasper Police Dept</option>

              <option value="321" >Jefferson County Sheriff Department</option>

              <option value="334" >Lauderdale County Sheriff Department</option>

              <option value="335" >Lawrence County Sheriff Department</option>

              <option value="337" >Lee County Sheriff Department</option>

              <option value="338" >Leeds Police Dept</option>

              <option value="343" >Limestone County Sheriff Department</option>

              <option value="358" >Madison County Sheriff Department</option>

              <option value="359" >Madison Police Dept</option>

              <option value="365" >Marshall County Sheriff Department</option>

              <option value="371" >Millbrook Police Dept</option>

              <option value="374" >Mobile County Sheriff Department</option>

              <option value="375" >Mobile Police Dept</option>

              <option value="381" >Montgomery Police Dept</option>

              <option value="382" >Moody Police Dept</option>

              <option value="383" >Morgan County Sheriff Department</option>

              <option value="388" >Mountain Brook Police Dept</option>

              <option value="391" >Muscle Shoals Police Dept</option>

              <option value="400" >Northport Police Dept</option>

              <option value="406" >Opelika Police Dept</option>

              <option value="410" >Oxford Police Dept</option>

              <option value="411" >Ozark Police Dept</option>

              <option value="413" >Pelham Police Dept</option>

              <option value="414" >Pell City Police Dept</option>

              <option value="417" >Phenix Police Dept</option>

              <option value="426" >Pleasant Grove Police Dept</option>

              <option value="429" >Prattville Police Dept</option>

              <option value="431" >Prichard Police Dept</option>

              <option value="451" >Saraland Police Dept</option>

              <option value="454" >Scottsboro Police Dept</option>

              <option value="456" >Selma Police Dept</option>

              <option value="458" >Shelby County Sheriff Department</option>

              <option value="470" >St. Clair County Sheriff Department</option>

              <option value="478" >Sylacauga Police Dept</option>

              <option value="481" >Talladega County Sheriff Department</option>

              <option value="482" >Talladega Police Dept</option>

              <option value="497" >Troy Police Dept</option>

              <option value="500" >Trussville Police Dept</option>

              <option value="501" >Tuscaloosa County Sheriff Department</option>

              <option value="502" >Tuscaloosa Police Dept</option>

              <option value="517" >Vestavia Hills Police Dept</option>

              <option value="522" >Walker County Sheriff Department</option>

          </select>
          </font> </td>
        <td width="34%" valign="top"><font size="2" class="text"><b> 
          <LABEL FOR="groups">b. Choose one or more variable groups:</LABEL>*
                    </b></font><BR> 
          <BR> <font size="2" class="text"> 
          <select name="DataType" size="4" Multiple ID="groups">

              <option value="1" >Number 
              of violent crimes</option>
              <option value="2" >Number 
              of property crimes</option>
              <option value="3" >Violent 
              crime rates</option>
              <option value="4" >Property 
              crime rates</option>

          </select>
        </font> </td>
        <td width="31%" rowspan="2" valign="top" NOWRAP><font size="2" class="text"><b> 
          <LABEL FOR="year">c. Choose one year:</LABEL>
          </b></font><BR> <BR> <font size="2" class="text"> 
          <SELECT Name="YearStart" Size="1" ID="year">

                  <OPTION Value="1985" > 
                  1985 </OPTION>

                  <OPTION Value="1986" > 
                  1986 </OPTION>

                  <OPTION Value="1987" > 
                  1987 </OPTION>

                  <OPTION Value="1988" > 
                  1988 </OPTION>

                  <OPTION Value="1989" > 
                  1989 </OPTION>

                  <OPTION Value="1990" > 
                  1990 </OPTION>

                  <OPTION Value="1991" > 
                  1991 </OPTION>

                  <OPTION Value="1992" > 
                  1992 </OPTION>

                  <OPTION Value="1993" > 
                  1993 </OPTION>

                  <OPTION Value="1994" > 
                  1994 </OPTION>

                  <OPTION Value="1995" > 
                  1995 </OPTION>

                  <OPTION Value="1996" > 
                  1996 </OPTION>

                  <OPTION Value="1997" > 
                  1997 </OPTION>

                  <OPTION Value="1998" > 
                  1998 </OPTION>

                  <OPTION Value="1999" > 
                  1999 </OPTION>

                  <OPTION Value="2000" > 
                  2000 </OPTION>

                  <OPTION Value="2001" > 
                  2001 </OPTION>

                  <OPTION Value="2002" > 
                  2002 </OPTION>

                  <OPTION Value="2003" > 
                  2003 </OPTION>

                  <OPTION Value="2004" > 
                  2004 </OPTION>

                  <OPTION Value="2005" > 
                  2005 </OPTION>

                  <OPTION Value="2006" > 
                  2006 </OPTION>

                  <OPTION Value="2007" > 
                  2007 </OPTION>

                  <OPTION Value="2008" > 
                  2008 </OPTION>

                  <OPTION Value="2009" > 
                  2009 </OPTION>

                  <OPTION Value="2010" > 
                  2010 </OPTION>

                  <OPTION Value="2011" > 
                  2011 </OPTION>

                  <OPTION Value="2012" > 
                  2012 </OPTION>

          </SELECT>
          </font> </td>
      </tr>
      <tr> 
        <td colspan="2" valign="top" NOWRAP><BR> 
          <table border="1" cellspacing="0" cellpadding="4" bordercolor="#999999" bgcolor="#FFFFCC" align="left" width="450">
            <tr> 
              <td align="center" nowrap><font size="2" class="text" color="#330099"><b>Hold 
                down the control key to select more than one option.</b></font></td>
            </tr>
          </table>        </td>
      </tr>
      <tr> 
        <td valign="top" NOWRAP> <BR> <BR> <p> 
            <input name="NextPage" type="submit" value="Get Table">
            <input name="PreviousPage" type="submit" value="Previous">
            <input name="Cancel" type="reset" value="Reset Form">
          </p></td>
        <td colspan="2" valign="top" NOWRAP><table width="300" border="0" cellspacing="0" cellpadding="3">
            <tr align="left"> 
              <td width="4%" valign="top"><strong>* </strong></td>
              <td width="48%" valign="top">Violent crimes:</td>
              <td colspan="2" valign="top">Property crimes :</td>
            </tr>
            <tr> 
              <td align="center" valign="top"></td>
              <td valign="top"> <font class=text size=2> &#8226;murder<br>
                &#8226;forcible rape<br>
                &#8226;robbery<br>
                &#8226;aggravated assault </font></td>
              <td width="4%">&nbsp;</td>
              <td valign="top"> &#8226;burglary<br> 
                &#8226;larceny-theft<br> &#8226;motor 
                vehicle theft</td>
            </tr>
            <tr align="left"> 
              <td colspan="4" valign="top"><FONT class=text size=2>Tables with 
                many variables may be very wide.</FONT> </td>
            </tr>
          </table>
          <br> <FONT class=text 
  size=2>See <B><A 
  href="/offenses.cfm">UCR Offense Definitions</A></B> 
          for additional information about these crimes.</FONT> </td>
      </tr>
    </table>
    </form>

我正在尝试选择这些多个字段中的几个字段中的所有 s(例如,选择所有机构/犯罪类型 / 等)并提交包含所有这些字段的 HTTP 发布请求。

当我在 Firefox 中手动提交此表单时,查看 Live HTTP 标头的输出,我可以看到 POST 请求包含以下查询字符串:

STATEID = 1&BJSPopulationGroupId =&CrimeCrossId = 102&CrimeCrossId = 104&CrimeCrossId = 105&CrimeCrossId = 110&CrimeCrossId = 119&CrimeCrossId = 120&CrimeCrossId = 122&CrimeCrossId = 127&CrimeCrossId = 134&CrimeCrossId = 136&CrimeCrossId = 138&CrimeCrossId = 156&CrimeCrossId = 157&CrimeCrossId = 174&CrimeCrossId = 204&CrimeCrossId = 205&CrimeCrossId = 210&CrimeCrossId = 213&CrimeCrossId = 214&CrimeCrossId = 218&CrimeCrossId = 228&CrimeCrossId = 229&CrimeCrossId = 232&CrimeCrossId = 233&CrimeCrossId = 237&CrimeCrossId = 238&CrimeCrossId = 247&CrimeCrossId = 248&CrimeCrossId = 251&CrimeCrossId = 259&CrimeCrossId = 262&CrimeCrossId = 281&CrimeCrossId = 292&CrimeCrossId = 296&CrimeCrossId = 305&CrimeCrossId = 306&CrimeCrossId = 307&CrimeCrossId = 308&CrimeCrossId = 310&CrimeCrossId = 314&CrimeCrossId = 315&CrimeCrossId = 318&CrimeCrossId = 320&CrimeCrossId =321&CrimeCrossId=334&CrimeCrossId=335&CrimeCrossId=337&CrimeCrossId=338&CrimeCrossId=343&CrimeCrossId=358&CrimeCrossId=359&CrimeCrossId=365&CrimeCrossId=371&CrimeCrossId=374&CrimeCrossId318&CrossId=3175& rimeCrossId = 382&CrimeCrossId = 383&CrimeCrossId = 388&CrimeCrossId = 391&CrimeCrossId = 400&CrimeCrossId = 406&CrimeCrossId = 410&CrimeCrossId = 411&CrimeCrossId = 413&CrimeCrossId = 414&CrimeCrossId = 417&CrimeCrossId = 426&CrimeCrossId = 429&CrimeCrossId = 431&CrimeCrossId = 451&CrimeCrossId = 454&CrimeCrossId = 456&CrimeCrossId = 458&CrimeCrossId = 470&CrimeCrossId = 478&CrimeCrossId = 481&CrimeCrossId = 482&CrimeCrossId = 497&CrimeCrossId = 500&CrimeCrossId = 501&CrimeCrossId=502&CrimeCrossId=517&CrimeCrossId=522&DataType=1&DataType=2&DataType=3&DataType=4&YearStart=2010&NextPage=Get+Table

这是迄今为止我尝试执行此操作的python代码...请注意我尝试构造 post_data2 的部分...这不起作用(它只是让我回到“第一步”页面):

import requests
from bs4 import BeautifulSoup as BS

base_url = 'http://www.ucrdatatool.gov/Search/Crime/Local/'
dl_page_url = base_url + 'OneYearofData.cfm'
post_url = base_url + 'OneYearofDataStepTwo.cfm'

r = requests.get(dl_page_url)
page = BS(r.content)

select_states = page.find('form', id = 'CFForm_1').find('select', id = 'state')
state_choices = select_states.findAll('option')

state = state_choices[2]   #DEBUGGING
#for state in state_choices:

state_id = int(state.get('value'))
state_name = state.text

post_data = { 'StateId': state_id, 'BJSPopulationGroupId' : ''}
r2 = requests.post(post_url, post_data)
page2 = BS(r2.content)

step2_form = page2.find('form', id = 'CFForm_1')
select_agencies =  step2_form.find('select', id = 'agencies')
select_crimes = step2_form.find('select', id = 'groups')
select_year =  step2_form.find('select', id = 'year')

agency_choices = select_agencies.findAll('option') 
crime_choices = select_crimes.findAll('option')
year_choices = select_year.findAll('option')

post_data2 = {'CrimeCrossId': list([a.get('value') for a in agency_choices]),
              'DataType' :  list([c.get('value') for c in crime_choices]),
              'YearStart': '2010'}

post_url2 = base_url + 'RunCrimeOneYearofData.cfm'
r3 = requests.post(post_url2, post_data2)    
state_results_page = BS(r3.content)

使用 Python 请求模块提交这样的多选字段的正确方法是什么?谢谢!

【问题讨论】:

    标签: python http web-crawler python-requests forms


    【解决方案1】:

    我发现了问题所在:基本上,从第一个表单中继承了两个隐藏字段,我需要将它们包含在第二步的 POST 数据中。

    所以而不是:

    post_data2 = {'CrimeCrossId': list([a.get('value') for a in agency_choices]),
                  'DataType' :  list([c.get('value') for c in crime_choices]),
                  'YearStart': '2010'}
    

    我只需要在第二个请求中包含 StateId 和 BJSPopulationGroupId 字段:

     post_data2 = { 'StateId': state['id'], 'BJSPopulationGroupId': '',
                      'CrimeCrossId': list([a.get('value') for a in agencies]),          
                      'DataType' :  list([c.get('value') for c in crimes])
                      'YearStart': year}
    

    【讨论】:

      猜你喜欢
      • 2011-05-19
      • 2016-12-23
      • 2019-08-26
      • 1970-01-01
      • 2016-03-21
      • 2016-01-20
      • 2016-06-22
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多