如何在字符串中查找子字符串 - 通过指定它的开头和结尾？答案

【问题标题】：How to find a substring in a string - by specifying the beginning and end of it?如何在字符串中查找子字符串 - 通过指定它的开头和结尾？
【发布时间】：2015-11-21 14:03:43
【问题描述】：

我想从网站获取一些数据。

我的程序目前使用 urllib.request 来读取整个 html 文档。因为网站的变化，每次运行程序时HTML文件的数据都不一样。

一些数据保持不变——的起点和终点。

我想告诉python子字符串的开始和结束应该是什么。

我已经用谷歌搜索了这个，但只找到了一种需要您提前知道子字符串才能查找它的方法 - 例如：

str1.find(str2)

这是我的程序的一个 sn-p：

import urllib.request

def get_html():
with urllib.request.urlopen("http://website.com/dynamic_page") as response:
    html = response.read()
    return html

print(get_html())

这会打印一个长字符串，但我只需要获取其中的一部分，否则我的其他函数会在整个文档中查找字符串，而不仅仅是一小部分：

def search_custom(string):

    html = get_html()
    string_var = string
    string_var = string_var.encode('utf-8')

    string_count = html.count(string_var)
    print(string_count)

    return string_count

【问题讨论】：

你到底想要什么？删除你的html字符串的<script>..</script>？

标签： python string python-3.x substring

【解决方案1】：

你可以使用类似的东西：

start = str1.find("<script>")
if start > -1:
    end = str1[start:].find("</script>")
    if end > -1:
        data = str1[start + 8:start + end]

【讨论】：

【解决方案2】：

您的页面数据发生了变化，但结构将保持不变。为什么不使用BeautifulSoup 并抓取特定的 div/script 标签？

一个例子

from bs4 import BeautifulSoup

soup = BeautifulSoup(page)
message = soup.find("script")

这将为您提供第一个脚本标签。您可能不想要第一个标签。还有很多其他的刮法。你可以看看docs。

【讨论】：