Python在关键字和括号之后提取文本答案

【问题标题】：Python extract text after keywords and bracketsPython在关键字和括号之后提取文本
【发布时间】：2015-10-21 20:49:06
【问题描述】：

我是python的新手，在网上搜索了一些尝试后，有点困惑。我想要做的是：从网站中提取一些信息，其页面源包含以下信息。我想提取最后括号中包含的纬度/经度信息：19.xxxxx，-19.xxxxx。

我的想法是搜索 myOptions，然后检索括号内的坐标。我该如何实施？谢谢！

<script>
function initialize() {
    var map, mapOptions, info, i, func, func1, borrar, capa,
        marcador = [], marcadorcalle = [], locales = [], calles = [];

    func = function (num, tipo) {
        return function () {
            if (tipo) {
                info.setContent('<b>' + calles[num][0] + '</b>');
                info.open(map, marcadorcalle[num]);
            } else {
                info.setContent('<b>' + locales[num][0] + '</b><br />' + locales[num][3]);
                info.open(map, marcador[num]);
            }
        };
    };

    func1 = function (objeto, tipo) {
        return function () {
            if (tipo) {
                if (borrar) {borrar.setMap(null); }
                borrar = objeto;
                objeto.setMap(map);
            }
            map.setZoom(18);
            map.setCenter(objeto.getPosition());
            google.maps.event.trigger(objeto, 'click');
        };
    };

    mapOptions = {
        zoom: 16,
        scrollwheel: false,
        center: new google.maps.LatLng(19.xxxxx, -19.xxxxx)
    };

【问题讨论】：

标签： python regex text web-scraping extract

【解决方案1】：

这是正则表达式最能发挥作用的地方：

import re

map_lat_long = re.compile(r'google\.maps\.LatLng\(([\d.-]+),\s*([\d.-]+)\)')
lat, long = map_lat_long.search(page_source).groups()

这假定使用了实际数字而不是 xxxxx。该表达式匹配文字 google.maps.LatLng(..) 文本，并通过查找 1 个或多个数字、点和破折号从中提取两个数字。

演示（减少样本）：

>>> import re
>>> sample = '''\
... mapOptions = {
...     zoom: 16,
...     scrollwheel: false,
...     center: new google.maps.LatLng(19.12345, -19.67890)
... };
... '''
>>> map_lat_long = re.compile(r'google\.maps\.LatLng\(([\d.-]+),\s*([\d.-]+)\)')
>>> map_lat_long.search(sample).groups()
('19.12345', '-19.67890')

【讨论】：

非常感谢！样品效果很好！但是我尝试通过source_page=requests.get(url)获取页面源，然后在这里使用你的代码的时候报错：TypeError: expected string or buffer 这是什么意思？
@ximu：你有一个响应对象，而不是文本。使用source_page.content 获取（未解码的）页面源。
非常感谢！！有用。只是想知道，我可以把 re.compile(r'center: \new\google\.maps\.LatLng(([\d.-]+),\s*([\d.-]+))') ，我尝试运行它，它给出了 AttributeError: 'NoneType' object has no attribute 'groups'。
@ximu：这意味着您的模式不正确并且没有匹配。请改用r'center: new google\.maps\.LatLng\(([\d.-]+),\s*([\d.-]+)\)'。
@ximu：可能首先在在线正则表达式编辑器上尝试模式。这是更新模式的 Regex101 链接，您可以使用它：regex101.com/r/nX0tD8/1