【问题标题】:Get all images from a board from a Pinterest web address从 Pinterest 网址获取板上的所有图像
【发布时间】:2013-08-21 04:07:31
【问题描述】:

这个问题听起来很简单,但并不像听起来那么简单。

问题的简要总结

例如,使用此板; http://pinterest.com/dodo/web-designui-and-mobile/

检查页面顶部的板本身的 HTML(在 div 内,类 GridItems)在页面顶部产生:

<div class="variableHeightLayout padItems GridItems Module centeredWithinWrapper" style="..">
    <!-- First div with a displayed board image -->
    <div class="item" style="top: 0px; left: 0px; visibility: visible;">..</div>
    ...
    <!-- Last div with a displayed board image -->
    <div class="item" style="top: 3343px; left: 1000px; visibility: visible;">..</div>
</div>

然而在页面底部,在激活无限滚动几次之后,我们得到了 HTML:

<div class="variableHeightLayout padItems GridItems Module centeredWithinWrapper" style="..">
    <!-- First div with a displayed board image -->
    <div class="item" style="top: 12431px; left: 750px; visibility: visible;">..</div>
    ...
    <!-- Last div with a displayed board image -->
    <div class="item" style="top: 19944px; left: 750px; visibility: visible;">..</div>
</div>

如您所见,页面上方的一些图片容器已经消失,并且并非所有图片容器都在首次加载页面时加载。


我想做的事

我希望能够创建一个可以下载页面的完整 HTML 的 C# 脚本(或目前的任何服务器端语言)(即检索页面上的每个图像),然后图像将从他们的网址。下载网页并使用适当的 XPath 很容易,但真正的挑战是为每张图片下载完整的 HTML。

有没有一种方法可以模拟滚动到页面底部,或者有没有一种更简单的方法可以检索每张图片?我想 Pinterest 使用 AJAX 来更改 HTML,有没有办法以编程方式触发事件以接收所有 HTML?预先感谢您的建议和解决方案,如果您没有任何建议和解决方案,即使阅读这个很长的问题也很荣幸!

伪代码

using System;
using System.Net;
using HtmlAgilityPack;

private void Main() {
    string pinterestURL = "http://www.pinterest.com/...";
    string XPath = ".../img";

    HtmlDocument doc = new HtmlDocument();

    // Currently only downloads the first 25 images.
    doc.Load(strPinterestUrl);

    foreach(HtmlNode link in doc.DocumentElement.SelectNodes(strXPath))
    {
         image_links[] = link["src"];
         // Use image links
    }
}

【问题讨论】:

  • 它只加载 25,因为当您滚动到底部时,它会通过 ajax 按需加载其余部分,即“无限滚动”。我想你必须模仿那种滚动。或者,如果他们伸出手指,他们的 API 就已经发布了。
  • 我没有办法管理调用 AJAX 事件时到底发生了什么? API 真是太可惜了
  • 嗯,我不这么认为。您可能最好尝试在 JavaScript/Jquery 中执行此操作,这样您可以获得所有链接,然后模拟滚动到最后,然后在重复直到滚动完成后,您可以将字符串数组发送到服务器。跨度>
  • 我不知道如何编写这样一个功能性脚本。您是否有任何指向教程/代码 sn-ps 的链接,您可以向我展示模拟 JavaScript 效果/将字符串返回到服务器端而无需重新加载页面?
  • @NickBull 你是如何用 C# 实现这个的,你能提供一些想法吗..

标签: c# javascript html ajax pinterest


【解决方案1】:

一些人建议使用 javascript 来模拟滚动。

我认为您根本不需要模拟滚动,我认为您只需要在滚动发生时找出通过 AJAX 调用的 URI 的格式,然后您就可以按顺序获取结果的每个“页面”。需要一点反向工程。

使用 Chrome 检查器的网络选项卡,我可以看到,一旦我到达页面下方一定距离,就会调用此 URI:

http://pinterest.com/resource/BoardFeedResource/get/?source_url=%2Fdodo%2Fweb-designui-and-mobile%2F&data=%7B%22options%22%3A%7B%22board_id%22%3A%22158400180582875562%22%2C%22access%22%3A%5B%5D%2C%22bookmarks%22%3A%5B%22LT4xNTg0MDAxMTE4NjcxMTM2ODk6MjV8ZWJjODJjOWI4NTQ4NjU4ZDMyNzhmN2U3MGQyZGJhYTJhZjY2ODUzNTI4YTZhY2NlNmY0M2I1ODYwYjExZmQ3Yw%3D%3D%22%5D%7D%2C%22context%22%3A%7B%22app_version%22%3A%22fb43cdb%22%7D%2C%22module%22%3A%7B%22name%22%3A%22GridItems%22%2C%22options%22%3A%7B%22scrollable%22%3Atrue%2C%22show_grid_footer%22%3Atrue%2C%22centered%22%3Atrue%2C%22reflow_all%22%3Atrue%2C%22virtualize%22%3Atrue%2C%22item_options%22%3A%7B%22show_rich_title%22%3Afalse%2C%22squish_giraffe_pins%22%3Afalse%2C%22show_board%22%3Afalse%2C%22show_via%22%3Afalse%2C%22show_pinner%22%3Afalse%2C%22show_pinned_from%22%3Atrue%7D%2C%22layout%22%3A%22variable_height%22%7D%7D%2C%22append%22%3Atrue%2C%22error_strategy%22%3A1%7D&_=1377092055381

如果我们对其进行解码,我们会发现它主要是 JSON

http://pinterest.com/resource/BoardFeedResource/get/?source_url=/dodo/web-designui-and-mobile/&data=
{
"options": {
    "board_id": "158400180582875562",
    "access": [],
    "bookmarks": [
        "LT4xNTg0MDAxMTE4NjcxMTM2ODk6MjV8ZWJjODJjOWI4NTQ4NjU4ZDMyNzhmN2U3MGQyZGJhYTJhZjY2ODUzNTI4YTZhY2NlNmY0M2I1ODYwYjExZmQ3Yw=="
    ]
},
"context": {
    "app_version": "fb43cdb"
},
"module": {
    "name": "GridItems",
    "options": {
        "scrollable": true,
        "show_grid_footer": true,
        "centered": true,
        "reflow_all": true,
        "virtualize": true,
        "item_options": {
            "show_rich_title": false,
            "squish_giraffe_pins": false,
            "show_board": false,
            "show_via": false,
            "show_pinner": false,
            "show_pinned_from": true
        },
        "layout": "variable_height"
    }
},
"append": true,
"error_strategy": 1
}
&_=1377091719636

向下滚动,直到我们收到第二个请求,我们会看到这个

http://pinterest.com/resource/BoardFeedResource/get/?source_url=/dodo/web-designui-and-mobile/&data=
{
    "options": {
        "board_id": "158400180582875562",
        "access": [],
        "bookmarks": [
            "LT4xNTg0MDAxMTE4NjcwNTk1ODQ6NDl8ODFlMDUwYzVlYWQxNzVmYzdkMzI0YTJiOWJkYzUwOWFhZGFkM2M1MzhiNzA0ZDliZDIzYzE3NjkzNTg1ZTEyOQ=="
        ]
    },
    "context": {
        "app_version": "fb43cdb"
    },
    "module": {
        "name": "GridItems",
        "options": {
            "scrollable": true,
            "show_grid_footer": true,
            "centered": true,
            "reflow_all": true,
            "virtualize": true,
            "item_options": {
                "show_rich_title": false,
                "squish_giraffe_pins": false,
                "show_board": false,
                "show_via": false,
                "show_pinner": false,
                "show_pinned_from": true
            },
            "layout": "variable_height"
        }
    },
    "append": true,
    "error_strategy": 2
}
&_=1377092231234

如您所见,变化不大。 Board_id 是一样的。 error_strategy 现在是 2,最后的 &_ 不同了。

&_ 参数是这里的关键。我敢打赌,它会告诉页面从哪里开始下一组照片。我在响应或原始页面 HTML 中都找不到对它的引用,但它必须在某处,或者由客户端的 javascript 生成。无论哪种方式,页面/浏览器都必须知道接下来要询问什么,因此您应该能够获得这些信息。

【讨论】:

  • 非常感谢您的回答 - 它的信息量很大,但信息量不够。我真的很努力,很遗憾我被难住了,因为我也遇到了这个 JSON 脚本并想知道发生了什么。还要注意“书签”值的变化——另一个谜。我已经提供了 50rep 的赏金来获取答案,该答案可以告诉我 JSON 的确切部分导致这些更新以及如何触发它。我也相信仅 JSON 信息就应该允许我请求这些 URL,返回 HTML 并识别图像,然后对 JSON 字符的 URL 进行编码,直到董事会结束。
【解决方案2】:

您可以通过使用此标头发出请求来触发 json 端点:X-Requested-With:XMLHttpRequest

在控制台的命令中试试这个:

curl -H "X-Requested-With:XMLHttpRequest" "http://pinterest.com/resource/CategoryFeedResource/get/?source_url=%2Fall%2Fgeek%2F&data=%7B%22options%22%3A%7B%22feed%22%3A%22geek%22%2C%22scope%22%3Anull%2C%22bookmarks%22%3A%5B%22Pz8xMzc3NjU4MjEyLjc0Xy0xfDE1ZjczYzc4YzNlNDg3M2YyNDQ4NGU1ZTczMmM0ZTQyYzBjMWFiMWNhYjRhMDRhYjg2MTYwMGVkNWQ0ZDg1MTY%3D%22%5D%2C%22is_category_feed%22%3Atrue%7D%2C%22context%22%3A%7B%22app_version%22%3A%22addc92b%22%7D%2C%22module%22%3A%7B%22name%22%3A%22GridItems%22%2C%22options%22%3A%7B%22scrollable%22%3Atrue%2C%22show_grid_footer%22%3Atrue%2C%22centered%22%3Atrue%2C%22reflow_all%22%3Atrue%2C%22virtualize%22%3Atrue%2C%22item_options%22%3A%7B%22show_pinner%22%3Atrue%2C%22show_pinned_from%22%3Afalse%2C%22show_board%22%3Atrue%2C%22show_via%22%3Afalse%7D%2C%22layout%22%3A%22variable_height%22%7D%7D%2C%22append%22%3Atrue%2C%22error_strategy%22%3A2%7D&module_path=App()%3EHeader()%3EDropdownButton()%3EDropdown()%3ECategoriesMenu(resource%3D%5Bobject+Object%5D%2C+name%3DCategoriesMenu%2C+resource%3DCategoriesResource(browsable%3Dtrue))&_=1377658213300" | python -mjson.tool

您将在输出的 json 中看到 pin 数据。您应该能够解析它并获取您需要的下一个图像。

至于这个位:&amp;_=1377658213300。我推测这是上一个列表的最后一个引脚的 id。您应该能够在每次呼叫时用上一个响应中的最后一个 pin 替换它。

【讨论】:

    【解决方案3】:

    好的,所以我认为这可能是您需要的(稍作改动)。

    注意事项:

    1. 这是 PHP,而不是 C#(但您说您对任何服务器端语言都感兴趣)。
    2. 此代码与(非官方)Pinterest 搜索端点挂钩。您需要更改 $data 和 $search_res 以反映您的任务的适当端点(例如 BoardFeedResouce)。注意:至少对于搜索,Pinterest 目前使用两个端点,一个用于初始页面加载,另一个用于无限滚动操作。每个都有自己的预期参数结构。
    3. Pinterest 没有官方的公共 API,预计只要他们更改任何内容,它就会在没有警告的情况下中断。
    4. 您可能会发现 pinterestapi.co.uk 更易于实施并且可以接受您正在做的事情。
    5. 我在类下面有一些演示/调试代码,一旦您获得所需的数据,这些代码就不应该存在,以及您可能想要更改的默认页面获取限制。

    兴趣点:

    1. 下划线_ 参数采用JavaScript 格式的时间戳,即。像 Unix 时间,但它增加了毫秒。它实际上并没有用于分页。
    2. 分页使用bookmarks 属性,因此您向不需要它的“新”端点发出第一个请求,然后从结果中获取bookmarks 并在您的请求中使用它以获得下一个结果的“页面”,从这些结果中获取bookmarks 以获取之后的下一页,依此类推,直到您用完结果或达到预设限制(或者您达到脚本执行时间的服务器最大值) .我很想知道bookmarks 字段编码的确切内容。我想除了 pin ID 或其他页面标记之外,还有一些有趣的秘诀。
    3. 我跳过了 html,而是处理 JSON,因为它(对我来说)比使用 DOM 操作解决方案或一堆正则表达式更容易。
    <?php
    
    if(!class_exists('Skrivener_Pins')) {
    
      class Skrivener_Pins {
    
        /**
         * Constructor
         */
        public function __construct() {
        }
    
        /**
         * Pinterest search function. Uses Pinterest's "internal" page APIs, so likely to break if they change.
         * @author [@skrivener] Philip Tillsley
         * @param $search_str     The string used to search for matching pins.
         * @param $limit          Max number of pages to get, defaults to 2 to avoid excessively large queries. Use care when passing in a value.
         * @param $bookmarks_str  Used internally for recursive fetches.
         * @param $pages          Used internally to limit recursion.
         * @return array()        int['id'], obj['image'], str['pin_link'], str['orig_link'], bool['video_flag']
         * 
         * TODO:
            * 
            * 
         */
        public function get_tagged_pins($search_str, $limit = 1, $bookmarks_str = null, $page = 1) {
    
          // limit depth of recursion, ie. number of pages of 25 returned, otherwise we can hang on huge queries
          if( $page > $limit ) return false;
    
          // are we getting a next page of pins or not
          $next_page = false;
          if( isset($bookmarks_str) ) $next_page = true;
    
          // build url components
          if( !$next_page ) {
    
            // 1st time
            $search_res = 'BaseSearchResource'; // end point
            $path = '&module_path=' . urlencode('SearchInfoBar(query=' . $search_str . ', scope=boards)');
            $data = preg_replace("'[\n\r\s\t]'","",'{
              "options":{
                "scope":"pins",
                "show_scope_selector":true,
                "query":"' . $search_str . '"
              },
              "context":{
                "app_version":"2f83a7e"
              },
              "module":{
                "name":"SearchPage",
                "options":{
                  "scope":"pins",
                  "query":"' . $search_str . '"
                }
              },
              "append":false,
              "error_strategy":0
              }');
          } else {
    
            // this is a fetch for 'scrolling', what changes is the bookmarks reference, 
            // so pass the previous bookmarks value to this function and it is included
            // in query
            $search_res = 'SearchResource'; // different end point from 1st time search
            $path = '';
            $data = preg_replace("'[\n\r\s\t]'","",'{
              "options":{
                "query":"' . $search_str . '",
                "bookmarks":["' . $bookmarks_str . '"],
                "show_scope_selector":null,
                "scope":"pins"
              },
              "context":{
                "app_version":"2f83a7e"
              },
                "module":{
                  "name":"GridItems",
                "options":{
                  "scrollable":true,
                  "show_grid_footer":true,
                  "centered":true,
                  "reflow_all":true,
                  "virtualize":true,
                  "item_options":{
                    "show_pinner":true,
                    "show_pinned_from":false,
                    "show_board":true
                  },
                  "layout":"variable_height"
                }
              },
              "append":true,
              "error_strategy":2
            }');
          }
          $data = urlencode($data);
          $timestamp = time() * 1000; // unix time but in JS format (ie. has ms vs normal server time in secs), * 1000 to add ms (ie. 0ms)
    
          // build url
          $url = 'http://pinterest.com/resource/' . $search_res . '/get/?source_url=/search/pins/?q=' . $search_str
              . '&data=' . $data
              . $path
              . '&_=' . $timestamp;//'1378150472669';
    
          // setup curl
          $ch = curl_init();
          curl_setopt($ch, CURLOPT_URL, $url);
          curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
          curl_setopt($ch, CURLOPT_HTTPHEADER, array("X-Requested-With: XMLHttpRequest"));
    
          // get result
          $curl_result = curl_exec ($ch); // this echoes the output
          $curl_result = json_decode($curl_result);
          curl_close ($ch);
    
          // clear html to make var_dumps easier to see when debugging
          // $curl_result->module->html = '';
    
          // isolate the pin data, different end points have different data structures
          if(!$next_page) $pin_array = $curl_result->module->tree->children[1]->children[0]->children[0]->children;
          else $pin_array = $curl_result->module->tree->children;
    
          // map the pin data into desired format
          $pin_data_array = array();
          $bookmarks = null;
          if(is_array($pin_array)) {
            if(count($pin_array)) {
    
              foreach ($pin_array as $pin) {
    
                //setup data
                $image_id = $pin->options->pin_id;
                $image_data = ( isset($pin->data->images->originals) ) ? $pin->data->images->originals : $pin->data->images->orig;
                $pin_url = 'http://pinterest.com/pin/' . $image_id . '/';
                $original_url = $pin->data->link;
                $video = $pin->data->is_video;
    
                array_push($pin_data_array, array(
                  "id"          => $image_id,
                  "image"       => $image_data,
                  "pin_link"    => $pin_url,
                  "orig_link"   => $original_url,
                  "video_flag"  => $video,
                  ));
              }
              $bookmarks = reset($curl_result->module->tree->resource->options->bookmarks);
    
            } else {
              $pin_data_array = false;
            }
          }
    
          // recurse until we're done
          if( !($pin_data_array === false) && !is_null($bookmarks) ) {
    
            // more pins to get
            $more_pins = $this->get_tagged_pins($search_str, $limit, $bookmarks, ++$page);
            if( !($more_pins === false) ) $pin_data_array = array_merge($pin_data_array, $more_pins);
            return $pin_data_array;
          }
    
          // end of recursion
          return false;
        }
    
      } // end class Skrivener_Pins
    } // end if
    
    
    
    /**
     * Debug/Demo Code
     * delete or comment this section for production
     */
    
    // output headers to control how the content displays
    // header("Content-Type: application/json");
    header("Content-Type: text/plain");
    // header("Content-Type: text/html");
    
    // define search term
    // $tag = "vader";
    $tag = "haemolytic";
    // $tag = "qjkjgjerbjjkrekhjk";
    
    if(class_exists('Skrivener_Pins')) {
    
      // instantiate the class
      $pin_handler = new Skrivener_Pins();
    
      // get pins, pinterest returns 25 per batch, function pages through this recursively, pass in limit to 
      // override default limit on number of pages to retrieve, avoid high limits (eg. limit of 20 * 25 pins/page = 500 pins to pull 
      // and 20 separate calls to Pinterest)
      $pins1 = $pin_handler->get_tagged_pins($tag, 2);
    
      // display the pins for demo purposes
      echo '<h1>Images on Pinterest mentioning "' . $tag . '"</h1>' . "\n";
      if( $pins1 != false ) {
        echo '<p><em>' . count($pins1) . ' images found.</em></p>' . "\n";
        skrivener_dump_images($pins1, 5);
      } else {
        echo '<p><em>No images found.</em></p>' . "\n";
      }
    }
    
    // demo function, dumps images in array to html img tags, can pass limit to only display part of array
    function skrivener_dump_images($pin_array, $limit = false) {
      if(is_array($pin_array)) {
        if($limit) $pin_array = array_slice($pin_array, -($limit));
        foreach ($pin_array as $pin) {
          echo '<img src="' . $pin['image']->url . '" width="' . $pin['image']->width . '" height="' . $pin['image']->height . '" >' . "\n";
        }
      }
    }
    
    ?>
    

    如果您在使其适应您的特定端点时遇到问题,请告诉我。 Apols 对于代码中的任何草率,它最初并没有投入生产。

    【讨论】:

    • 这个问题花了一些时间来回答,但你基本上已经解决了——幸运的是我也几乎破解了那个难以捉摸的bookmarks。例如,采取一些书签字符串和puttingthem通过一个base64解码器提供了: - > 18788523419059400:25 | 77a8c15de91998d843301116b0345928753478fa9ac0b7da855a8eeccb9c1f84 - > 18788523419039267:49 | 3686b33864aa96a215b28dd5e442afc06e6c76615a8adaae9f6f526432d47d12下面的格式: - > {pinID} {itemNumber} | {随机base16字符串of 64 chars} 帮我破解最后一部分,我想我们会做到的!
    • 不错!在 10 月底的项目之间休息之前,我可能没有机会进一步挖掘。我的直觉会暗示时间/日期戳的一些变化,或者可能是用于错误检查的某些数据部分的哈希,但这些都是在黑暗中刺伤。当我有时间时会重新访问:)
    • 感谢您的帮助,因为 64 字符的十六进制散列系统并不多。我已经尝试在 SHA256 中对 ->{pinID}:{item#}、{pinID}:{item#} 和 {pinID} 进行编码,但没有成功。您提供的 PHP 无论如何都可以工作,但如果这是完全编程的,那就太好了!再次感谢您一直以来的帮助:)
    【解决方案4】:
    #!/usr/bin/env bash 
    ##
    ## File: getpins.bsh 
    ## 
    ## Copyrighted by +A.M.Danischewski  2016+ (c)
    ## This program may be reutilized without limits, provided this 
    ## notice remain intact. 
    
    ## If this breaks one day, then just fire up firefox Developer Tools and check the network traffic to 
    ## capture "copy as curl" of the calls to the search page (filter with BaseSearchResource), then the 
    ## call to feed more data (filter with SearchResource). 
    ## 
    ## Do a search on whatever you want remove the cookie header, and add -o ret2.html -D h2.txt -c c1.txt, 
    ## then search replace the search terms as SEARCHTOKEN1 and SEARCHTOKEN2. 
    ## 
    ## Description this script facilitates alternate browsers, by caching images/pins 
    ## from pinterest. This script is hardwired for two search terms. First create a directory 
    ## to where you want the images to go, then cd there. 
    ##  Usage: 
    ##    $> cd /big/drive/auto_gyros 
    ##    $> getpins.bsh "sleek autogyros"
    ## 
    ## Expect around 900 images to land wherever you select, so make sure you have space! =) 
    ##
    
    declare -r ORIG_IMGS="pin_orig_imgs.txt"
    declare -r TMP_IMGS="pin_imgs.txt"
    declare -r UA_HEADER="User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.$(($RANDOM%10))) Gecko/20100101 Firefox/19.0"
    
     ## Say Hello to the main page and get a cookie. 
    declare PINCMD1=$(cat << EOF
    curl -o ret1.html -D h1.txt -c c1.txt -H 'Host: www.pinterest.com' -H '${UA_HEADER}' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Accept-Language: en-US,en;q=0.5' --compressed -H 'Connection: keep-alive' 'https://www.pinterest.com/'
    EOF
    )
     ## Start a search for our dear search terms. 
    declare PINCMD2=$(cat << EOF
    curl -H 'X-APP-VERSION: ea7a93a' -o ret2.html -D h2.txt -c c1.txt -H 'Host: www.pinterest.com' -H '${UA_HEADER}' -H 'Accept: application/json, text/javascript, */*; q=0.01' -H 'Accept-Language: en-US,en;q=0.5' --compressed -H 'X-Pinterest-AppState: active' -H 'X-NEW-APP: 1'  -H 'X-Requested-With: XMLHttpRequest' -H 'Referer: https://www.pinterest.com' -H 'Connection: keep-alive' 'https://www.pinterest.com/resource/BaseSearchResource/get/?source_url=%2Fsearch%2Fpins%2F%3Fq%3DSEARCHTOKEN1%2520SEARCHTOKEN2%26rs%3Dtyped%260%3DSEARCHTOKEN1%257Ctyped%261%3DSEARCHTOKEN2%257Ctyped&data=%7B%22options%22%3A%7B%22restrict%22%3Anull%2C%22scope%22%3A%22pins%22%2C%22constraint_string%22%3Anull%2C%22show_scope_selector%22%3Atrue%2C%22query%22%3A%22SEARCHTOKEN1+SEARCHTOKEN2%22%7D%2C%22context%22%3A%7B%7D%2C%22module%22%3A%7B%22name%22%3A%22SearchPage%22%2C%22options%22%3A%7B%22restrict%22%3Anull%2C%22scope%22%3A%22pins%22%2C%22constraint_string%22%3Anull%2C%22show_scope_selector%22%3Atrue%2C%22query%22%3A%22SEARCHTOKEN1+SEARCHTOKEN2%22%7D%7D%2C%22render_type%22%3A1%2C%22error_strategy%22%3A0%7D&module_path=App%3EHeader%3ESearchForm%3ETypeaheadField(support_guided_search%3Dtrue%2C+resource_name%3DAdvancedTypeaheadResource%2C+tags%3Dautocomplete%2C+class_name%3DbuttonOnRight%2C+prefetch_on_focus%3Dtrue%2C+support_advanced_typeahead%3Dnull%2C+hide_tokens_on_focus%3Dundefined%2C+search_on_focus%3Dtrue%2C+placeholder%3DSearch%2C+show_remove_all%3Dtrue%2C+enable_recent_queries%3Dtrue%2C+name%3Dq%2C+view_type%3Dguided%2C+value%3D%22%22%2C+input_log_element_type%3D227%2C+populate_on_result_highlight%3Dtrue%2C+search_delay%3D0%2C+is_multiobject_search%3Dtrue%2C+type%3Dtokenized%2C+enable_overlay%3Dtrue)&_=1454779874891' 
    EOF
    )
     ## Load further images. 
    declare PINCMD3=$(cat << EOF
    curl -H 'X-APP-VERSION: ea7a93a' -D h3.txt -c c1.txt -H 'Host: www.pinterest.com' -H '${UA_HEADER}' -H 'Accept: application/json, text/javascript, */*; q=0.01' -H 'Accept-Language: en-US,en;q=0.5' --compressed -H 'X-Pinterest-AppState: active' -H 'X-NEW-APP: 1'  -H 'X-Requested-With: XMLHttpRequest' -H 'Referer: https://www.pinterest.com' -H 'Connection: keep-alive' 'https://www.pinterest.com/resource/SearchResource/get/?source_url=%2Fsearch%2Fpins%2F%3Fq%3DSEARCHTOKEN1%2520SEARCHTOKEN2%26rs%3Dtyped%260%3DSEARCHTOKEN1%257Ctyped%261%3DSEARCHTOKEN2%257Ctyped&data=%7B%22options%22%3A%7B%22layout%22%3Anull%2C%22places%22%3Afalse%2C%22constraint_string%22%3Anull%2C%22show_scope_selector%22%3Atrue%2C%22query%22%3A%22SEARCHTOKEN1+SEARCHTOKEN2%22%2C%22scope%22%3A%22pins%22%2C%22bookmarks%22%3A%5B%22_NEW_BOOK_MARK_%22%5D%7D%2C%22context%22%3A%7B%7D%7D&module_path=App%3EHeader%3ESearchForm%3ETypeaheadField(support_guided_search%3Dtrue%2C+resource_name%3DAdvancedTypeaheadResource%2C+tags%3Dautocomplete%2C+class_name%3DbuttonOnRight%2C+prefetch_on_focus%3Dtrue%2C+support_advanced_typeahead%3Dnull%2C+hide_tokens_on_focus%3Dundefined%2C+search_on_focus%3Dtrue%2C+placeholder%3DSearch%2C+show_remove_all%3Dtrue%2C+enable_recent_queries%3Dtrue%2C+name%3Dq%2C+view_type%3Dguided%2C+value%3D%22%22%2C+input_log_element_type%3D227%2C+populate_on_result_highlight%3Dtrue%2C+search_delay%3D0%2C+is_multiobject_search%3Dtrue%2C+type%3Dtokenized%2C+enable_overlay%3Dtrue)&_=1454779874911'
    EOF
    )
     ## Exactly 2 search terms in a single string are expected, you can hack it up if 
     ## you want something else.  
    declare SEARCHTOKEN1=$(echo "${1}" | cut -d " " -f1)
    declare SEARCHTOKEN2=$(echo "${1}" | cut -d " " -f2)
    
    PINCMD3=$(sed "s/SEARCHTOKEN1/${SEARCHTOKEN1}/g" <<< "${PINCMD3}") 
    PINCMD3=$(sed "s/SEARCHTOKEN2/${SEARCHTOKEN2}/g" <<< "${PINCMD3}") 
    PINCMD2=$(sed "s/SEARCHTOKEN1/${SEARCHTOKEN1}/g" <<< "${PINCMD2}") 
    PINCMD2=$(sed "s/SEARCHTOKEN2/${SEARCHTOKEN2}/g" <<< "${PINCMD2}") 
    
    function lspinimgs() { grep -o "\"url\": \"http[s]*://[^\"]*.pinimg.com[^\"]*.jpg\"" "${1}" | cut -d " " -f2 | tr -d "\""; }
    function mkpinorig() { sed "s#\(^http.*\)\(com/\)\([^/]*\)\(/.*jpg\$\)#\1\2originals\4#g" "${1}" > "${2}"; }    
    function getpinbm() { grep -o "bookmarks\": [^ ]* "  "${1}" | sed "s/^book.*\[\"//g;s/\"\].*\$//g" | sort | uniq | grep -v "-end-"; }
    function changepinbm() { PINCMD3=$(sed "s/\(^.*\)\(bookmarks%22%3A%5B%22\)\(.*\)\(%22%5D.*\$\)/\1\2${1}\4/g" <<< "${PINCMD3}"); }
    function cleanup() { rm ret*html c1.txt "${TMP_IMGS}" h{1..3}.txt "${ORIG_IMGS}"; } 
    
    function main() { 
    eval "${PINCMD1}" 
    eval "${PINCMD2}"
    for ((i=3,lasti=2; i<10000; i++,lasti++)); do 
     pinbm=$(getpinbm "ret${lasti}.html")
     [[ -z "${pinbm}" ]] && break 
     changepinbm "${pinbm}"
     eval "${PINCMD3}" > "ret${i}.html"
    done 
    for a in *.html; do lspinimgs "${a}" >> "${TMP_IMGS}"; done
    mkpinorig "${TMP_IMGS}" "${ORIG_IMGS}"
    IFS=$(echo -en "\n\b") && for a in $(sort "${ORIG_IMGS}" | uniq); do 
     wget --tries=3 -E -e robots=off -nc --random-wait --content-disposition --no-check-certificate -p --restrict-file-names=windows,lowercase,ascii --header "${UA_HEADER}" -nd "$a"  
    done
    cleanup 
    } 
    
    main 
    exit 0
    

    【讨论】:

      【解决方案5】:

      可能有点晚了,但是使用 py3-pinterest 开源项目你可以轻松做到:

      首先从板上获取所有引脚作为对象,它们还包括原始图像 url。

      # get all pins for the board
      board_pins = []
      pin_batch = pinterest.board_feed(board_id=target_board['id'], board_url=target_board['url'])
      
      while len(pin_batch) > 0:
          board_pins += pin_batch
          pin_batch = pinterest.board_feed(board_id=target_board['id'], board_url=target_board['url'])
      

      然后你就可以获取图片的url并下载它们或者用它们做任何你喜欢的事情

      for pin in board_pins:
          url = pin['image']
          # process image url..
      

      完整代码示例: https://github.com/bstoilov/py3-pinterest/blob/master/download_board_images.py

      是的,它的 python,但如果你仍然坚持使用 c#,它应该很容易移植:)

      【讨论】:

      • 我一直在尝试这种方法,但面临两个问题(a)代码示例没有,board_url=target_board['url'] 作为 pinterest.board_feed(..) 和 (b ) 即使我使用代码示例,pin_batch 中返回的内容也是一个空列表。
      • 我去看看
      猜你喜欢
      • 2016-01-14
      • 2023-02-06
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多