使用 gdata python 客户端在博客上批量发布答案

【问题标题】：Batch posting on blogger using gdata python client使用 gdata python 客户端在博客上批量发布
【发布时间】：2014-08-26 05:37:32
【问题描述】：

我正在尝试将我的所有 Livejournal 帖子复制到我在 blogger.com 上的新博客。我通过使用gdata python client 附带的稍微修改的example 来做到这一点。我有一个 json 文件，其中包含从 Livejournal 导入的所有帖子。问题是 blogger.com 有每天发布新博客条目的每日限制 - 50，因此您可以想象我的 1300 多篇帖子将在一个月内被复制，因为在 50 次导入后我无法以编程方式输入验证码。

我最近了解到gdata的某处也有批处理模式，但我不知道如何使用它。谷歌搜索并没有真正的帮助。

我们将不胜感激任何建议或帮助。

谢谢。

更新

以防万一，我使用以下代码

#!/usr/local/bin/python
import json
import requests

from gdata import service
import gdata
import atom
import getopt
import sys

from datetime import datetime as dt
from datetime import timedelta as td
from datetime import tzinfo as tz

import time

allEntries = json.load(open("todays_copy.json", "r"))

class TZ(tz):
    def utcoffset(self, dt): return td(hours=-6)

class BloggerExample:
    def __init__(self, email, password):
        # Authenticate using ClientLogin.
        self.service = service.GDataService(email, password)
        self.service.source = "Blogger_Python_Sample-1.0"
        self.service.service = "blogger"
        self.service.server = "www.blogger.com"
        self.service.ProgrammaticLogin()

        # Get the blog ID for the first blog.
        feed = self.service.Get("/feeds/default/blogs")
        self_link = feed.entry[0].GetSelfLink()
        if self_link:
            self.blog_id = self_link.href.split("/")[-1]

    def CreatePost(self, title, content, author_name, label, time):
        LABEL_SCHEME = "http://www.blogger.com/atom/ns#"
        # Create the entry to insert.
        entry = gdata.GDataEntry()
        entry.author.append(atom.Author(atom.Name(text=author_name)))
        entry.title = atom.Title(title_type="xhtml", text=title)
        entry.content = atom.Content(content_type="html", text=content)
        entry.published = atom.Published(time)
        entry.category.append(atom.Category(scheme=LABEL_SCHEME, term=label))

        # Ask the service to insert the new entry.
        return self.service.Post(entry, 
            "/feeds/" + self.blog_id + "/posts/default")

    def run(self, data):
        for year in allEntries:
            for month in year["yearlydata"]:
                for day in month["monthlydata"]:
                    for entry in day["daylydata"]:
                        # print year["year"], month["month"], day["day"], entry["title"].encode("utf-8")
                        atime = dt.strptime(entry["time"], "%I:%M %p")
                        hr = atime.hour
                        mn = atime.minute
                        ptime = dt(year["year"], int(month["month"]), int(day["day"]), hr, mn, 0, tzinfo=TZ()).isoformat("T")
                        public_post = self.CreatePost(entry["title"],
                            entry["content"],
                            "My name",
                            ",".join(entry["tags"]),
                            ptime)
                        print "%s, %s - published, Waiting 30 minutes" % (ptime, entry["title"].encode("utf-8"))
                        time.sleep(30*60)


def main(data):
    email = "my@email.com"
    password = "MyPassW0rd"

    sample = BloggerExample(email, password)
    sample.run(data)

if __name__ == "__main__":
    main(allEntries)

【问题讨论】：

你能绕过并通过 python 独立脚本手动将每条记录从一个数据库写入另一个数据库吗？不熟悉 livejournal 或 blogger，但我不得不批量处理大量帖子，所以我有兴趣提供帮助。
@Joaq2Remember 对不起，我没有真正关注，请您澄清一下吗？谢谢。
可能建立到两个数据库的两个连接； livejournal和博客。从 live journal 中选择然后将副本写入 blogger 或建立 blogger db 连接并通过解析 Json 写入。
@Joaq2Remember 我希望我能做到，但博主只提供 REST API，所以我无法直接访问他们的数据库。
那么您的问题是对他们的 API 发布请求的硬性限制？除了 API 之外，还有其他发布项目吗？我可能会建议使用机器人通过 CMS 发布这些帖子，或者尝试与博客中的某个人取得联系，看看他们是否可以帮助您。

标签： python batch-processing blogger gdata-python-client

【解决方案1】：

我建议改用 Google 博客转换器 (https://code.google.com/archive/p/google-blog-converters-appengine/)

要开始你必须经历

https://github.com/google/gdata-python-client/blob/master/INSTALL.txt - 设置 Google GData API 的步骤 https://github.com/pra85/google-blog-converters-appengine/blob/master/README.txt - 使用博客转换器的步骤

一旦你完成了所有设置，你必须运行以下命令（它的 LiveJournal 用户名和密码）

livejournal2blogger.sh -u <username> -p <password> [-s <server>]

将其输出重定向到 .xml 文件。此文件现在可以直接导入 Blogger 博客，方法是转到 Blogger 仪表板，您的博客 > 设置 > 其他 > 博客工具 > 导入博客

这里记得勾选自动发布所有导入的帖子和页面选项。我之前曾在一个包含 400 多篇文章的博客上尝试过一次，Blogger 确实成功导入并发布了它们而没有问题

如果您怀疑 Blogger 可能存在一些问题（因为帖子数量非常多），或者您的帐户中有其他 Blogger 博客。然后为了预防起见，创建一个单独的 Blogger (Google) 帐户，然后尝试导入帖子。之后，您可以将管理员控制权转移到您的真实 Blogger 帐户（要转移，您首先必须发送作者邀请，然后将您的真实 Blogger 帐户提升到管理员级别，最后删除虚拟帐户。发送邀请的选项位于设置 > 基本 > 权限 > 博客作者）

还要确保您使用的是 Python 2.5，否则这些脚本将无法运行。在运行 livejournal2blogger.sh 之前，更改以下行（感谢 Michael Fleet 修复此问题http://michael.f1337.us/2011/12/28/google-blog-converters-blogger2wordpress/）

PYTHONPATH=${PROJ_DIR}/lib python ${PROJ_DIR}/src/livejournal2blogger/lj2b.py $*

到

PYTHONPATH=${PROJ_DIR}/lib python2.5 ${PROJ_DIR}/src/livejournal2blogger/lj2b.py $*

附：我知道这不是您问题的答案，但由于此答案的目标与您的问题相同（每天导入 50 多个帖子），这就是我分享它的原因。我对 Python 或 GData API 知之甚少，我设置了环境并按照以下步骤回答了这个问题（并且我能够使用它将帖子从 LiveJournal 导入到 Blogger）。

【讨论】：

好的，这个看起来是合法的答案。我可以在一次导入中传输 1000 个帖子。看起来 1000 是一个新的限制。比50好多了。所以我会接受这个。谢谢@PrayagVerma。
所以，这实际上是一个完整的答案。 1000 个帖子的限制原来是 livejournal 限制，而不是博主。

【解决方案2】：

# build feed
request_feed = gdata.base.GBaseItemFeed(atom_id=atom.Id(text='test batch'))
# format each object 
entry1 = gdata.base.GBaseItemFromString('--XML for your new item goes here--')
entry1.title.text = 'first batch request item'
entry2 = gdata.base.GBaseItemFromString('--XML for your new item here--')
entry2.title.text = 'second batch request item'

# Add each blog item to the request feed 
request_feed.AddInsert(entry1)
request_feed.AddInsert(entry2)

# Execute the batch processes through the request_feed (all items)
result_feed = gd_client.ExecuteBatch(request_feed)

【讨论】：

天哪，这看起来很有希望。我马上测试一下。谢谢和+1。我会在正确测试后将其标记为解决方案。
我还找到了不同的文档。构建请求源似乎有所不同，但这是gdata-python-client.googlecode.com/hg/pydocs/… 上的一些文档。构建它使用 GDatafeed 的提要。让我知道你的想法
@Kaster bounty 即将结束......让我知道你的进展情况，以便它可以被授予/不授予或不管是什么情况 :)
@JonClements 不幸的是，到目前为止我无法让它工作。该代码遗漏了其他一些重要部分，我不完全确定如何使用它。尝试了不同的东西，但到目前为止没有运气。另一件事是我没有太多时间检查所有选项，但我会继续尝试。