【问题标题】:How to load data from the online GAE datastore into the local development server?如何将在线 GAE 数据存储中的数据加载到本地开发服务器中?
【发布时间】:2012-11-13 19:15:14
【问题描述】:

我之前使用GAE docs 中描述的方法将我的实体的备份下载到实时数据存储中。

目前,我有一个每个实体类型的 csv 文件,我通过编写 bulkloader.yaml 并使用以下命令获得:

appcfg.py download_data --config_file=bulkloader.yaml --filename=users.csv --kind=Permission --url=http://your_app_id.appspot.com/_ah/remote_api

我还有一个使用以下命令获得的 sql3 转储文件:

appcfg.py download_data --kind=<kind> --url=http://your_app_id.appspot.com/_ah/remote_api --filename=<data-filename>

现在如果我尝试这个命令:

appcfg.py upload_data --url=http://your_app_id.appspot.com/_ah/remote_api --kind=<kind> --filename=<data-filename>

用 localhost:8080 替换 URL,它要求我输入用户名/密码。现在即使在http://localhost:8080/_ah/remote_api 中提供一个模拟用户名(test@example.com)并选中“admin”复选框,它总是会给我一个身份验证错误。

文档中提到的另一种选择是使用这个:

appcfg.py upload_data --config_file=album_loader.py --filename=album_data.csv --kind=Album --url=http://localhost:8080/_ah/remote_api <app-directory>

我写了一个加载器,试了一下,它还要求输入用户名和密码,但它在这里接受任何东西。输出如下:

/usr/local/google_appengine/google/appengine/api/search/search.py:232: UserWarning: DocumentOperationResult._code is deprecated. Use OperationResult._code instead.
  'Use OperationResult.%s instead.' % (name, name))
/usr/local/google_appengine/google/appengine/api/search/search.py:232: UserWarning: DocumentOperationResult._CODES is deprecated. Use OperationResult._CODES instead.
  'Use OperationResult.%s instead.' % (name, name))
Application: knowledgetestgame
Uploading data records.
[INFO    ] Logging to bulkloader-log-20121113.210613
[INFO    ] Throttling transfers:
[INFO    ] Bandwidth: 250000 bytes/second
[INFO    ] HTTP connections: 8/second
[INFO    ] Entities inserted/fetched/modified: 20/second
[INFO    ] Batch Size: 10
[INFO    ] Opening database: bulkloader-progress-20121113.210613.sql3
Please enter login credentials for localhost
Email: test@example.com
Password for test@example.com: 
[INFO    ] Connecting to localhost:8080/_ah/remote_api
[INFO    ] Starting import; maximum 10 entities per post
[ERROR   ] [WorkerThread-4] WorkerThread:
Traceback (most recent call last):
  File "/usr/local/google_appengine/google/appengine/tools/adaptive_thread_pool.py", line 176, in WorkOnItems
    status, instruction = item.PerformWork(self.__thread_pool)
  File "/usr/local/google_appengine/google/appengine/tools/bulkloader.py", line 764, in PerformWork
    transfer_time = self._TransferItem(thread_pool)
  File "/usr/local/google_appengine/google/appengine/tools/bulkloader.py", line 933, in _TransferItem
    self.content = self.request_manager.EncodeContent(self.rows)
  File "/usr/local/google_appengine/google/appengine/tools/bulkloader.py", line 1394, in EncodeContent
    entity = loader.create_entity(values, key_name=key, parent=parent)
  File "/usr/local/google_appengine/google/appengine/tools/bulkloader.py", line 2728, in create_entity
    (len(self.__properties), len(values)))
AssertionError: Expected 17 columns, found 18.
[INFO    ] [WorkerThread-5] Backing off due to errors: 1.0 seconds
[INFO    ] Unexpected thread death: WorkerThread-4
[INFO    ] An error occurred. Shutting down...
[ERROR   ] Error in WorkerThread-4: Expected 17 columns, found 18.

[INFO    ] 980 entities total, 0 previously transferred
[INFO    ] 0 entities (278 bytes) transferred in 5.9 seconds
[INFO    ] Some entities not successfully transferred

我总共有大约 4000 个实体,这里说转移了 980 个,但实际上我检查了本地数据存储区,但我没有找到它们。..

下面是我使用的加载器(我用NDB作为Guess实体)

import datetime
from google.appengine.ext import db
from google.appengine.tools import bulkloader
from google.appengine.ext.ndb import key


class Guess(db.Model):
    pass

class GuessLoader(bulkloader.Loader):
    def __init__(self):
        bulkloader.Loader.__init__(self, 'Guess',
                                   [('selectedAssociation', lambda x: x.decode('utf-8')),
                                    ('suggestionsList', lambda x: x.decode('utf-8')),
                                    ('associationIndexInList', int),                                    
                                    ('timeEntered',
                                     lambda x: datetime.datetime.strptime(x, '%m/%d/%Y').date()),
                                    ('rank', int),
                                    ('topicName', lambda x: x.decode('utf-8')),
                                    ('topic', int),
                                    ('player', int),
                                    ('game', int),
                                    ('guessString', lambda x: x.decode('utf-8')),
                                    ('guessTime',
                                     lambda x: datetime.datetime.strptime(x, '%m/%d/%Y').date()),
                                    ('accountType', lambda x: x.decode('utf-8')),
                                    ('nthGuess', int),
                                    ('score', float),
                                    ('cutByRoundEnd', bool),
                                    ('suggestionsListDelay', int),
                                    ('occurrences', float)
                                   ])

loaders = [GuessLoader]

编辑:我只是注意到错误消息[ERROR ] Error in WorkerThread-0: Expected 17 columns, found 18. 中的这一部分,而实际上我只是浏览了整个 csv 文件,并确保每行有 18 列。我检查了加载器,发现我缺少key 列,我给它一个类型int,但这不起作用。

【问题讨论】:

    标签: google-app-engine google-cloud-datastore app-engine-ndb database-backups bulkloader


    【解决方案1】:

    如果您在身份验证方面遇到问题,请将以下内容放入您的 appengine_config.py:

    if os.environ.get('SERVER_SOFTWARE','').startswith('Development'):
        remoteapi_CUSTOM_ENVIRONMENT_AUTHENTICATION = (
        'REMOTE_ADDR', ['127.0.0.1'])
    

    然后运行

    appcfg.py download_data --url=http://APPNAME.appspot.com/_ah/remote_api --filename=dump --kind=EntityName
    appcfg.py upload_data --url=http://localhost:8080/_ah/remote_api --filename=dump --application=dev~APPNAME
    

    【讨论】:

      【解决方案2】:

      尝试按Enter(无用户名/密码)。这似乎对我有用。我的命令(包装在 bash 脚本中以防止我偶尔收到的导入错误)是:

      #!/bin/bash
      
      # Modify path
      export PYTHONPATH=$PYTHONPATH:.
      
      # Load data
      python /path/to/app/config/appcfg.py upload_data \
      --config_file=<my_loader.py> \
      --filename=<output.csv> \
      --kind=<kind> \
      --application=dev~<application_id> \
      --url=http://localhost:8088/_ah/remote_api ./
      

      当提示输入Email 时,我按下回车键,所有内容都已上传到开发服务器。在这种情况下,我没有使用NDB,尽管我认为这不会产生影响。

      【讨论】:

      • @MohamedKhamis 您的 CSV 中有多少列?该错误采用Expected &lt;number of items in config&gt; columns, found &lt;number of columns in CSV. 的形式。我建议删除key,但也转到不包含数据的CSV 的第一列,按Ctrl+Space,然后按Ctrl+Shift+Right Arrow,然后实际删除这些列(Excel 2007 中的Alt+HDC,或者正好单击列标题并单击删除)。然后保存 - 可能存在“幻像”数据(空格等),导致 GAE 认为列数多于实际列数。
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2012-09-02
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-10-27
      • 1970-01-01
      相关资源
      最近更新 更多