【问题标题】:Mismatch in number of rows imported into cassandra table (COPY command)导入 cassandra 表的行数不匹配(COPY 命令)
【发布时间】:2018-11-20 17:23:29
【问题描述】:

我正在尝试使用 COPY 命令将 csv 文件转储到 cassandra 表中。但是,我的 csv 文件中的行数和 cassandra 中的行数并不一致。

CSV 文件的行数:49765(不包括标题)

cassandra 表的行数:

cqlsh:test_df> select Count(*) from test_table;

 count
-------
 46982

(1 rows)

Warnings :
Aggregation query used without partition key

复制命令:

COPY test_table (column1,column2,column3) from 'temp.csv'  with delimiter = ',' and header = True;

错误:

Starting copy of test_df.test_bhavcopy with columns [symbol, instrument, expiry_dt, strike_pr, option_typ, open, high, low, close, settle_pr, contracts, val_inlakh, open_int, ch_in_oi, price_date, key].
Process ImportProcess-3:ate:    8387 rows/s; Avg. rate:    3937 rows/s
Traceback (most recent call last):
P rocess ImportProcess-2:
 File "X:\Anaconda\lib\multiprocessing\process.py", line 267, in _bootstrap
Traceback (most recent call last):
Process ImportProcess-1:
T raceback (most recent call last):
  File "X:\Anaconda\lib\multiprocessing\process.py", line 267, in _bootstrap
 File "X:\Anaconda\lib\multiprocessing\process.py", line 267, in _bootstrap
    self.run()
    File "X:\apache-cassandra-3.11.3\bin\..\pylib\cqlshlib\copyutil.py", line 2328, in run
   self.run()
   self.run()
   File "X:\apache-cassandra-3.11.3\bin\..\pylib\cqlshlib\copyutil.py", line 2328, in run
 File "X:\apache-cassandra-3.11.3\bin\..\pylib\cqlshlib\copyutil.py", line 2328, in run
    self.close()
  File "X:\apache-cassandra-3.11.3\bin\..\pylib\cqlshlib\copyutil.py", line 2332, in close
    self._session.cluster.shutdown()
      self.close()
 File "X:\apache-cassandra-3.11.3\bin\..\lib\cassandra-driver-internal-only-3.11.0-bb96859b.zip\cassandra-driver-3.11.0-bb96859b\cassandra\cluster.py", line 1259, in shutdown
   self.close()
   File "X:\apache-cassandra-3.11.3\bin\..\pylib\cqlshlib\copyutil.py", line 2332, in close
 File "X:\apache-cassandra-3.11.3\bin\..\pylib\cqlshlib\copyutil.py", line 2332, in close
     self._session.cluster.shutdown()
   self._session.cluster.shutdown()
   File "X:\apache-cassandra-3.11.3\bin\..\lib\cassandra-driver-internal-only-3.11.0-bb96859b.zip\cassandra-driver-3.11.0-bb96859b\cassandra\cluster.py", line 1259, in shutdown
 File "X:\apache-cassandra-3.11.3\bin\..\lib\cassandra-driver-internal-only-3.11.0-bb96859b.zip\cassandra-driver-3.11.0-bb96859b\cassandra\cluster.py", line 1259, in shutdown
    self.control_connection.shutdown()
  File "X:\apache-cassandra-3.11.3\bin\..\lib\cassandra-driver-internal-only-3.11.0-bb96859b.zip\cassandra-driver-3.11.0-bb96859b\cassandra\cluster.py", line 2850, in shutdown
    self._connection.close()
  File "X:\apache-cassandra-3.11.3\bin\..\lib\cassandra-driver-internal-only-3.11.0-bb96859b.zip\cassandra-driver-3.11.0-bb96859b\cassandra\io\asyncorereactor.py", line 373, in close
    AsyncoreConnection.create_timer(0, partial(asyncore.dispatcher.close, self))
  File "X:\apache-cassandra-3.11.3\bin\..\lib\cassandra-driver-internal-only-3.11.0-bb96859b.zip\cassandra-driver-3.11.0-bb96859b\cassandra\io\asyncorereactor.py", line 335, in create_timer
    cls._loop.add_timer(timer)
A ttributeError: 'NoneType' object has no attribute 'add_timer'
   self.control_connection.shutdown()
   File "X:\apache-cassandra-3.11.3\bin\..\lib\cassandra-driver-internal-only-3.11.0-bb96859b.zip\cassandra-driver-3.11.0-bb96859b\cassandra\cluster.py", line 2850, in shutdown
   self.control_connection.shutdown()
     self._connection.close()
 File "X:\apache-cassandra-3.11.3\bin\..\lib\cassandra-driver-internal-only-3.11.0-bb96859b.zip\cassandra-driver-3.11.0-bb96859b\cassandra\cluster.py", line 2850, in shutdown
   File "X:\apache-cassandra-3.11.3\bin\..\lib\cassandra-driver-internal-only-3.11.0-bb96859b.zip\cassandra-driver-3.11.0-bb96859b\cassandra\io\asyncorereactor.py", line 373, in close
   self._connection.close()
  File "X:\apache-cassandra-3.11.3\bin\..\lib\cassandra-driver-internal-only-3.11.0-bb96859b.zip\cassandra-driver-3.11.0-bb96859b\cassandra\io\asyncorereactor.py", line 373, in close
    AsyncoreConnection.create_timer(0, partial(asyncore.dispatcher.close, self))
     AsyncoreConnection.create_timer(0, partial(asyncore.dispatcher.close, self))
 File "X:\apache-cassandra-3.11.3\bin\..\lib\cassandra-driver-internal-only-3.11.0-bb96859b.zip\cassandra-driver-3.11.0-bb96859b\cassandra\io\asyncorereactor.py", line 335, in create_timer
   File "X:\apache-cassandra-3.11.3\bin\..\lib\cassandra-driver-internal-only-3.11.0-bb96859b.zip\cassandra-driver-3.11.0-bb96859b\cassandra\io\asyncorereactor.py", line 335, in create_timer
   cls._loop.add_timer(timer)
 A   cls._loop.add_timer(timer)
ttributeError: 'NoneType' object has no attribute 'add_timer'
AttributeError: 'NoneType' object has no attribute 'add_timer'
Processed: 49765 rows; Rate:    4193 rows/s; Avg. rate:    3906 rows/s
49765 rows imported from 1 files in 12.742 seconds (0 skipped).

可能是因为这个错误。

【问题讨论】:

  • 您的主键定义可能会导致覆盖某些行...
  • 我做了一些谷歌搜索,似乎是一些 cassandra-driver 问题 AttributeError: 'NoneType' object has no attribute 'add_timer' 但不确定如何解决这个问题
  • 你有什么版本的 Python? cqlsh 仅适用于 2.x
  • 是的,我正在使用 python 2.7
  • @AlexOtt 是的,你是对的,谢谢

标签: cassandra cql


【解决方案1】:

找到了解决办法: 我在

中编辑了我的asyncorereactor.py
cassandra-driver-internal-only-3.11.0-bb96859b.zip/cassandra-driver-3.11.0-bb96859b/cassandra/io/asyncorereactor.py

按照本文中的建议从 AsyncoreConnection.create_timer() 到 self.create_timer()

https://datastax-oss.atlassian.net/browse/PYTHON-862?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel

【讨论】:

    猜你喜欢
    • 2014-07-02
    • 1970-01-01
    • 2017-11-14
    • 2015-10-10
    • 1970-01-01
    • 2021-10-26
    • 1970-01-01
    • 1970-01-01
    • 2020-08-15
    相关资源
    最近更新 更多