如何使用 Datastax 批量加载程序（Ubuntu）将数据加载到 Apache Cassandra？答案

【问题标题】：How to load data into Apache Cassandra with Datastax Bulk loader (Ubuntu)?如何使用 Datastax 批量加载程序（Ubuntu）将数据加载到 Apache Cassandra？
【发布时间】：2020-11-05 10:49:08
【问题描述】：

当我想将数据上传到我的“测试集群”到 Apache Cassandra 中时，我打开终端，然后：

export PATH=/home/mypc/dsbulk-1.7.0/bin:$PATH

source ~/.bashrc

dsbulk load -url /home/mypc/Desktop/test/file.csv -k keyspace_test -t table_test

但是……

At least 1 record does not match the provided schema.mapping or schema.query. Please check that the connector configuration and the schema configuration are correct.
Operation LOAD_20201105-103000-577734 aborted: Too many errors, the maximum allowed is 100.

total | failed | rows/s | p50ms | p99ms | p999ms | batches
  104 |    104 |      0 |  0,00 |  0,00 |   0,00 |    0,00

Rejected records can be found in the following file(s): mapping.bad
Errors are detailed in the following file(s): mapping-errors.log
Last processed positions can be found in positions.txt

这是什么意思？为什么我无法加载？

谢谢！

【问题讨论】：

标签： ubuntu cassandra datastax bulkloader

【解决方案1】：

错误是您没有提供 CSV 数据和表格之间的映射。它可以通过两种方式完成：

如果 CSV 文件的标题与 Cassandra 中的列名匹配，则使用 -header true
使用-m 选项显式提供映射（请参阅docs） - 您需要将 CSV 列映射到 Cassandra 列。

关于 DSBulk 使用的不同方面，有一系列非常好的博客文章：

前两个详细介绍了数据加载

【讨论】：

【解决方案2】：

这意味着 CSV 输入文件中的列与您的 table_test 表中的列不匹配。您可以在mapping-errors.log 中获取架构不匹配的详细信息，以便了解哪些列存在问题。

由于 CSV 列与表架构不匹配，您需要通过指定 --schema.mapping 标志手动映射它们。

有关详细信息，请参阅DSBulk Common options 页面。您还可以查看this blog post 中的模式映射示例。干杯！

【讨论】：

我添加了 -delim "," -header true -m '0=col1, 1=col2'
如果你有 -header 那么你可能不需要 -m