【问题标题】:Loading quoted numbers into snowflake table from CSV with COPY TO <TABLE>使用 COPY TO <TABLE> 将引用的数字从 CSV 加载到雪花表中
【发布时间】:2020-02-07 20:45:37
【问题描述】:

我在将 CSV 数据加载到雪花表时遇到问题。字段用双引号括起来,因此将它们导入表格时会出现问题。

我知道 COPY TO 具有 CSV 特定选项 FIELD_OPTIONALLY_ENCLOSED_BY = '"' 但它根本不起作用。

这里是一些表定义和复制命令的图片:

CREATE TABLE ...
(
GamePlayId NUMBER NOT NULL,
etc...
....);


COPY INTO ...
     FROM ...csv.gz'
FILE_FORMAT = (TYPE = CSV 
               STRIP_NULL_VALUES = TRUE 
               FIELD_DELIMITER = ',' 
               SKIP_HEADER = 1  
               error_on_column_count_mismatch=false 
               FIELD_OPTIONALLY_ENCLOSED_BY = '"'
              )
ON_ERROR = "ABORT_STATEMENT"
;

CSV 文件如下所示:

"3922000","14733370","57256","2","3","2","2","2019-05-23 14:14:44",",00000000",",00000000",",00000000",",00000000","1000,00000000","1000,00000000","1317,50400000","1166,50000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000"

我收到一个错误

'''Numeric value '"3922000"' is not recognized '''

我很确定这是因为当雪花读取“”标记时,NUMBER 值被解释为字符串,但是因为我使用了

FIELD_OPTIONALLY_ENCLOSED_BY = '"' 

它甚至不应该在那里......有没有人对此有一些解决方案?

【问题讨论】:

  • 通常情况下,数字和日期字段不会像这样引用。引用字段的目的是允许字段分隔符或记录分隔符包含在字段中,因此不应有这样做的理由。您可以导出文件以便不引用数字和日期时间字段吗?

标签: csv import snowflake-cloud-data-platform


【解决方案1】:

您的文件可能有问题?我只是能够毫无问题地运行以下命令。

1. create the test table:
CREATE OR REPLACE TABLE 
    dbNameHere.schemaNameHere.stacko_58322339 (
    num1    NUMBER,  
    num2    NUMBER, 
    num3    NUMBER);

2. create test file, contents as follows 
1,2,3
"3922000","14733370","57256"
3,"2",1
4,5,"6"

3. create stage and put file in stage 

4. run the following copy command
COPY INTO dbNameHere.schemaNameHere.STACKO_58322339
     FROM @stageNameHere/stacko_58322339.csv.gz
FILE_FORMAT = (TYPE = CSV 
               STRIP_NULL_VALUES = TRUE 
               FIELD_DELIMITER = ',' 
               SKIP_HEADER = 0  
               ERROR_ON_COLUMN_COUNT_MISMATCH=FALSE 
               FIELD_OPTIONALLY_ENCLOSED_BY = '"'
              )
ON_ERROR = "CONTINUE";

4. results 
+-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
| file                                                | status | rows_parsed | rows_loaded | error_limit | errors_seen | first_error | first_error_line | first_error_character | first_error_column_name |
|-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------|
| stageNameHere/stacko_58322339.csv.gz | LOADED |           4 |           4 |           4 |           0 | NULL        |             NULL |                  NULL | NULL                    |
+-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
1 Row(s) produced. Time Elapsed: 2.436s

5. view the records
>SELECT * FROM dbNameHere.schemaNameHere.stacko_58322339;
+---------+----------+-------+                                                  
|    NUM1 |     NUM2 |  NUM3 |
|---------+----------+-------|
|       1 |        2 |     3 |
| 3922000 | 14733370 | 57256 |
|       3 |        2 |     1 |
|       4 |        5 |     6 |
+---------+----------+-------+

你能试试类似的测试吗?

编辑:快速查看您的数据显示您的许多数字字段似乎以逗号开头,因此数据肯定有问题。

【讨论】:

  • 猜测它是一个国家的“浮动”,小数点为
  • 除了浮点数使用非美式/英式十进制逗号外,输入没有任何问题。更改为小数点,数据加载没有问题。不支持十进制逗号是很不寻常的,因为这是 ISO 数字标准中的替代方案之一:O
【解决方案2】:

假设您的数字是欧洲格式的, 小数位和. 千位,阅读numeric formating 帮助,Snowflake 似乎不支持将其作为输入。我会打开一个功能请求。

但是,如果您以text 的形式阅读该列,则使用REPLACE 之类的

SELECT '100,1234'::text as A
    ,REPLACE(A,',','.') as B
    ,TRY_TO_DECIMAL(b, 20,10 ) as C;

给予:

A         B         C
100,1234  100.1234  100.1234000000

更安全的方法是先去掉占位符

SELECT '1.100,1234'::text as A
  ,REPLACE(A,'.') as B
  ,REPLACE(B,',','.') as C
  ,TRY_TO_DECIMAL(C, 20,10 ) as D;

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2020-11-13
    • 2020-04-10
    • 1970-01-01
    • 2020-09-27
    • 2021-09-07
    • 2021-06-27
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多