【问题标题】:How to fix Data Lake Analytics script如何修复数据湖分析脚本
【发布时间】:2018-07-03 04:22:46
【问题描述】:

我想将 Azure 数据工厂与 Azure Data Lake Analytics 作为操作,但没有成功。

这是我的管道脚本

{
"name": "UsageStatistivsPipeline",
"properties": {
    "description": "Standardize JSON data into CSV, with friendly column names & consistent output for all event types. Creates one output (standardized) file per day.",
    "activities": [{
            "name": "UsageStatisticsActivity",
            "type": "DataLakeAnalyticsU-SQL",
            "linkedServiceName": {
                "referenceName": "DataLakeAnalytics",
                "type": "LinkedServiceReference"
            },
            "typeProperties": {
                "scriptLinkedService": {
                    "referenceName": "BlobStorage",
                    "type": "LinkedServiceReference"
                },
                "scriptPath": "adla-scripts/usage-statistics-adla-script.json",
                "degreeOfParallelism": 30,
                "priority": 100,
                "parameters": {
                    "sourcefile": "wasb://nameofblob.blob.core.windows.net/$$Text.Format('{0:yyyy}/{0:MM}/{0:dd}/0_647de4764587459ea9e0ce6a73e9ace7_2.json', SliceStart)",
                    "destinationfile": "$$Text.Format('wasb://nameofblob.blob.core.windows.net/{0:yyyy}/{0:MM}/{0:dd}/DailyResult.csv', SliceStart)"
                }
            },
            "inputs": [{
                    "type": "DatasetReference",
                    "referenceName": "DirectionsData"
                }
            ],
            "outputs": [{
                    "type": "DatasetReference",
                    "referenceName": "OutputData"
                }
            ],
            "policy": {
                "timeout": "06:00:00",
                "concurrency": 10,
                "executionPriorityOrder": "NewestFirst"
            }
        }
    ],
    "start": "2018-01-08T00:00:00Z",
    "end": "2017-01-09T00:00:00Z",
    "isPaused": false,
    "pipelineMode": "Scheduled"
}}

我有两个参数变量sourcefiledestinationfile,它们是动态的(路径来自日期)。

然后我有这个 ADLA 脚本可以执行。

REFERENCE ASSEMBLY master.[Newtonsoft.Json];
REFERENCE ASSEMBLY master.[Microsoft.Analytics.Samples.Formats]; 

USING Microsoft.Analytics.Samples.Formats.Json;

@Data = 
    EXTRACT 
        jsonstring string
    FROM @sourcefile
    USING Extractors.Tsv(quoting:false);


@CreateJSONTuple = 
    SELECT 
        JsonFunctions.JsonTuple(jsonstring) AS EventData 
    FROM 
        @Data;

@records = 
    SELECT
        JsonFunctions.JsonTuple(EventData["records"], "[*].*") AS record
    FROM 
        @CreateJSONTuple;

@properties =
    SELECT 
        JsonFunctions.JsonTuple(record["[0].properties"]) AS prop,
        record["[0].time"] AS time
    FROM 
        @records;

@result =
    SELECT 
        ...
    FROM @properties;


OUTPUT @result
TO @destinationfile
USING Outputters.Csv(outputHeader:false,quoting:true);

作业执行失败,错误为:

编辑:

看来,Text.Format 没有被执行并像字符串一样传递到脚本中......然后在 Data Lake Analytics Job 中的详细信息是这样的:

DECLARE @sourcefile string = "$$Text.Format('wasb://nameofblob.blob.core.windows.net/{0:yyyy}/{0:MM}/{0:dd}/0_647de4764587459ea9e0ce6a73e9ace7_2.json', SliceStart)";

【问题讨论】:

    标签: azure-data-factory azure-data-lake


    【解决方案1】:

    在您的代码示例中,sourcefile 参数的定义方式与destinationfile 不同。后者似乎是正确的,而前者则不然。

    整个字符串都应该包含在 $$Text.Format() 中:

    "paramName" : "$$Text.Format('...{0:pattern}...', param)"
    

    还可以考虑只传递格式化的日期,如下所示:

    "sliceStart": "$$Text.Format('{0:yyyy-MM-dd}', SliceStart)"
    

    然后在 U-SQL 中完成剩下的工作:

    DECLARE @sliceStartDate DateTime = DateTime.Parse(@sliceStart);
    
    DECLARE @path string = String.Format("wasb://path/to/file/{0:yyyy}/{0:MM}/{0:dd}/file.csv", @sliceStartDate);
    

    希望对你有帮助

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-10-02
      • 1970-01-01
      • 2018-11-26
      • 2016-04-23
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多