【问题标题】:No data in Neptune after bulk load批量加载后 Neptune 中没有数据
【发布时间】:2021-07-12 00:57:42
【问题描述】:

从 S3 将大量数据加载到 Neptune 后,我在数据库中看不到任何顶点。这是我的装载机状态:

curl -G 'https://**.amazonaws.com:8182/loader/**?details=true&errors=true'
^[[A{
    "status" : "200 OK",
    "payload" : {
        "feedCount" : [
            {
                "LOAD_FAILED" : 1
            }
        ],
        "overallStatus" : {
            "fullUri" : "s3://**.nt",
            "runNumber" : 1,
            "retryNumber" : 0,
            "status" : "LOAD_FAILED",
            "totalTimeSpent" : 13035,
            "startTime" : 1626033369,
            "totalRecords" : 1745612081,
            "totalDuplicates" : 3580674,
            "parsingErrors" : 22,
            "datatypeMismatchErrors" : 0,
            "insertErrors" : 0
        },
        "failedFeeds" : [
            {
                "fullUri" : "s3://**.nt",
                "runNumber" : 1,
                "retryNumber" : 0,
                "status" : "LOAD_FAILED",
                "totalTimeSpent" : 13032,
                "startTime" : 1626033372,
                "totalRecords" : 1745612081,
                "totalDuplicates" : 3580674,
                "parsingErrors" : 22,
                "datatypeMismatchErrors" : 0,
                "insertErrors" : 0
            }
        ],
        "errors" : {
            "startIndex" : 1,
            "endIndex" : 10,
            "loadId" : "**",
            "errorLogs" : [
                {
                    "errorCode" : "PARSING_ERROR",
                    "errorMessage" : "IRI includes string escapes: '\\92'",
                    "fileName" : "s3://**.nt",
                    "recordNum" : 195142350
                },
                {
                    "errorCode" : "PARSING_ERROR",
                    "errorMessage" : "IRI includes string escapes: '\\92'",
                    "fileName" : "s3://**.nt",
                    "recordNum" : 213781671
                },
                {
                    "errorCode" : "PARSING_ERROR",
                    "errorMessage" : "IRI includes string escapes: '\\92'",
                    "fileName" : "s3://**.nt",
                    "recordNum" : 223606399
                },
                {
                    "errorCode" : "PARSING_ERROR",
                    "errorMessage" : "IRI includes string escapes: '\\92'",
                    "fileName" : "s3://**.nt",
                    "recordNum" : 237802811
                },
                {
                    "errorCode" : "PARSING_ERROR",
                    "errorMessage" : "IRI includes string escapes: '\\92'",
                    "fileName" : "s3://**.nt",
                    "recordNum" : 459805351
                },
                {
                    "errorCode" : "PARSING_ERROR",
                    "errorMessage" : "IRI includes string escapes: '\\92'",
                    "fileName" : "s3://**.nt",
                    "recordNum" : 603488680
                },
                {
                    "errorCode" : "PARSING_ERROR",
                    "errorMessage" : "IRI includes string escapes: '\\92'",
                    "fileName" : "s3://**.nt",
                    "recordNum" : 644623634
                },
                {
                    "errorCode" : "PARSING_ERROR",
                    "errorMessage" : "IRI includes string escapes: '\\92'",
                    "fileName" : "s3://**.nt",
                    "recordNum" : 696970927
                },
                {
                    "errorCode" : "PARSING_ERROR",
                    "errorMessage" : "IRI includes string escapes: '\\92'",
                    "fileName" : "s3://**.nt",
                    "recordNum" : 700557784
                },
                {
                    "errorCode" : "PARSING_ERROR",
                    "errorMessage" : "IRI includes string escapes: '\\92'",
                    "fileName" : "s3://**.nt",
                    "recordNum" : 714098924
                }
            ]
        }
    }
}

如您所见,它提到我有 22 个解析错误和大约 1.7B 条记录。我可以假设,因为我在请求中设置了 "failOnError" : "FALSE",,数据库应该一切正常,但 22 项我完全可以接受。

此时,我确定数据库在那里,但运行一个简单的查询后,我什么也看不到:

curl -G "https://**.amazonaws.com:8182?gremlin=g.V().count()"

{"requestId":"**","status":{"message":"","code":200,"attributes":{"@type":"g:Map","@value":[]}},"result":{"data":{"@type":"g:List","@value":[{"@type":"g:Int64","@value":0}]},"meta":{"@type":"g:Map","@value":[]}}}

【问题讨论】:

  • 看起来好像您加载了 RDF 数据,但尝试使用 Gremlin 将其读回。对于 RDF 数据,您将需要使用 SPARQL。 SELECT ?s ?p ?o where {?s ?p ?o } LIMIT 1 返回什么?

标签: amazon-web-services amazon-neptune


【解决方案1】:

看起来好像您加载了 RDF 数据(N-Triples 格式)。必须使用 SPARQL 和 Amazon Neptune 查询 RDF 数据。 Gremlin 只能用于属性图数据(使用批量加载程序作为 CSV 文件加载)。要验证您是否有一些数据,请尝试使用 SPARQL 查询,例如:

SELECT ?s ?p ?o where {?s ?p ?o } LIMIT 1

【讨论】:

  • SPARQL接口确实是这样!这有点没有意义,因为这些应该只是查询接口。
猜你喜欢
  • 2021-08-26
  • 2023-03-15
  • 2022-01-17
  • 1970-01-01
  • 2020-06-21
  • 1970-01-01
  • 1970-01-01
  • 2021-08-25
  • 1970-01-01
相关资源
最近更新 更多