posenet 返回什么？答案

【问题标题】：What does posenet return?posenet 返回什么？
【发布时间】：2019-09-28 21:34:31
【问题描述】：

我正在开发一个项目，该项目将图像作为输入读取并显示和输出图像。输出图像包含一些线条来指示人体骨骼。我正在使用来自 tensorflow-lite 的姿态估计模型：

https://www.tensorflow.org/lite/models/pose_estimation/overview

我已阅读文档，它显示输出包含一个 4 维数组。我尝试使用 netron 来可视化我的模型文件，它看起来像这样：

我成功地从输入中获取了结果热图，但我遇到了一个问题，即所有浮点数都是负数。这让我很困惑，我不确定我是否做错了什么或如何理解这些输出。

这是输出的代码

            tfLite = new Interpreter(loadModelFile());
            Bitmap inputPhoto = BitmapFactory.decodeResource(getResources(), R.drawable.human2);
            inputPhoto = Bitmap.createScaledBitmap(inputPhoto, INPUT_SIZE_X, INPUT_SIZE_Y, false);
            inputPhoto = inputPhoto.copy(Bitmap.Config.ARGB_8888, true);

            int pixels[] = new int[INPUT_SIZE_X * INPUT_SIZE_Y];

            inputPhoto.getPixels(pixels, 0, INPUT_SIZE_X, 0, 0, INPUT_SIZE_X, INPUT_SIZE_Y);

            int pixelsIndex = 0;

            for (int i = 0; i < INPUT_SIZE_X; i ++) {
                for (int j = 0; j < INPUT_SIZE_Y; j++) {
                    int p = pixels[pixelsIndex];
                    inputData[0][i][j][0] = (p >> 16) & 0xff;
                    inputData[0][i][j][1] = (p >> 8) & 0xff;
                    inputData[0][i][j][2] = (p) & 0xff;
                    pixelsIndex ++;
                }
            }

            float outputData[][][][] = new float[1][23][17][17];

            tfLite.run(inputData, outputData);

输出是一个数组 [1][23][17][17]，它都是负数。那么有没有人知道这可以帮助我:(

非常感谢！

【问题讨论】：

标签： java android tensorflow image-processing tensorflow-lite

【解决方案1】：

这篇文章今天很活跃，所以我发布了一个迟到的答案，对此深表歉意。
你应该检查Posenet.kt file。在那里你可以看到一个非常详细的文档代码。你可以看看这是怎么回事：

初始化 1 * x * y * z FloatArrays 的 outputMap 以供模型处理填充。 */

private fun initOutputMap(interpreter: Interpreter): HashMap<Int, Any> {
    val outputMap = HashMap<Int, Any>()

    // 1 * 9 * 9 * 17 contains heatmaps
    val heatmapsShape = interpreter.getOutputTensor(0).shape()
    outputMap[0] = Array(heatmapsShape[0]) {
      Array(heatmapsShape[1]) {
        Array(heatmapsShape[2]) { FloatArray(heatmapsShape[3]) }
      }
    }

    // 1 * 9 * 9 * 34 contains offsets
    val offsetsShape = interpreter.getOutputTensor(1).shape()
    outputMap[1] = Array(offsetsShape[0]) {
      Array(offsetsShape[1]) { Array(offsetsShape[2]) { FloatArray(offsetsShape[3]) } }
    }

    // 1 * 9 * 9 * 32 contains forward displacements
    val displacementsFwdShape = interpreter.getOutputTensor(2).shape()
    outputMap[2] = Array(offsetsShape[0]) {
      Array(displacementsFwdShape[1]) {
        Array(displacementsFwdShape[2]) { FloatArray(displacementsFwdShape[3]) }
      }
    }

    // 1 * 9 * 9 * 32 contains backward displacements
    val displacementsBwdShape = interpreter.getOutputTensor(3).shape()
    outputMap[3] = Array(displacementsBwdShape[0]) {
      Array(displacementsBwdShape[1]) {
        Array(displacementsBwdShape[2]) { FloatArray(displacementsBwdShape[3]) }
      }
    }

    return outputMap
    }

当然还有输出如何转换为屏幕上的点：

/**
 * Estimates the pose for a single person.
 * args:
 *      bitmap: image bitmap of frame that should be processed
 * returns:
 *      person: a Person object containing data about keypoint locations and confidence scores
 */
fun estimateSinglePose(bitmap: Bitmap): Person {
    val estimationStartTimeNanos = SystemClock.elapsedRealtimeNanos()
    val inputArray = arrayOf(initInputArray(bitmap))
    Log.i(
        "posenet",
        String.format(
            "Scaling to [-1,1] took %.2f ms",
            1.0f * (SystemClock.elapsedRealtimeNanos() - estimationStartTimeNanos) / 1_000_000
        )
    )

    val outputMap = initOutputMap(getInterpreter())

    val inferenceStartTimeNanos = SystemClock.elapsedRealtimeNanos()
    getInterpreter().runForMultipleInputsOutputs(inputArray, outputMap)
    lastInferenceTimeNanos = SystemClock.elapsedRealtimeNanos() - inferenceStartTimeNanos
    Log.i(
        "posenet",
        String.format("Interpreter took %.2f ms", 1.0f * lastInferenceTimeNanos / 1_000_000)
    )

    val heatmaps = outputMap[0] as Array<Array<Array<FloatArray>>>
    val offsets = outputMap[1] as Array<Array<Array<FloatArray>>>

    val height = heatmaps[0].size
    val width = heatmaps[0][0].size
    val numKeypoints = heatmaps[0][0][0].size

    // Finds the (row, col) locations of where the keypoints are most likely to be.
    val keypointPositions = Array(numKeypoints) { Pair(0, 0) }
    for (keypoint in 0 until numKeypoints) {
        var maxVal = heatmaps[0][0][0][keypoint]
        var maxRow = 0
        var maxCol = 0
        for (row in 0 until height) {
            for (col in 0 until width) {
                if (heatmaps[0][row][col][keypoint] > maxVal) {
                    maxVal = heatmaps[0][row][col][keypoint]
                    maxRow = row
                    maxCol = col
                }
            }
        }
        keypointPositions[keypoint] = Pair(maxRow, maxCol)
    }

    // Calculating the x and y coordinates of the keypoints with offset adjustment.
    val xCoords = IntArray(numKeypoints)
    val yCoords = IntArray(numKeypoints)
    val confidenceScores = FloatArray(numKeypoints)
    keypointPositions.forEachIndexed { idx, position ->
        val positionY = keypointPositions[idx].first
        val positionX = keypointPositions[idx].second
        yCoords[idx] = (
                position.first / (height - 1).toFloat() * bitmap.height +
                        offsets[0][positionY][positionX][idx]
                ).toInt()
        xCoords[idx] = (
                position.second / (width - 1).toFloat() * bitmap.width +
                        offsets[0][positionY]
                                [positionX][idx + numKeypoints]
                ).toInt()
        confidenceScores[idx] = sigmoid(heatmaps[0][positionY][positionX][idx])
    }

    val person = Person()
    val keypointList = Array(numKeypoints) { KeyPoint() }
    var totalScore = 0.0f
    enumValues<BodyPart>().forEachIndexed { idx, it ->
        keypointList[idx].bodyPart = it
        keypointList[idx].position.x = xCoords[idx]
        keypointList[idx].position.y = yCoords[idx]
        keypointList[idx].score = confidenceScores[idx]
        totalScore += confidenceScores[idx]
    }

    person.keyPoints = keypointList.toList()
    person.score = totalScore / numKeypoints

    return person
}

整个 .kt 文件是位图到屏幕上点的核心！

如果你还有什么需要请加我。

愉快的编码

【讨论】：

感谢您的回复，并为我迟到的回复感到抱歉，因为过了一段时间我的项目仍然存在并没有完成，也许我稍后会回来。无论如何，这个例子很棒，我想我可以让它工作。我会努力付出更多的努力xD