【问题标题】:What does posenet return?posenet 返回什么?
【发布时间】:2019-09-28 21:34:31
【问题描述】:

我正在开发一个项目,该项目将图像作为输入读取并显示和输出图像。输出图像包含一些线条来指示人体骨骼。我正在使用来自 tensorflow-lite 的姿态估计模型:

https://www.tensorflow.org/lite/models/pose_estimation/overview

我已阅读文档,它显示输出包含一个 4 维数组。我尝试使用 netron 来可视化我的模型文件,它看起来像这样:

我成功地从输入中获取了结果热图,但我遇到了一个问题,即所有浮点数都是负数。这让我很困惑,我不确定我是否做错了什么或如何理解这些输出。

这是输出的代码

            tfLite = new Interpreter(loadModelFile());
            Bitmap inputPhoto = BitmapFactory.decodeResource(getResources(), R.drawable.human2);
            inputPhoto = Bitmap.createScaledBitmap(inputPhoto, INPUT_SIZE_X, INPUT_SIZE_Y, false);
            inputPhoto = inputPhoto.copy(Bitmap.Config.ARGB_8888, true);

            int pixels[] = new int[INPUT_SIZE_X * INPUT_SIZE_Y];

            inputPhoto.getPixels(pixels, 0, INPUT_SIZE_X, 0, 0, INPUT_SIZE_X, INPUT_SIZE_Y);

            int pixelsIndex = 0;

            for (int i = 0; i < INPUT_SIZE_X; i ++) {
                for (int j = 0; j < INPUT_SIZE_Y; j++) {
                    int p = pixels[pixelsIndex];
                    inputData[0][i][j][0] = (p >> 16) & 0xff;
                    inputData[0][i][j][1] = (p >> 8) & 0xff;
                    inputData[0][i][j][2] = (p) & 0xff;
                    pixelsIndex ++;
                }
            }

            float outputData[][][][] = new float[1][23][17][17];

            tfLite.run(inputData, outputData);

输出是一个数组 [1][23][17][17],它都是负数。那么有没有人知道这可以帮助我:(

非常感谢!

【问题讨论】:

    标签: java android tensorflow image-processing tensorflow-lite


    【解决方案1】:

    这篇文章今天很活跃,所以我发布了一个迟到的答案,对此深表歉意。
    你应该检查Posenet.kt file。在那里你可以看到一个非常详细的文档代码。你可以看看这是怎么回事:

    初始化 1 * x * y * z FloatArrays 的 outputMap 以供模型处理填充。 */

    private fun initOutputMap(interpreter: Interpreter): HashMap<Int, Any> {
        val outputMap = HashMap<Int, Any>()
    
        // 1 * 9 * 9 * 17 contains heatmaps
        val heatmapsShape = interpreter.getOutputTensor(0).shape()
        outputMap[0] = Array(heatmapsShape[0]) {
          Array(heatmapsShape[1]) {
            Array(heatmapsShape[2]) { FloatArray(heatmapsShape[3]) }
          }
        }
    
        // 1 * 9 * 9 * 34 contains offsets
        val offsetsShape = interpreter.getOutputTensor(1).shape()
        outputMap[1] = Array(offsetsShape[0]) {
          Array(offsetsShape[1]) { Array(offsetsShape[2]) { FloatArray(offsetsShape[3]) } }
        }
    
        // 1 * 9 * 9 * 32 contains forward displacements
        val displacementsFwdShape = interpreter.getOutputTensor(2).shape()
        outputMap[2] = Array(offsetsShape[0]) {
          Array(displacementsFwdShape[1]) {
            Array(displacementsFwdShape[2]) { FloatArray(displacementsFwdShape[3]) }
          }
        }
    
        // 1 * 9 * 9 * 32 contains backward displacements
        val displacementsBwdShape = interpreter.getOutputTensor(3).shape()
        outputMap[3] = Array(displacementsBwdShape[0]) {
          Array(displacementsBwdShape[1]) {
            Array(displacementsBwdShape[2]) { FloatArray(displacementsBwdShape[3]) }
          }
        }
    
        return outputMap
        }
    

    当然还有输出如何转换为屏幕上的点:

    /**
     * Estimates the pose for a single person.
     * args:
     *      bitmap: image bitmap of frame that should be processed
     * returns:
     *      person: a Person object containing data about keypoint locations and confidence scores
     */
    fun estimateSinglePose(bitmap: Bitmap): Person {
        val estimationStartTimeNanos = SystemClock.elapsedRealtimeNanos()
        val inputArray = arrayOf(initInputArray(bitmap))
        Log.i(
            "posenet",
            String.format(
                "Scaling to [-1,1] took %.2f ms",
                1.0f * (SystemClock.elapsedRealtimeNanos() - estimationStartTimeNanos) / 1_000_000
            )
        )
    
        val outputMap = initOutputMap(getInterpreter())
    
        val inferenceStartTimeNanos = SystemClock.elapsedRealtimeNanos()
        getInterpreter().runForMultipleInputsOutputs(inputArray, outputMap)
        lastInferenceTimeNanos = SystemClock.elapsedRealtimeNanos() - inferenceStartTimeNanos
        Log.i(
            "posenet",
            String.format("Interpreter took %.2f ms", 1.0f * lastInferenceTimeNanos / 1_000_000)
        )
    
        val heatmaps = outputMap[0] as Array<Array<Array<FloatArray>>>
        val offsets = outputMap[1] as Array<Array<Array<FloatArray>>>
    
        val height = heatmaps[0].size
        val width = heatmaps[0][0].size
        val numKeypoints = heatmaps[0][0][0].size
    
        // Finds the (row, col) locations of where the keypoints are most likely to be.
        val keypointPositions = Array(numKeypoints) { Pair(0, 0) }
        for (keypoint in 0 until numKeypoints) {
            var maxVal = heatmaps[0][0][0][keypoint]
            var maxRow = 0
            var maxCol = 0
            for (row in 0 until height) {
                for (col in 0 until width) {
                    if (heatmaps[0][row][col][keypoint] > maxVal) {
                        maxVal = heatmaps[0][row][col][keypoint]
                        maxRow = row
                        maxCol = col
                    }
                }
            }
            keypointPositions[keypoint] = Pair(maxRow, maxCol)
        }
    
        // Calculating the x and y coordinates of the keypoints with offset adjustment.
        val xCoords = IntArray(numKeypoints)
        val yCoords = IntArray(numKeypoints)
        val confidenceScores = FloatArray(numKeypoints)
        keypointPositions.forEachIndexed { idx, position ->
            val positionY = keypointPositions[idx].first
            val positionX = keypointPositions[idx].second
            yCoords[idx] = (
                    position.first / (height - 1).toFloat() * bitmap.height +
                            offsets[0][positionY][positionX][idx]
                    ).toInt()
            xCoords[idx] = (
                    position.second / (width - 1).toFloat() * bitmap.width +
                            offsets[0][positionY]
                                    [positionX][idx + numKeypoints]
                    ).toInt()
            confidenceScores[idx] = sigmoid(heatmaps[0][positionY][positionX][idx])
        }
    
        val person = Person()
        val keypointList = Array(numKeypoints) { KeyPoint() }
        var totalScore = 0.0f
        enumValues<BodyPart>().forEachIndexed { idx, it ->
            keypointList[idx].bodyPart = it
            keypointList[idx].position.x = xCoords[idx]
            keypointList[idx].position.y = yCoords[idx]
            keypointList[idx].score = confidenceScores[idx]
            totalScore += confidenceScores[idx]
        }
    
        person.keyPoints = keypointList.toList()
        person.score = totalScore / numKeypoints
    
        return person
    }
    

    整个 .kt 文件是位图到屏幕上点的核心!

    如果你还有什么需要请加我。

    愉快的编码

    【讨论】:

    • 感谢您的回复,并为我迟到的回复感到抱歉,因为过了一段时间我的项目仍然存在并没有完成,也许我稍后会回来。无论如何,这个例子很棒,我想我可以让它工作。我会努力付出更多的努力xD
    猜你喜欢
    • 2017-07-23
    • 2017-11-27
    • 2011-01-05
    • 2012-01-15
    • 2020-06-02
    • 2015-09-06
    • 2011-07-30
    • 2015-12-27
    • 2011-11-01
    相关资源
    最近更新 更多