如何使用 RGB 图像作为 C# EvalDll Wrapper 的输入？答案

【问题标题】：How to use RGB Image as input for the C# EvalDll Wrapper?如何使用 RGB 图像作为 C# EvalDll Wrapper 的输入？
【发布时间】：2016-05-18 13:21:51
【问题描述】：

我使用提供的 ImageReader 训练了一个网络，现在我尝试在 C# 项目中使用 CNTK EvalDll 来评估 RGB 图像。

我见过与 EvalDll 相关的示例，但输入始终是浮点数/双精度数组，而不是图像。

如何使用暴露的接口来使用经过训练的网络和 RGB 图像？

【问题讨论】：

标签： cntk

【解决方案1】：

我假设您希望使用 ImageReader 进行阅读，您的阅读器配置类似于

features=[
        width=224
        height=224
        channels=3
        cropType=Center
]

您将需要辅助函数来创建裁剪，并将图像重新调整为网络接受的大小。

我将定义System.Drawing.Bitmap的两种扩展方法，一种用于裁剪，一种用于调整大小：

open System.Collections.Generic
open System.Drawing
open System.Drawing.Drawing2D
open System.Drawing.Imaging
type Bitmap with
    /// Crops the image in the present object, starting at the given (column, row), and retaining
    /// the given number of columns and rows.
    member this.Crop(column, row, numCols, numRows) = 
        let rect = Rectangle(column, row, numCols, numRows)
        this.Clone(rect, this.PixelFormat)
    /// Creates a resized version of the present image. The returned image
    /// will have the given width and height. This may distort the aspect ratio
    /// of the image.
    member this.ResizeImage(width, height, useHighQuality) =
        // Rather than using image.GetThumbnailImage, use direct image resizing.
        // GetThumbnailImage throws odd out-of-memory exceptions on some 
        // images, see also 
        // http://stackoverflow.com/questions/27528057/c-sharp-out-of-memory-exception-in-getthumbnailimage-on-a-server
        // Use the interpolation method suggested on 
        // http://stackoverflow.com/questions/1922040/resize-an-image-c-sharp
        let rect = Rectangle(0, 0, width, height);
        let destImage = new Bitmap(width, height);
        destImage.SetResolution(this.HorizontalResolution, this.VerticalResolution);
        use graphics = Graphics.FromImage destImage
        graphics.CompositingMode <- CompositingMode.SourceCopy;
        if useHighQuality then
            graphics.InterpolationMode <- InterpolationMode.HighQualityBicubic
            graphics.CompositingQuality <- CompositingQuality.HighQuality
            graphics.SmoothingMode <- SmoothingMode.HighQuality
            graphics.PixelOffsetMode <- PixelOffsetMode.HighQuality
        else
            graphics.InterpolationMode <- InterpolationMode.Low
        use wrapMode = new ImageAttributes()
        wrapMode.SetWrapMode WrapMode.TileFlipXY
        graphics.DrawImage(this, rect, 0, 0, this.Width,this.Height, GraphicsUnit.Pixel, wrapMode)
        destImage

在此基础上，定义一个函数来做中心裁剪：

/// Returns a square sub-image from the center of the given image, with
/// a size that is cropRatio times the smallest image dimension. The 
/// aspect ratio is preserved.
let CenterCrop cropRatio (image: Bitmap) =
    let cropSize = 
        float(min image.Height image.Width) * cropRatio
        |> int
    let startRow = (image.Height - cropSize) / 2
    let startCol = (image.Width - cropSize) / 2
    image.Crop(startCol, startRow, cropSize, cropSize)

然后将它们全部组合在一起：裁剪、调整大小，然后按照 OpenCV 使用的平面顺序遍历图像：

/// Creates a list of CNTK feature values from a given bitmap.
/// The image is first resized to fit into an (targetSize x targetSize) bounding box,
/// then the image planes are converted to a CNTK tensor.
/// Returns a list with targetSize*targetSize*3 values.
let ImageToFeatures (image: Bitmap, targetSize) =
    // Apply the same image pre-processing that is typically done
    // in CNTK when running it in test or write mode: Take a center
    // crop of the image, then re-size it to the network input size.
    let cropped = CenterCrop 1.0 image
    let resized = cropped.ResizeImage(targetSize, targetSize, false)
    // Ensure that the initial capacity of the list is provided 
    // with the constructor. Creating the list via the default constructor
    // makes the whole operation 20% slower.
    let features = List (targetSize * targetSize * 3)
    // Traverse the image in the format that is used in OpenCV:
    // First the B plane, then the G plane, R plane
    for c in 0 .. 2 do
        for h in 0 .. (resized.Height - 1) do
            for w in 0 .. (resized.Width - 1) do
                let pixel = resized.GetPixel(w, h)
                let v = 
                    match c with 
                    | 0 -> pixel.B
                    | 1 -> pixel.G
                    | 2 -> pixel.R
                    | _ -> failwith "No such channel"
                    |> float32
                features.Add v
    features

使用有问题的图像调用ImageToFeatures，将结果输入IEvaluateModelManagedF 的实例，这样就可以了。我假设您的 RGB 图像来自 myImage，并且您正在使用 224 x 224 的网络大小进行二进制分类。

let LoadModelOnCpu modelPath =
    let model = new IEvaluateModelManagedF()
    let description = sprintf "deviceId=-1\r\nmodelPath=\"%s\"" modelPath
    model.Init description
    model.CreateNetwork description
    model
let model = LoadModelOnCpu("myModelFile")
let featureDict = Dictionary()
featureDict.["features"] <- ImageToFeatures(myImage, 224)
model.Evaluate(featureDict, "OutputNodes.z", 2)

【讨论】：

我应该提醒一句：您将在此处获得的特征值与 CNTK 生成的特征值不匹配 100%。 .Net 调整图像大小的方式与 OpenCV 的方式之间仍然存在一些小的差异，导致网络输出的微小差异（在最终的 SoftMax 之前） - 但我没有观察到分配的类有任何变化。
我完全不明白这个答案。它所谈论的只是如何调整大小和裁剪。和 CNTK（大概是 IEvaluateModelManagedF）有什么关系？
@thang：我有点认为如何处理它很明显，但似乎不是。您只需将其插入到model.Evaluate 的调用中即可。会更新答案
我个人认为使用 ImageProcessor 调整大小/裁剪比使用 OpenCV 更容易。
@AntonSchwaighofer :) 我的想法正好相反。裁剪和调整大小是与 CNTK 无关的显而易见的事情。如何将数据发送到 CNTK（对我而言）并不明显的原因是 github.com/Microsoft/CNTK/issues/337。似乎在不同的版本中存在错误。

【解决方案2】：

我在 C# 中实现了类似的代码，它加载到模型中，读取测试图像，进行适当的裁剪/缩放/等，然后运行模型。正如 Anton 所指出的，输出与 CNTK 的输出没有 100% 匹配，但非常接近。

图像读取/裁剪/缩放的代码：

    private static Bitmap ImCrop(Bitmap img, int col, int row, int numCols, int numRows)
    {
        var rect = new Rectangle(col, row, numCols, numRows);
        return img.Clone(rect, System.Drawing.Imaging.PixelFormat.DontCare);
    }

    /// Returns a square sub-image from the center of the given image, with
    /// a size that is cropRatio times the smallest image dimension. The 
    /// aspect ratio is preserved.
    private static Bitmap ImCropToCenter(Bitmap img, double cropRatio)
    {
        var cropSize = (int)Math.Round(Math.Min(img.Height, img.Width) * cropRatio);
        var startCol = (img.Width - cropSize) / 2;
        var startRow = (img.Height - cropSize) / 2;
        return ImCrop(img, startCol, startRow, cropSize, cropSize);
    }

    /// Creates a resized version of the present image. The returned image
    /// will have the given width and height. This may distort the aspect ratio
    /// of the image.
    private static Bitmap ImResize(Bitmap img, int width, int height)
    {
        return new Bitmap(img, new Size(width, height));
    }

加载模型和包含像素的xml文件的代码意味着：

    public static IEvaluateModelManagedF loadModel(string modelPath, string outputLayerName)
    {
        var networkConfiguration = String.Format("modelPath=\"{0}\" outputNodeNames=\"{1}\"", modelPath, outputLayerName);
        Stopwatch stopWatch = new Stopwatch();
        var model = new IEvaluateModelManagedF();
        model.CreateNetwork(networkConfiguration, deviceId: -1);
        stopWatch.Stop();
        Console.WriteLine("Time to create network: {0} ms.", stopWatch.ElapsedMilliseconds);
        return model;
    }

    /// Read the xml mean file, i.e. the offsets which are substracted
    /// from each pixel in an image before using it as input to a CNTK model.
    public static float[] readXmlMeanFile(string XmlPath, int ImgWidth, int ImgHeight)
    {
        // Read and parse pixel value xml file
        XmlTextReader reader = new XmlTextReader(XmlPath);
        reader.ReadToFollowing("data");
        reader.Read();
        var pixelMeansXml =
            reader.Value.Split(new[] { "\r", "\n", " " }, StringSplitOptions.RemoveEmptyEntries)
                .Select(Single.Parse)
                .ToArray();

        // Re-order mean pixel values to be in the same order as the bitmap
        // image (as outputted by the getRGBChannels() function).
        int inputDim = 3 * ImgWidth * ImgHeight;
        Debug.Assert(pixelMeansXml.Length == inputDim);
        var pixelMeans = new float[inputDim];
        int counter = 0;
        for (int c = 0; c < 3; c++)
            for (int h = 0; h < ImgHeight; h++)
                for (int w = 0; w < ImgWidth; w++)
                {
                    int xmlIndex = h * ImgWidth * 3 + w * 3 + c;
                    pixelMeans[counter++] = pixelMeansXml[xmlIndex];
                }
        return pixelMeans;
    }

加载图像并转换为模型输入的代码：

    /// Creates a list of CNTK feature values from a given bitmap.
    /// The image is first resized to fit into an (targetSize x targetSize) bounding box,
    /// then the image planes are converted to a CNTK tensor, and the mean 
    /// pixel value substracted. Returns a list with targetSize * targetSize * 3 floats.
    private static List<float> ImageToFeatures(Bitmap img, int targetSize, float[] pixelMeans)
    {
        // Apply the same image pre-processing that is done typically in CNTK:
        // Take a center crop of the image, then re-size it to the network input size.
        var imgCropped = ImCropToCenter(img, 1.0);
        var imgResized = ImResize(imgCropped, targetSize, targetSize);

        // Convert pixels to CNTK model input.
        // Fast pixel extraction is ~5 faster while giving identical output
        var features = new float[3 * imgResized.Height * imgResized.Width];
        var boFastPixelExtraction = true; 
        if (boFastPixelExtraction) 
        {
            var pixelsRGB = ImGetRGBChannels(imgResized);
            for (int c = 0; c < 3; c++)
            {
                byte[] pixels = pixelsRGB[2 - c];
                Debug.Assert(pixels.Length == imgResized.Height * imgResized.Width);
                for (int i = 0; i < pixels.Length; i++)
                {
                    int featIndex = i + c * pixels.Length;
                    features[featIndex] = pixels[i] - pixelMeans[featIndex];
                }
            }
        }
        else
        {
            // Traverse the image in the format that is used in OpenCV:
            // First the B plane, then the G plane, R plane
            // Note: calling GetPixel(w, h) repeatedly is slow!
            int featIndex = 0;
            for (int c = 0; c < 3; c++)
                for (int h = 0; h < imgResized.Height; h++)
                    for (int w = 0; w < imgResized.Width; w++)
                    {
                        var pixel = imgResized.GetPixel(w, h);
                        float v;
                        if (c == 0)
                            v = pixel.B;
                        else if (c == 1)
                            v = pixel.G;
                        else if (c == 2)
                            v = pixel.R;
                        else
                            throw new Exception("");

                        // Substract pixel mean                                                                                           
                        features[featIndex] = v - pixelMeans[featIndex];
                        featIndex++;
                    }
        }  
        return features.ToList();
    }

    /// Convert bitmap image to R,G,B channel byte arrays.
    /// See: http://stackoverflow.com/questions/6020406/travel-through-pixels-in-bmp
    private static List<byte[]> ImGetRGBChannels(Bitmap bmp)
    {
        // Lock the bitmap's bits.  
        Rectangle rect = new Rectangle(0, 0, bmp.Width, bmp.Height);
        BitmapData bmpData = bmp.LockBits(rect, ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb);

        // Declare an array to hold the bytes of the bitmap.
        int bytes = bmpData.Stride * bmp.Height;
        byte[] rgbValues = new byte[bytes];
        byte[] r = new byte[bytes / 3];
        byte[] g = new byte[bytes / 3];
        byte[] b = new byte[bytes / 3];

        // Copy the RGB values into the array, starting from ptr to the first line
        IntPtr ptr = bmpData.Scan0;
        Marshal.Copy(ptr, rgbValues, 0, bytes);

        // Populate byte arrays
        int count = 0;
        int stride = bmpData.Stride;
        for (int col = 0; col < bmpData.Height; col++)
        {
            for (int row = 0; row < bmpData.Width; row++)
            {
                int offset = (col * stride) + (row * 3);
                b[count] = rgbValues[offset];
                g[count] = rgbValues[offset + 1];
                r[count++] = rgbValues[offset + 2];
            }
        }
        bmp.UnlockBits(bmpData);
        return new List<byte[]> { r, g, b };
    }

【讨论】：