【问题标题】:How to use RGB Image as input for the C# EvalDll Wrapper?如何使用 RGB 图像作为 C# EvalDll Wrapper 的输入?
【发布时间】:2016-05-18 13:21:51
【问题描述】:

我使用提供的 ImageReader 训练了一个网络,现在我尝试在 C# 项目中使用 CNTK EvalDll 来评估 RGB 图像。

我见过与 EvalDll 相关的示例,但输入始终是浮点数/双精度数组,而不是图像。

如何使用暴露的接口来使用经过训练的网络和 RGB 图像?

【问题讨论】:

    标签: cntk


    【解决方案1】:

    我假设您希望使用 ImageReader 进行阅读,您的阅读器配置类似于

    features=[
            width=224
            height=224
            channels=3
            cropType=Center
    ]
    

    您将需要辅助函数来创建裁剪,并将图像重新调整为网络接受的大小。

    我将定义System.Drawing.Bitmap的两种扩展方法,一种用于裁剪,一种用于调整大小:

    open System.Collections.Generic
    open System.Drawing
    open System.Drawing.Drawing2D
    open System.Drawing.Imaging
    type Bitmap with
        /// Crops the image in the present object, starting at the given (column, row), and retaining
        /// the given number of columns and rows.
        member this.Crop(column, row, numCols, numRows) = 
            let rect = Rectangle(column, row, numCols, numRows)
            this.Clone(rect, this.PixelFormat)
        /// Creates a resized version of the present image. The returned image
        /// will have the given width and height. This may distort the aspect ratio
        /// of the image.
        member this.ResizeImage(width, height, useHighQuality) =
            // Rather than using image.GetThumbnailImage, use direct image resizing.
            // GetThumbnailImage throws odd out-of-memory exceptions on some 
            // images, see also 
            // http://stackoverflow.com/questions/27528057/c-sharp-out-of-memory-exception-in-getthumbnailimage-on-a-server
            // Use the interpolation method suggested on 
            // http://stackoverflow.com/questions/1922040/resize-an-image-c-sharp
            let rect = Rectangle(0, 0, width, height);
            let destImage = new Bitmap(width, height);
            destImage.SetResolution(this.HorizontalResolution, this.VerticalResolution);
            use graphics = Graphics.FromImage destImage
            graphics.CompositingMode <- CompositingMode.SourceCopy;
            if useHighQuality then
                graphics.InterpolationMode <- InterpolationMode.HighQualityBicubic
                graphics.CompositingQuality <- CompositingQuality.HighQuality
                graphics.SmoothingMode <- SmoothingMode.HighQuality
                graphics.PixelOffsetMode <- PixelOffsetMode.HighQuality
            else
                graphics.InterpolationMode <- InterpolationMode.Low
            use wrapMode = new ImageAttributes()
            wrapMode.SetWrapMode WrapMode.TileFlipXY
            graphics.DrawImage(this, rect, 0, 0, this.Width,this.Height, GraphicsUnit.Pixel, wrapMode)
            destImage
    

    在此基础上,定义一个函数来做中心裁剪:

    /// Returns a square sub-image from the center of the given image, with
    /// a size that is cropRatio times the smallest image dimension. The 
    /// aspect ratio is preserved.
    let CenterCrop cropRatio (image: Bitmap) =
        let cropSize = 
            float(min image.Height image.Width) * cropRatio
            |> int
        let startRow = (image.Height - cropSize) / 2
        let startCol = (image.Width - cropSize) / 2
        image.Crop(startCol, startRow, cropSize, cropSize)
    

    然后将它们全部组合在一起:裁剪、调整大小,然后按照 OpenCV 使用的平面顺序遍历图像:

    /// Creates a list of CNTK feature values from a given bitmap.
    /// The image is first resized to fit into an (targetSize x targetSize) bounding box,
    /// then the image planes are converted to a CNTK tensor.
    /// Returns a list with targetSize*targetSize*3 values.
    let ImageToFeatures (image: Bitmap, targetSize) =
        // Apply the same image pre-processing that is typically done
        // in CNTK when running it in test or write mode: Take a center
        // crop of the image, then re-size it to the network input size.
        let cropped = CenterCrop 1.0 image
        let resized = cropped.ResizeImage(targetSize, targetSize, false)
        // Ensure that the initial capacity of the list is provided 
        // with the constructor. Creating the list via the default constructor
        // makes the whole operation 20% slower.
        let features = List (targetSize * targetSize * 3)
        // Traverse the image in the format that is used in OpenCV:
        // First the B plane, then the G plane, R plane
        for c in 0 .. 2 do
            for h in 0 .. (resized.Height - 1) do
                for w in 0 .. (resized.Width - 1) do
                    let pixel = resized.GetPixel(w, h)
                    let v = 
                        match c with 
                        | 0 -> pixel.B
                        | 1 -> pixel.G
                        | 2 -> pixel.R
                        | _ -> failwith "No such channel"
                        |> float32
                    features.Add v
        features
    

    使用有问题的图像调用ImageToFeatures,将结果输入IEvaluateModelManagedF 的实例,这样就可以了。我假设您的 RGB 图像来自 myImage,并且您正在使用 224 x 224 的网络大小进行二进制分类。

    let LoadModelOnCpu modelPath =
        let model = new IEvaluateModelManagedF()
        let description = sprintf "deviceId=-1\r\nmodelPath=\"%s\"" modelPath
        model.Init description
        model.CreateNetwork description
        model
    let model = LoadModelOnCpu("myModelFile")
    let featureDict = Dictionary()
    featureDict.["features"] <- ImageToFeatures(myImage, 224)
    model.Evaluate(featureDict, "OutputNodes.z", 2)
    

    【讨论】:

    • 我应该提醒一句:您将在此处获得的特征值与 CNTK 生成的特征值不匹配 100%。 .Net 调整图像大小的方式与 OpenCV 的方式之间仍然存在一些小的差异,导致网络输出的微小差异(在最终的 SoftMax 之前) - 但我没有观察到分配的类有任何变化。
    • 我完全不明白这个答案。它所谈论的只是如何调整大小和裁剪。和 CNTK(大概是 IEvaluateModelManagedF)有什么关系?
    • @thang:我有点认为如何处理它很明显,但似乎不是。您只需将其插入到model.Evaluate 的调用中即可。会更新答案
    • 我个人认为使用 ImageProcessor 调整大小/裁剪比使用 OpenCV 更容易。
    • @AntonSchwaighofer :) 我的想法正好相反。裁剪和调整大小是与 CNTK 无关的显而易见的事情。如何将数据发送到 CNTK(对我而言)并不明显的原因是 github.com/Microsoft/CNTK/issues/337。似乎在不同的版本中存在错误。
    【解决方案2】:

    我在 C# 中实现了类似的代码,它加载到模型中,读取测试图像,进行适当的裁剪/缩放/等,然后运行模型。正如 Anton 所指出的,输出与 CNTK 的输出没有 100% 匹配,但非常接近。

    图像读取/裁剪/缩放的代码:

        private static Bitmap ImCrop(Bitmap img, int col, int row, int numCols, int numRows)
        {
            var rect = new Rectangle(col, row, numCols, numRows);
            return img.Clone(rect, System.Drawing.Imaging.PixelFormat.DontCare);
        }
    
        /// Returns a square sub-image from the center of the given image, with
        /// a size that is cropRatio times the smallest image dimension. The 
        /// aspect ratio is preserved.
        private static Bitmap ImCropToCenter(Bitmap img, double cropRatio)
        {
            var cropSize = (int)Math.Round(Math.Min(img.Height, img.Width) * cropRatio);
            var startCol = (img.Width - cropSize) / 2;
            var startRow = (img.Height - cropSize) / 2;
            return ImCrop(img, startCol, startRow, cropSize, cropSize);
        }
    
        /// Creates a resized version of the present image. The returned image
        /// will have the given width and height. This may distort the aspect ratio
        /// of the image.
        private static Bitmap ImResize(Bitmap img, int width, int height)
        {
            return new Bitmap(img, new Size(width, height));
        }
    

    加载模型和包含像素的xml文件的代码意味着:

        public static IEvaluateModelManagedF loadModel(string modelPath, string outputLayerName)
        {
            var networkConfiguration = String.Format("modelPath=\"{0}\" outputNodeNames=\"{1}\"", modelPath, outputLayerName);
            Stopwatch stopWatch = new Stopwatch();
            var model = new IEvaluateModelManagedF();
            model.CreateNetwork(networkConfiguration, deviceId: -1);
            stopWatch.Stop();
            Console.WriteLine("Time to create network: {0} ms.", stopWatch.ElapsedMilliseconds);
            return model;
        }
    
        /// Read the xml mean file, i.e. the offsets which are substracted
        /// from each pixel in an image before using it as input to a CNTK model.
        public static float[] readXmlMeanFile(string XmlPath, int ImgWidth, int ImgHeight)
        {
            // Read and parse pixel value xml file
            XmlTextReader reader = new XmlTextReader(XmlPath);
            reader.ReadToFollowing("data");
            reader.Read();
            var pixelMeansXml =
                reader.Value.Split(new[] { "\r", "\n", " " }, StringSplitOptions.RemoveEmptyEntries)
                    .Select(Single.Parse)
                    .ToArray();
    
            // Re-order mean pixel values to be in the same order as the bitmap
            // image (as outputted by the getRGBChannels() function).
            int inputDim = 3 * ImgWidth * ImgHeight;
            Debug.Assert(pixelMeansXml.Length == inputDim);
            var pixelMeans = new float[inputDim];
            int counter = 0;
            for (int c = 0; c < 3; c++)
                for (int h = 0; h < ImgHeight; h++)
                    for (int w = 0; w < ImgWidth; w++)
                    {
                        int xmlIndex = h * ImgWidth * 3 + w * 3 + c;
                        pixelMeans[counter++] = pixelMeansXml[xmlIndex];
                    }
            return pixelMeans;
        }
    

    加载图像并转换为模型输入的代码:

        /// Creates a list of CNTK feature values from a given bitmap.
        /// The image is first resized to fit into an (targetSize x targetSize) bounding box,
        /// then the image planes are converted to a CNTK tensor, and the mean 
        /// pixel value substracted. Returns a list with targetSize * targetSize * 3 floats.
        private static List<float> ImageToFeatures(Bitmap img, int targetSize, float[] pixelMeans)
        {
            // Apply the same image pre-processing that is done typically in CNTK:
            // Take a center crop of the image, then re-size it to the network input size.
            var imgCropped = ImCropToCenter(img, 1.0);
            var imgResized = ImResize(imgCropped, targetSize, targetSize);
    
            // Convert pixels to CNTK model input.
            // Fast pixel extraction is ~5 faster while giving identical output
            var features = new float[3 * imgResized.Height * imgResized.Width];
            var boFastPixelExtraction = true; 
            if (boFastPixelExtraction) 
            {
                var pixelsRGB = ImGetRGBChannels(imgResized);
                for (int c = 0; c < 3; c++)
                {
                    byte[] pixels = pixelsRGB[2 - c];
                    Debug.Assert(pixels.Length == imgResized.Height * imgResized.Width);
                    for (int i = 0; i < pixels.Length; i++)
                    {
                        int featIndex = i + c * pixels.Length;
                        features[featIndex] = pixels[i] - pixelMeans[featIndex];
                    }
                }
            }
            else
            {
                // Traverse the image in the format that is used in OpenCV:
                // First the B plane, then the G plane, R plane
                // Note: calling GetPixel(w, h) repeatedly is slow!
                int featIndex = 0;
                for (int c = 0; c < 3; c++)
                    for (int h = 0; h < imgResized.Height; h++)
                        for (int w = 0; w < imgResized.Width; w++)
                        {
                            var pixel = imgResized.GetPixel(w, h);
                            float v;
                            if (c == 0)
                                v = pixel.B;
                            else if (c == 1)
                                v = pixel.G;
                            else if (c == 2)
                                v = pixel.R;
                            else
                                throw new Exception("");
    
                            // Substract pixel mean                                                                                           
                            features[featIndex] = v - pixelMeans[featIndex];
                            featIndex++;
                        }
            }  
            return features.ToList();
        }
    
        /// Convert bitmap image to R,G,B channel byte arrays.
        /// See: http://stackoverflow.com/questions/6020406/travel-through-pixels-in-bmp
        private static List<byte[]> ImGetRGBChannels(Bitmap bmp)
        {
            // Lock the bitmap's bits.  
            Rectangle rect = new Rectangle(0, 0, bmp.Width, bmp.Height);
            BitmapData bmpData = bmp.LockBits(rect, ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb);
    
            // Declare an array to hold the bytes of the bitmap.
            int bytes = bmpData.Stride * bmp.Height;
            byte[] rgbValues = new byte[bytes];
            byte[] r = new byte[bytes / 3];
            byte[] g = new byte[bytes / 3];
            byte[] b = new byte[bytes / 3];
    
            // Copy the RGB values into the array, starting from ptr to the first line
            IntPtr ptr = bmpData.Scan0;
            Marshal.Copy(ptr, rgbValues, 0, bytes);
    
            // Populate byte arrays
            int count = 0;
            int stride = bmpData.Stride;
            for (int col = 0; col < bmpData.Height; col++)
            {
                for (int row = 0; row < bmpData.Width; row++)
                {
                    int offset = (col * stride) + (row * 3);
                    b[count] = rgbValues[offset];
                    g[count] = rgbValues[offset + 1];
                    r[count++] = rgbValues[offset + 2];
                }
            }
            bmp.UnlockBits(bmpData);
            return new List<byte[]> { r, g, b };
        }
    

    【讨论】:

      猜你喜欢
      • 2016-10-14
      • 2022-07-31
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2011-07-06
      • 2021-12-15
      • 2020-08-23
      • 1970-01-01
      相关资源
      最近更新 更多