[ad_1]
Two weeks ago we were asked to deliver a workshop with one of my colleagues on machine learning and bots. We had the idea to train a neural network in the first part of the workshop and make use of this in the bot demoed in the second part — digit recognition using the MNIST dataset is a very famous ‘hello-world level’ machine learning problem, so that’s what we’ve chosen.
The only thing I wasn’t giving attention is that the MNIST dataset contains 28 by 28 pixel images with absolute white backgrounds and black digits centered into a 20 by 20 pixel square. Which is not quite like the images you’ll shoot with your phone and send to the bot…
First, we’ll need to apply a grayscale filter on the original source image. This will basically sum the R, G and B part of each pixel and divide them with 3.
Second, we’ll remove the shadow marks (as you can see on the above image) with a vignette. This is a standard filter part of every photo app — applies a radial glow to the image making the corners darker — now with white color.
Third, separating foreground from the background. In this case the single digit from everything else. This is done by a binary threshold: everything above a given threshold will be pitch black, and everything below that absolutely white.
The next step is a little bit more complex. We need to crop the image dynamically to the content — for this, we need to find the bounding box first. Briefly, it will iterate from each side of the image and find the last row/column where there’s no pixel which is different then the background color (in this case white). You can check out the code here, if you are interested.
Fifth, we’ll have to make the image a square (add some padding to each side which is less than the maximum of the image width and height).
Lastly, to center the image we’ll first downscale the image to 20 by 20 pixel and add 4 pixel margin. This will produce a perfectly centered 28 by 28 pixel white background, black foreground image which is just what our neural network needs.
/// <summary> /// Preprocess camera images for MNIST-based neural networks. /// </summary> /// <param name="image">Source image in a file format agnostic structure in memory as a series of Rgba32 pixels.</param> /// <returns>Preprocessed image in a file format agnostic structure in memory as a series of Rgba32 pixels.</returns> public static Image<Rgba32> Preprocess(Image<Rgba32> image) { // Step 1: Apply a grayscale filter image.Mutate(i => i.Grayscale()); // Step 2: Apply a white vignette on the corners to remove shadow marks image.Mutate(i => i.Vignette(Rgba32.White)); // Step 3: Separate foreground and background with a threshold and set the correct colors image.Mutate(i => i.BinaryThreshold(0.6f, Rgba32.White, Rgba32.Black)); // Step 4: Crop to bounding box var boundingBox = FindBoundingBox(image); image.Mutate(i => i.Crop(boundingBox)); // Step 5: Make the image a square var maxWidthHeight = Math.Max(image.Width, image.Height); image.Mutate(i => i.Pad(maxWidthHeight, maxWidthHeight).BackgroundColor(_backgroundColor)); // Step 6: Downscale to 20x20 image.Mutate(i => i.Resize(20, 20)); // Step 7: Add 4 pixel margin image.Mutate(i => i.Pad(28, 28).BackgroundColor(_backgroundColor)); return image; }
I’m a Partner Technology Strategist at Microsoft, helping partners grow and reach the global market — from the technical side. A true geek, from time to time showing up at conferences and events around Central and Eastern Europe talking about some future stuff, probably with a HoloLens on my head. [)-)
[ad_2]
This article has been published from the source link without modifications to the text. Only the deadline has been changed.