Computer Vision using OpenCV- Part III (Histogram computing)

11 min readJan 17, 2020

Histograms

This post is mostly about histograms and how are the pixels intensities of an image distributed in a histogram. The Part I of this post goes over how to create shapes using the CV package. You can check that out here. You can also check the part II of the this post which goes over the basics of Image processing.

Before getting started on this, I would like to thank Adrian Rosebrock for his amazing posts on computer vision. All the blogs that I am creating here in CV are referenced from his work, so please go and check it out here.

Let’s get started !!

So, before getting into the details and code, let me just give a brief explanation on what histogram is and what it has to do with image.

Histogram is basically a way in which we could represent the pixel intensities of an image. We can visualize it using a graph which provides an overall idea about the pixel intensity distribution.

For now let’s consider the RGB color space and move forward. So the pixel value for these will be in the range of 0 to 255. So, if we are plotting this we would be dividing the pixels into bins within the range of 0 and 255 along the X-axis. So by doing this we are basically counting the number of times each pixels occurred. For example, if we have 256 bins then we are counting the number of times each pixel occurred.

Now that we have a basic knowledge of what histogram has to do with an image, let’s see how we could possibly use OpenCV to compute histograms.

Compute Histogram with OpenCV

To build a histogram, we can use the cv2.calcHist function. Let’s see what are the parameters that we need to pass to the function.

cv2.calcHist(images, channels, mask, histSize, ranges)

images: This will be the image for which we will be computing the histogram. It will be passed to this function in the form of list.
channels : Channels are a list of indexes where we would specify the index of channel for which we want to compute the histogram. For example, if we want to compute the histogram for a grayscale image then the channel list value would be [0]. If we are computing a histogram for all the red, green and blue channels, the channel list value would be [0,1,2].
mask : We have already discussed in detail what a mask is in the previous post. In the context of histograms, if we are providing a masked value to the function, then the histogram is computed for the masked portion only. If we are not applying the mask and simply computing the histogram for the entire image then we will specify “None” in that case.
histSize: This specifies the bin size that we will be using to compute the histogram. This is a list with each value in the list corresponding to each channel. For example, [8,8,8] corresponds to 8 bins for each channel.
ranges: This is to specify the range of the pixel values. Normally we have pixel ranges within [0,256] if we are dealing with RGB scale, however this might differ if we are using HSV scale.

Now that we have understood what are the parameters involved in the cv2.calcHist function, it’s time we get into some code and see how to implement this.

For start, let’s implement grayscale histogram.

Gray Scale Histogram

Below is how a gray scale looks like for a range of 0 to 255 pixels. This scale will come in handy when we are computing the histogram for gray scale image.

We are loading the image in the above code for which we want to compute the gray scale histogram. Now if you look at the image shape above (234,351,3). This is because it is represented in RGB scale which explains why we have 3 in the tuple. Also if you see, we are accessing the individual row of the 3 RGB scale .The value at the 0th row and 0th column for the RGB value is [12,15,29].

Below is how our original image looks like:

In the above code, we are converting the color image to a gray scale image and storing it in hist. If you look at the shape it’s just (234,351) as gray scale is 1-dimensional. Now if you look at an individual index, there are 351 pixel values for each index. In our case we are accessing the 233rd index value and it has 351 pixel values.

Below is how the gray scale image looks like:

Now let’s tie this all together and compute the histogram for the above gray scale image. As discussed earlier we will use the cv2.calcHist function. Below are the parameters for the function:

“hist” list
[0] as we are using a gray scale image, so the index list will only contain 0.
“None” as we are not implementing masking here
[256] as we are using 256 bins in this case.
[0,256] as we are using a pixel range varying from 0 to 256

If you look at the graph above there are about 3400–3500 pixels which fall in the 0th value bin. The reason being the background of the image is completely dark/black. Also there is a peak in the count of the pixels at around the 25th value of the bin as there are few lighter shades of black on the tower as well.

The count of pixels start falling while we move towards the 50th bin as there are white shades of color in the tower as well.

Color Histograms

1-Dimensional histogram

Let’s load our original image before computing the color histogram.

If you see the above figure there is a spike in blue probably towards the 10 to 15 pixels bin range with a count of the number of pixels at around 7000. But as the pixel intensity is low (within the range of 10 to 15 pixel bin) we are not clearly able to detect it in the image.

There is also a spike in the green pixel around the 25 pixels bin range with a count of the number of pixels at around 2500. However we are not able to properly detect green in the image due to the low pixel intensity (within the range of 25 pixels bin).

In case of red, if you notice the graph, there is some fluctuation around 50 pixels bin range which makes sense as the background of the image and the tower both have a shiny reflection of light which has a shade of red to it, however that red is not a dominant color due to which there is a slight fluctuation in the graph and it flattens after the 50 pixel bin range.

Now that we have seen how to compute 1-dimensional color histogram, let’s move on to compute a 2-dimensional histogram

Remember that the shape for the result of the cv2.calcHist function will be (256,1) because our histogram size of [256]

2-Dimensional Histogram

The 2-D color histogram computation is very similar to the 1-D. There are just a few things extra things that we need to pass to the cv2.calcHist function in case of the 2-D histogram.

Let’s have a look !!

Before that let’s load up the image and split the BGR color scales of the image and store it in a list called “chans”.

Below is how our original image would look like:

Now, let’s move to the 2nd part of the image and understand how we break down the image’s pixel intensities into a 2-dimensional histogram space.

Histogram for Green and Blue

# plot a 2D color histogram for green and blue
ax = fig.add_subplot(131)
hist = cv2.calcHist([chans[1], chans[0]], [0, 1], None,[32, 32], [0, 256, 0, 256])
print("2D histogram shape:",hist.shape)
p = ax.imshow(hist, interpolation = "nearest")
ax.set_title("2D Color Histogram for G and B")
plt.colorbar(p)O/P: 
2D histogram shape:  (32, 32)

In the above code we are trying to create a 2-D color histogram for green and blue scale.

So if you look at the cv2.calcHist function, the parameters to it are as follows:

chan[1] corresponds to the green color and chan[0] corresponds to the blue scale.
[0,1] corresponds to the number of channels that we are computing. As we are computing 2 channels here so the list is [0,1]. If it was 3 channels then it would be [0,1,2]
The 3rd parameter is None as we are not considering any mask in the computation.
We are using 32 bins for each channel in this case. [32,32] list is used because we are using 2 channels here
The last parameter corresponds to the range of pixels. As we have 2 channels, hence the value [0,256,0,256].

Remember that the shape for the result of cv2.calcHist function will be (32,32) as our histogram (histSize) is of size (32,32)

Histogram for Green and Red

ax = fig.add_subplot(132)
hist = cv2.calcHist([chans[1], chans[2]], [0, 1], None,[32, 32], [0, 256, 0, 256])
p = ax.imshow(hist, interpolation = "nearest")
ax.set_title("2D Color Histogram for G and R")
plt.colorbar(p)

In this case we are computing the histogram for Green and Red so everything remains the same as the above code, only the 1st parameter for the cv2.calcHist function changes to [chans[1], chans[2]] as chans[1] correspond to green and chans[2] corresponds to red.

Histogram for blue and Red

ax = fig.add_subplot(133)
hist = cv2.calcHist([chans[0], chans[2]], [0, 1], None,[32, 32], [0, 256, 0, 256])
p = ax.imshow(hist, interpolation = "nearest")
ax.set_title("2D Color Histogram for B and R")plt.colorbar(p)

In this case we are computing the histogram for blue and Red so everything remains the same as the above code, only the 1st parameter for the cv2.calcHist function changes to [chans[0], chans[2]] as chans[0] correspond to blue and chans[2] corresponds to red.

Now let’s tie it all together in a single piece of code and see the output:

If you look the output above you might see a lot of peaks for the green and blue histogram at around x = 22 and y = 12 which can be seen in the graph with the tiny yellow square.

As per the bar besides the graph, yellow color corresponds to higher count of pixels and blue represents low count.

So, we saw how we could take into account 2 channels and compute a 2D histogram. What if we want to account for all the 3 RGB channels. Yes right!! we are going to use the 3D histogram. Let’s see how we could do that.

3 Dimensional Histogram

The 3D histogram is just an extension of the previous one. Let’s implement it in code.

histogram = cv2.calcHist([image],[0,1,2],None,[8,8,8],[0,256,0,256,0,256])
print("3D histogram shape:{}, with {} values".format(histogram.shape,histogram.flatten().shape[0]))
plt.show()O/P:
3D histogram shape:(8, 8, 8), with 512 values

If you look at the above code, we are passing the image as the first parameter.
As we are using 3 channels, hence we pass [0,1,2].
We are passing None for the mask as we are not masking the image.
We are using 8 bins for each channel in this case. [8,8,8] list is used because we are using 3 channels here.
The last parameter corresponds to the range of pixels. As we have 3 channels, hence the value [0,256,0,256,0,256].

Histogram Equalizer

Histogram equalizer is the process through which we could equally distribute or stretch the distribution of pixels across an image. It is used to improve the contrast of an image. If suppose we have a histogram with a huge peak in it, as we had in our first grayscale histogram(Fig:2-Gray scale graph). Using a histogram equalizer would stretch the peak towards the corners of the image which will increase the global contrast of the image.

We apply histogram equalization to gray scale images. Histogram equalizer come in useful when we have an image with foregrounds or backgrounds being both dark or both white.

So, let’s implement this in code to understand the concept.

As histogram equalizer is applied to gray scale images, we are converting our image to grayscale in the above code.

In the above code we pass our grayscale image to the cv2.equalizeHist function.

In the above output you could see the original grayscale image on the left and the equally distributed pixels across the image on the right.

Histograms with Masks

Till now we have computed histograms without mask. Now let’s switch gears and include the mask in the image and compute the histogram.

Before that let’s first create a function plot_histogram for the computing the histogram so that we could avoid the repetitive process of writing code every-time we have to compute histogram for a new image.

Now that we have created the function plot_histogram, let’s just test it out by passing the image and create a 1D histogram. Below is the image that we are using here:

Now, let’s go ahead and implement the mask on the image. If you want to learn what mask is and how we will implement it for an image then please refer my part II post.

The above picture shows the mask that we would be applying to the overall image.

The above picture is the output masked image.

Now let’s pass the masked image to the plot_histogram function and visualize the output graph.

If you notice the red pixels fall within 0 to 80 pixel bins which makes sense as these don’t contribute much to our masked image. The green pixels are present in the darker side of the RGB spectrum. However, the blue pixels are present towards the brighter side in the graph and that’s because most of area in our masked image has blue sky.

If we want to compute the histogram only on a specific region of the image then by using the mask in the cv2.calcHist function we could easily compute it.

Parting Thoughts !!

In this post, we have gone in detail about how to compute histogram for the BGR color channel. BGR channels are the most commonly used and we will apply this concept going forward when we will be implementing image search.