Nuts and Bolts of Computer Vision using openCV- Part II (Image Processing)

Tanwir Khan
14 min readJan 13, 2020

This post is mostly about the basics of image processing. The Part I of this post goes over how to create shapes using the CV package. You can check that out here.

Before getting started on this, I would like to thank Adrian Rosebrock for his amazing posts on computer vision. All the blogs that I am creating here in CV are referenced from his work, so please go and check it out here.

Let’s get started !!

We will be going over different methods of image processing here. The first one that we will be discussing is Translation.

Translation

Translation is basically shifting the image along the x and y axis whether it’s shifting it up, down, left or right. Let’s implement this in code and see how we could translate our image.

import numpy as np
import imutils
import cv2
image = cv2.imread("/input33.jpg")
cv2.imshow("Image",image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Below if the O/P for the code.

Now that we have loaded our image, let’s go ahead and do some translations to it.

M = np.float32([[1,0,20],[0,1,30]])
shifted = cv2.warpAffine(image, M, (image.shape[1], image.shape[0]))
cv2.imshow("Shifted Down and Right", shifted)
cv2.waitKey(0)
cv2.destroyAllWindows()

The first line above defines the translation matrix which tells us how many pixels to the left/right, up/down the image will be shifted.

We define our translation matrix as a floating point array, because opencv expects this to be of floating point type.

The first row of the matrix is in the format [1,0, Px] where the Px is the number of pixels we want to shift the image either left or right. If the Px is negative then the image will be shifted to the left, in our case the image will be shifted to the right as the Px value is positive. Similarly the 2nd row of the matrix is in the format [0,1, Py], where the Py is the number of pixels we want to shift the image either up or down. If the Py is negative then the image will be shifted up, in our case the image will be shifted down as the Py value is positive. So, in our case the image will be shifted 20 pixels right and 30 pixels down.

So, the first line of the code is just for defining our translation matrix. To implement that translation we use the cv2.warpAffine method. The arguments to this method are the image which we want to translate, the translation matrix M and the (width, height) of our image. So let’s see how our translated image looks like now.

So, as you could see here, the image has shifted slightly towards the right and down.

M = np.float32([[1, 0, -50], [0, 1, -90]])
shifted = cv2.warpAffine(image,M,(image.shape[1],image.shape[0]))
cv2.imshow("Shifted up and left", shifted)
cv2.waitKey(0)
cv2.destroyAllWindows()

In the above code, everything is the same, just that we are shifting the image up and left. Below is how the image looks like after the shift.

Rotation

Now that we are quite familiar about translation, let’s jump over to rotation and see how we could implement that.

image = cv2.imread("input33.jpg")
cv2.imshow("Image",image)
cv2.waitKey(0)
cv2.destroyAllWindows()

The above code loads the image as discussed previously.

(h,w)= image.shape[:2]
center = (w//2,h//2)
M = cv2.getRotationMatrix2D(center,45,1.0)
rotated = cv2.warpAffine(image,M,(w,h))
cv2.imshow("Rotated by 45 degrees",rotated)
cv2.waitKey(0)
cv2.destroyAllWindows()

In the first line we are grabbing the height and width of the image and then dividing by 2 to determine the center of the image. We are calculating the whole integer values for the center using “//”. Then in the 3rd line we are calling the function cv2.getRotationMatrix2D to create a rotation matrix for rotating the image the way we created a translation matrix in the translation section above. The function’s argument are the image, and the degrees to which we want to rotate our image (45 degrees in our case) and the last argument is the scale of the image. The scale of the image has more to deal with the size. We will discuss that in detail in the resizing section. For now we have taken a floating point value of 1.0 which means the size remains the same. If we had specified the value as 0.5 then the image would have become half the size of the original and a value of 2.0 would make it double the size of original.

So now that we have our rotation matrix, we apply the rotation using the cv2.warpAffine method in the 4th line. The argument to this method are the image, the rotation matrix and the dimensions of the image which is the (width, height). So the output of our image is shown below:

M1 = cv2.getRotationMatrix2D(center,-90,1.0)
rotated1=cv2.warpAffine(image,M1,(w,h))
cv2.imshow("Rotated by -90 degrees",rotated1)
cv2.waitKey(0)
cv2.destroyAllWindows()

In the above code we are rotating the image by -90 degrees. Below is the output of the image.

Resize

1. Resizing while specifying the width

In this section we will see how we could resize an image by specifying the new width of the image and also see what are the things that we need to keep in mind before doing that.

image = cv2.imread("input33.jpg")
cv2.imshow("Original Image",image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Basic house-keeping stuff in above code. Just loading the image and visualizing it. Below is how the original image looks like

print("Original dimensions: ",image.shape[:2])
new_width = 550
r = new_width/image.shape[1]
print("aspect ratio",r)
dim = (new_width,int(image.shape[0]*r))
print("New dimensions: ",dim)
O/P:
Original dimensions: (234, 351)
aspect ratio 1.566951566951567
New dimensions: (550, 366)

In this section as we are resizing the image by specifying the width, therefore we are defining the new image width as 550.

In the third line we are calculating the ratio of the new width to the old width. Once we have the ratio, we calculate the dimension of the image having the new width and the new height in the 5th line.

resized = cv2.resize(image,dim,interpolation = cv2.INTER_AREA)
cv2.imshow("Resized Image",resized)
cv2.waitKey(0)
cv2.destroyAllWindows()

We use the cv2.resize function to resize our image. The arguments to the function are the image, new dimension (dim) and the interpolation. The interpolation method is the main algorithm which works behind the scenes to resize the image. The interpolation method that we have used here is cv2.INTER_AREA. There are other interpolation methods as well which we can use as well : cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_NEAREST. Below is the output of the resized image:

2. Resizing while specifying the height

new_height = 50
r_1 = new_height/image.shape[0]
dim = (int(image.shape[1]*r_1),new_height)
print("New dimensions: ",dim)
O/P:
New dimensions: (75, 50)

In the above code, we are resizing the image while specifying the height. The logic remains the same as before, the only difference is we are resizing here based on height.

resized = cv2.resize(image,dim,interpolation = cv2.INTER_AREA)
cv2.imshow("Resized image", resized)
cv2.waitKey(0)
cv2.destroyAllWindows()

Below of the resized image output:

Now let’s try to create a resizing function for the above to avoid writing the same code every time we are resizing the image

def resize(image,width=None, height=None, inter=cv2.INTER_AREA):
dim = None
(h,w) = image.shape[:2]
if width is None and height is None:
return image
if width is None:
r = height/float(h)
dim = (int(w*r),height)
else:
r = width/float(w)
dim =(width,int(h*r))
resized = cv2.resize(image,dim,interpolation=inter)
return resized

Now let’s call the function while specifying the height and see the O/P

resized = resize(image,height=100)
cv2.imshow("resized image",resized)
cv2.waitKey(0)
cv2.destroyAllWindows()

Now let’s call the function while specifying the width and see the O/P

resized = resize(image,width=700)
cv2.imshow("resized image",resized)
cv2.waitKey(0)
cv2.destroyAllWindows()

Flipping

Flipping is very straight forward and doesn’t require much of an explanation.

image = cv2.imread("/elephant.jpg")
cv2.imshow("Original Image",image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Below is how the original image looks like:

flipped = cv2.flip(image,1)
cv2.imshow("Flipped Horizontally",flipped)
cv2.waitKey(0)
cv2.destroyAllWindows()

cv2.flip function is used to flip the image. The argument value 1 flips the image horizontally along the y-axis. Let’s have a look at the O/P.

flipped = cv2.flip(image,0)
cv2.imshow("Flipped vertically",flipped)
cv2.waitKey(0)
cv2.destroyAllWindows()

To flip the image vertically along the x-axis we pass 0 as the argument. Let’s have a look at the O/P.

flipped = cv2.flip(image,-1)
cv2.imshow("Flipped Horizontally and vertically",flipped)
cv2.waitKey(0)
cv2.destroyAllWindows()

To flip the image horizontally and vertically we pass -1 as the argument.

Cropping

Cropping an image is pretty simple and straight forward. We are just slicing the indexes of the numpy array. We basically provide the numpy array slices to extract the image.

Just a couple of things that we need to remember here in OpenCV is it represents images in the form of Numpy arrays where the first value is the height and the 2nd value is the width. So when cropping the image we need to supply the the 2nd value (width) first and the 1st value (height) second which means we need to provide the y-axis before the x-axis.

image = cv2.imread("/elephant.jpg")
cv2.imshow("Original Image",image)
print("Image shape: ",image.shape[:2])
cv2.waitKey(0)
cv2.destroyAllWindows()
O/P
Image shape: (519, 778)

Below is how our Original image would look like

sliced = image[30:500,0:500]
cv2.imshow("sliced image",sliced)
cv2.waitKey(0)
cv2.destroyAllWindows()

So, for the above slicing code we are providing the following indices:

  1. Starting coordinate of slicing for height (Start-y-axis) : y = 30
  2. Ending coordinate of slicing for height (End-y-axis): y =500
  3. Starting coordinate of slicing for width (Start-x-axis): x = 0
  4. Ending coordinate of slicing for width (End-x-axis): x = 500

Below is the output of the sliced image:

Caveats in OpenCV and Numpy array

Now that we have explored the major image manipulation methods, let’s try to understand few caveats in OpenCV and Numpy array operation and how it would impact our image.

print("max of 255 pixels: {}".format(cv2.add(np.uint8([255]),np.uint8([100]))))O/P:
max of 255 pixels: [[255]]
print("min of 0 pixels: {}".format(cv2.subtract(np.uint8([50]),np.uint8([100]))))O/P:
min of 0 pixels: [[0]]

So normally for an image within the RGB scale would have pixels within the range of [0,255]. But what would happen if we add 15 to it. Ideally it would result to 270, but in this case as RGB images are represented in 8-bit unsigned integer, 270 is not a valid value in this case.

So, to have the correct answer to this there are 2 approaches that we will see now and these approaches are typically based on our use cases:

  1. The first approach is using OpenCV, where it just clips the remainder, ensuring that the pixel value never fall outside the range of [0,255]. If you see the code above we are adding 2 numpy arrays which are 8-bit unsigned integer. Normal addition should result to 355, however Opencv is clipping the remainder and returning 255. In case of subtraction if you notice we are subtracting 50 from 100. Ideally the result should be -50, but Opencv clips it and returns the value 0.
print("wrap around: {}".format(np.uint8([100])+np.uint8([175])))
O/P:
wrap around: [19]
print("wrap around:{}".format(np.uint8([10])-np.uint8([31])))
O/P:
wrap around:[235]

2. Now in the 2nd approach we are doing Numpy addition/subtraction. This will basically perform modulus operation to show the remainder. In the 1st piece of code we are adding 2 numpy arrays which are 8-bit unsigned integer. The first numpy array has value 100 and the 2nd has value 175. The normal addition would result in 275, but in this case numpy performs modulo operation on the above and returns the remainder which is 19 in this case.

In the 2nd line we define 2 more numpy arrays with value 10 and 31. Here we are performing subtraction. So here Numpy performs modulo operation and wraps around the result and starts counting backwards from 255 to return 235.

So now that we have understood what are the caveats, let’s implement this on an image and see how it looks:

image = cv2.imread("/elephant.jpg")
cv2.imshow("Original Image",image)
print("Image shape: ",image.shape[:2])
cv2.waitKey(0)
cv2.destroyAllWindows()

Below is how the original image looks like:

M = np.ones(image.shape,dtype = "uint8")*200
add = cv2.add(image,M)
cv2.imshow("Caveat Image_add",add)
cv2.waitKey(0)
cv2.destroyAllWindows()

Below is how the image looks like after adding pixels from the original image using openCV. You will notice that the image has become lighter comparatively.

M_1 = np.ones(image.shape,dtype = "uint8")*60
sub = cv2.subtract(image,M_1)
cv2.imshow("Caveat Image_sub",sub)
cv2.waitKey(0)
cv2.destroyAllWindows()

Below is how the image looks like after subtracting pixels from the original image using openCV. You will notice that the image has become darker comparatively.

Bitwise operations and its applications

Let’s implement the below bitwise operations in OpenCV and see how it can be used in real life examples. Before that let’s see what are the different bitwise operations:

  1. AND : A bitwise AND operation is possible if and only if both the pixels are greater than zero.
  2. OR : A bitwise OR operation is possible if either of the two pixels are greater than zero.
  3. XOR : A bitwise XOR operation is possible if and only if either of the two pixels are greater than zero, but not both.
  4. NOT : A bitwise NOT operation simply flips the “on” and “off” pixels in an image.

Let’s draw 2 shapes — rectangle and circle and implement the above bitwise operations on them.

rectangle = np.zeros((300,300),dtype="uint8")
cv2.rectangle(rectangle,(25,25),(275,275),255,-1)
cv2.imshow("Rectangle",rectangle)
cv2.waitKey(0)
cv2.destroyAllWindows()

O/P:

circle = np.zeros((300,300),dtype="uint8")
cv2.circle(circle,(150,150),150,255,-1)
cv2.imshow("circle",circle)
cv2.waitKey(0)
cv2.destroyAllWindows()

O/P:

Bitwise AND

Now let’s perform bitwise AND operation by using these 2 images:

bitwiseAND = cv2.bitwise_and(rectangle,circle)
cv2.imshow("AND",bitwiseAND)
cv2.waitKey(0)
cv2.destroyAllWindows()

O/P:

Let’s understand what happened in the above operation:

If you notice the corners of rectangle are masked in the image, the reason is the corners of the rectangle were bright and had values greater than zero, however that portion is not covered by the circle. and for an AND operation both the pixels have to be greater than zero which was not true in this case due to which the corners of the image were masked.

Bitwise OR

bitwiseOR = cv2.bitwise_or(rectangle,circle)
cv2.imshow("OR",bitwiseOR)
cv2.waitKey(0)
cv2.destroyAllWindows()

O/P:

If you notice the corners of rectangle are not masked in the image, the reason is the corners of the rectangle were bright and had values greater than zero, and for an OR operation either of the pixels have to be greater than zero which was true in this case due to which the corners of the image were not masked.

Bitwise XOR

bitwiseXOR = cv2.bitwise_xor(rectangle,circle)
cv2.imshow("XOR",bitwiseXOR)
cv2.waitKey(0)
cv2.destroyAllWindows()

O/P:

If you notice the above picture, you will notice that most of the area is masked.The reason being that XOR operation masks those portions which have both the pixels greater than zero. However if you notice the white/bright areas you will notice that only one of the 2 image pixels were bright due to which it was not masked.

The main applications of bitwise operations is Masking. Let’s see how we could implement that.

image = cv2.imread("/elephant.jpg")
cv2.imshow("Original Image",image)
print("Image shape: ",image.shape[:2])
cv2.waitKey(0)
cv2.destroyAllWindows()

Below is the original image:

Now I want to mask the entire elephant and just want to show it’s eyes and some area around it. We can do that as follows:

Masking using Rectangles

mask = np.zeros(image.shape[:2],dtype = "uint8")
(cX,cY) = (image.shape[1]//2,image.shape[0]//2)
new_rect=cv2.rectangle(mask,(cX-55,cY-75),(cX+175,cY+175),255,-1)
cv2.imshow("mask Image",new_rect)
cv2.waitKey(0)
cv2.destroyAllWindows()
masked = cv2.bitwise_and(image,image,mask=mask)
cv2.imshow("Final image masked rectangle",masked)
cv2.waitKey(0)
cv2.destroyAllWindows()

O/P:

Masking using Circles

mask = np.zeros(image.shape[:2],dtype="uint8")
circle_mask = cv2.circle(mask,(cX,cY),200,255,-1)
cv2.imshow("Final image masked",circle_mask)
cv2.waitKey(0)
cv2.destroyAllWindows()
masked = cv2.bitwise_and(image,image,mask=mask)
cv2.imshow("Final image masked circle",masked)
cv2.waitKey(0)
cv2.destroyAllWindows()

O/P:

Splitting and Merging

Now let’s see how we could split an image of RGB scale into individual scale images and again merge it back.

image = cv2.imread("/rgb.jpg")
cv2.imshow("Original Image",image)
print("Image shape: ",image.shape[:2])
cv2.waitKey(0)
cv2.destroyAllWindows()
(B,G,R) = cv2.split(image)

In the above piece of code, we are splitting the image into individual scales. We will visualize how the image looks like for each scale.

cv2.imshow("Red",R)
cv2.waitKey(0)
cv2.destroyAllWindows()

O/P:

If you notice the above output for the red channel, it’s completely bright/white. The reason is that the image has heavy shades of red in the lower right circle.

cv2.imshow("Blue",B)
cv2.waitKey(0)
cv2.destroyAllWindows()

O/P:

If you notice the above output for the blue channel, it’s completely bright/white. The reason is that the image has heavy shades of blue in the lower left circle.

cv2.imshow("Green",G)
cv2.waitKey(0)
cv2.destroyAllWindows()

O/P:

If you notice the above output for the green channel, it’s completely bright/white. The reason is that the image has heavy shades of green in the top circle.

Now that we have split all the color scale and viewed it individually, let’s merge it back and visualize it.

merged = cv2.merge([B,G,R])
cv2.imshow("Merged",merged)
cv2.waitKey(0)
cv2.destroyAllWindows()

O/P:

Alternate method of merging pixels

zeroes = np.zeros(image.shape[:2],dtype="uint8")
cv2.imshow("Red",cv2.merge([zeroes,zeroes,R]))
cv2.waitKey(0)
cv2.destroyAllWindows()

O/P:

cv2.imshow("Blue",cv2.merge([B,zeroes,zeroes]))
cv2.waitKey(0)
cv2.destroyAllWindows()

O/P:

cv2.imshow("Green",cv2.merge([zeroes,G,zeroes]))
cv2.waitKey(0)
cv2.destroyAllWindows()

O/P:

Other Color Scales

Now that we have learnt how to split and merge an image into RGB color scale, let’s try to understand what are the other color spaces apart from RGB that we might encounter in an image.

Hue-Saturation-value (HSV) — Hue, Saturation, and Value (HSV) is a color model that is often used in place of the RGB color model in graphics and paint programs. In using this color model, a color is specified then white or black is added to easily make color adjustments. HSV may also be called HSB (short for hue, saturation and brightness). This color space is more similar and closer to how human think and conceive color in an object.

L*a*b color space is more tuned to how humans perceive color.

Let’s implement these color spaces and see how the output looks like.

image = cv2.imread("/rgb.jpg")
cv2.imshow("Original Image",image)
cv2.waitKey(0)
cv2.destroyAllWindows()

O/P:

Gray Color space

gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
cv2.imshow("Gray Image",gray)
cv2.waitKey(0)
cv2.destroyAllWindows()

O/P:

HSV Color Space

hsv = cv2.cvtColor(image,cv2.COLOR_BGR2HSV)
cv2.imshow("HSV Image",hsv)
cv2.waitKey(0)
cv2.destroyAllWindows()

O/P:

LAB color space

lab = cv2.cvtColor(image,cv2.COLOR_BGR2LAB)
cv2.imshow("lab Image",lab)
cv2.waitKey(0)
cv2.destroyAllWindows()

O/P:

Parting thoughts !!

That’s it for this post. It’s been a long post, but I’m sure it won’t take a long time to read it through as it contains very basic stuff.

Stay tuned for more content !!

References:

  1. https://www.xrite.com/blog/lab-color-space
  2. https://www.pyimagesearch.com/
  3. https://www.kdnuggets.com/2016/08/seven-steps-understanding-computer-vision.html
  4. https://machinelearningmastery.com/what-is-computer-vision/
  5. https://www.pyimagesearch.com/2018/07/19/opencv-tutorial-a-guide-to-learn-opencv/

--

--