Nuts and Bolts of Computer Vision using opencv — Part I

8 min readJan 8, 2020

I remember taking a computer graphics class in my bachelors and it was pretty cool however back then I didn’t know it will be such an important concept going forward in the field of computer vision. So, let’s see what are the basic concepts that will help us in our journey in computer vision.

Before getting into details, shout out to Adrian Rosebrock for his awesome blogs on computer vision. All the blogs that I am creating here in CV are referenced from his work.

NOTE: This post is like taking baby steps towards learning computer vision. So if you are know basic python, then just skim through the code without getting into the documentation :)

Let’s get started !!

Below are the basic operations that we will be doing while working on computer vision. I will try to explain how to do it and also tell what happens under the hood when we do these operations. Basically how python interprets these operations. So let’s get started. Wait !! I almost forgot to mention the different packages we need to install and import before we start.

Below are the packages that would come handy while carrying out our task:

Numpy ($ pip install numpy)
scipy ($ pip install scipy)
Matplotlib ($ pip install matplotlib)
Opencv ( $ pip install opencv-python)
mahotas ($ pip install mahotas)
scikit-learn ($ pip install scikit-learn)
scikit-image ($ pip install -U scikit-image)

Cool, now that we have installed all the required packages, let’s get started with our first basic task:

Loading the image

i. Load the input image and show its dimensions, keeping in mind that images are represented as a multi-dimensional NumPy array with shape

no. rows (height) x no. columns (width) x no. channels (depth)

import cv2
import imutils
image = cv2.imread('jp.png')
(h,w,d) = image.shape
print("width={},height={},depth={}".format(w,h,d))O/P:
width=600,height=322,depth=3

Few things to be noted here from the output. You can see that the width (No of Columns) is 600 pixels and height(No of rows) is 322 pixels and has 3 channels (RGB components of the image) which is represented as a Numpy array. So the shape of our image can be seen as (322,600,3)

ii. Display the image to our screen — we will need to click the window open by OpenCV and press a key on our keyboard to continue execution

cv2.imshow("Image",image)
cv2.waitKey(0)
cv2.destroyAllwindows()

iii. Saving the image

This will save the image to the specified path. Notice that we had loaded the image in the beginning in .png format and here we are writing back the same image in .jpg format.

The cv2.imwrite function takes 2 arguments — first argument is the path where the new image should be written and the second argument is the image that we had loaded previously.

cv2.imwrite('/new.jpg',image)O/P: True

Now that we have loaded the image and found out the dimensions of it in terms of pixel, let’s get into more detail about an image and its dimensions.

So what exactly is Pixel ?

Pixel also known as picture element is the smallest square element of an image. The resolution of an image depends on the number of pixels. They are the raw building blocks of an image.

Imagine an image as a normal grid, where each square grid in an image is a single pixel.

Suppose, let’s take an image with a resolution of 100 X 100. This means that our image is represented as a grid of pixels with 100 rows and 100 columns. So there are 10000 pixels in total for the image.

Pixels can be represented in 2 forms: Grayscale and Color.

In grayscale image, the pixel value would scale from 0 to 255 with 0 being black and 255 being white. While on the other hand the color pixels are usually represented in the RGB space having one value of Red component, one for Green and one for blue.

So, each of these 3 colors are represented by an integer in the range of 0 to 255. So basically to represent our color we combine these 3 into a tuple of (red, green, blue)

The way we will construct colors here is, for a white color, we would fill each of the RGB as (255,255,255) and for black we would fill it like (0,0,0).

To create a red color we would fill only the red part of the tuple : (255,0,0). Similarly for green it would be (0,255,0) and for blue (0,0,255).

Enough of the theory, let’s see how we could implement this in code:

import cv2
image = cv2.imread("/image.png")
cv2.imshow("Image",image)
cv2.waitKey(0)
cv2.destroyAllWindows()
image.shapeO/P: (228,350,3)

Now that we have loaded our image, let’s go ahead and access the pixel values. Before that, couple of things that needs to be remembered here:

OpenCV stores the images in the form of Numpy arrays which are conceptually matrix. In order to access them we just need to provide the X and Y values/coordinates.
Till now we have been discussing how a color scale is stored as a tuple of (red,green, blue), however we should note that OpenCV stores the RGB channel in reverse order which is (blue, green, red).

(b,g,r)= image[0,0]
print("pixels at (0,0)- Red:{}, Green:{}, Blue:{}".format(r,g,b))O/P : pixels at (0,0)- Red:254, Green:254, Blue:254

In the above code we are grabbing the pixel value located at (0,0) location of the image.

Now that we have seen the pixel values at (0,0), let’s modify the pixel values a bit.

image[0,0] = (0,0,255)
(b,g,r) = image[0,0]
print("pixels at (0,0)- Red:{}, Green:{}, Blue:{}".format(r,g,b))O/P: pixels at (0,0)- Red:255, Green:0, Blue:0

Let’s play around a bit more with these pixels:

corner = image[0:100,0:100]
cv2.imshow("Corner",corner)
cv2.waitKey(0)
cv2.destroyAllWindows()

In the above code we are grabbing a 100 X 100 pixel region of the image. To do this we need the following:

The starting coordinate of y (height) where we need to start our array slice. In our case the slice starts at y = 0.
The ending y-coordinate, in our case the slice ends at y = 100.
The 3 coordinate is the starting of X (width), in our case the slice starts at x = 0.
The ending x-coordinate, in our case the slice ends at x = 100.

image[0:100,0:100] = (0,255,0)
cv2.imshow("Updated",image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Now in the above code we are again accessing the same region of the image and then setting that particular region to green (0,255,0)

Some more basics

Let’s try to draw some shapes and play around a bit with the concepts to get familiar with this.

import numpy as np
import cv2
canvas = np.zeros((300,300,3),dtype = "uint8")

In the above code we are creating a Numpy array with 300 rows and 300 columns, basically a 300 X 300 pixel image with 3 channels — RGB. As per the function above we are filling the numpy array with all zeros using np.zeros.

The purpose of the above piece of code is to create a canvas space with 300 X 300 pixels/ dimension with 3 channel, so that we could create some shapes in that canvas.

Drawing a line

green = (0,255,0)
cv2.line(canvas,(0,0),(300,300),green)
cv2.imshow("Canvas",canvas)
red = (0,0,255)
cv2.line(canvas,(300,0),(0,300),red,3)
cv2.imshow("Canvas",canvas)
cv2.waitKey(0)
cv2.destroyAllWindows()

In the above code in the first line, we are creating a tuple for green color. Then we are drawing a green line using cv2.line inside the canvas image with the line starting from (0,0) and ending at (300,300). The 3rd argument we pass the tuple for the green color as we are drawing the line in green.

We do the same thing in the subsequent codes as well to draw a line in red color. The only thing that is different is the 4 argument which has value 3. The 4th argument is basically for the thickness which is 3 in our case. Below is how the output looks like:

Drawing a rectangle

cv2.rectangle(canvas,(10,10),(60,60),green)
cv2.imshow("Canvas",canvas)
cv2.rectangle(canvas,(50,200),(200,225),red,5)
cv2.imshow("Canvas",canvas)
blue = (255,0,0)
cv2.rectangle(canvas,(200,50),(225,125),blue,-1)
cv2.imshow("Canvas",canvas)cv2.waitKey(0)
cv2.destroyAllWindows()

In the above code we are creating a rectangle using cv2.rectangle. The signature of this method is same as cv2.line. Below is how the output looks like:

Drawing Circle

canvas = np.zeros((300,300,3),dtype="uint8")
(centerX,centerY)=(canvas.shape[1] //2, canvas.shape[0]//2)
white = (255,255,255)for r in range(0,175,25):
    cv2.circle(canvas,(centerX,centerY),r,white)
cv2.imshow("Canvas",canvas)
cv2.waitKey(0)
cv2.destroyAllWindows()

Let’s try understanding how we have implemented the above code to create number of circles.

In the first line we are creating a Numpy array with 300 rows and 300 columns, basically a 300 X 300 pixel image with 3 channels — RGB. As per the function above we are filling the numpy array with all zeros using np.zeros. Once that is done we are finding the center of the circle in the 2nd line with find the coordinates of the center where centerX is the width corresponding to the 2nd element of the canvas array and centerY is the height corresponding to the 1st element of the canvas array.

Then we are looping through the coordinates starting from 0 to 150 with an increment of the radius r =25 after each iteration. And then we are drawing the circle after each iteration using the cv2.circle method. The parameters for this method is the canvas image, the center coordinates which is (centerX, centerY), the radius r and the color of the circle (white in this case). Below is how the output looks like:

for i in range(0,5):
    radius = np.random.randint(5,high = 200)
    color = np.random.randint(0,high = 256,size=(3,)).tolist()
    pt = np.random.randint(0,high=300,size=(2,))
    cv2.circle(canvas,tuple(pt),radius,color,-1)
cv2.imshow("Canvas",canvas)
cv2.waitKey(0)
cv2.destroyAllWindows()

In the above code few things we need to understand here. In the first line we are looping to create 5 random circles.

So, in order to create a circle we need the radius, the color of the circle and the (x,y) coordinate of where the circle is to be drawn. In the 2nd line of the code we are generating radius values within the range of (5,200) using np.random.randint.

In the 3rd line we are randomly generating the color of the circle. As we are aware that the color of the RGB scale should have 3 values of numpy array in the range of (0,255), we are using the size = (3,) to generate 3 random integers instead of one.

Finally, we generate the random point (pt) which is the (x,y) coordinate of where the circle will be drawn. The range of that will be (0,300) with size =(2,) as we are generating 2 random integers.

We use the cv2.circle method to draw the circle and the arguments for this circle are canvas image which we drew before, the pt (x,y), radius, color, -1 (as we are drawing solid circles). Below is the output circles.