This website is made possible by displaying online advertisements to our visitors.
Please consider supporting us by disabling your ad blocker.

Learn OpenCV by Building a Document Scanner

June 13 2021 Yacine Rouizi
OpenCV Computer Vision
Learn OpenCV by Building a Document Scanner

OpenCV is a library written in C++ aimed to provide an infrastructure for computer vision and machine learning. The library contains more than 2500 algorithms that are used for facial detection, gesture recognition, augmented reality, track moving objects, identify objects, etc.

In this tutorial we will create a simple document scanner using the OpenCV library. This can be useful, for example, for scanning pages in a book.

This is a beginner tutorial so I will explain in details each line of code so that you can follow along with me.

The steps that we need to follow to build this project are:

  1. Convert the image to grayscale
  2. Find the edges in the image
  3. Use the edges to find all the contours
  4. Select only the contours of the document
  5. Apply warp perspective to get the top-down view of the document

Setup

So let's get started. Open a new terminal and create a directory and the necessary packages:

mkdir document-scanner
cd document-scanner

python3 -m venv venv
source venv/bin/activate
pip install imutils
pip install scipy # needed for the imutils package
pip install opencv-python

Let's check the version of OpenCV by running the command below:

$ python3
>>> import cv2
>>> print(cv2.__version__)
4.5.2

Ok great! We are now ready to start writing some code.

Load the Image

Create a new file inside the document-scanner directory, name it scanner.py and put the following code:

from imutils.perspective import four_point_transform
import cv2

height = 800
width = 600
green = (0, 255, 0)

image = cv2.imread("input/2.jpg")
image = cv2.resize(image, (width, height))
orig_image = image.copy()

We start by importing the OpenCV library and the four_point_transform helper function from the imutils package.

This function will help us perform a 4 point perspective transform to obtain the top-down view of the document.

Next, we set the height and width of the image so that we can resize it, and we also create the green variable for the contour display later on.

To load an image with OpenCV we use the imread() function, it takes the path of the image as argument.

Note that this function don't throw an error if the image path is wrong, it will simply return None.

To resize the image, we use the resize() function for that. The first argument is the image we want to resize, and the second is the width and height for the new image.

The function has a third argument which defines the algorithm used for the resizing (the default one is cv.INTER_LINEAR). Check the documentation for the other options.

Lastly, we take a copy of our image. This will allow us later to display the contours of the document on the original image rather than the modified image.

Image Processing

Now we start preprocessing our image by converting it to grayscale, blur it, and then find the edges in the image. Let's see how to do it:

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # convert the image to gray scale
blur = cv2.GaussianBlur(gray, (5, 5), 0) # Add Gaussian blur
edged = cv2.Canny(blur, 75, 200) # Apply the Canny algorithm to find the edges

# Show the image and the edges
cv2.imshow('Original image:', image)
cv2.imshow('Edged:', edged)
cv2.waitKey(0)
cv2.destroyAllWindows()

Now that our image is loaded we start by converting it from the RGB color to grayscale.

Next, to remove noise from the image, we smooth it by using the GaussianBlur function. The first argument is the image we want to blur. The second argument is the width and height of the kernel which must be positive and odd.

The last argument is the standard deviation. If we set it to 0, OpenCV calculate it from the kernel size.

Lastly, we apply the so-know Canny edge detector. This is a multi-stage algorithm that is used to remove noise and detect edges in the image.

The first argument is our input image. The second and third argument are the thresholds that the algorithm uses to determine the edges and non-edges in the image.

We used the imshow function to display our images in a window.

The waitKey(delay) function will wait for a pressed key for delay milliseconds if delay is positive. Otherwise, it will wait infinitely for a pressed key.

The destroyAllWindows() function simply destroys all the windows we created.

Below you can see the output that we get (you can find the image in the repository):

image and edged image

Below you can see different transformations of an image:

Original image:

original-image

Convert the RGB color to grayscale:

image-processing-grayscale

Blurring the image using GaussianBlur function with a (9, 9) kernel size:

image-processing-blur

Applying Canny edge detector:

image-processing-edged

Use the Edges to Find all the Contours

Now we can use our edged image to find all the contours.

# If you are using OpenCV v3, v4-pre, or v4-alpha
# cv2.findContours returns a tuple with 3 element instead of 2
# where the `contours` is the second one
# In the version OpenCV v2.4, v4-beta, and v4-official
# the function returns a tuple with 2 element 
contours, _ = cv2.findContours(edged, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
contours = sorted(contours, key=cv2.contourArea, reverse=True)

# Show the image and all the contours
cv2.imshow("Image", image)
cv2.drawContours(image, contours, -1, green, 3)
cv2.imshow("All contours", image)
cv2.waitKey(0)
cv2.destroyAllWindows()

 

To find the contours on the image we apply the cv2.findContours function. This function takes three argument, the first one is the source image.

The second parameter is the contour retrieval mode. Here we are using cv.RETR_LIST to retrieve all the contours. Please refer to the documentation for the other options.

The last argument represents the contour approximation method. For example, if we set it to cv.CHAIN_APPROX_NONE, the function will store all the (x, y) coordinates of a contour. But do we really need that?

For example, for a rectangle contour, we only need 4 points.

That's why we used the cv.CHAIN_APPROX_SIMPLE. This will allow us to save memory by keeping only the important points.

Note that since opencv 3.2 this function does not change the source image.

The drawContours function allow us to draw contours on an image. The first argument is the source image, then we need to pass it the contours that we want to draw.

The third argument is to indicate which contour we want to draw, a negative value means draw all the contours.

The fourth parameter is the color of the contour and the fifth one is the thickness.

Let's see what we get so far:

contours

Cool! Let's keep going.

Select Only the Edges of the Document

Now we need to find the biggest rectangle contour in the image that will define our document. Here is how to do it:

# go through each contour
for contour in contours:
    # we approximate the contour
    peri = cv2.arcLength(contour, True)
    approx = cv2.approxPolyDP(contour, 0.05 * peri, True)
    # if we found a countour with 4 points we break the for loop
    # (we can assume that we have found our document)
    if len(approx) == 4:
        doc_cnts = approx
        break

Here we use the arcLength function to compute the perimeter of the contour. It takes as first argument the contour, and the second argument is just a boolean to tell the function whether the contour is closed or not. True means that the contour is closed.

Then we use the approxPolyDP function to get the approximation of the contour with another contour with fewer vertices.

This function takes 3 argument: the first one is the contour we want to approximate, the second argument is to specify the approximation accuracy. In our case, we are approximating the contour with an accuracy that is proportional to the contour perimeter (0.05 * perimeter).

The last argument is a boolean to specify whether the approximated contour is closed or not.

Finally, we check if the approximated contour has four point. If so, we can assume with confidence that we have found our document (we break the for loop).

You can start to see the limits of this algorithm ...

For example, If we use a document that is not a rectangle, our technique wouldn't work sad.

Apply Warp Perspective to Get the Top-Down View of the Document

Now we are ready to apply the four_point_transform function to get the top-down view:

# We draw the contours on the original image not the modified one
cv2.drawContours(orig_image, [doc_cnts], -1, green, 3)
cv2.imshow("Contours of the document", orig_image)
# apply warp perspective to get the top-down view
warped = four_point_transform(orig_image, doc_cnts.reshape(4, 2))
# convert the warped image to grayscale
warped = cv2.cvtColor(warped, cv2.COLOR_BGR2GRAY)
cv2.imshow("Scanned", cv2.resize(warped, (600, 800)))
cv2.waitKey(0)
cv2.destroyAllWindows()

Basically, the four_point_transform function takes an image and a contour as input and returns the top-down view of the image.

If you want to learn more about this function, I recommend this great tutorial from pyimagesearch.

Here is what we get:

top down view

Bonus

As a gift for you, I built a simple program that will loop over a directory (named input), find images in that directory, and apply the technique we saw in this tutorial to get the top-down view of each document and put it in a new directory (named output).

from pathlib import Path
import os

# ...

valid_formats = [".jpg", ".jpeg", ".png"]
get_text = lambda f: os.path.splitext(f)[1].lower()

img_files = ['input/' + f for f in os.listdir('input') if get_text(f) in valid_formats]
# create a new folder that will contain our images
Path("output").mkdir(exist_ok=True)

# go through each image file
for img_file in img_files:
    # read, resize, and make a copy of the image
    img = cv2.imread(img_file)
    img = cv2.resize(img, (width, height))
    orig_img = img.copy()

    # preprocess the image
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    blur = cv2.GaussianBlur(gray, (5, 5), 0)
    edged = cv2.Canny(img, 75, 200)

    # find and sort the contours
    contours, _ = cv2.findContours(edged, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
    contours = sorted(contours, key=cv2.contourArea, reverse=True)
    # go through each contour
    for contour in contours:
        # approximate each contour
        peri = cv2.arcLength(contour, True)
        approx = cv2.approxPolyDP(contour, 0.05 * peri, True)
        # check if we have found our document
        if len(approx) == 4:
            doc_cnts = approx
            break
   
    # apply warp perspective to get the top-down view
    warped = four_point_transform(orig_img, doc_cnts.reshape(4, 2))
    warped = cv2.cvtColor(warped, cv2.COLOR_BGR2GRAY)
    final_img = cv2.resize(warped, (600, 800))
    
    # write the image in the ouput directory
    cv2.imwrite("output" + "/" + os.path.basename(img_file), final_img)

Summary

In this tutorial we learned how to build a simple document scanner with OpenCV. Of course the algorithm has its limitations but I tried to make this tutorial as simple as possible so that you don't feel overwhelmed.

For example, you can see that the quality of the scanned image is a bit poor. That's because we lose too much information when we resize the image.

Indeed, try to keep the original size of the image and you will get a better result.

Also, you can apply adaptive thresholding at the final step to get a 'black and white' scanned image.

I got a lot of inspiration by following the tutorial How to Build a Kick-Ass Mobile Document Scanner in Just 5 Minutes. Great article! Thanks to Adrian.

As always, you can get the final code on GitHub at: https://github.com/Rouizi/learn-opencv-by-building-a-document-scanner

Support DontRepeatYourSelf

If you appreciate what I am doing here, or if it helped you solve your issues please consider buying me a coffee (or 2) as a token of appreciation. It will mean a lot to me and it will really make a difference.

Thank you for your support.

Buy Me a Coffee at ko-fi.com

Previous Article
Django Pagination With Class Based View

Django Pagination With Class Based View

Next Article
How to Use Forms in Django

How to Use Forms in Django

Join the mailing list to be notified about new posts and updates.

Leave a comment

(Your email address will not be published)