Opencv image recognition. Writing a script to search for books in images using Python and OpenCV

Well basically you need to discover circles. Have you seen cvHoughCircles() ? Are you allowed to use this?

This page has good information on how to detect things using OpenCV. You may be interested in section 2.5.

This is a little demo I just wrote to detect the coins in this picture. Hope you can use some of the code to your advantage.

Entrance :

Exits :

// compiled with: g++ circles.cpp -o circles `pkg-config --cflags --libs opencv` #include #include #include #include int main(int argc, char** argv) ( IplImage* img = NULL; if ((img = cvLoadImage(argv))== 0) ( printf("cvLoadImage failed\n"); ) IplImage* gray = cvCreateImage( cvGetSize(img), IPL_DEPTH_8U, 1); CvMemStorage* storage = cvCreateMemStorage(0); cvCvtColor(img, gray, CV_BGR2GRAY); // This is done so as to prevent a lot of false circles from being detected cvSmooth(gray, gray , CV_GAUSSIAN, 7, 7); IplImage* canny = cvCreateImage(cvGetSize(img),IPL_DEPTH_8U,1); IplImage* rgbcanny = cvCreateImage(cvGetSize(img),IPL_DEPTH_8U,3); cvCanny(gray, canny, 50, 100, 3); CvSeq* circles = cvHoughCircles(gray, storage, CV_HOUGH_GRADIENT, 1, gray->height/3, 250, 100); cvCvtColor(canny, rgbcanny, CV_GRAY2BGR); for (size_t i = 0; i< circles->total; i++) ( // round the floats to an int float* p = (float*)cvGetSeqElem(circles, i); cv::Point center(cvRound(p), cvRound(p)); int radius = cvRound(p) ; // draw the circle center cvCircle(rgbcanny, center, 3, CV_RGB(0,255,0), -1, 8, 0); // draw the circle outline cvCircle(rgbcanny, center, radius+1, CV_RGB(0, 0.255), 2, 8, 0); printf("x: %d y: %d r: %d\n",center.x,center.y, radius); ) cvNamedWindow("circles", 1); cvShowImage("circles", rgbcanny); cvSaveImage("out.png", rgbcanny); cvWaitKey(0); return 0; )

Circle detection relies heavily on the parameters of cvHoughCircles() . Please note that I also used Canny in this demo.

I have to code the object detector (in in this case ball) using OpenCV. The problem is that every google search returns me something with FACE DETECTION. So I need help on where to start, what to use, etc.

Some information:

  • The ball does not have a fixed color, it will probably be white, but it can change.
  • I have to use machine learning, doesn't have to be complex and robust, KNN proposal (it's simpler and simpler).
  • After all my searching I found that calculating a histogram of samples just for the balls and training it with ML could be useful, but my main concern here is that the size of the ball can and will change (closer and farther from the camera) and I have no idea what pass the ML to classify for me, I mean.. I can't (or can I?) just check every pixel of the image for every possible size (from say 5x5 to WxH) and hope to find a positive result.
  • There may be an uneven background, such as people, cloth behind the ball, etc.
  • As I said, I need to use an ML algorithm, which means no Haar or Viola algorithms.
  • Also, I was thinking about using contours to find the circles in the Canny"ed image, just need to find a way to convert the contour to a data string for KNN training.

    So... suggestions?

    Thank you in advance. ;)

In this article, you will learn how to create a Python script to count the number of books in an image using OpenCV.

What do we do?

Let's take a look at the image in which we will look for books:

We can see that there are four books in the image, as well as distracting things such as a coffee mug, a Starbucks cup, some magnets, and a piece of candy.

Our goal is to find four books in an image without identifying any other object as a book.

What libraries will we need?

To write a system for searching and detecting books in images, we will use OpenCV for computer vision and image processing. We also need to install NumPy for OpenCV to work correctly. Make sure you have these libraries installed!

Finding books in images using Python and OpenCV

Note translation You may notice that the source code in our article is different from the original code. The author probably used installation of the necessary libraries through repositories. We suggest using pip, which is much easier. To avoid errors, we recommend using the version of the code given in our article.

Open your favorite code editor, create new file with the name find_books.py and let's start:

# -*- coding: utf-8 -*- # import the necessary packages import numpy as np import cv2 # load the image, change the color to grayscale and reduce the sharpness image = cv2.imread("example.jpg") gray = cv2. cvtColor(image, cv2.COLOR_BGR2GRAY) gray = cv2.GaussianBlur(gray, (3, 3), 0) cv2.imwrite("gray.jpg", gray)

Let's start by importing the OpenCV library. Loading an image from disk is handled by the cv2.imread function. Here we simply load it from disk and then convert the color gamut from RGB to grayscale.

We also blur the image slightly to reduce high-frequency noise and improve the accuracy of our application. After running the code, the image should look like this:

We loaded the image from disk, converted it to grayscale, and blurred it a bit.

Now let's define the edges (i.e. outlines) of objects in the image:

# edge detection edged = cv2.Canny(gray, 10, 250) cv2.imwrite("edged.jpg", edged)

Our image now looks like this:

We found the outlines of objects in the images. However, as you can see, some of the contours are not closed - there are gaps between the contours. To remove the gaps between the white pixels of the image, we will use the "close" operation:

# create and apply a closure kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (7, 7)) closed = cv2.morphologyEx(edged, cv2.MORPH_CLOSE, kernel) cv2.imwrite("closed.jpg", closed)

Now the spaces in the outlines are closed:

The next step is to actually detect the outlines of objects in the image. To do this we will use the cv2.findContours function:

# find the contours in the image and count the number of books cnts = cv2.findContours(closed.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) total = 0

Let's look at the geometry of the book.

The book is a rectangle. A rectangle has four vertices. Therefore, if we examine the outline and find that it has four vertices, then we can assume that it is a book and not another object in the image.

To check whether a path is a book or not, we need to loop through each path:

# loop through contours for c in cnts: # approximate (smooth) the contour peri = cv2.arcLength(c, True) approx = cv2.approxPolyDP(c, 0.02 * peri, True) # if the contour has 4 vertices, assume that it is book if len(approx) == 4: cv2.drawContours(image, , -1, (0, 255, 0), 4) total += 1

For each of the contours, we calculate the perimeter using cv2.arcLength and then approximate (smooth) the contour using cv2.approxPolyDP .

The reason we approximate the outline is that it may not be a perfect rectangle. Due to the noise and shadows in the photo, the likelihood that the book will have exactly 4 vertices is low. By approximating the contour, we solve this problem.

Finally, we check that the contour being approximated actually has four vertices. If so, we draw an outline around the book and then increment the total number of books counter.

Let's complete this example by showing the resulting image and the number of books found:

# show the resulting image print("I found (0) books in this picture." format(total) cv2.imwrite("output.jpg", image))

At this stage our image will look like this:

Let's sum it up

In this article, you learned how to find books in images using simple methods image processing and computer vision using Python and OpenCV.

Our approach was to:

  1. Load an image from disk and convert it to grayscale.
  2. Blur the image a little.
  3. Apply Canny edge detector to detect objects in the image.
  4. Close any gaps in the outlines.
  5. Find the outlines of objects in the image.
  6. Apply contour approximation to determine whether the contour was a rectangle and therefore a book.

You can download the script source code and image used in this article.

Library of computer vision and machine learning with open source source code. It includes more than 2,500 algorithms, which include both classical and modern algorithms for computer vision and machine learning. This library has interfaces in various languages, including Python (we use it in this article), Java, C++ and Matlab.

Installation

Installation instructions on Windows can be viewed, and on Linux -.

Importing and viewing an image

import cv2 image = cv2.imread("./path/to/image.extension") cv2.imshow("Image", image) cv2.waitKey(0) cv2.destroyAllWindows()

Note When reading using the above method, the image is in the color space not RGB (as everyone is used to), but BGR. Perhaps this is not so important at the beginning, but as soon as you start working with color, it is worth knowing about this feature. There are 2 solutions:

  1. Swap the 1st channel (R - red) with the 3rd channel (B - blue), and then the red color will be (0,0,255) and not (255,0,0).
  2. Change color space to RGB: rgb_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    And then in the code you no longer work with image , but with rgb_image .

Note To close the window in which the image is displayed, press any key. If you use the close window button, you may encounter freezes.

Throughout the article, the following code will be used to display images:

Import cv2 def viewImage(image, name_of_window): cv2.namedWindow(name_of_window, cv2.WINDOW_NORMAL) cv2.imshow(name_of_window, image) cv2.waitKey(0) cv2.destroyAllWindows()

Cropping

Doggie after framing

Import cv2 cropped = image viewImage(cropped, "Doggie after cropping")

Where image is image .

Change of size

After resizing by 20%

Import cv2 scale_percent = 20 # Percentage of original size width = int(img.shape * scale_percent / 100) height = int(img.shape * scale_percent / 100) dim = (width, height) resized = cv2.resize(img, dim , interpolation = cv2.INTER_AREA) viewImage(resized, "After resizing by 20%")

This function takes into account the aspect ratio of the original image. Other image resizing functions can be seen.

Turn

Doggie after turning 180 degrees

Import cv2 (h, w, d) = image.shape center = (w // 2, h // 2) M = cv2.getRotationMatrix2D(center, 180, 1.0) rotated = cv2.warpAffine(image, M, (w , h)) viewImage(rotated, "Doggie after rotating 180 degrees")

image.shape returns the height, width and channels. M - rotation matrix - rotates the image 180 degrees around the center. -ve is the angle of rotation of the image clockwise, and +ve, respectively, counterclockwise.

Conversion to grayscale and black and white by threshold

Doggie in grayscale

Black and white doggie

Import cv2 gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) ret, threshold_image = cv2.threshold(im, 127, 255, 0) viewImage(gray_image, "Doggie in grayscale") viewImage(threshold_image, "Black and white doggie ")

gray_image is a single-channel version of the image.

The threshold function returns an image in which all pixels that are darker (less than) 127 are replaced by 0, and all pixels that are brighter (greater than) 127 are replaced by 255.

For clarity, another example:

Ret, threshold = cv2.threshold(im, 150, 200, 10)

Here, everything that is darker than 150 is replaced by 10, and everything that is brighter is replaced by 200.

The remaining threshold functions are described.

Blur/Smooth

Blurred doggie

Import cv2 blurred = cv2.GaussianBlur(image, (51, 51), 0) viewImage(blurred, "Blurred Doggie")

The GaussianBlur function takes 3 parameters:

  1. Original image.
  2. A tuple of 2 positive odd numbers. The higher the numbers, the greater the smoothing power.
  3. sigmaX And sigmaY. If these parameters are left equal to 0, then their value will be calculated automatically.

Drawing rectangles

Draw a rectangle around the dog's face

Import cv2 output = image.copy() cv2.rectangle(output, (2600, 800), (4100, 2400), (0, 255, 255), 10) viewImage(output, "Draw a rectangle around the dog's face")

This function takes 5 parameters:

  1. The image itself.
  2. Upper left corner coordinate (x1, y1) .
  3. Coordinate of the lower right corner (x2, y2) .
  4. Rectangle color (GBR/RGB depending on the selected color model).
  5. The line thickness of the rectangle.

Drawing lines

2 doggies separated by a line

Import cv2 output = image.copy() cv2.line(output, (60, 20), (400, 200), (0, 0, 255), 5) viewImage(output, "2 doggies separated by a line")

The line function takes 5 parameters:

  1. The image itself on which the line is drawn.
  2. Coordinate of the first point (x1, y1) .
  3. Coordinate of the second point (x2, y2) .
  4. Line color (GBR/RGB depending on the selected color model).
  5. Line thickness.

Text on image

Image with text

Import cv2 output = image.copy() cv2.putText(output, "We<3 Dogs", (1500, 3600),cv2.FONT_HERSHEY_SIMPLEX, 15, (30, 105, 210), 40) viewImage(output, "Изображение с текстом")

The putText function takes 7 parameters:

  1. Direct image.
  2. Text for the image.
  3. The lower-left corner coordinate of the beginning of the text (x, y).
  4. Persons detected: 2

    Import cv2 image_path = "./path/to/photo.extension" face_cascade = cv2.CascadeClassifier("haarcascade_frontalface_default.xml") image = cv2.imread(image_path) gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) faces = face_cascade .detectMultiScale(gray, scaleFactor= 1.1, minNeighbors= 5, minSize=(10, 10)) faces_detected = "Faces detected: " + format(len(faces)) print(faces_detected) # Draw squares around faces for (x, y , w, h) in faces: cv2.rectangle(image, (x, y), (x+w, y+h), (255, 255, 0), 2) viewImage(image, faces_detected)

    detectMultiScale is a general function for both face and object recognition. In order for the function to search specifically for faces, we pass it the appropriate cascade.

    The detectMultiScale function takes 4 parameters:

    1. The processed image is in grayscale.
    2. scaleFactor parameter. Some faces may be larger than others because they are closer than others. This setting compensates for perspective.
    3. The recognition algorithm uses a sliding window during object recognition. The minNeighbors parameter determines the number of objects around the face. That is, the higher the value of this parameter, the more similar objects the algorithm needs in order for it to identify the current object as a face. A value that is too small will increase the number of false positives, while a value that is too large will make the algorithm more demanding.
    4. minSize is the direct size of these areas.

    Contours - object recognition

    Object recognition is performed using color image segmentation. There are two functions for this: cv2.findContours and cv2.drawContours.

    This paper details object detection using color segmentation. Everything you need for it is there.

    Saving an image

    import cv2 image = cv2.imread("./import/path.extension") cv2.imwrite("./export/path.extension", image)

    Conclusion

    OpenCV is an excellent library with lightweight algorithms that can be used in 3D rendering, advanced image and video editing, tracking and identifying objects and people in video, finding identical images from a set and much, much more.

    This library is very important for those developing projects related to machine learning in the field of images.

The most important sources of information about the outside world for a robot are its optical sensors and cameras. After receiving the image, it is necessary to process it to analyze the situation or make a decision. As I said earlier, computer vision combines many methods of working with images. When the robot operates, it is assumed that video information from cameras is processed by some program running on the controller. To avoid writing code from scratch, you can use ready-made software solutions. Currently, there are many ready-made computer vision libraries:

  • Matrox Imaging Library
  • Camellia Library
  • Open eVision
  • HALCON
  • libCVD
  • OpenCV
  • etc…
These SDKs can vary greatly in functionality, licensing terms, and programming languages ​​used. We will dwell in more detail on OpenCV. It is free for both educational purposes and commercial use. Written in optimized C/C++, supports C, C++, Python, Java interfaces and includes implementations of over 2500 algorithms. In addition to standard image processing functions (filtering, blurring, geometric transformations, etc...), this SDK allows you to solve more complex problems, which include detecting an object in a photograph and “recognizing” it. It should be understood that detection and recognition tasks can be completely different:
  • search and recognition of a specific object,
  • search for objects of the same category (without recognition),
  • only object recognition (ready-made image with it).
To detect features in an image and check for a match, OpenCV has the following methods:
  • Histogram of Oriented Gradients (HOG) - can be used for pedestrian detection
  • Viola-Jones algorithm - used to search for faces
  • SIFT (Scale Invariant Feature Transform) feature detection algorithm
  • SURF (Speeded Up Robust Features) feature detection algorithm
For example, SIFT detects sets of points that can be used to identify an object. In addition to the above methods, OpenCV also has other algorithms for detection and recognition, as well as a set of algorithms related to machine learning, such as the k-nearest neighbors method, neural networks, support vector machines, etc... In general, OpenCV provides tools sufficient for solving the vast majority of computer vision problems. If the algorithm is not included in the SDK, then, as a rule, it can be programmed without problems. In addition, there are many proprietary versions of algorithms written by users based on OpenCV. It should also be noted that OpenCV has expanded a lot in recent years and has become somewhat heavyweight. In this regard, different groups of enthusiasts are creating “lightweight” libraries based on OpenCV. Examples: SimpleCV, liuliu ccv, tinycv… Useful sites
  1. http://opencv.org/ - Main website of the project
  2. http://opencv.willowgarage.com/wiki/ - Old project website with documentation for old versions