Java – detect objects (words) in images
I want to realize object detection in license plate (city name) I have an image:
I want to detect whether the image contains words“ بابل”:
I have tried to use template matching method, opencv and MATLAB, but when tested with other images, the results are very poor
I also read this page, but I can't understand what to do well
Can anyone help me or solve this problem step by step? I have a project to identify license plates. We can identify and detect numbers, but I need to detect and recognize these words (the same as more cars)
Solution
Your question is very extensive, but I will try my best to explain optical character recognition (OCR) in the program context, and provide you with a general project workflow, followed by a successful OCR algorithm
The problem you face is easier than most people, because you don't have to identify / distinguish different roles, you only need to identify one image (assuming this is the only city you want to identify) However, you are subject to many limitations of any image recognition algorithm (quality, illumination, image change)
What you need to do
1) Image isolation
You must isolate your image from the noisy background:
I think the best isolation technology is to isolate the license plate first, and then isolate the specific characters you want to find Important things to keep in mind in this step:
Do the license plates always appear in the same place of the car? > Is the car always in the same position when the image is taken? Is the word you are looking for always in the same place on the license plate?
The difficulty / execution of this task depends largely on the answers to these three questions
2) Image capture / preprocessing
This is a very important step in your specific implementation Although possible, it is highly unlikely that your image will look like this:
Because your camera must be directly in front of the license plate More likely, your image might look like this:
Depends on the perspective of the captured image Ideally, all your images will be taken from the same vantage point, and you only need to apply one transform to make them look similar (or not at all) If you take photos from different vantage points, you need to manipulate them, otherwise you will compare two different images Also, especially if you take an image from only one vantage point and decide not to convert, make sure that the text your algorithm is looking for is converted from the same viewpoint If you don't, you will have a poor success rate, which is difficult to debug / figure out
3) Image optimization
You may want to (a) convert your image to black and white and (b) reduce image noise These two processes are called binarization and de - dispersion, respectively There are many implementations of these algorithms that can be used in many different languages, most of which can be accessed through Google search You can batch process your images in any language / free tool if you want, or find an implementation suitable for any language you decide to work in
4) Pattern recognition
If you only want to search the name of the city (only one word), you are likely to implement a matrix matching strategy Many people also call matrix matching pattern recognition, so you may have heard it in this context This is a detailed algorithm implementation of Excel paper, which should help you very much. You should choose to use matrix matching Other available algorithms are feature extraction, which attempts to recognize words based on patterns within letters (i.e. loops, curves, lines) If the font style of words on the license changes, you may use this method, but if you always use the same font, I think matrix matching will get the best effect
5) Algorithm training
Depending on the method you take (if you use a learning algorithm), you may need to train the algorithm with labeled data This means that you have a series of images recognized as true (including city name) or false (no) This is such an effective pseudo code example:
train = [(img1,True),(img2,(img3,False),(img4,False)] img_recognizer = algorithm(train)
Then, you apply trained algorithms to identify unlabeled images
test_untagged = [img5,img6,img7] for image in test_untagged: img_recognizer(image)
Your training set should be greater than four data points; Generally speaking, the bigger the better As mentioned earlier, make sure that all images are the same conversion
This is a very, very high-level code flow that may help you implement your algorithm:
img_in = capture_image() cropped_img = isolate(img_in) scaled_img = normalize_scale(cropped_img) img_desp = despeckle(scaled_img) img_final = binarize(img_desp) #train match() = train_match(training_set) boolCity = match(img_final)
The above process has been performed many times and has been documented in detail in many languages Here are some implementations of the markup language in your question
>Pure Java > cvblob in opencv (check this tutorial and this blog post) > Tesseract OCR in C > matlab OCR
Good luck!