Friday, July 14, 2017

Can Robotic Vision Guide a Robot Down a Row of an Orchard?

Based on this color-near infrared image, is this robot driving down the middle of the orchard row? Can the robot determine how much and in what direction it must adjust its driving path? Image from the NNU Robotics Vision Lab. 


Orchard work is labor intensive and labor costs money. In order to keep costs down, agriculture, along with manufacturing, is trying to automate processes. In agriculture, automated systems means things like programming robots to drive down the rows between trees in an orchard to inspect the fruit or spray the trees. For robot to drive through an orchard without crashing into trees, it must first recognize trees in its robotic vision, determine its location based on that image, and then plan a driving path. I was given an opportunity to analyze an image recorded by the camera system of a robot built by NNU and see what I could come up with. Here's what I did, using ImageJ to analyze the image above (as told by the images generated in each step). With a pinch of luck, this will help robots see trees from the orchard (forest).


First, crop the image. I took about the center 1/3rd of the image and it doesn't seem my method doesn't cares exactly how much of the image is cropped, as long as it includes the tree trunks.  


Next, split apart the three color channels and retain just the near infrared. Notice that the tree trunks appear very dark compared to the leaves, grass, and even the sky. On  cloudy day,  the sky should appear even brighter, which makes the next process even easier. 
The image is then segmented by setting a threshold using the Otsu method. However, in this case, I selected to invert the image by isolating the high end of the histogram. I suspect one could invert the image first and then let the Otsu method segment the image as it determined is best. Segmenting an image means finding a good threshold value in to which to split the image into either black or white pixels.  














































After the image is segmented, its filtered to remove the more distant trees and grasses. The filtering that does this works by dividing the image into ten pixel groups and making every pixel in the group as bright as the brightest pixel in the group of ten. So it's called maximum filtering.  

Now the image is scaled. The scaling decreases the x-axis by a factor of 2 (scaling factor is 0.5) and the y-axis by a factor of four. So essentially, the image is being stretched vertically and shrunk horizontally. The image is shrunk in the x-axis to keep the image size from becoming too large. 


After scaling, the image is cropped to keep just the middle third. The stretching and cropping is repeated a second time.  
This is what the image looks like after the second round of scaling and cropping.


Now the image is made into a binary file. In other words, each pixel is just a 1 or a 0.
In this step, the image is skeletonized. That means the middle of each region is replaced with a line running through its middle. So notice that this process has turned the tree trunks visible in the near infrared image into a series of vertical black lines. 
Now the robot's vision system just needs to detect the black lines across the image. I think the reference line can be taken at any height across the image.
Final Thoughts
My feeling is that the vision system should count the x-axis location of a black line as it goes across the image. Since the lines are a single pixel wide, the location of each black line becomes a single number. The location of each black pixel must be taken in reference to the center of the image (in other words, the origin of the horizontal sampling line is the center of the image. Pixel locations left of the center of the image have negative values and pixel locations right of the center of the image have positive values. Now add the pixel values together. The sum indicates the center of the tree rows relative to the center of the image. If the sum is positive, then the robot needs to drive forward and right. If the sum of the pixel values is negative, the robot needs to drive forward and to the left. And of course, if the sum is zero, the robot just needs to drive forwards. The absolute value of the pixel sum indicates just how far off the center the robot is.




For higher accuracy and certainty, the robotic vision system might want to take measurements across the final skeletonized image in several rows.

No comments:

Post a Comment