Robot vision review Martin Jagersand
What is Computer Vision? Computer Graphics Three Related fields Image Processing: Changes 2D images into other 2D images Computer Graphics: Takes 3D models, renders 2D images Computer vision: Extracts scene information from 2D images and video e.g. Geometry, Where something is in 3D, Objects What something is What information is in a 2D image? What information do we need for 3D analysis? Computer Vision CV hard. Only special cases can be solved. Image Processing Computer Vision 90 horizon
Machine Vision 3D Camera vision in general environments hard Machine vision: Use engineered environment Use 2D when possible Special markers/led Can buy working system! Photogrammetry: Outdoors 3D surveying using cameras
Vision Full: Human vision We don t know how it works in detail Limited vision: Machines, Robots, AI What is the most basic useful information we can get from a camera? 1. Location of a dot (LED/Marker) [u,v] = f(i) 2. Segmentation of object pixels All of these are 2D image plane measurements! What is the best camera location? Usually overhead pointing straight down Adjust cam position so pixel [u,v] = s[x,y]. s Pixel coordinates are scaled world coord X u v Y
Tracking LED special markers v Put camera overhead pointing straight down on worktable. Adjust cam position so pixel [u,v] = s[x,y]. s Pixel coordinates are scaled world coord Lower brightness so LED brighterest Put LED on robot end-effector effector Detection algorithm: Threshold brightest pixels I(u,v)>200 Find centroid [u,v] of max pixels Variations: Blinking LED can enhance detection in ambient light. Different color LED s can be detected separately from R,G,B color video. X u Y Camera LED
Commercial tracking systems Polaris Vicra infra-red system (Northern Digitial Inc.) MicronTracker visible light system (Claron Technology Inc.)
Commercial tracking system Images acquired by the Polaris Vicra infra-red stereo system: left image right image
IMAGE SEGMENTATION How many objects are there in the image below? Assuming the answer is 4, what exactly defines an object? Zoom In 3/29/2016 Introduction to Machine Vision
8 BIT GRAYSCALE IMAGE 200 150 100 50 0 3/29/2016 Introduction to Machine Vision
GRAY LEVEL THRESHOLDING Objects Set threshold here 3/29/2016 Introduction to Machine Vision
BINARY IMAGE 3/29/2016 Introduction to Machine Vision
CONNECTED COMPONENT LABELING: FIRST PASS A A A A A EQUIVALENCE: B=C B B C C B B B C C B B B B B B 3/29/2016 Introduction to Machine Vision
CONNECTED COMPONENT LABELING: SECOND PASS A A A A A TWO OBJECTS! B B CB CB B B B CB CB B B B B B B 3/29/2016 Introduction to Machine Vision
IMAGE SEGMENTATION CONNECTED COMPONENT LABELING 4 Neighbor Connectivity 8 Neighbor Connectivity P i,j P i,j 3/29/2016 Introduction to Machine Vision
What are some examples of form parameters that would be useful in identifying the objects in the image below? 3/29/2016 Introduction to Machine Vision
OBJECT RECOGNITION BLOB ANALYSIS Examples of form parameters that are invariant with respect to position, orientation, and scale: Number of holes in the object Compactness or Complexity: (Perimeter) 2 /Area Moment invariants All of these parameters can be evaluated during contour following. 3/29/2016 Introduction to Machine Vision
GENERALIZED MOMENTS Shape features or form parameters provide a high level description of objects or regions in an image For a digital image of size n by m pixels : M ij = n m x= 1 y= 1 x i y j f ( x, y) For binary images the function f(x,y) takes a value of 1 for pixels belonging to class object and 0 for class background. 3/29/2016 Introduction to Machine Vision
GENERALIZED MOMENTS M ij = n m x= 1 y= 1 x i y j f ( x, y) 0 1 2 3 4 5 6 7 8 9 1 2 3 X i j M ij 0 0 1 0 0 1 7 Area 33 20 4 5 6 Y 2 0 0 2 1 1 159 Moment 64 of Inertia 93 3/29/2016 Introduction to Machine Vision
SOME USEFUL MOMENTS The center of mass of a region can be defined in terms of generalized moments as follows: X = M 10 Y = M 01 M 00 M 00 3/29/2016 Introduction to Machine Vision
SOME USEFUL MOMENTS The moments of inertia relative to the center of mass can be determined by applying the general form of the parallel axis theorem: M 02 = M 02 M M 2 01 00 M 20 = M 20 M M 2 10 00 M 11 = M 11 M 10 M M 00 01 3/29/2016 Introduction to Machine Vision
SOME USEFUL MOMENTS The principal axis of an object is the axis passing through the center of mass which yields the minimum moment of inertia. This axis forms an angle θ with respect to the X axis. The principal axis is useful in robotics for determining the orientation of randomly placed objects. TAN 2 θ = 2M 11 M 20 M 02 3/29/2016 Introduction to Machine Vision
Example X Principal Axis Center of Mass Y 3/29/2016 Introduction to Machine Vision
3D MACHINE VISION SYSTEM XY Table Laser Projector Digital Camera Field of View Plane of Laser Light P(x,y,z) Granite Surface Plate 3/29/2016 Introduction to Machine Vision
3D MACHINE VISION SYSTEM 3/29/2016 Introduction to Machine Vision
3D MACHINE VISION SYSTEM 3/29/2016 Introduction to Machine Vision
Morphological processing Erosion 26
Morphological processing Erosion 27
Dilation 28
Opening erosion followed by dilation, denoted Ao B = ( A B) B eliminates protrusions breaks necks smoothes contour 29
Opening 30
Opening 31
Opening A o B = ( A B) B Ao B = {( B) ( B) A} z z 32
Closing dilation followed by erosion, denoted A B = ( A B) B smooth contour fuse narrow breaks and long thin gulfs eliminate small holes fill gaps in the contour 33
Closing A B = ( A B) B 34
Image Processing Filtering: Noise suppresion Edge detection
Image filtering 0 0 0 0 0 0 0 0 0 0 0 0 0 0 200 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 Mean filter 0 0 0 0 0 0 0.11 0.11 0.11 0 * = 0 0.11 0.11 0.11 0 0 0.11 0.11 0.11 0 0 0 0 0 0 0 0 0 0 0 0 0 22 22 22 0 0 0 22 22 22 0 0 0 22 33 33 11 0 0 0 11 11 11 0 0 0 11 11 11 0 Input image f Filter h Output image g
Animation of Convolution p1,1 p1,2 p1,3 p1,4 p1,5 p1,6 p2,1 p2,2 p2,3 p2,4 p2,5 p2,6 To accomplish convolution of the whole image, we just Slide the mask p3,1 p3,2 p3,3 p3,4 p3,5 p 3,6 p4,1 p4,2 p4,3 p4,4 p4,5 p4,6 p5,1 p5,2 p5,3 p5,4 p5,5 p5,6 p6,1 p6,2 p6,3 p6,4 p6,5 p6,6 Original Image m1,1m1,2m1,3 m2,1m2,2m2,3 m3,1m3,2m3,3 Mask C 4,2 C 4,3 C 4,4 Image after convolution
Gaussian filter Compute empirically * = Input image f Filter h Output image g
Convolution for Noise Elimination --Noise Elimination The noise is eliminated but the operation causes loss of sharp edge definition. In other words, the image becomes blurred
Convolution with a non-linear mask There are many masks used in Noise Elimination Median Mask is a typical one Median filtering The principle of Median Mask is to mask some sub-image, use the median of the values of the sub-image as its value in new image I=1 2 3 J=1 2 3 23 65 64 120 187 90 47 209 72 Rank: 23, 47, 64, 65, 72, 90, 120, 187, 209 median Masked Original Image
Median Filtering
Edge detection using the Sobel operator In practice, it is common to use: -1 0 1-2 0 2-1 0 1-1 -2-1 0 0 0 1 2 1 Magnitude: Orientation:
Sobel operator Original Magnitude Orientation
Effects of noise on edge detection / derivatives Consider a single row or column of the image Plotting intensity as a function of position gives a signal Where is the edge?
Solution: smooth first Where is the edge? Look for peaks in
Vision-Based Control (Visual Servoing) Initial Image User Desired Image
Vision-Based Control (Visual Servoing) : Current Image Features : Desired Image Features
Vision-Based Control (Visual Servoing) : Current Image Features : Desired Image Features
Vision-Based Control (Visual Servoing) : Current Image Features : Desired Image Features
Vision-Based Control (Visual Servoing) : Current Image Features : Desired Image Features
Vision-Based Control (Visual Servoing) : Current Image Features : Desired Image Features
u,v Image-Space Error v u : Current Image Features : Desired Image Features E =[ à ] One point E =[y 0 à y ã ] Pixel coord ô õ E = à E = y u y v Many points y 1. y 16 ã ô y õ ã u y v à y 1. y 16 0
Other (easier) solution: Image-based motion control Note: What follows will work for one or two (or 3..n) cameras. Either fixed (eye-in-hand) or on the robot. Here we will use two cam Motor-Visual function: y=f(x) Achieving 3d tasks via 2d image control
Vision-Based Control Image 1 Image 2
Vision-Based Control Image 1 Image 2
Vision-Based Control Image 1 Image 2
Vision-Based Control Image 1 Image 2
Vision-Based Control Image 1 Image 2
Image-based Visual Servoing Observed features: Motor joint angles: Local linear model: y = [ y 1 y 2... y m ] T x = [ x 1 x 2... x n ] T Δy = JΔx Visual servoing steps: 1 Solve: 2 Update: y ã à y k = JΔx x k+1 = x k + Δx
Find J Method 1: Test movements along basis Remember: J is unknown m by n matrix J = f 1 f x 1 ááá 1 x.. n... f m x 1 Assume movements Finite difference: Jt.. Δy. 1. f m x n.. Δy 2.. Δx 1 =[1, 0,..., 0] T Δx 2 =[0,. 1,..., 0] T. Δx n =[0, 0,..., 1] T ááá.. Δy n..
Find J Method 2: Secant Constraints Constraint along a line: Defines m equations Δy = JΔx Collect n arbitrary, but different measures y Solve for J ááá Δy T â 1 ááá ã ááá Δy T 2 ááá. â. = ã ááá Δy T n ááá ááá Δx T â 1 ááá ã ááá Δx T 2 ááá. â. ã ááá Δx T n ááá JT
Find J Method 3: Recursive Secant Constraints Based on initial J and one measure pair Adjust J s.t. Rank 1 update: Δy = J k+1 Δx Δy, Δx J ê k+1 = J ê (Δy à J k + ê kδx)δx T Δx T Δx Consider rotated coordinates: Update same as finite difference for n orthogonal moves
Achieving visual goals: Uncalibrated Visual Servoing 1. Solve for motion: 2. Move robot joints: 3. Read actual visual move 4. Update Jacobian: [y à y k ]=JΔx x k+1 = x k + Δx Δy J ê k+1 = J ê (Δy à J k + ê kδx)δx T Can we always guarantee when a task is achieved/achievable? Δx T Δx
Visually guided motion control Issues: 1. What tasks can be performed? Camera models, geometry, visual encodings 2. How to do vision guided movement? H-E E transform estimation, feedback, feedforward motion control 3. How to plan, decompose and perform whole tasks?
How to specify a visual task?
Task and Image Specifications Task function T(x) = 0 Image encoding E(y) =0 task space error task space satisfied image space error image space satisfied
Visual specifications Point to Point task error : E =[y 2 à y 0 ] y 0 y E = y. 1. y 16 à Why 16 elements? y. 1. y 16 0 y y 0
Visual specifications 2 Point to Line Line: ô E pl (y, l) = y l á l l y r á l r ô l l = y õ 3 â y 1 y 4 â y 2 õ ô y l = y õ 5 y 6 y 3 y 4 Note: y homogeneous coord. y 1 y 2
Parallel Composition example E (y) = wrench y 2 - y5 y 4 - y7 y (y y ) 6 1 2 y (y y ) 8 3 4 (plus e.p. checks)
Visual Specifications Additional examples: Line to line Point to conic Identify conic C from 5 pts on rim Error function ycy Image moments on segmented images Any image feature vector that encodes pose. Etc.
Achieving visual goals: Uncalibrated Visual Servoing 1. Solve for motion: 2. Move robot joints: 3. Read actual visual move 4. Update Jacobian: [y à y k ]=JΔx x k+1 = x k + Δx Δy J ê k+1 = J ê (Δy à J k + ê kδx)δx T Can we always guarantee when a task is achieved/achievable? Δx T Δx