Machine vision is concerned with the sensing of vision data and its interpretation by a computer. The typical vision system consists of the camera and digitizing hardware, a digital computer, and hardware and software necessary to interface them.
This interface hardware and software is often referred to as a preprocessor.
The operation of the vision system consists of three functions:
- Sensing and digitizing image data
- Image processing and analysis
The relationships between the three functions are illustrated in the diagram of fig.
The sensing and digitizing functions involve the input of vision data by means of a camera focused on the scene of interest. Special lighting techniques are frequently used to obtain an image of sufficient contrast for later processing.
The image viewed by the camera is typically digitized and stored in computer memory. The digital image is called a frame of vision data, and is frequently captured by a hardware device called a frame grabber.
These devices are capable of digitizing images at the rate of 30 frames per second. The frames consist of a matrix of data representing projections of the scene sensed by the camera. The elements of the matrix are called picture elements, or pixels.
The number of pixels are determined by a sampling process performed on each image frame. A single pixel is the projection of a small portion of light intensity for that element of the scene. Each pixel intensity is converted into a digital value. (We are ignoring the additional complexities involved in the operation of a color video camera.)
The digitized image matrix for each frame is stored and then subjected to image processing and analysis functions for data reduction and interpretation of the image.
These steps are required in order to permit the real-time application of vision analysis required in robotic applications.
Typically, an image frame will be thresholded to produce a binary image, and then various feature measurements will further reduce the data representation of the image.
This data reduction can change the representation of a frame from several hundred thousand bytes of raw image data to several hundred bytes of feature value data. The resultant feature data can be analyzed in the available time for action by the robot system.
Various techniques to compute the feature values can be programmed into the computer to obtain feature descriptors of the image which are matched against previously computed values stored in the computer.
These descriptors include shape and size characteristics that can be readily calculated from the thresholded image matrix.
To accomplish image processing and analysis, the vision system frequently must be trained. In training, information is obtained on prototype objects and stored as computer models.
The information gathered during training consists of features such as the area of the object, its perimeter length, major and minor diameters, and similar features.
During subsequent operation of the system, feature values computed on unknown objects viewed by the camera are compared with the computer models to determine if a match has occurred, will discuss training of a vision system.
The third function of a machine vision system is the applications function.
The current applications of machine vision in robotics include inspection, part identification, location, and orientation. Research is ongoing in advanced applications of machine vision for use in complex inspection, guidance, and navigation.
Vision systems can be classified in a number of ways. One obvious classification is whether the system deals with a two-dimensional or three-dimensional model of the scene.
Some vision applications require only a two-dimensional analysis. Examples of two-dimensional vision problems include checking the dimensions of a part or verifying the presence of components on a subassembly.
Many two-dimensional vision systems can operate on a binary image which is the result of a simple thresholding technique. This is based on an assumed high contrast between the object (s) and the background. The desired contrast can often be accomplished by using a controlled lighting system.
Three-dimensional vision systems may require special lighting techniques and more sophisticated image processing algorithms to analyze the image. Some systems require two cameras in order to achieve a stereoscopic view of the scene, while other three-dimensional systems rely on the use of structured light and optical triangulation techniques with a single camera.
An example of a structured light system is one that projects a controlled band of light across the object. The light band is distorted according to the three-dimensional shape of the object.
The vision system sees the distorted band and utilizes triangulation to deduce the shape.
Another way of classifying vision systems is according to the number of gray levels (light intensity levels) used to characterize the image.
In a binary image the gray level values are divided into either of two categories, black or white. Other systems permit the classification of each pixel’s gray level into various levels, the range of which is called a gray scale.