Understand How Autofocus Works: Part 2
After our first look into autofocus, we'll now take a more technical approach. I'm firmly of the belief that the more you understand about your camera and how it interprets the world from an engineering perspective, the more you can get out of it to accurately create your vision.
The Quick Technical History
Leitz, now known as Leica, started patenting a series of autofocus system technologies in 1960, and demonstrated an autofocusing camera at Photokina (which started in 1950) in 1976. The first mass-produced AF camera was the Konica C35 point-and-shoot released in 1977. The first AF 35mm SLR was the Pentax ME-F in 1981, followed by a similar Nikon F3AF in 1983.

Originally all these early AF SLRs had lens motors, essentially a standard lens with a big ugly motor block stuck onto it. This continued until the 1985 Minolta Maxxum 7000 had the drive motor in the camera body along with the sensors.
This was the first AF SLR that met with reasonable commercial success. The previous attempts were slow, innaccurate and only worked under ideal conditions, which didn't really make a case for the double cost compared to similar manual focus cameras. The Maxxum 7000 cost Minolta $130 million in 1991 after a protracted patent battle with the US company Honeywell over the AF technology.

Nikon followed Minolta's suit, but reverted back to lens motors in 1992, hence modern entry-level Nikons not having an integrated AF drive motor. Canon's EOS (electro-optical system) AF system came about in 1987, where they annoyed many photographers by dropping the FD lens mount and creating the completely electronic EF mount.

Well, that's generally what happened and the order that it happened in. So what about the technology itself? Let's dig a little more.
Physical Implementations
Phase Detection
Phase detect autofocus is the fast AF found on DSLRs (and increasingly mirrorless cameras as part of a hybrid AF system). In DSLRs, part of the main mirror is semi-silvered and passes about a quarter of the light from the lens to a small secondary mirror behind it and down into the base of the mirror box. In the base are small lenses which focus the light from the edges of the lens onto the CCD sensor array.

The array is generally made up of a number of one-dimensional strips of pixels in various orientations. Each strip can only see a feature that contrasts perpendicularly to it, as the only change it can see is along the line. If a feature in the image is parallel to the strip, it can only see any one particular aspect of the feature at once, rather than the "shape" of it.
Contrast Detection
Contrast detection generally exists directly on the imaging sensor itself, hence its use in live view on DSLRs. It's usually the only detection system available on mirrorless and compact cameras. It's a software implementation, so there's no more real physical aspect to it, just the sensor and a processor.
Hybrid Detection
As the name implies, a combination of both systems. This can take the form of converting some of the sensor pixels to AF pixels, or having a phase detect array layered over the sensor, which then works with the contrast detect system in tandem to improve AF speed.
How Things Work
Ok, now we know the physical setup for each type of AF system, let's cover how they use their respective implementations to do their job.
Focus And Distance
The compound lens (a single optical system made up of a number of simple lenses, usually called "elements" in the photography literature) system in your camera lens uses one or more moving lenses to focus the light rays at the image plane.
The distance to the subject dictates how far the corrective lens needs to move in order to focus. Consider it like a pair of glasses for the main optics, except instead of changing the lens power, its position is changed.
Let's take a very simple example with just one simple lens, to show that as the subject moves, the image blurs, approximated by the thin lens formula:
$${1 \over f} = {1 \over S_1} + {1 \over S_2}$$
This equation assumes lenses of negligible thickness in air, so it doesn't accurately translate to real-world lenses, but it allows me to get the point across more simply.

We use a point source of light with a lens of focal length 1m (1000mm). This gives a \(1 \over f\) value of 1. If \(S_1\) is two metres, \(1 \over S_1\) is 0.5. Thus \(S_2\) is also 2m when the lens is focused. If we move the point source subject back to 8m away from the lens, \(1 \over S_1\) becomes 1/8. To compensate, \(1 \over S_2\) must become 7\8, which requires an \(S_2\) value of 8/7, or 1.14m. Of course, the \(S_2\) value is fixed as the sensor is stationary, so the image is thrown out of focus.
If we insert a second, corrective, lens at distance \(d\) from the first one into this optical system to create a compound lens, we can focus the image as the subject moves. The combined new focal length is, according to the compound thin lens equation:
$${1 \over f} = {1 \over f_1} + {1 \over f_2} - {d \over f_1 f_2}$$
So we have a new focal length. The distance from the new lens to the new focal point for the combined system is called the back focal length, which should be a relatively familiar term in photography, since it's the distance from the rear element to the sensor. If I call the back focal length "\(d_2\)", this is given by:
$$d_2 = {{f_2 (d - f_1)} \over {d - (f_1 + f_2)}}$$
Let's try an example where the image is focused on a fixed image plane, then the subject moves. Adding diverging corrective lenses and crunching the numbers gives us this:

The math may not be flawless, but it's good enough to get the point across! So as the subject moves, the corrective lens must move to compensate because the imaging plane is fixed.
In AF systems, the electronics calculates where the lens needs to move to and instructs the lens motor to move it there. How does it do this? This brings us to detection systems.
Phase Detect
The small lenses in the base of the mirror box focus the light from opposite sides of the lens. Because of the gap between these two points, a parallax is created where each one sees slightly different views of the subject, just like the two input lenses in a rangefinder camera.
The individual points are in focus, just like in a rangefinder; it's the infinite combination of points across the two-dimensional image field that creates the focal blur in an actual image. This is why wide apertures create more blur; not through some kind of optical manipulation, but simply because more of the diameter of the glass is used, creating more points to overlap and create blur. Imagine AF to be using an f/22 or smaller aperture at each side of the lens, so the view remains in focus regardless of lens focal position.

While the light comes from opposite sides of the lens, the split image going to the AF sensors is of the same part of the subject, where the AF dots in the viewfinder are.

The CCD strips are read-out and sent to a dedicated AF chip, which performs a comparison between the two. While individual manufacturers, improving technology, patent-infringment-avoidance and various price points of equipment likely alter the exact algorithm used, the general point here is to perform a mathematical function called an autocorrelation, or similar.

Autocorrelation is a pattern matching algorithm under the umbrella of cross-correlation in signal processing, but instead of comparing two different signals, it compares a signal with a shifted version of itself. Essentially, it's an integral (or more likely in this case of discrete value sets, summation) function which calculates, compares and maximises the area under the superimposed signal graphs.
The goal is to calculate how far it has to shift one of the signals in order to maximise that area and thus match up the two views. The mathematics involved is very long-winded (it would likely take several articles to work through a basic example) but the result of the overall final algorithm should fall between 1 and -1, with the camera looking to find the shift value where the correlation value is as close to 1 as possible.
By doing this, it sees and understands the same feature coming from each side of the lens, and knowing the physical spatial shift between them along the pixel strip tells it, with trigonometry based on known camera dimensions, how far and in which direction the lens is out of focus. It can then send a focusing signal to the lens, and check the focus after the move. That's when your camera indicates focus lock and allows the image to be shot.

You may have heard of "dot" or "point" type AF points vs. "cross" type AF points. The difference between those is that point-type points are single, one-dimensional strips of pixels, whereas cross-type points are two lines arranged perpendicularly. Because an AF sensor is one-dimensional, it can only see luminance changing along its length. Dot-type sensors are thus only sensitive to detail in one direction, whereas cross-types can see across two dimensions.
If a dot-type sensor is parallel with a major detail feature, it cannot see the difference between it and its adjacent, contrasting feature, and thus has significant difficulty in focusing.

Contrast Detect
This method reads off a few pixels at the desired focus position from the imaging sensor. The processor calculates the contrast value between these pixels, the difference in luminance over the pixel space being measured. By calculating the gradient of the curve along the pixel lines and columns, it can seek to maximise the value of this gradient.

The lens focus is then fractionally moved, and the contrast is calculated again. If the contrast is lower, the system has moved the lens in the wrong direction, so it's then moved in the opposite direction. The contrast is measured again, the lens is moved further, and this process repeats as the contrast value climbs until it dips. When it falls, the lens has gone too far and the algorithm moves the lens back again, making further microadjustments.

The contrast detect method of AF has the potential to be extremely accurate because it's on the sensor-level, no separate system. It just moves the lens until contrast is maximised. Unfortunately for the same reason, it seems unlikely to ever be quick; you could argue that it should only require a measurement at two focal positions in order to know how much the lens is defocused, but that requires the camera to know exactly how contrasty the subject is to begin with.
It has no way of knowing what the "true" distribution of luminance values being measured will be, because they depend on the subject. This is why there also cannot be a "threshold gradient" nor an "ideal peak luma value." These things vary greatly from scene to scene.

Thus, for the forseeable future, professional filmmaking will continue to use manual focus pullers as it always has, and mirrorless point-and-shoots will continue to be slow. Unless...
Hybrid Systems
What if you could get the best of both worlds? What if you could have the speed of phase detect and eliminate hunting, but combine that with the accuracy and simplicity of contrast detection? Well, that's exactly what manufacturers are doing now.
Instead of putting the phase detection strips on the bottom of a mirror box, which is useless in mirrorless cameras and DSLRs in live view, they are instead being created as dedicated arrays onto the image sensor itself. But surely there's nothing to phase-match on the sensor, because it's getting blasted by all the light from the rest of the lens in a big blurry circle of confusion like I said earlier? Not so fast!
Because the pixels (technically "sensels," since they're sensor elements and not picture elements) on an imaging sensor are covered in microlenses for improved light-gathering, all we need to do is block off half of the pixel to get the image from one side of the lens. Is this ideal? No, the image will still be blurry, but half as blurry as it is when seeing the entire lens, and now we can use it to more accurately detect focus because there will be a parallax between the two images.

In the Fuji X100s this technology is used to beef up the manual focusing visual aids with a split-prism-like EVF overlay, but Sony uses it as a true hybrid system in conjuction with contrast detect AF as "Fast Hybrid AF" in their higher-end NEX cameras. Canon and Nikon also use this concept in their lower-end cameras. In Sony's A99, a second dedicated phase detection array takes advantage of the translucent mirror by being overlaid directly in front of the imaging sensor, known as Dual AF.

So on-sensor phase detection low-light ability isn't up to much, it tends to be limited to a center point to reduce the number of pixels taken out of imaging use, and the techology is in its infancy. But with more dedicated systems like Sony's Dual AF arrays, and maybe some "sacrificed" image sensor pixels (using software interpolation) with more directional microlenses, this looks like the future of autofocus.
Conclusion
So we've come from the invention of autofocus, through its development and widespread adoption. We've looked at the fundamental optical mechanics of focus. We know what types of AF there are, where they are in the camera, and how they work, as well as how these attributes practically affect the performance of the camera. We've taken a look at recent developments in hybrid autofocus systems, and considered where they may continue from here.
When using AF, consider how the camera is seeing the scene and adjust accordingly. When shopping for cameras, take a good look at their AF systems and how well they can work for your style of shooting.
Well, that's a wrap on this technical overview of autofocus. Questions? Comments? Hit up the comments below!