Saturday, September 20, 2008

Invariant Representations

I just finished rereading Jeff Hawkins' On Intelligence on audiobook. One of the keys to developing intelligent systems is to enable the system to learn invariant representations of things in the world, and then use current information to make predictions about what's coming next.

An invariant representation is a way of storing information so that if the information appears in a slightly different form, it is still recognizable. For example, you recognize the melody "Happy Birthday" no matter what key it's played in. What let's you recognize it are the intervals.

One example Hawkins uses is the "train station example". Let's say you live in a town a long time ago, and your sweetheart is supposed to come to the town to live with you, and they're arriving by train. Every day you go to the train station, but your sweetheart doesn't arrive. You know that two trains run per day, and you've gotten a letter saying they'll be on the later train. After a couple of weeks of visiting the station, you see a pattern. The morning train arrives at different times, but the afternoon train is always exactly 4 hours later than the morning train. You develop an invariant representation of the train schedule. So if the morning train arrives at 10:14, you know that the afternoon train will arrive at exactly 2:14. If the morning train arrives at 9:27, you know the afternoon train will arrive at exactly 1:27. What you have encoded is relative information, rather than absolute. So given your representation, and the time of the morning train for that day, you can reliably predict when the afternoon train will arrive.

Same thing with vision or audition. If you have an invariant representation of an object, like a dog, it doesn't matter if the lighting conditions are slightly different, or that the dog is near you or far away, or that it's upside down or rotated. Given you invariant representation and information about, say, where the ears are, you can predict where the eyes, legs, and tail are going to be.

This reminded me of the difference between raster and vector graphics.

Raster graphics are bit-for-bit encodings of images. They encode absolute information, about every single bit. Sometimes this is good, but sometimes it's bad, as when you want the image to scale without losing resolution.

Vector graphics, on the other hand, store an invariant representation of the image, and the computer renders objects given certain information.

For example, a circle stored in raster graphics would encode the position and color of every single pixel that makes up the circle. When you zoomed in, the circle would look "blocky". And you'd need a lot of information to store the image.

A circle stored in vector graphics would need to know the formula for the circle, the radius, and other relative information. Then all it would need is the center, so it would know where to render it.

Hawkins makes the claim that AI researchers have not used such an approach, but rely on absolute encoding in order to teach machines to recognize patterns and produce actions, but it would surprise me if the general approach had not been taken.

No comments: