Stories from ANU Reporter and ANU News
- Cycles of the sun
- Data cycles
- Eye robot
- Flight plans
- From cell to sell
- Grid, get smart
- Look into my eyes
- Lost & found in the supermarket
- Melding mind & machine
- Multicore computer to help train more sleek geeks
- On the buses
- Operation vision
- Power to the people
- Sea of dreams
- Smart stuff
- Solar energy for soldier mobility
- Water, water everywhere...
- Work with us
CECS Spotlight: Lost & found in the supermarket
Try doing your weekly shop blindfolded and you'd be close to the experience of a vision impaired person in the supermarket. But new computer matching technology could take the pain out of navigating those long aisles.
Navigating the towering, long and utterly indistinguishable shelves of a supermarket is challenging at the best of times, not least when the shopper is hunting that one special item - say, a pack of needles. Those people who are inclined to stick things out might seek assistance from a supermarket employee, which might in turn lead to one of those squawking announcements that temporarily disrupt the muzak: "Aisle check on tools, aisle check!" By this stage, no one could be blamed for abandoning the search and making his or her own check for the nearest exit.Now, imagine attempting such a task if you were vision impaired. What was previously a minor test of resilience and recognition has suddenly become much, much harder. A red-coloured carton of laundry detergent can look much like a red-coloured carton of red wine, particularly if you're not in possession of 20-20 sight. One product will leave your whites whiter than white, the other might turn those same whites a striking shade of crimson. While admittedly at the extreme end of misrecognition, such a mistake does illustrate how forbidding a supermarket can appear to those with vision problems.Help could be on the way for all shoppers, but especially those whose sight is impaired. A team of researchers from the Research School of Information Sciences and Engineering are developing software that will locate products anywhere in a supermarket. All you need do is point your camera at a sample of the item you're looking for and the computer will do the rest: matching the product to a record in a massive visual dictionary, and then giving you precise instructions on the shelf and aisle locations within the store.
Masters student Yuhang Zhang is the man driving the project forward. A hard-working student, he confesses to sometimes cutting his sleeping time down to five hours a night so that he could maximise the amount of time he'd spend on the problem: how to get a computer to perfectly match a product based on visual recognition. After 10 months of consistent labour and consultation with his supervisors, Zhang has come up with a system that works well and is getting better.
"Do you know how Google works? If you're looking for a keyword, the search engine will find your search item by locating any document that includes them," Zhang says. "What we are trying to do is describe each image as a 'text file'. Just as there are many words in a text file, we think that each image contains many subregions, which can be thought of as 'visual words'. We use software to extract the visual words and describe them mathematically so we can store them in a database, just as we do with a text document."
Last January, Zhang and his supervisors spent three days haunting a test supermarket in southern Canberra. They slowly walked up each aisle, painstakingly capturing overlapping photographs of shelves and taking detailed notes about locations. Their goal: to collect the image of every product in the store. At the end of their shuffling, the researchers had gathered 3,000 shots illustrating the 18 aisles of the market. The next step was to break each image down into sub-regions, resulting in 75,000 smaller images to analyse. The team then created software to hunt out 'local invariant features', those bits of pictures that are consistently similar when such factors as scale, viewpoint or illumination alter.
"The features are not things that you would recognise as letters or words or logos. It's more generic," explains Professor Richard Hartley, expert in computer vision and senior researcher on Zhang's team of supervisors. "To call them visual words is maybe a bit misleading. They're more like phonemes - units of sound that have no meaning in their own but can be combined to make words."
Hartley takes the analogy of the phoneme a step further to explain the process of clustering, where near-identical local invariant features are grouped together to allow for easier sorting. "You might have someone saying 'r' in a variety of ways with subtle differences, but to all intents and purposes they might as well be the same," Hartley says. "In the same way, you take a bunch of things that appear the same and simplify the problem." This clustering process brought the total number of recognisable visual features down to around the 100,000 mark. The team describes these useful units as 'visual words'.
"We call [the 100,000 clustered features] a dictionary," says Dr Lei Wang, Zhang's immediate supervisor. "For each image you then scan, we'd be looking for matches between the features captured by the query and those stored in the database. It's a case of matching by approximation."
It's also a case of taking a lot of care. Any error in the programming at an early stage of the visual dictionary's development could compound disastrously later on. Zhang says the project has required much concentration, and he speaks of his dissatisfaction that the process of visual matching is yet to be 100 per cent perfect. At the outset, the software was able to match every one item in 30, but now it's getting it right up to two-thirds of the time with only a small number of repeated searches. Each individual search takes just 25 milliseconds.
"I cannot say I'm satisfied yet, because sometimes the picture match can prove quite difficult," Zhang says. "If you're looking for a box, it's easier. But if you're looking for a bag, its shape is much more complicated." Difficulties also arise when the object is too small to extract many useful visual words. This can be the case with something like a pen or a downsized chocolate bar.
But Wang speaks more positively about the work of his student. "I think [Zhang's] progress is quite good. Given he only started in January 2007, he's done very well. Especially when you take into account the need to check the ground truths manually for each search, which means physically going through the images each time to see that the software is working correctly. He's also done well to develop the software to cluster features efficiently."
Hartley says the software would need to be able to match visual objects correctly up to 95 per cent of the time before commercial applications could be considered. But he's excited about the possibilities of expanding the project to cover the stock of more than one supermarket, perhaps even building up a network Australia-wide.
He says the work forms part of a very large endeavour in computer vision, where many researchers around the world are working to solve problems like object recognition and category recognition. This latter term might describe a system that could recognise different kinds of dog, for example, but one that knows that a Chihuahua and a Great Dane are both still 'dogs'.
As for Zhang, he intends to continue to refine his visual recognition software through further research. But check the aisles of your local supermarket in future for a system that makes navigating the stacks as simple as point and click.
