Google’s debut in augment reality began with an argument in a bar. David Petrou, a long time engineer at Google, was sitting in a bar with his friends in 2008. He was explaining to his friends how someday you’d just point your phone’s camera at something and your phone will automatically run a search for it. He explained how you would be able to search for something you don’t know how to write. Based on what he had seen in Google, Petrou thought the tech could already work. For his friends the idea was impossible, they called him crazy.
Petrou, angry and disappointed left for home and started coding. Although he had no background in computer vision, he taught himself Java so he could write an Android app and immersed himself in Google’s latest work on computer vision. After fiercely hacking for a month he finally had the very first prototype of what later became Google Goggles.
He had a video of an early demo. He entered a Google conference room with Ted Power, a UX designer talking into a webcam. Petrou explained, “The idea is generic image annotation, where an image can come in to Google and a number of back-ends can annotate that image with some interesting features.”
To explain further he grabbed a G1 (Google’s newly launched Android phone at that time) and took a photo of a newspaper article about Congressional oversight of ExxonMobile. A moment later, the smartphone showed the article’s text in white with a black background. It looked nothing like a smartphone app, but rather a DOS prompt. The app worked impressively, nevertheless, it spelled the company’s name as Em>onMobile. Then Petrou took a picture of Mr. Power’s desk, occupied by books and cables, with a MacBook in the center. The app read the image and gave 10 possible terms to describe it. The terms included; Room, interior, Nokia, however two particular terms excited Petrou: Laptop and MacBook. This proved that the camera could see and understand objects. But immediately after that Petrou said in the webcam, “We are a long way to providing perfect results.”
When the first version of Google Goggles came, it couldn’t do well, but still the idea of searching the web just by taking a photo was enchanting.
After Google provided the crucial example of the interaction between a smartphone and the real world, Apple built the ARKit, Microsoft made the HoloLens and many others started to explore the world of AR.
Then … Goggles died. The first AR experiment in smartphones came and went before the rivals could even copy it.
Robin Williams often joked that the Irish discovered civilization, then had a Guinness and forgot where they left it. So was the case with Google and smartphone cameras. The ideas that you find on Snapchat and Facebook nowadays, Google engineers were working on it nearly a decade ago. As the tech industry is expanding day by day, moving towards a camera-first future, where you Talk, play and work through the camera of your smartphone, Google is now turning back, collecting those same ideas and trying to finish what it started.
When Petrou started this project, he didn’t know how many of his colleagues were working on the same idea and how long they had been on this. In 2006, Google acquired a Santa Monica based company called Neven Vision, which held some of the most advanced computer tools in the world. Google decided to deploy it in its Picasa photo-sharing app.
“It could be as simple as detecting whether or not a photo contains a person, or, one day, as complex as recognizing people, places, and objects,” Adrian Graham, Picasa’s product manager, wrote in a blog post announcing the acquisition. “This technology just may make it a lot easier for you to organize and find the photos you care about.”
After some years when Neven Vision’s technology integrated further into Picasa, Hartmut Neven, founder of the company and his team thought of something bigger.
“We were all inspired by the Terminator movie, when he walks into the bar and everything gets identified,” says Shailesh Nalawadi, a former product manager on the team and now CEO at Mavin, Inc. “We thought, ‘Hey, wouldn’t it be amazing if you could have something like that, match it against a database, and it would tell you what’s in that picture?'”
Eventually Petrou met the Neven Vision team, and together they started working on a better prototype. Both the geniuses built an app that could identify paintings, book covers album art, landscapes and a lot of other well known images. The app would show search results just 20 seconds after you’d taken a photo.
Google had always been like this for its every project. One man builds up something, shares it with the crew, gets some more people interested and together they contribute and work for the project. However, for the Goggles team, this task was much easier as the idea itself was amazing. Two particular executives, Vic Gundotra, a vice president of engineering and Alan Eustace, a senior vice president of knowledge, became high- level champions of the idea. They contributed resources, energy and ambitions to Goggles. Googlers were really excited and often talked about how great it would be when the app was universal, recognizing everything, everywhere.
“Everyone at Google understood that this was possible, this was familiar, and yet transformative,” That we were on the cusp of this thing, and it could be done,” recalled Nalawadi.
In December 2009, at an event at the Computer History Museum, Google launched its Goggles as a public product. It could only identify some landmarks, works of art and some other things, yet Google aimed of creating a much upgraded product in future.
“Google Goggles today works very well on certain types of objects in certain categories, but it is our goal to be able to visually identify any image over time,” Gundotra said at the launch. “Today you have to frame a picture and snap a photo, but in the future you’ll simply be able to point to it…and we’ll be able to treat it like a mouse pointer for the real world.”
The team was having a lot of problems with the technology and knew well that although the mouse-pointer future was possible, it was still a long way to go.
“We always knew it was more like a research project,” one former engineer said.
Google couldn’t handle all of the problems. Some of them were not even Google’s to solve. Like neither were the smartphone cameras good enough nor the users knew well how to use them. And even if the picture was good enough, it contained lots and lots of interesting things that Google couldn’t process altogether. Text recognition was a new thing but couldn’t recognize curved or handwritten texts, as they challenged the algorithms. Plants were difficult to recognize because of their vast range. Barcodes were simple but animals impossible.
And the biggest issue, Google couldn’t even use the thing it did best: facial recognition. “If there are six or more pictures of you on the internet that are well-tagged, and you through our system take a seventh picture, you had 90 percent probability of the right answer being in the first ten search results,” Nalawadi says. But Google couldn’t roll out the feature at the time, when users were already worried about how much Google knew about them. Google Buzz which was launched a few months earlier, had been rife with privacy violations. Scarred of the issue, Google decided to abandon the Facial recognition option.
Even as the team moved further and further accomplishing many other important tasks, it did not forget about the Goggles. In 2010, Petrou delivered a keynote address at the Hot Chips conference at Stanford. While addressing the conference he flipped to a slide titled as ‘Digression into Augmented Reality.’ The Googlers figured out that if the camera understands what it was seeing, it could potentially add more things into a scene. In another Terminator-y thought Petrou thought of how certain things in your view could be amplified, as if you’re using a thermal camera.
Towards the end of his address, Petrou showed that iconic image from Wall-E, a bunch of uniform wearing obese people seated, sipping drinks and staring at screens. “If this is our future, then maybe AR is not at all that important,” he said. AR tech and image search only matter when people care about the world around them, instead of just sitting at a place and staring at screens.
Google continuously searched for ways to attract people towards Goggles. The Goggles upgraded into a Sudoku solver, a translation tool and a barcode scanner. Petrou also worked on a feature known as ‘Virtual Graffiti’, where you could draw something in AR and leave it somewhere for others to mind. Same as the AR art Facebook showed for its Facebook Camera in 2017. Google was the first one to come up with this idea, however it never shipped it.
The company continued developing Goggles, but the progress soon stopped. Google had promised a full iPhone version of the Goggles, yet inserted it in the Google app. But it was quickly removed from there. By 2011, the hype regarding the Goggles almost died and by 2012, the company finally stopped its development.
A team member said, the Goggles died because of the technical limits, another said that people were not comfortable with the idea of walking with their camera held up all the time.
The key reason behind Goggles’ death was actually ‘Google Glass’. In 2011, Google filed a patent application for a “head-mounted display that displays a visual representation of physical interaction with an input interface outside of the field of view.” The name on the patent was David Petrou.
Although Petrou said that “we never questioned mobile phones”, some others say that Goggles team always knew that smartphones were not ideal for AR display. Users can not just hold up there smartphones all the time. They would rather like a device that is much comfortable to use like a pair of glasses or contact lenses.
Google Glass was praised far beyond any other Google product. In 2012, co-CEO Sergey Brin addressed the Google I/O conference followed by Glass-wearing skydivers falling from the sky on the roof of the conference. He then showed a video related to the product. He also discussed the topic in the TED conference in 2013. Unfortunately the tech didn’t work out.
“I think the momentum shifted to project Glass,” said Nalawadi. Eventually, some of the employees went to work on some other projects while others left Google. By 2014, no one was left to even update the app.
When Google gave up on Goggles, other companies began working on the idea. In 2011 Snapchat launched as an option for sending disappearing messages, but quickly adopted smartphone camera for an exciting feature. Pinterest worked on turning images into search queries.
Google didn’t let the technology go to waste, in fact it improved it. “We had this big step-function jump because of deep learning,” says Aparna Chennapragada, a senior director of product at Google. “The same step-function jump we got with voice, we started seeing in image search.” Thanks to the obtainment of DeepMind, investment in AI chips and Google’s company-wide shift to AI thinking, situations got improved for Google. For example; Google Photos.
In May 2017, at the I/O developer conference, Google CEO Sundari Pichai announced Goggles again but this time renamed as Google Lens.
“Google Lens is a set of vision-based computing abilities that can understand what you’re looking at, and help you take action based on that information,” said Pichai.
He also gave demos on how the product identified flowers and connected to Wi-Fi, just by taking a picture of the username and password. What Goggles did in 2010, Lens is doing the same today, but much faster and much better.
Though, Google took a long break between Goggles and Google Lens, however it’s still not late. Google’s AR tech had a long, difficult journey but the company learned its lesson and came up with a product that is clearly excellent and surely long-termed.