The Possibilities of Ubiquitous Video Streams – AI Everywhere. Are you ready?

The idea of video cameras everywhere is used to conjure up thoughts of Police States or 1984. Today however, each of us walks around with at least two cameras at the ready.

One on their phone and likely another such as video surveillance of the interior of their homes or overlooking their doorsteps. Cameras are everywhere and the tech giants have been investing huge sums into making this technology cheap, accessible, and ubiquitous.

In March, Amazon announced it was acquiring Ring, the video doorbell company. Several years back, Google had acquired Nest, which then acquired Dropcam and brought it into the fold. The two represent billions of dollars invested in developing both the hardware as well as the infrastructure to support large scale video recording and analysis. Over the same period, dozens of alternative products have come to market such as the WeMo NetCam, Netgear Arlo, and Canary. The video camera on other devices such as the Echo Show and the JIBO also have the flexibility of doubling as cameras for the home.

With so much video data being streamed, it begs the question of what’s possible when you combine multiple streams together along with some of the latest technologies around AI? What can consumers expect of these devices over the next 2-3 years and what are the considerations, especially around privacy, that we’ll need to resolve?

Large advances in hardware technology coupled with new means of processing video have allowed for the costs to exponentially decrease over the years and for the capabilities of these devices to similarly experience exponential growth. Bandwidth, latency, and congestion issues of wireless network technology being addressed means that 4K, 60 FPS video can be streamed without concern about the image being grainy, or buffering.

Behind the scenes, computer vision technology has become commoditized with more service providers offering up the technology and more functionalities being extended to developers. New technology around edge computing may allow for the benefits of computer vision AI with the security of local-based processing.

What can you do today?

In the home, the primary placement of cameras likely include:

Baby cams to check on infants and toddlers
Doorbell cameras that face out onto a front porch
Outdoor cameras looking at backyards
Indoor cameras looking at entry ways

Most of the cameras on the market today come with the ability to stream the video to a phone or desktop, backup the video online, take time lapse images, and push voice to the outcome through the camera. Some also have alerting features through app notifications, email, or text message for event triggers such as motion or the identification of a person.

With these features alone, you can already do a lot:

Know if a package has been delivered
See if anyone is home or has come / gone
Check if the surrounding area is safe
Get a sense of the environment remotely (e.g. is there light inside yet)
Provide voice communication to someone in the area
Check if an infant child is sleeping / safe

However, when you start to add more cameras combined with AI, you can abstract a lot more information about the environment.

Google, Amazon, Microsoft, and IBM, among others, now offer computer vision APIs that can be implemented by even a novice developer with extended amazing functionality. These include the ability to:

Identify an object
Identify a person
Understand logos
Extract text
Determine “inappropriate content”
Transcribe the video
Identify handwriting
Identify smiles
Identify emotion
Estimate peoples’ ages
Identify gesture
Identify foods

There is a lot of overlap among the service providers and while today these services are still too expensive for continuous use (they cost pennies per minute), the price will likely drop to pennies per hour or day over the next few years. Even with only this capability, it’s already possible to start extending the applications that are currently available on today’s webcams such as:

Tracking a user from room to room
Logging when someone arrives home or leaves
Tracking the emotion of different people in frame throughout the day
Keeping a record of what we’re talking about
Tracking visitors to the home

Today, this is achievable without needing to develop new technologies. What’s coming next will reshape how we adopt these devices.

The Next Five Years

The next generation of in-home cameras is going to combine advanced embedded AI features together with a highly reliable connection to online processing. We’ll see Alexa, Google Assistant, and Bixby, among others, embedded into the products and with that, the capability for them to understand what’s happening around us. Maybe we’ll become more comfortable with the idea of live streaming inside our home if the benefits are substantial.

Original blog post by:

Leor Grebler

http://www.ucic.io

What can you do today?

The Next Five Years

Leave a Reply