Human-centered Vision Systems:
------------------------------

Ideas for enabling Smart Environments, Ambient 
Intelligence, and Social Networks
Hamid Aghajan, Stanford University
aghajan@stanford.edu

Vision offers rich information about events 
involving human activities in applications from 
gesture recognition to occupancy reasoning. 
Multi-camera vision allows for applications based 
on 3D perception and reconstruction, offers 
opportunities for collaborative decision making, 
and enables hybrid processing through task assignment 
to different cameras based on their views.


Smart Environments and Ambient Intelligence
-------------------------------------------
Smart environments are spaces that sense, perceive, 
and react to the presence, commands, or observed 
events of their occupants through a variety of 
interfaces and offer services such as multimedia, 
home control, or pervasive communications, as well 
as accident detection and well-being applications. 
The notion of ambient intelligence refers to endowing 
such systems with unobtrusive and intuitive interfaces 
as well as mechanisms to learn and adapt to the behavior 
models and preferences of their user in order to offer 
context-aware and customized services tailored to the 
user needs. 


User-centric Design and Social Networks
---------------------------------------
A user-centric design paradigm in creating 
vision-based applications considers the user 
acceptance and social aspects of the intended 
solution as part of the design effort. Adaptation 
to the user’s set of preferences and behavior model, 
seamless and intuitive interfaces, automated setup 
and configuration, ease of use, awareness of the 
context, and responsiveness to the user’s privacy 
options are some of the attributes of a user-centric 
design. Novel opportunities in application development 
for smart homes, offices, seminar rooms, automotive, 
health-care and well-being domains, and experience 
sharing in social networks are enabled by employing 
user-centric approaches in vision-based development.


Context-based Processing
------------------------
Interpreting an event or a scene based on visual 
data often requires additional contextual information. 
Contextual information may be obtained from different 
sources. Generally speaking, in an interactive system 
the sources of context can be loosely categorized as 
environmental context and user-centric context. 
Environmental context refers to information derived 
from domain knowledge or from concurrently sensed effects 
in the area of operation. User-centric context refers 
to information obtained and accumulated from the user. 
Both types of context can include static or dynamic 
contextual elements.


Interfacing Vision and Other Layers
-----------------------------------
In addition to the inherent complexities in vision 
processing stemming from perspective view and 
occlusions, setup and calibration requirements 
have challenged the creation of meaningful applications 
that can operate in uncontrolled environments. 
Moreover, the task of studying user acceptance 
criteria such as privacy management and the implications 
in visual ambient communication has for the most 
part stayed out of the realm of technology design, 
further hindering the roll-out of vision-based 
applications in spite of the available sensing, 
processing, and networking technologies.

The output of visual processing often consists 
of instantaneous measurements such as location 
and pose, enabling the vision module to yield 
quantitative knowledge to higher levels of reasoning. 
The extracted information is not always flawless 
and often needs further interpretation at a data 
fusion level.  Also while quantitative knowledge 
is essential in many smart environments applications 
such as gesture control and accident detection, 
most ambient intelligence applications need to 
also depend on qualitative knowledge accumulated 
over time in order to learn user’s behavior models 
and adapt their services to the preferences explicitly 
or implicitly stated by the user. Proper interfacing 
of vision to high-level reasoning allows for integration 
of information arriving at different times and from 
different cameras, and application-level interpretation 
according to the associated confidence levels, 
available contextual data, as well as the accumulated 
knowledge base from the user history and behavior model.  


Smart Camera Networks
---------------------
Design of scalable, network-based applications 
employing high-bandwidth data such as multi-source 
video calls for a change of paradigm in the processing 
methodology from central to distributed methods. 
In a smart camera network local processing is employed 
to reduce the acquired video sequence to meta-data 
which can be transmitted to the server for data 
or decision fusion. Besides enabling scalable networks, 
smart cameras can also offer a solution for applications 
in which the privacy of the user is of priority.


The Talk
--------
This talk presents ideas for human-centric application 
development based on visual input. A number of applications 
in smart environments, ambient intelligence, and social 
network settings are discussed in which the vision 
processing task involving the recognition of user activities 
is linked with other processing modules in charge of 
higher-level interpretation or user behavior modeling. 
The notion of employing contextual data is examined 
through examples in which prior information can assist 
vision processing to function more effectively. Case 
studies in algorithm development for human pose analysis 
based on smart cameras are discussed to illustrate the 
relationship between application requirements and available 
processing resources. Context-aware and user-adaptive 
methods for light and ambience control services in smart 
homes, exercise monitoring and experience sharing using 
avatars, speaker assistance systems, and automated 
environment discovery based on user interaction will 
be presented to demonstrate example applications.