Human-centered Vision Systems: ------------------------------ Ideas for enabling Smart Environments, Ambient Intelligence, and Social Networks Hamid Aghajan, Stanford University aghajan@stanford.edu Vision offers rich information about events involving human activities in applications from gesture recognition to occupancy reasoning. Multi-camera vision allows for applications based on 3D perception and reconstruction, offers opportunities for collaborative decision making, and enables hybrid processing through task assignment to different cameras based on their views. Smart Environments and Ambient Intelligence ------------------------------------------- Smart environments are spaces that sense, perceive, and react to the presence, commands, or observed events of their occupants through a variety of interfaces and offer services such as multimedia, home control, or pervasive communications, as well as accident detection and well-being applications. The notion of ambient intelligence refers to endowing such systems with unobtrusive and intuitive interfaces as well as mechanisms to learn and adapt to the behavior models and preferences of their user in order to offer context-aware and customized services tailored to the user needs. User-centric Design and Social Networks --------------------------------------- A user-centric design paradigm in creating vision-based applications considers the user acceptance and social aspects of the intended solution as part of the design effort. Adaptation to the user’s set of preferences and behavior model, seamless and intuitive interfaces, automated setup and configuration, ease of use, awareness of the context, and responsiveness to the user’s privacy options are some of the attributes of a user-centric design. Novel opportunities in application development for smart homes, offices, seminar rooms, automotive, health-care and well-being domains, and experience sharing in social networks are enabled by employing user-centric approaches in vision-based development. Context-based Processing ------------------------ Interpreting an event or a scene based on visual data often requires additional contextual information. Contextual information may be obtained from different sources. Generally speaking, in an interactive system the sources of context can be loosely categorized as environmental context and user-centric context. Environmental context refers to information derived from domain knowledge or from concurrently sensed effects in the area of operation. User-centric context refers to information obtained and accumulated from the user. Both types of context can include static or dynamic contextual elements. Interfacing Vision and Other Layers ----------------------------------- In addition to the inherent complexities in vision processing stemming from perspective view and occlusions, setup and calibration requirements have challenged the creation of meaningful applications that can operate in uncontrolled environments. Moreover, the task of studying user acceptance criteria such as privacy management and the implications in visual ambient communication has for the most part stayed out of the realm of technology design, further hindering the roll-out of vision-based applications in spite of the available sensing, processing, and networking technologies. The output of visual processing often consists of instantaneous measurements such as location and pose, enabling the vision module to yield quantitative knowledge to higher levels of reasoning. The extracted information is not always flawless and often needs further interpretation at a data fusion level. Also while quantitative knowledge is essential in many smart environments applications such as gesture control and accident detection, most ambient intelligence applications need to also depend on qualitative knowledge accumulated over time in order to learn user’s behavior models and adapt their services to the preferences explicitly or implicitly stated by the user. Proper interfacing of vision to high-level reasoning allows for integration of information arriving at different times and from different cameras, and application-level interpretation according to the associated confidence levels, available contextual data, as well as the accumulated knowledge base from the user history and behavior model. Smart Camera Networks --------------------- Design of scalable, network-based applications employing high-bandwidth data such as multi-source video calls for a change of paradigm in the processing methodology from central to distributed methods. In a smart camera network local processing is employed to reduce the acquired video sequence to meta-data which can be transmitted to the server for data or decision fusion. Besides enabling scalable networks, smart cameras can also offer a solution for applications in which the privacy of the user is of priority. The Talk -------- This talk presents ideas for human-centric application development based on visual input. A number of applications in smart environments, ambient intelligence, and social network settings are discussed in which the vision processing task involving the recognition of user activities is linked with other processing modules in charge of higher-level interpretation or user behavior modeling. The notion of employing contextual data is examined through examples in which prior information can assist vision processing to function more effectively. Case studies in algorithm development for human pose analysis based on smart cameras are discussed to illustrate the relationship between application requirements and available processing resources. Context-aware and user-adaptive methods for light and ambience control services in smart homes, exercise monitoring and experience sharing using avatars, speaker assistance systems, and automated environment discovery based on user interaction will be presented to demonstrate example applications.