Plenary Lectures

Brief Vita
Dr. Marc Schröder is a Senior Researcher at DFKI and the leader of the DFKI speech group. Since 1998, he is responsible at DFKI for building up technology and research in TTS and emotion-oriented computing. Within the FP6 NoE HUMAINE, Schröder has built up the scientific portal http://emotion-research.net, which won the Grand Prize for the best IST project website 2006. He is Editor of the W3C Emotion Markup Language specification, Coordinator of the FP7 STREP SEMAINE, and project leader of the national-funded basic research project PAVOQUE and the FP7 Network of Excellence SSPNet. Dr. Schröder authored more than 50 scientific publications and is PC member in many conferences and workshops.
OpenMary Text-to-Speech (followed by tutorial)
14 July 2010, 9:00-11:00, Science Park FNWI Gebouw, C.1.112

In this lecture Dr. Schröder will present the key properties of the OpenMary Text-to-Speech system, an open-source multi-lingual client-server platform for generating speech from text. OpenMary is written in 100% pure Java, and currently supports British and American English, German, Turkish, and Telugu. Dr. Schröder will present both the runtime system and the toolkit for supporting new languages and building new voices. The plenary will present the concepts; the tutorial will allow participants to try hands-on how to work with the OpenMary system. Programming capabilities are useful but not strictly required for the tutorial.
Building emotion-oriented real-time interactive systems with the SEMAINE API (followed by tutorial)
15 July 2010, 9:00-11:00, Science Park FNWI Gebouw, C.1.112

The plenary talk will first describe the SEMAINE project and the Sensitive Artificial Listener system it builds. Dr. Schröder will then zoom in on the system integration level, describing the SEMAINE API as a cross-platform component integration framework based on the message-oriented middleware ActiveMQ. Using standard representation formats such as SSML, EMMA, BML or EmotionML wherever possible, the SEMAINE API aims to make it very easy to build new emotion-oriented real-time interactive systems from both old and new components with minimal integration overhead. The plenary will outline the concepts; the tutorial session will allow participants to build their own emotion-oriented systems using the SEMAINE API. Programming skills in Java or C++ are required to participate in the tutorial.

Brief Vita
Hamid Aghajan
Stanford Univ.

Dr. Hamid Aghajan has obtained his Ph.D. degree in Electrical Engineering from Stanford University in 1995 and he is a professor of Electrical Engineering (consulting) at Stanford University since 2003. He supervises the Ambient Intelligence Research Lab and Wireless Sensor Networks Lab at Stanford. Recent research topics include interfacing vision processing with artificial intelligence for semantic interpretation of user-object relationships, acquiring user behavior models and implicitly-stated application preferences, and human gesture analysis for avatar-based representation as means of social connectedness. Dr Aghajan has chaired numerous conferences, is an editorial board member of the book series on Artificial Intelligence and Smart Environments by IOS Press, an associate editor of Machine Vision and Applications, organizer of short courses on Distributed Vision Processing in Smart Camera Networks at CVPR 2007, CVPR 2008, ACIVS 2007, and ICASSP 2009, among many others. Dr. Aghajan is also the co-founder and co-editor-in-chief of the Journal of Ambient Intelligence and Smart Environments, and has co-authored four edited volumes.
Ambient Intelligence: From Sensor Networks to Smart Environments and Social Networks
19 July 2010, 15:00-17:00, Science Park FNWI Gebouw, C.1.112

Vision offers rich information about events involving human activities in applications from gesture recognition to occupancy reasoning. Multi-camera vision allows for applications based on 3D perception and reconstruction, offers opportunities for collaborative decision making, and enables hybrid processing through task assignment to different cameras based on their views. This talk presents ideas and techniques for human-centric application development based on visual input. A number of applications in smart environments, ambient intelligence, and social network settings are discussed in which the vision processing task involving the recognition of user activities is linked with other processing modules in charge of higher-level interpretation or user behavior modeling. The notion of employing contextual data is examined through examples in which prior information can assist vision processing to function more effectively. Case studies in algorithm development for human pose analysis based on smart cameras are discussed to illustrate the relationship between application requirements and available processing resources. Context-aware and user-adaptive methods for light and ambience control services in smart homes, exercise monitoring and experience sharing using avatars, speaker assistance systems, and automated environment discovery based on user interaction will be presented as example applications.

Brief Vita
Anton Nijholt
Univ. of Twente

Dr. Anton Nijholt is a full professor of computer science at the University of Twente. Previously he held positions in computer science departments at the University of Nijmegen, the Vrije Universiteit in Amsterdam, the McMaster University (in Canada) and the Vrije Universiteit Brussels (in Belgium). His main research interests are multi-party interaction, multimodal interaction and entertainment computing. He coordinates the Human Media Interaction research group. In the period 1995-1996 he was a NIAS-fellow in Wassenaar. Presently he is also scientific advisor of Philips Research Europe, in particular of the Experience Processing group of the Philips Lifestyle programme.

People as Content
21 July 2010, 15:00-17:00, Science Park FNWI Gebouw, C.0.110

In the first part of this talk we survey our research efforts on human-computer interaction: natural, affective and social interactions. The assumption is that sensor-equipped environments are able to detect, interpret and anticipate our intentions and feelings. This allows more natural interaction between humans and intelligent environments that support human activity. However, it also allows these environments to collect more information about their human partners than these human partners may find desirable. Environments collect our lives, environments process our lives. In the second part of our presentation we look at situations where it is quite acceptable or even desirable that part of the intentions and feelings of an interacting partner remains hidden for the other. This can happen in everyday life, but also in sports and entertainment. Non-cooperative behavior is often more natural than cooperative behavior. In this talk we will also discuss the many useful uses of non-cooperative behavior, both from the point of view of a smart environment and from the point of view of human partners, users, or inhabitants of smart environments.

Brief Vita
Leon Rothkrantz
Delft Technical Univ.

Dr. Leon J. M. Rothkrantz received the M.Sc. degree in mathematics from the University of Utrecht, the Ph.D. degree in mathematics from the University of Amsterdam, and the M.Sc. degree in psychology from the University of Leiden, in 1971, 1980, and 1990, respectively. He joined the Data and Knowledge Systems group of the Mediamatics Department, Delft University of Technology as an Associate Professor, in 1992. Since 2008, Leon Rothkrantz has been appointed as full Professor Sensortechnology at the Netherlands Defence Academy. His long-range research goal is the design and development of natural, context-aware, multimodal man-machine interfaces. His current research focuses on a wide range of issues including lip-reading, speech recognition and synthesis, facial expression analysis and synthesis, multimodal information fusion, natural dialogue management, and human affective feedback recognition.
Surveillance by multimodal camera systems
28 July 2010, 15:00-17:00, Science Park FNWI Gebouw, C.0.110

Surveillance of public spaces is currently widely used to monitor locations and the behavior of people in those areas. Closed Circuit television (CCTV) systems around the world are used to monitor the safety of people in pubic spaces 24 hours a day. Since events like the terrorist attack in Madrid and London there has been a further increasing need for video network systems to guarantee the safety of people in public areas. Those systems can be used in case of crisis or disasters. However the greater the number of cameras, the greater the number of operators and supervisors are needed to monitor the video streams. The solution of that problem could be an automated surveillance system. A fully automated surveillance system is currently commercially not available. Some software packages do exist, but they mostly record video streams and provide little further functionality. Behavior detection and motion detection, as well as human tracking methods are a widely researched topic. A combination of those methods has been used in our project to develop an automated system capable of classifying human behavior and compute situational awareness. Today the results shown by object detection and motion tracking demonstrate many opportunities in current situations and future applications. Both fields benefit from great attention in current society. The power to predict information about object identity, behavior and position is being used by monitoring services, security applications and information retrieval. We will discuss some examples from crowd surveillance in trains, shopping malls, public spaces.

Brief Vita
Josef Kittler
University of Surrey

Dr. Josef Kittler graduated from the University of Cambridge in Electrical Engineering in 1971 where he also obtained his Ph.D. in Pattern Recognition in 1974 and the Sc.D. degree in 1991. Professor Kittler heads the Centre for Vision Speech and Signal Processing, University of Surrey, U.K. He gained the title Distinguished Professor in 2004. He published more than 600 papers, co-authored a book, and serves on the editorial board of several journals. He is a Series Editor of the Springer Lecture Notes on Computer Science. He is a fellow of the Royal Academy of Engineering, IAPR and EURASIP. His current research interests include Pattern Recognition, Image Processing and Computer Vision. Dr. Kittler will be giving a EURASIP Plenary Lecture at the eNTERFACE.
Information fusion in content-based retrieval from multimedia databases
4 August 2010, 15:00-17:00, Science Park FNWI Gebouw, C.0.110

The retrieval of information from multimedia databases is a challenging problem because of the number of different concepts that may be of interest to the user and the multifaceted characteristics of each concept. The concept properties may span different sensing modalities and within each modality call for the use of a diverse set of features. Commonly, the retrieval problem is formulated as a detection problem (a two class pattern recognition problem), whereby the content of interest is looked for in the multimedia material and discriminated from the anti-concept class. The detectors are designed to capture the different manifestations of each concept class (colour, texture, shape, sounds). The design process is often hampered by small sample set and class imbalance problems. The nature of the retrieval problem raises issues in information fusion. Both, feature level and decision level fusion provide useful mechanisms for tackling different aspects of the concept detector design process. At the feature level, the fusion is often accomplished with multi-kernel machine learning methods. The key question in this approach is how to weigh the contributions of the respective kernels. The weight allocation is normally controlled by regularisation. We discuss the effect of different norms on weight assignment. The findings lead to a two-stage machine learning strategy where the first stage serves simply as a means to eliminate non informative kernels. In contrast, decision level fusion is adopted for dealing with the class population imbalance problem. We show that by extreme under sampling of the negative (anti concept) class we can create a large number of weak classifiers, the fusion of which has the capacity to improve retrieval performance. The techniques discussed are evaluated on standard benchmark databases, including PASCAL VOC 08 image data set and Mediamill Challenge video database, based on the NIST TRECVID 2005 benchmark. The performance is measured using average precision that combines precision and recall into one performance figure. The benefits of various fusion mechanisms are demonstrated.

eNTERFACE 2010 Web Site eNTERFACE | EURASIP| OpenInterface Foundation| SSPNet | IOP-MMI