Program

Tuesday, July 10

8:50 AM - 9:50 AM

Keynote Presentation

Google Like Search of Image Databases and Videos: Andrew Zisserman; How can we retrieve images containing specific objects, scenes or people with the ease, speed and accuracy with which Google retrieves web pages containing specific words? This talk will give one answer to this question and show that it is possible to generalize the concept of text-based search to non-textual information. Starting from a visual query, images or shots containing an object can be retrieved from large scale image databases or videos.

In the visual case we are faced with a number of additional problems over those of text retrieval: the object in a target image may appear at a different viewpoint or under different illumination or at a different size to that specified in the query image. When retrieving people by specifying their face, their expressions may differ between the query and target images. We will show that solutions to these problems are available, and that video can facilitate greater automation.

We will demonstrate retrieval over a number of feature films and over a large scale database of Flickr images. The talk will conclude with a mention of some of the current challenges in taking these ideas further.

9:50 AM - 10:10 AM

Coffee Break

10:10 AM - 11:50 PM

Oral Session

Scalable Near Identical Image and Shot Detection: Ondra Chum (University of Oxford, United Kingdom); James Philbin (University of Oxford, United Kingdom); Michael Isard (Microsoft research, USA); Andrew Zisserman (University of Oxford, United Kingdom)
Detection of Near-duplicate Images for Web Search: Jun Jie Foo (RMIT University, Australia); Justin Zobel (RMIT University, Australia); Ranjan Sinha (RMIT University, Australia); Seyed Tahaghoghi (RMIT University, Australia)
The Feature and Spatial Covariant Kernel: Adding Implicit Spatial Constraints to Histogram: Xiaobing Liu (Tsinghua University, P.R. China); Dong Wang (Tsinghua University, P.R. China); Jianmin Li (Tsinghua University, P.R. China); Bo Zhang (Tsinghua University, P.R. China)
New local descriptors based on dissociated dipoles: Alexis Joly (INRIA, France)
Using Multiple Segmentations for Image Auto-Annotation: Jiayu Tang (University of Southampton, United Kingdom); Paul Lewis (University of Southampton, United Kingdom)

11:50 PM - 1:20 PM

Lunch

1:20 PM - 2:40 PM

Oral session

Probabilistic Model Supported Rank Aggregation for the Semantic Concept Detection in Video: Dayong Ding (Tsinghua University, P.R. China); Bo Zhang (Tsinghua University, P.R. China)
Cluster-based Data Modeling for Semantic Video Search: Jelena Tesic (IBM Watson Research Center, USA); Apostol Natsev (IBM T. J. Watson Research Center, USA); John Smith (IBM T.J. Watson Research Center, USA)
Video Search in Concept Subspace: A Text-Like Paradigm: Xirong Li (Tsinghua University, P.R. China); Dong Wang (Tsinghua University, P.R. China); Jianmin Li (Tsinghua University, P.R. China); Bo Zhang (Tsinghua University, P.R. China)
Semantic Video Analysis for Psychological Research on Violence in Computer Games: Markus Muehling (University of Marburg, Germany); Ralph Ewerth (University of Marburg, Germany); Thilo Stadelmann (University of Marburg, Germany); Bernd Freisleben (University of Marburg, Germany); Rene Weber (University of California Santa Barbara, USA); Klaus Mathiak (RWTH Aachen University, Germany)

2:40 PM - 10:45 PM

Social Event

The social event will take place at the Netherlands Institute for Sound and Vision. The Netherlands Institute for Sound and Vision was established in 1997 as the result of a merger between three large audiovisual archives and the Broadcast Museum. Substantial collections of radio and television programmes, documentaries, commercials, amateur films, photographs and music are all to be found at this institute. The (total) archival holdings include materials dating from the earliest days of cinema right up to current news broadcasts. The size of the archive is estimated at 700,000 hours worth of viewing and listening. The major collection currently held is that of the (Dutch) public broadcasters, the magnitude of which increases daily. On December 1st, 2006, the Netherlands Institute for Sound and Vision opened a futuristic building which houses the Beeld en Geluid Experience. It is a hands on exhibition for the general public and for schools to experience the position and the power the media hold in our society.

See New York Times Architecture Review: Heaven, Hell and Purgatory, Encased in Glass by Nicolai Ouroussoff.

2:40 PM - 4:15 PM

Bus Trip to Hilversum

4:15 PM - 5:30 PM

Best Paper Session

Information-Theoretic Semantic Multimedia Indexing: Joao Magalhaes (Imperial College London, United Kingdom); Stefan Rueger (The Open University, United Kingdom)
How many high-level concepts will fill the semantic gap in news video retrieval?: Alexander Hauptmann (Carnegie Mellon University, USA); Rong Yan (IBM TJ Watson Research Center, USA); Wei-Hao Lin (Carnegie Mellon University, USA)
VISTO: VIsual STOryboard for Web Video Browsing: Marco Furini (University of Piemonte Orientale, Italy); Filippo Geraci (IIT-CNR, Italy); Manuela Montangero (University of Modena and Reggio Emilia, Italy); Marco Pellegrini (IIT-CNR, Italy)

5:30 PM - 7:00 PM

Image Retrieval Showcase

Video Copy Detection Evaluation Showcase

VideOlympics: Video Retrieval Showcase

Columbia University's Semantic Video Search Engine: Shih-Fu Chang (Dept. of E.E. , Columbia Univ., USA); Lyndon Kennedy (Columbia University, USA); Eric Zavesky (Columbia University, USA)
FXPAL MediaMagic Video Search System: John Adcock (FX Palo Alto Laboratory, USA); Matthew Cooper (FX Palo Alto Laboratory, USA); Francine Chen (FX Palo Alto Laboratory, USA)
IBM Multimedia Search and Retrieval System: Apostol Natsev (IBM T. J. Watson Research Center, USA); Jelena Tesic (IBM Watson Research Center, USA); Lexing Xie (IBM Research, USA); Rong Yan (IBM TJ Watson Research Center, USA); John Smith (IBM T.J. Watson Research Center, USA)
Carnegie Mellon University Traditional Informedia Digital Video Retrieval System: Michael Christel (Carnegie Mellon University, USA)
ITI Interactive Video Retrieval System: Stefanos Vrochidis (Informatics and Telematics Institute (ITI), Greece); Vasileios Mezaris (Informatics and Telematics Institute / CERTH, Greece); Ioannis Kompatsiaris (Informatics and Telematics Institute (ITI), Greece)
MediaMill: Semantic Video Search using the RotorBrowser: Ork de Rooij (University of Amsterdam, The Netherlands); Cees Snoek (University of Amsterdam, The Netherlands); Marcel Worring (University of Amsterdam, The Netherlands)
Video Search By Multi-modal and Clustering Analysis: Duy-Dinh Le (National Institute of Informatics, Japan); Fuminori Yamagishi (The University of Tokyo, Japan); Shin'ichi Satoh (National Institute of Informatics, Japan)
Active Learning Approach to Interactive Spatio-temporal News Video Retrieval: Huan-Bo Luan (Institute of Computing Technology, Chinese Academy of Sciences, P.R. China); Shi-Yong Neo (National University of Singapore, Singapore); Tat-Seng Chua (National University of Singapore, Singapore); Yan-Tao Zheng (National University of Singapore, Singapore); Sheng Tang (Institute of Computing Technology, Chinese Academy of Sciences, P.R. China); Yong-Dong Zhang (Institute of Computing Technology, Chinese Academy of Sciences, P.R. China); Jintao Li (Institute of Computing Technology,Chinese Academy of Sciences, P.R. China)
Video Retrieval with Multi-modal features: Jianmin Li (Tsinghua University, P.R. China); Zhi-Kun Wang (Tsinghua University, P.R. China); Xirong Li (Tsinghua University, P.R. China); Tongchun Xiao (Tsinghua University, P.R. China); Dong Wang (Tsinghua University, P.R. China); Zheng Wujie (Tsinghua University, P.R. China); Bo Zhang (Tsinghua University, P.R. China)

Sponsors