Session:「Video "Smart" Viewers」

EgoScanning: Quickly Scanning First-Person Videos with Egocentric Elastic Timelines

論文URL: http://dl.acm.org/citation.cfm?doid=3025453.3025821

論文アブストラクト: This work presents EgoScanning, a novel video fast-forwarding interface that helps users to find important events from lengthy first-person videos recorded with wearable cameras continuously. This interface is featured by an elastic timeline that adaptively changes playback speeds and emphasizes egocentric cues specific to first-person videos, such as hand manipulations, moving, and conversations with people, based on computer-vision techniques. The interface also allows users to input which of such cues are relevant to events of their interests. Through our user study, we confirm that users can find events of interests quickly from first-person videos thanks to the following benefits of using the EgoScanning interface: 1) adaptive changes of playback speeds allow users to watch fast-forwarded videos more easily; 2) Emphasized parts of videos can act as candidates of events actually significant to users; 3) Users are able to select relevant egocentric cues depending on events of their interests.

日本語のまとめ:

一人称視点動画の高速閲覧を支援するインタフェースを提案した。ユーザが手がかりを選択すると、対応するシーンが低速で再生され、他のシーンが高速で再生される。ユーザスタディで提案インタフェースが有用であることを確認した。

Retargeting Video Tutorials Showing Tools With Surface Contact to Augmented Reality

論文URL: http://dl.acm.org/citation.cfm?doid=3025453.3025688

論文アブストラクト: A video tutorial effectively conveys complex motions, but may be hard to follow precisely because of its restriction to a predetermined viewpoint. Augmented reality (AR) tutorials have been demonstrated to be more effective. We bring the advantages of both together by interactively retargeting conventional, two-dimensional videos into three-dimensional AR tutorials. Unlike previous work, we do not simply overlay video, but synthesize 3D-registered motion from the video. Since the information in the resulting AR tutorial is registered to 3D objects, the user can freely change the viewpoint without degrading the experience. This approach applies to many styles of video tutorials. In this work, we concentrate on a class of tutorials which alter the surface of an object.

日本語のまとめ:

AR技術を用いてチュートリアル動画の2次元の情報を3次元の物体に描画する。化粧や漢字書き取りなどのチュートリアル動画に応用し、有効性を明らかにした。

Close to the Action: Eye-Tracking Evaluation of Speaker-Following Subtitles

論文URL: http://dl.acm.org/citation.cfm?doid=3025453.3025772

論文アブストラクト: The incorporation of subtitles in multimedia content plays an important role in communicating spoken content. For example, subtitles in the respective language are often preferred to expensive audio translation of foreign movies. The traditional representation of subtitles displays text centered at the bottom of the screen. This layout can lead to large distances between text and relevant image content, causing eye strain and even that we miss visual content. As a recent alternative, the technique of speaker-following subtitles places subtitle text in speech bubbles close to the current speaker. We conducted a controlled eye-tracking laboratory study (n = 40) to compare the regular approach (center-bottom subtitles) with content-sensitive, speaker-following subtitles. We compared different dialog-heavy video clips with the two layouts. Our results show that speaker-following subtitles lead to higher fixation counts on relevant image regions and reduce saccade length, which is an important factor for eye strain.

日本語のまとめ:

外国語の映画で字幕をつけるときに話者の近くに字幕を出すことで、字幕とコンテンツの距離による眼精疲労やコンテンツの見逃しを軽減する。既存の中央下方に配置した字幕と比較して目の緊張の指標であるサッカード距離を減少させることができた。

Responsive Action-based Video Synthesis

論文URL: http://dl.acm.org/citation.cfm?doid=3025453.3025880

論文アブストラクト: We propose technology to enable a new medium of expression, where video elements can be looped, merged, and triggered, interactively. Like audio, video is easy to sample from the real world, but hard to segment into clean reusable elements. Reusing a video clip means non-linear editing, and compositing with novel footage. The new context dictates how carefully a clip must be prepared, so our end-to-end approach enables previewing and easy iteration. We convert static-camera videos into loopable sequences, synthesizing them in response to simple end-user requests. This is hard because a) users want essentially semantic-level control over the synthesized video content, and b) automatic loop-finding is brittle and leaves users limited opportunity to work through problems. We propose a human-in-the-loop system where adding effort gives the user progressively more creative control. Artists help us evaluate how our trigger interfaces can be used for authoring of videos and video-performances.

日本語のまとめ:

動画をセグメントに分割し、ループやマージ、トリガーするインタラクティブな新しいメディアの表現方法を提案した。アーティストに実際に使ってもらいシステムの評価を行った。