Session:「Talk to me; Text me」

Patterns for How Users Overcome Obstacles in Voice User Interfaces

論文URL: http://dl.acm.org/citation.cfm?doid=3173574.3173580

論文アブストラクト: Voice User Interfaces (VUIs) are growing in popularity. However, even the most current VUIs regularly cause frustration for their users. Very few studies exist on what people do to overcome VUI problems they encounter, or how VUIs can be designed to aid people when these problems occur. In this paper, we analyze empirical data on how users (n=12) interact with our VUI calendar system, DiscoverCal, over three sessions. In particular, we identify the main obstacle categories and types of tactics our participants employ to overcome them. We analyzed the patterns of how different tactics are used in each obstacle category. We found that while NLP Error obstacles occurred the most, other obstacles are more likely to frustrate or confuse the user. We also found patterns that suggest participants were more likely to employ a "guessing" approach rather than rely on visual aids or knowledge recall.

日本語のまとめ:

音声ユーザインターフェース(VUI)の利用に関する分析。VUIカレンダーを用いた実験により、VUI利用における障害を4カテゴリに分類すると共に、障害克服のために用いられた方略を明らかにした。

DeepWriting: Making Digital Ink Editable via Deep Generative Modeling

論文URL: http://dl.acm.org/citation.cfm?doid=3173574.3173779

論文アブストラクト: Digital ink promises to combine the flexibility and aesthetics of handwriting and the ability to process, search and edit digital text. Character recognition converts handwritten text into a digital representation, albeit at the cost of losing personalized appearance due to the technical difficulties of separating the interwoven components of content and style. In this paper, we propose a novel generative neural network architecture that is capable of disentangling style from content and thus making digital ink editable. Our model can synthesize arbitrary text, while giving users control over the visual appearance (style). For example, allowing for style transfer without changing the content, editing of digital ink at the word level and other application scenarios such as spell-checking and correction of handwritten text. We furthermore contribute a new dataset of handwritten text with fine-grained annotations at the character level and report results from an initial user evaluation.

日本語のまとめ:

ディープニューラルネットを用いた編集可能なデジタルインクの提案。手書き文字を構成要素とスタイルの異なる潜在変数を用いて扱うことで、任意のテキストに対するスタイルの制御を行うとともに、スペルチェック・訂正を可能にした。

EDITalk: Towards Designing Eyes-free Interactions for Mobile Word Processing

論文URL: http://dl.acm.org/citation.cfm?doid=3173574.3173977

論文アブストラクト: We present EDITalk, a novel voice-based, eyes-free word processing interface. We used a Wizard-of-Oz elicitation study to investigate the viability of eyes-free word processing in the mobile context and to elicit user requirements for such scenarios. Results showed that meta-level operations like highlight and comment, and core operations like insert, delete and replace are desired by users. However, users were challenged by the lack of visual feedback and the cognitive load of remembering text while editing it. We then studied a commercial-grade dictation application and discovered serious limitations that preclude comfortable speak-to-edit interactions. We address these limitations through EDITalk's closed-loop interaction design, enabling eyes-free operation of both meta-level and core word processing operations in the mobile context. Finally, we discuss implications for the design of future mobile, voice-based, eyes-free word processing interface.

日本語のまとめ:

音声ベースのアイズフリーの携帯ワードプロセッシングインターフェースの提案。システム利用において要求される操作を明らかにするとともに、既存システムの限界を明らかにした上で、提案システムをデザインした。

Identifying Speech Input Errors Through Audio-Only Interaction

論文URL: http://dl.acm.org/citation.cfm?doid=3173574.3174141

論文アブストラクト: Speech has become an increasingly common means of text input, from smartphones and smartwatches to voice-based intelligent personal assistants. However, reviewing the recognized text to identify and correct errors is a challenge when no visual feedback is available. In this paper, we first quantify and describe the speech recognition errors that users are prone to miss, and investigate how to better support this error identification task by manipulating pauses between words, speech rate, and speech repetition. To achieve these goals, we conducted a series of four studies. Study 1, an in-lab study, showed that participants missed identifying over 50% of speech recognition errors when listening to audio output of the recognized text. Building on this result, Studies 2 to 4 were conducted using an online crowdsourcing platform and showed that adding a pause between words improves error identification compared to no pause, the ability to identify errors degrades with higher speech rates (300 WPM), and repeating the speech output does not improve error identification. We derive implications for the design of audio-only speech dictation.

日本語のまとめ:

音声フィードバックによる音声入力エラーに対する識別に関する実験により、単語間の休止、発生速度、繰り返しによるエラー識別に対する影響を明らかにした。