論文アブストラクト： Voice User Interfaces (VUIs) are growing in popularity. However, even the most current VUIs regularly cause frustration for their users. Very few studies exist on what people do to overcome VUI problems they encounter, or how VUIs can be designed to aid people when these problems occur. In this paper, we analyze empirical data on how users (n=12) interact with our VUI calendar system, DiscoverCal, over three sessions. In particular, we identify the main obstacle categories and types of tactics our participants employ to overcome them. We analyzed the patterns of how different tactics are used in each obstacle category. We found that while NLP Error obstacles occurred the most, other obstacles are more likely to frustrate or confuse the user. We also found patterns that suggest participants were more likely to employ a "guessing" approach rather than rely on visual aids or knowledge recall.
論文アブストラクト： Digital ink promises to combine the flexibility and aesthetics of handwriting and the ability to process, search and edit digital text. Character recognition converts handwritten text into a digital representation, albeit at the cost of losing personalized appearance due to the technical difficulties of separating the interwoven components of content and style. In this paper, we propose a novel generative neural network architecture that is capable of disentangling style from content and thus making digital ink editable. Our model can synthesize arbitrary text, while giving users control over the visual appearance (style). For example, allowing for style transfer without changing the content, editing of digital ink at the word level and other application scenarios such as spell-checking and correction of handwritten text. We furthermore contribute a new dataset of handwritten text with fine-grained annotations at the character level and report results from an initial user evaluation.
論文アブストラクト： We present EDITalk, a novel voice-based, eyes-free word processing interface. We used a Wizard-of-Oz elicitation study to investigate the viability of eyes-free word processing in the mobile context and to elicit user requirements for such scenarios. Results showed that meta-level operations like highlight and comment, and core operations like insert, delete and replace are desired by users. However, users were challenged by the lack of visual feedback and the cognitive load of remembering text while editing it. We then studied a commercial-grade dictation application and discovered serious limitations that preclude comfortable speak-to-edit interactions. We address these limitations through EDITalk's closed-loop interaction design, enabling eyes-free operation of both meta-level and core word processing operations in the mobile context. Finally, we discuss implications for the design of future mobile, voice-based, eyes-free word processing interface.
論文アブストラクト： Speech has become an increasingly common means of text input, from smartphones and smartwatches to voice-based intelligent personal assistants. However, reviewing the recognized text to identify and correct errors is a challenge when no visual feedback is available. In this paper, we first quantify and describe the speech recognition errors that users are prone to miss, and investigate how to better support this error identification task by manipulating pauses between words, speech rate, and speech repetition. To achieve these goals, we conducted a series of four studies. Study 1, an in-lab study, showed that participants missed identifying over 50% of speech recognition errors when listening to audio output of the recognized text. Building on this result, Studies 2 to 4 were conducted using an online crowdsourcing platform and showed that adding a pause between words improves error identification compared to no pause, the ability to identify errors degrades with higher speech rates (300 WPM), and repeating the speech output does not improve error identification. We derive implications for the design of audio-only speech dictation.