論文アブストラクト： Predicting human performance in interaction tasks allows designers or developers to understand the expected performance of a target interface without actually testing it with real users. In this work, we present a deep neural net to model and predict human performance in performing a sequence of UI tasks. In particular, we focus on a dominant class of tasks, i.e., target selection from a vertical list or menu. We experimented with our deep neural net using a public dataset collected from a desktop laboratory environment and a dataset collected from hundreds of touchscreen smartphone users via crowdsourcing. Our model significantly outperformed previous methods on these datasets. Importantly, our method, as a deep model, can easily incorporate additional UI attributes such as visual appearance and content semantics without changing model architectures. By understanding about how a deep learning model learns from human behaviors, our approach can be seen as a vehicle to discover new patterns about human behaviors to advance analytical modeling.
論文アブストラクト： As display environments become larger and more diverse - now often encompassing multiple walls and room surfaces - it is becoming more common that users must find and manipulate digital artifacts not directly in front of them. There is little understanding, however, about what techniques and devices are best for carrying out basic operations above, behind, or to the side of the user. We conducted an empirical study comparing two main techniques that are suitable for full-coverage display environments: mouse-based pointing, and ray-cast 'laser' pointing. Participants completed search and pointing tasks on the walls and ceiling, and we measured completion time, path lengths and perceived effort. Our study showed a strong interaction between performance and target location: when the target position was not known a priori the mouse was fastest for targets on the front wall, but ray-casting was faster for targets behind the user. Our findings provide new empirical evidence that can help designers choose pointing techniques for full-coverage spaces.
論文アブストラクト： We introduce a mobile app for collecting in-the-wild data, including sensor measurements and self-reported labels describing people's behavioral context (e.g., driving, eating, in class, shower). Labeled data is necessary for developing context-recognition systems that serve health monitoring, aging care, and more. Acquiring labels without observers is challenging and previous solutions compromised ecological validity, range of behaviors, or amount of data. Our user interface combines past and near-future self-reporting of combinations of relevant context-labels. We deployed the app on the personal smartphones of 60 users and analyzed quantitative data collected in-the-wild and qualitative user-experience reports. The interface's flexibility was important to gain frequent, detailed labels, support diverse behavioral situations, and engage different users: most preferred reporting their past behavior through a daily journal, but some preferred reporting what they're about to do. We integrated insights from this work back into the app, which we make available to researchers for conducting in-the-wild studies.
論文アブストラクト： Cognitive load has been shown, over hundreds of validated studies, to be an important variable for understanding human performance. However, establishing practical, non-contact approaches for automated estimation of cognitive load under real-world conditions is far from a solved problem. Toward the goal of designing such a system, we propose two novel vision-based methods for cognitive load estimation, and evaluate them on a large-scale dataset collected under real-world driving conditions. Cognitive load is defined by which of 3 levels of a validated reference task the observed subject was performing. On this 3-class problem, our best proposed method of using 3D convolutional neural networks achieves 86.1% accuracy at predicting task-induced cognitive load in a sample of 92 subjects from video alone. This work uses the driving context as a training and evaluation dataset, but the trained network is not constrained to the driving environment as it requires no calibration and makes no assumptions about the subject's visual appearance, activity, head pose, scale, and perspective.