Session:「All about Data」

Variolite: Supporting Exploratory Programming by Data Scientists

論文URL: http://dl.acm.org/citation.cfm?doid=3025453.3025626

論文アブストラクト: How do people ideate through code? Using semi-structured interviews and a survey, we studied data scientists who program, often with small scripts, to experiment with data. These studies show that data scientists frequently code new analysis ideas by building off of their code from a previous idea. They often rely on informal versioning interactions like copying code, keeping unused code, and commenting out code to repurpose older analysis code while attempting to keep those older analyses intact. Unlike conventional version control, these informal practices allow for fast versioning of any size code snippet, and quick comparisons by interchanging which versions are run. However, data scientists must maintain a strong mental map of their code in order to distinguish versions, leading to errors and confusion. We explore the needs for improving version control tools for exploratory tasks, and demonstrate a tool for lightweight local versioning, called Variolite, which programmers found usable and desirable in a preliminary usability study.

日本語のまとめ:

データ科学者は、コードをコピーしておいたり未使用のまま保持したりコメントアウトしたりと、通常のバージョン管理とは違うコーディングをしている。こうした探索的プログラミングを支援すべくコードの一部を選択してバージョン分岐させたり、タイムラインとツリー状表示を備えたシステムを提案。

The Trials and Tribulations of Working with Structured Data: -a Study on Information Seeking Behaviour

論文URL: http://dl.acm.org/citation.cfm?doid=3025453.3025838

論文アブストラクト: Structured data such as databases, spreadsheets and web tables is becoming critical in every domain and professional role. Yet we still do not know much about how people interact with it. Our research focuses on the information seeking behaviour of people looking for new sources of structured data online, including the task context in which the data will be used, data search, and the identification of relevant datasets from a set of possible candidates. We present a mixed-methods study covering in-depth interviews with 20 participants with various professional backgrounds, supported by the analysis of search logs of a large data portal. Based on this study, we propose a framework for human structured-data interaction and discuss challenges people encounter when trying to find and assess data that helps their daily work. We provide design recommendations for data publishers and developers of online data platforms such as data catalogs and marketplaces. These recommendations highlight important questions for HCI research to improve how people engage and make use of this incredibly useful online resource.

日本語のまとめ:

オンラインの構造化されたデータから情報を探す行動を対象に、職業や立場の違う20人の被験者のインタビューと分析を実施。そしてhuman structured-data interactionのフレームワークを提案。

Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing

論文URL: http://dl.acm.org/citation.cfm?doid=3025453.3025912

論文アブストラクト: Datasets which are identical over a number of statistical properties, yet produce dissimilar graphs, are frequently used to illustrate the importance of graphical representations when exploring data. This paper presents a novel method for generating such datasets, along with several examples. Our technique varies from previous approaches in that new datasets are iteratively generated from a seed dataset through random perturbations of individual data points, and can be directed towards a desired outcome through a simulated annealing optimization strategy. Our method has the benefit of being agnostic to the particular statistical properties that are to remain constant between the datasets, and allows for control over the graphical appearance of resulting output.

日本語のまとめ:

グラフ表示の重要性を示すのに使われるのがAnscombe’sQuartet(アンスコムの四つ組み)。散布図はまるっきり違うのに、平均・標準偏差・相関係数が一緒になってしまう例。こうしたデータセットを作り出すためのシステムを、焼きなまし法アルゴリズムで実現。

Inferring Cognitive Models from Data using Approximate Bayesian Computation

論文URL: http://dl.acm.org/citation.cfm?doid=3025453.3025576

論文アブストラクト: An important problem for HCI researchers is to estimate the parameter values of a cognitive model from behavioral data. This is a difficult problem, because of the substantial complexity and variety in human behavioral strategies. We report an investigation into a new approach using approximate Bayesian computation (ABC) to condition model parameters to data and prior knowledge. As the case study we examine menu interaction, where we have click time data only to infer a cognitive model that implements a search behaviour with parameters such as fixation duration and recall probability. Our results demonstrate that ABC (i) improves estimates of model parameter values, (ii) enables meaningful comparisons between model variants, and (iii) supports fitting models to individual users. ABC provides ample opportunities for theoretical HCI research by allowing principled inference of model parameter values and their uncertainty.

日本語のまとめ:

近似ベイズ計算(ABC, approximate Bayesian computation)によって認知モデルを推論。メニュー選択モデルに適用したところ、メニュー項目をクリックする時間についての観察から同様の挙動を再現し、凝視時間等のユーザーの視覚系の特性を正確に推定できた!

Effects of Frequency Distribution on Linear Menu Performance

論文URL: http://dl.acm.org/citation.cfm?doid=3025453.3025707

論文アブストラクト: While it is well known that menu usage follows a Zipfian distribution, there has been little interest in the impact of menu item frequency distribution on user's behavior. In this note, we explore the effects of frequency distribution on average menu performance as well as individual item performance. We compare three frequency distributions of menu item usage: Uniform; Zipfian with s=1 and Zipfian with s=2. The results show that (1) user's behavior is sensitive to different frequency distributions at both menu and item level; (2) individual item selection time depends on, not only its frequency, but also the frequency of other items in the menu. Finally, we discuss how these findings might have impacts on menu design, empirical studies and menu modeling.

日本語のまとめ:

メニュー使用はジップ分布(Zipfian distribution)に従う(12項目のメニューがあるとき最高頻度項目は最低頻度項目の12倍の頻度で選択される)が、本稿では頻度分布による「メニューの個別項目における選択時間」および「メニュー全体における選択時間」への影響を調べた