Session:「Data Extraction」

SEER: Auto-Generating Information Extraction Rules from User-Specified Examples

論文URL: http://dl.acm.org/citation.cfm?doid=3025453.3025540

論文アブストラクト: Time-consuming and complicated best describe the current state of the Information Extraction (IE) field. Machine learning approaches to IE require large collections of labeled datasets that are difficult to create and use obscure mathematical models, occasionally returning unwanted results that are unexplainable. Rule-based approaches, while resulting in easy-to-understand IE rules, are still time-consuming and labor-intensive. SEER combines the best of these two approaches: a learning model for IE rules based on a small number of user-specified examples. In this paper, we explain the design behind SEER and present a user study comparing our system against a commercially available tool in which users create IE rules manually. Our results show that SEER helps users complete text extraction tasks more quickly, as well as more accurately.

日本語のまとめ:

文書の機械学習用データアノテーション支援。ユーザが最初のいくつかをアノテーションすると、そこから自動でルールを学習していってあとはシステムが勝手にやってくれる。

Leveraging Human Routine Models to Detect and Generate Human Behaviors

論文URL: http://dl.acm.org/citation.cfm?doid=3025453.3025571

論文アブストラクト: An ability to detect behaviors that negatively impact people's wellbeing and show people how they can correct those behaviors could enable technology that improves people's lives. Existing supervised machine learning approaches to detect and generate such behaviors require lengthy and expensive data labeling by domain experts. In this work, we focus on the domain of routine behaviors, where we model routines as a series of frequent actions that people perform in specific situations. We present an approach that bypasses labeling each behavior instance that a person exhibits. Instead, we weakly label instances using people's demonstrated routine. We classify and generate new instances based on the probability that they belong to the routine model. We illustrate our approach on an example system that helps drivers become aware of and understand their aggressive driving behaviors. Our work enables technology that can trigger interventions and help people reflect on their behaviors when those behaviors are likely to negatively impact them.

日本語のまとめ:

行動認識機械学習データのラベリングを、1つ1つの行動ではなく人に対して付けてそこから推定することで楽に、または水増しするためのMarkov Decision Processを使う手法の提案。

Interactive Vectorization

論文URL: http://dl.acm.org/citation.cfm?doid=3025453.3025872

論文アブストラクト: Vectorization turns photographs into vector art. Manual vectorization, where the artist traces over the image by hand, requires skill and time. On the other hand, automatic approaches allow users to generate a result by setting a few global parameters. However, global settings often leave too much detail/complexity in some parts of the image while missing important details in others. We propose interactive vectorization tools that offer more local control than automatic systems, but are more powerful and high-level than simple curve editing. Our system enables novices to vectorize images significantly faster than even experts with state-of-the-art tools.

日本語のまとめ:

画像をベクター化したとき、いらない背景だけ消したりある部分だけ解像度を上げたり輪郭だけ取り出したり、といった局所的なクオリティコントロールをユーザ操作でできる手法の提案。

ChartSense: Interactive Data Extraction from Chart Images

論文URL: http://dl.acm.org/citation.cfm?doid=3025453.3025957

論文アブストラクト: Charts are commonly used to present data in digital documents such as web pages, research papers, or presentation slides. When the underlying data is not available, it is necessary to extract the data from a chart image to utilize the data for further analysis or improve the chart for more accurate perception. In this paper, we present ChartSense, an interactive chart data extraction system. ChartSense first determines the chart type of a given chart image using a deep learning based classifier, and then extracts underlying data from the chart image using semi-automatic, interactive extraction algorithms optimized for each chart type. To evaluate chart type classification accuracy, we compared ChartSense with ReVision, a system with the state-of-the-art chart type classifier. We found that ChartSense was more accurate than ReVision. In addition, to evaluate data extraction performance, we conducted a user study, comparing ChartSense with WebPlotDigitizer, one of the most effective chart data extraction tools among publicly accessible ones. Our results showed that ChartSense was better than WebPlotDigitizer in terms of task completion time, error rate, and subjective preference.

日本語のまとめ:

元データの存在しないチャート画像から直接数値読み取ったり別のグラフ形式に変換してくれる。CNNでチャート種別を判別した後はそれぞれルールベースの画像解析。