Emotional Dialogue Generation using Image-Grounded Language Models

論文URL:http://dl.acm.org/citation.cfm?doid=3173574.3173851

論文アブストラクト:Computer-based conversational agents are becoming ubiquitous. However, for these systems to be engaging and valuable to the user, they must be able to express emotion, in addition to providing informative responses. Humans rely on much more than language during conversations; visual information is key to providing context. We present the first example of an image-grounded conversational agent using visual sentiment, facial expression and scene features. We show that key qualities of the generated dialogue can be manipulated by the features used for training the agent. We evaluate our model on a large and very challenging real-world dataset of conversations from social media (Twitter). The image-grounding leads to significantly more informative, emotional and specific responses, and the exact qualities can be tuned depending on the image features used. Furthermore, our model improves the objective quality of dialogue responses when evaluated on standard natural language metrics.

日本語のまとめ:

画像センチメント、表情、風景に基づいて画像で対話生成エージェントを作成。ソーシャルメディアからのデータセットでモデルを評価して質問に対しての回答生成した。結果として回答するうえで画像と感情が重要な役割を果たした。

(106文字)

発表スライド: