Michael White

Ohio State University

Homepage: http://u.osu.edu/white.1240/

Abstract

Dependency Locality in Natural Language Generation

Michael White, Rajakrishnan Rajkumar, Marten van Schijndel and William Schuler

Temperley's (2007) corpus study of written English found evidence that the tendency to minimize dependency length, or dependency locality (Gibson 2000, inter alia), has a strong influence on constituent ordering choices. In this talk, I'll begin by examining dependency locality in the context of discriminative realization ranking, showing that adding a global feature for capturing a dependency length minimization preference to an otherwise comprehensive realization ranking model yields significant improvements in generation decisions, particularly with heavy/light ordering choices. Next, complementing this realization ranking study, I'll present the results of a recent corpus study that goes beyond Temperley's by taking into account lexical and syntactic surprisal as competing control factors, where we find that dependency length remains a significant predictor of the corpus sentence for a wide variety of syntactic constructions, and moreover that embedding depth and embedding difference (Wu et al., 2010) together help to improve the prediction accuracy in cases of anti-locality. Finally, to conclude I'll briefly connect these results with NLG challenges for future work, including how to incorporate psycholinguistic metrics into incremental, discourse-aware generation models.

Presentation

Download Presentation as PDF

Biography

Dr. Michael White is an Associate Professor in the Department of Linguistics at The Ohio State University. After obtaining his Ph.D. in Computer and Information Science from the University of
Pennsylvania in 1994, Dr. White worked for eight years at CoGenTex, Inc., where he focused on developing practical applications of natural language generation technologies. In 2002, Dr. White crossed the pond to Scotland where he worked for three years as a Research Fellow at the University of Edinburgh, managing Edinburgh's effort on the COMIC dialogue system project as part of the EU's Fifth Framework Programme. During this time, Dr. White also took over the development of the open source OpenCCG library, the first practical system for parsing and realization with Combinatory Categorial Grammar. With his colleagues in Edinburgh, Dr. White developed grammar-based and data-driven methods for producing utterances that use prosody to help highlight trade-offs among the available options that are important to a user. Since joining the faculty at OSU in 2005, Dr. White has continued to develop OpenCCG, extending it to a broad-coverage setting. His research interests also include NLG evaluation methods and paraphrase generation and recognition