Resnik, Philip

Phillip Resnik
Department of Linguistics
Program in Neuroscience and Cognitive Science
Department of Computer Science
College of Computer, Mathematical and Natural Sciences
College of Arts and Humanities
1401C Marie Mount Hall
General Research Interests: 
  • Computational social science
  • Crowdsourcing and translation
  • Clinical informatics
  • Computational psycholinguistics
  • Empirical linguistics

I do research in computational linguistics, with interests both in the application of natural language processing techniques to practical problems such as machine translation and sentiment analysis, and in the modeling of human linguistic processes (especially related to lexical semantics). My general research agenda for language technology is to improve the state of the art by finding the right balance between knowledge-free statistical modeling and linguistically informed techniques -- and in so doing, to obtain a better scientific understanding of human language itself. Currently I'm director of the University of Maryland Computational Linguistics and Information Processing (CLIP) Laboratory.

Computational social science:
I have been doing work on sentiment analysis and related topics such as persuasion, framing, and "spin", with a particular interest in the connections among lexical semantics, surface linguistic expression, and underlying internal state. One area in which I'm excited about applying these ideas is computational political science. For example, why does my son say "My toy broke" instead of "I broke my toy"? He's using syntax to package up the statement about what happened in a way that de-emphasizes semantic properties such as causation, volition, and change-of-state. (This is an example of using syntax for "spin", just the same way that Ronald Reagan did in 1987 when he sidestepped attributing responsibility for the Iran-contra scandal; remember "Mistakes were made"? Precocious child.) My student Stephan Greene did a fascinating dissertation on this topic, and for a conference-paper-length description see our 2009 NAACL paper. Current topics of investigation include modeling syntax/semantics/sentiment connections in a Bayesian framework, bootstrapping multilingual sentiment analysis capabilities, and working with political scientists to model agenda setting and framing in political discourse. I've also been working with political scientist collaborators on the React Labs project, a smartphone app for large scale, real-time collection of people's responses during live events like political debates. Outside academia, I do real-world sentiment analysis as Lead Scientist with Converseon Inc., a leading social media firm.
Machine translation. My recent work has largely been focused on machine translation and multilingual natural language processing, exploiting parallel corpora and linguistically informed modeling in statistical machine translation and in multilingual natural language processing more generally (with a focus on Chinese and Arabic, as well as other less-studied languages). As part of this effort, my postdoc David Chiang (now at USC/ISI) developed Hiero, the first syntax-based system to demonstrate performance comparable to then state-of-the-art statistical phrase-based MT systems (see 2005 NIST MT Evaluation results). I have worked with a number of students to further improve hierarchical phrase-based translation, and some innovations include the introduction of lattice decoding (useful in translation of speech recognition output and also for text translation of morphologically complex languages), development of efficient algorithms for using suffix array representations in hierarchical decoding, use of English-to-English translation to create artificial reference translations for use in parameter tuning, the introduction of soft syntactic constraints based on source language structure, and exploitation of lattices and forests to represent source language paraphrase and syntactically driven reordering alternatives.
Crowdsourcing and translation:
Connected with my machine translation research, Ben Bederson and I have been working on an ambitious attempt to achieve low cost, high quality translation by taking advantage of monolingual human participants in a computer-assisted translation protocol, in a project we call "Translation as a Collaborative Process". We're blending ideas from machine translation, human computer-interfaces, and distributed human computation ("crowdsourcing"), and tackling the real-world problem of translating books in the International Children's Digital Library. We received a 2009 Google Research Award sponsoring this work, as well as funding from NSF. In September 2009, Ben gave a Google tech talk about the project which is available on YouTube. Ben and I now have a follow-up Google Research Award in which we're collaborating with Chris Callison-Burch to bring his crowdsourcing work and ours together in a framework we're calling "Translate the World".
Clinical informatics:
Since about 1999 I've been involved in natural language processing for clinical documentation. I helped start up CodeRyte, Inc., which became the nation's fastest growing provider of NLP solutions in healthcare (see, e.g., Deloitte's Technology Fast 500 and the Inc. 5000 listings) and was acquired in April 2012 by 3M Health Information Systems. I developed major pieces of the core technology, helped build an excellent language technology team, and I continue to advise on technology development and strategic direction. Somewhere along the way, much to my surprise, I was listed at #82 on the Future Health 100, a list of "the most creative and influential innovators working in healthcare today" at
Computational psycholinguistics:
During the next several years, I hope to re-engage more fully with my interests in computational psycholinguistics. I'm particularly interested in the possibility that ideas from (statistical) information theory may have a useful role to play in explaining why language works the way it does. (This is an idea I first began exploring in my dissertation [ps, pdf], back in 1993, and in recent years a variety of people like John Hale, Roger Levy, and Florian Jaeger, among others, have done very interesting work in the same spirit.) I'm also interested in using Bayesian modeling as a way to bring linguists here with cognitive modeling interests together with computational linguists focusing on applications. Momentum for that around here has already started building with the recent arrival of Naomi Feldman in our Linguistics Department.
Empirical linguistics:
I'm quite interested in promoting the use of naturally occurring data as evidence in linguistics research. I led the development of the Linguist's Search Engine, a tool designed to make it easier for linguists to search naturally occurring data using syntactic and lexical criteria. This tool was intended to make it easier for more linguists to go beyond the exclusive use of introspective judgments as empirical evidence, which can lead to useful and interesting results. In follow-on work with the Center for the Advanced Study of Language (CASL), we ported the LSE to Chinese, and the LSE code is available under an open source license. (Aaron Elkiss was the LSE's chief architect, implementor, and guru. I kept it running for a number of years after he graduated, but eventually retired it. Anyone interested in resurrecting it: the source code is available.)
Research examples

Philip Resnik is Professor of Linguistics at the University of Maryland, with a joint appointment at the University of Maryland Institute for  Advanced Computer Studies and an affiliate appointment in Computer Science. He received his bachelor's degree in Computer Science at Harvard in 1987 and his Ph.D. in Computer and Information Science at University of Pennsylvania in 1993, and joined the University of Maryland faculty in 1996. His industry experience prior to entering academia includes time in R&D at Bolt Beranek and Newman, IBM T.J. Watson Research Center, and Sun Microsystems Laboratories.  

Resnik's research focuses on computational modeling of language that brings together linguistic knowledge, domain-relevant context, and data-driven machine learning/modeling methods, with an emphasis on questions in computational social science, multilingual text analysis, and lexical semantics.  He holds two patents (plus one pending) and has authored or co-authored more than 100 peer-reviewed articles and conference papers. At various times his work has been highlighted in Newsweek, The Economist, New Scientist, and on National Public Radio, and he has been a frequent organizer and panelist at SXSW Interactive.
Outside academia, Resnik is a serial entrepreneur, with experience that includes being technical co-founder of CodeRyte (clinical natural language processing, acquired in 2012 by 3M Health Information Systems), lead scientist for Converseon (spearheading development of their sentiment analysis platform, now marketed as ConveyAPI), and founder of React Labs, which is commercializing his research on scalable real-time response measurement and engagement using mobile devices.