Nicole Cote and Rob Hammond, New York University
With the relationship between rhetoric and political language, alongside the heightened dissemination of new information by means of the Internet, it is difficult to cipher the language that purposefully eludes its audience. Rhetoric is defined by the Oxford English Dictionary as “the art of using language effectively so as to persuade or influence others, esp. the exploitation of figures of speech and other compositional techniques to this end” . We believe that this deception is a form of obfuscation where politicians hide the ideas in their language through misleading narrative that is intended to confuse their true intentions.
We therefore suggest that in order to fully analyze a political speech with a natural language model, the existence of rhetoric needs to be considered; it is then necessary to ask if and how a model might accomplish the tasks of analyzing or filtering rhetoric. This project does not deal with obfuscation as a privacy tactic to redress a power imbalance in order to protect or hide information against a more powerful adversary. It approaches obfuscation not as a tool for users, but as something that can be exploited by those in a position of power to mislead a generally less informed audience to further the asymmetry of power. The theoretical model that we have proposed in this project would thus hope to complete two tasks in an effort to tackle rhetoric: 1) extractive summarization of the underlying ideas present in political text, and 2) categorization of the summarized political speeches into specific political ideologies; both combined are, to our knowledge, not tasks of which a singular existing language model is capable. The neural network approach we take stems from analysis of existing natural language processing and neural network approaches to political speech , attention ], and sentence summarization and entailment [4, 5], among other methods.
Our model also theoretically connects to the CLSA model posited by Cambria, et al. , who suggest that a truly effective singular language model should be an ensemble of models that excel at individual tasks; they accordingly offer a hypothetical model that takes all existing individual tasks, and puts them together to create a holistic individual model. The project is accordingly posited as a means to explore rhetoric, an open issue in neural network research, by thinking about how we might create one holistic model that combines numerous individual tasks from neural network models that already exist, and use these tasks individually for political text. With our theoretical model we would hope to be able to read a speech, extract a summary, and then classify the sentiment of the speech into a political ideology in order to inform a reader about a politician. We are interested in thinking about how to use politicians’ actual words (and not those of journalists writing about them) for training a neural network language model. Specifically, we have investigated a way to train the model on transcripts from speeches made by politicians within the context of a campaign, political rally, or debate. There are some general challenges on how to incorporate concepts such as the semantic separation of message and noise in a political speech where rhetoric and content may be found to be intertwined. However, most of the individual tasks involved in the project are already well-defined tasks within language modeling where this is work being produced to actively build upon the cited works.
Because this work is an attempt to join multiple models rather than create a new task, much of the model is relatively well defined from a computational perspective, but getting adequate data for supervised learning poses the biggest problem for this work. As this is a supervised machine learning task, a nuanced model requires finding ways to define “rhetoric” for the necessary tagging of the dataset (which would need to be comprised of thousands of political speeches) on which the model would train. A non-exhaustive list of linguistic options explored include ideas such as: repetition, excessive synonyms, frequent oppositional sentences/phrases (bait and switch), frequent pronouns (we, you, they). Because of the intertwined nature of content and rhetoric, an already complex idea to define, tagging thousands of speeches in a scalable, and consistent manner will be the most challenging aspect of this project. The model remains theoretical as curating such a dataset would have substantial financial, computational, and time costs. This theoretical model asks questions of how we solve problems with neural network models and how such models address nuanced language issues, such as the obfuscation of language through rhetoric—an area we identify as needing further exploration in the obfuscation and NLP communities.
 Iyyer,Mohit and Enns,Peter and Boyd-Graber,Jordan and Resnik,Philip. 2014. Political Ideology Detection Using Recursive Neural Networks. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pages 1113-1122.
 Olah,Chris and Carter,Shan. 2016. Attention and Augmented Recurrent Neural Networks.
 Bowman, Samuel R. and Angeli, Gabor and Potts, Christopher and Manning, Christopher D. 2015. A large annotated corpus for learning natural language inference. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.
 Rush, Alexander M. and Chopra,Sumit and Westo, Jasonn. 2015. A Neural Attention Model for Abstractive Sentence Summarization. arXiv preprint arXiv:1509.00685.
 Cambria, Erik and Poria, Soujanya and Bisio, Federica and Bajpai, Rajiv and Chaturvedi, Iti. 2015. The CLSA Model: A Novel Framework for Concept-Level Sentiment Analysis. International Conference on Intelligent Text Processing and Computational Linguistics.