research
2024
- Improving Probabilistic Models in Text Classification via Active LearningMitchell Bosley, Saki Kuzushima, Ted Enamorado, and 1 more authorAmerican Political Science Review, 2024
Social scientists often classify text documents to use the resulting labels as an outcome or a predictor in empirical research. Automated text classification has become a standard tool, since it requires less human coding. However, scholars still need many human-labeled documents to train automated classifiers. To reduce labeling costs, we propose a new algorithm for text classification that combines a probabilistic model with active learning. The probabilistic model uses both labeled and unlabeled data, and active learning concentrates labeling efforts on difficult documents to classify. Our validation study shows that the classification performance of our algorithm is comparable to state-of-the-art methods at a fraction of the computational cost. Moreover, we replicate two recently published articles and reach the same substantive conclusions with only a small proportion of the original labeled data used in those studies. We provide activeText, an open-source software to implement our method.
2023
- Public Preferences for International Law Compliance: Respecting Legal Obligations or Conforming to Common Practices?Saki Kuzushima, Kenneth Mori McElwain, and Yuki ShiraitoReview of International Organizations, 2023
Despite significant debate about the ability of international law to constrain state behavior, recent research points to domestic mechanisms that deter non-compliance, most notably public disapproval of governments that violate treaty agreements. However, existing studies have not explicitly differentiated two distinct, theoretically important motivations that underlie this disapproval: respect for legal obligations versus the desire to follow common global practices. We design an innovative survey experiment in Japan that manipulates information about these two potential channels directly. We examine attitudes towards four controversial practices that fall afoul of international law—same-surname marriage, whaling, hate speech regulation, and capital punishment—and find that the legal obligation cue has a stronger effect on respondent attitudes than the common practices cue. We also show subgroup differences based on partisanship and identification with global civil society. These results demonstrate that the legal nature of international law is crucial to domestic compliance pull.
- Paragraph-Citation Topic Models for Corpora with Citation NetworksByungKoo Kim, Saki Kuzushima, and Yuki ShiraitoWorking Paper, 2023
Topic modeling is one of the most popular approaches to statistical text analysis in many fields, especially in the social sciences. An important feature of text data in social sciences is that many corpora consist of document networks in which documents cite other documents. However, existing topic models either ignore the network structure or make simplifying assumptions that do not reflect the structural properties of actual citation networks. In this paper, we propose a topic model that jointly analyzes both text and citations. In the proposed paragraph-citation topic model (PCTM), topics are assigned to paragraphs rather than tokens. The topic of a paragraph then shapes both the distribution of words and the likelihood of citations emanating from that paragraph to other documents. To model the likelihood of citations to other documents, we introduce a latent citation propensity variable that incorporates two stylized facts about citation networks: the authority and the topic similarity of the documents. We demonstrate the utility of our model by applying it to two subsets of majority opinions of the Supreme Court of the United States: all opinions on Privacy and Voting Rights issues.