Tomek Strzalkowski

Constellation Chair Professor

About

Prof. Tomek Strzalkowski research interests span a wide spectrum of human language technology including computational linguistics and sociolinguistics, socio-behavioral computing, interactive information retrieval, question-answering, human-computer dialogue, serious games, social media analytics, formal semantics, and reversible grammars. He has directed research sponsored by IARPA, DARPA, ARL, AFRL, NSF, the European Commission, NSERC, as well as a number of industry-funded projects. He was involved in IBM’s Jeopardy! Challenge in advanced question answering. Dr. Strzalkowski has published over a hundred and fifty scientific papers, and is the editor of several books, including Advances in Open Domain Question Answering. He serves on the Editorial Board of the journal of Natural Language Engineering.

Prior to joining RPI, Dr. Strzalkowski was Professor of Computer Science at SUNY Albany. At SUNY, he was the founding Director of the Institute for Informatics, Logics, and Security Studies with research budget of more than $35 million. He came to SUNY from GE CRD where he was a Natural Language Group Leader and Principal Scientist. At GE, Dr. Strzalkowski directed projects on automated technical manuals, medical informatics, speech recognition, automated summarization, as well as multimedia processing including language and video. Before coming to GE, he was a research faculty at the Courant Institute of New York University, where he worked on applications of natural language processing to information retrieval.

Current projects include research into social dimensions of information spread online, internet ethnography, and building effective AI defenses against disinformation and exploitation of human socio-cognitive vulnerabilities online, including social engineering attacks. Some example projects include:
GATOR: The Goal-oriented Autonomous Dialogue System. We develop a new type of human-machine dialogue system that uses deep learning technologies (such as transformers) to learn how to recognize and generate dialogue plans, i.e., semantic and pragmatic structures that represent one party’s goals and intentions, as well as the impact these are having on the other party. Unlike the current transformer-driven chatbots, the core learning is not to transform one language expression (input) into another language expression (response) but instead to construct a response plan that would properly address the plan in the input and the history of interaction. Consequently, the learning process takes three types of information: (1) the input utterance; (2) its semantic-pragmatic plan, i.e., the plan that was used to produce the utterance, and (3) the history of interaction up to this point. Furthermore, the cumulative history of the dialogue is not merely the memory of the utterances exchanged earlier, but it captures, in a condensed semantic form, the evolving state of the parties’ objectives as well as the emerging sociolinguistic behavioral patterns of both (all) parties.
Personalized AutoNomous Agents Countering Social Engineering Attacks (PANACEA) protects online users against current and future forms of social engineering. PANACEA serves as an intermediary between attackers (human, automated, hybrid, coordinated) and the potential victim(s) they target. Depending upon the nature and source of communication, PANACEA either handles it autonomously, or allows the user to proceed with an exchange while monitoring the conversation and intervening as needed by (1) inserting or modifying users’ messages, (2) instructing the user how best to respond, while at the same time (3) initiating an investigation to identify the attacker. (DARPA ASED Program)
COMETH (Computational Ethnography from Metaphors and Polarized Language). The objective of this project is to develop a methodology and accompanying software tools for constructing dynamic socio-behavioral models of communities based on online content that their members produce. A community can be defined by the set of salient concepts that its members recognize, along with the values they assign to them. The resulting causal models are then applied to derive culturally biased interpretations of novel information by prototyping the process by which such new information is adapted to fit into the community current model. (DARPA UGB)
Social Convos: A New Approach to Modeling Information Diffusion in Social Media. In this project, we recast our understanding of all social media as a landscape of collectives, or “convos”: sets of users connected by a common interest in an (possibly evolving) information artifact, such as a repository in GitHub, a subreddit in Reddit or a group of hashtags in Twitter. Convos are represented by the collections of features that capture their internal social dynamics. Furthermore, convos are basis for modeling large and small internet-based communities as “hybrid organisms” that interact in various ways with one another and react collectively to external stimuli, including information and disinformation campaigns. (DARPA SocialSim)

Other affililations: Computer Science

Research

Other Focus Areas

Artificial Intelligence, Natural Language Processing, Computational Sociolinguistics

Research Centers

Institute for Data Exploration and Applications (IDEA)

Publications

The following is a selection of recent publications in Scopus. Tomek Strzalkowski has 85 indexed publications in the subjects of Arts and Humanities, Social Sciences, Social Sciences.

ARCLIGHT: Automated Clustering and Curriculum Learning Guided by Human Training

Jamie McCusker, Henrique Santos, Rishi Singh, Sabbir M. Rashid, Abraham Sanders, Grace Roessling, Hongji Guo, Bashirul Biswas, Deborah L. McGuinness, Tomek Strzalkowski, Qiang Ji, Jay Miller

Ceur Workshop Proceedings

, 3953

, 2025

Prediction of U.S. daily mask wearing and social distancing using psychologically valid agents during three waves of COVID-19

Choh Man Teng, Peter Pirolli, Archna Bhatia, Kathleen Carley, Bonnie Dorr, Christian Lebiere, Brodie Mather, Konstantinos Mitsopoulos, Don Morrison, Mark Orr, Tomek Strzalkowski

Frontiers in Epidemiology

, 5

, 2025

Figuratively Speaking: Authorship Attribution via Multi-Task Figurative Language Modeling

Gregorios A. Katsios, Ning Sa, Tomek Strzalkowski

Proceedings of the Annual Meeting of the Association for Computational Linguistics

, 2024

, pp.13240-13255

Social Convos: Capturing Agendas and Emotions on Social Media

Ankita Bhaumik, Ning Sa, Gregorios Katsios, Tomek Strzalkowski

2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings

, 2024

, pp.14984-14994

Uncovering Agendas: A Novel French & English Dataset for Agenda Detection on Social Media

Gregorios A. Katsios, Ning Sa, Ankita Bhaumik, Tomek Strzalkowski

2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings

, 2024

, pp.16984-16997

Adapting Emotion Detection to Analyze Influence Campaigns on Social Media

Ankita Bhaumik, Andy Bernhardt, Gregorios A. Katsios, Ning Sa, Tomek Strzalkowski

Proceedings of the Annual Meeting of the Association for Computational Linguistics

, 2023

, pp.441-451

Multiuser, multimodal sensemaking cognitive immersive environment with a task-oriented dialog system

Shannon Briggs, Sam Chabot, Abraham Sanders, Matthew Peveler, Tomek Strzalkowski, Jonas Braasch

2022 IEEE International Symposium on Technologies for Homeland Security, HST 2022

, 2022

BeSt: The Belief and Sentiment Corpus

Jennifer Tracey, Owen Rambow, Michael Arrigo, Claire Cardie, Adam Dalton, Hoa Dang, Mona Diab, Bonnie Dorr, Louise Guthrie, Magdalena Markowska, Smaranda Muresan, Vinodkumar Prabhakaran, Samira Shaikh, Tomek Strzalkowski, Janyce Wiebe

2022 Language Resources and Evaluation Conference, LREC 2022

, 2022

, pp.2460-2467

Towards a Progression-Aware Autonomous Dialogue Agent

Abraham Sanders, Tomek Strzalkowski, Mei Si, Albert Chang, Deepanshu Dey, Jonas Braasch, Dakuo Wang

NAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference

, 2022

, pp.1194-1212

A General Framework for Domain-Specialization of Stance Detection: A Covid-19 Response Use Case

Brodie Mather, Bonnie J. Dorr, Owen Rambow, Tomek Strzalkowski

Proceedings of the International Florida Artificial Intelligence Research Society Conference, FLAIRS

, 34

, 2021

View All Scopus Publications