Tomek Strzalkowski


Prof. Tomek Strzalkowski research interests span a wide spectrum of human language technology including computational linguistics and sociolinguistics, socio-behavioral computing, interactive information retrieval, question-answering, human-computer dialogue, serious games, social media analytics, formal semantics, and reversible grammars. He has directed research sponsored by IARPA, DARPA, ARL, AFRL, NSF, the European Commission, NSERC, as well as a number of industry-funded projects. He was involved in IBM’s Jeopardy! Challenge in advanced question answering. Dr. Strzalkowski has published over a hundred and fifty scientific papers, and is the editor of several books, including Advances in Open Domain Question Answering. He serves on the Editorial Board of the journal of Natural Language Engineering.
Prior to joining RPI, Dr. Strzalkowski was Professor of Computer Science at SUNY Albany. At SUNY, he was the founding Director of the Institute for Informatics, Logics, and Security Studies with research budget of more than $35 million. He came to SUNY from GE CRD where he was a Natural Language Group Leader and Principal Scientist. At GE, Dr. Strzalkowski directed projects on automated technical manuals, medical informatics, speech recognition, automated summarization, as well as multimedia processing including language and video. Before coming to GE, he was a research faculty at the Courant Institute of New York University, where he worked on applications of natural language processing to information retrieval.
Current projects include research into social dimensions of information spread online, internet ethnography, and building effective AI defenses against disinformation and exploitation of human socio-cognitive vulnerabilities online, including social engineering attacks. Some example projects include:
GATOR: The Goal-oriented Autonomous Dialogue System. We develop a new type of human-machine dialogue system that uses deep learning technologies (such as transformers) to learn how to recognize and generate dialogue plans, i.e., semantic and pragmatic structures that represent one party’s goals and intentions, as well as the impact these are having on the other party. Unlike the current transformer-driven chatbots, the core learning is not to transform one language expression (input) into another language expression (response) but instead to construct a response plan that would properly address the plan in the input and the history of interaction. Consequently, the learning process takes three types of information: (1) the input utterance; (2) its semantic-pragmatic plan, i.e., the plan that was used to produce the utterance, and (3) the history of interaction up to this point. Furthermore, the cumulative history of the dialogue is not merely the memory of the utterances exchanged earlier, but it captures, in a condensed semantic form, the evolving state of the parties’ objectives as well as the emerging sociolinguistic behavioral patterns of both (all) parties.
Personalized AutoNomous Agents Countering Social Engineering Attacks (PANACEA) protects online users against current and future forms of social engineering. PANACEA serves as an intermediary between attackers (human, automated, hybrid, coordinated) and the potential victim(s) they target. Depending upon the nature and source of communication, PANACEA either handles it autonomously, or allows the user to proceed with an exchange while monitoring the conversation and intervening as needed by (1) inserting or modifying users’ messages, (2) instructing the user how best to respond, while at the same time (3) initiating an investigation to identify the attacker. (DARPA ASED Program)
COMETH (Computational Ethnography from Metaphors and Polarized Language). The objective of this project is to develop a methodology and accompanying software tools for constructing dynamic socio-behavioral models of communities based on online content that their members produce. A community can be defined by the set of salient concepts that its members recognize, along with the values they assign to them. The resulting causal models are then applied to derive culturally biased interpretations of novel information by prototyping the process by which such new information is adapted to fit into the community current model. (DARPA UGB)
Social Convos: A New Approach to Modeling Information Diffusion in Social Media. In this project, we recast our understanding of all social media as a landscape of collectives, or “convos”: sets of users connected by a common interest in an (possibly evolving) information artifact, such as a repository in GitHub, a subreddit in Reddit or a group of hashtags in Twitter. Convos are represented by the collections of features that capture their internal social dynamics. Furthermore, convos are basis for modeling large and small internet-based communities as “hybrid organisms” that interact in various ways with one another and react collectively to external stimuli, including information and disinformation campaigns. (DARPA SocialSim)

Other affililations: Computer Science


Other Focus Areas

Artificial Intelligence, Natural Language Processing, Computational Sociolinguistics


The following is a selection of recent publications in Scopus. Tomek Strzalkowski has 87 indexed publications in the subjects of Computer Science, Social Sciences, Arts and Humanities.

Ankita Bhaumik, Andy Bernhardt, Gregorios A. Katsios, Ning Sa, Tomek Strzalkowski
Proceedings of the Annual Meeting of the Association for Computational Linguistics
, 2023
, pp.441-451
Shannon Briggs, Sam Chabot, Abraham Sanders, Matthew Peveler, Tomek Strzalkowski, Jonas Braasch
2022 IEEE International Symposium on Technologies for Homeland Security, HST 2022
, 2022
Jennifer Tracey, Owen Rambow, Michael Arrigo, Claire Cardie, Adam Dalton, Hoa Dang, Mona Diab, Bonnie Dorr, Louise Guthrie, Magdalena Markowska, Smaranda Muresan, Vinodkumar Prabhakaran, Samira Shaikh, Tomek Strzalkowski, Janyce Wiebe
2022 Language Resources and Evaluation Conference, LREC 2022
, 2022
, pp.2460-2467
Abraham Sanders, Tomek Strzalkowski, Mei Si, Albert Chang, Deepanshu Dey, Jonas Braasch, Dakuo Wang
NAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference
, 2022
, pp.1194-1212
Brodie Mather, Bonnie J. Dorr, Owen Rambow, Tomek Strzalkowski
Proceedings of the International Florida Artificial Intelligence Research Society Conference, FLAIRS
, 34
, 2021
Tomek Strzalkowski, Anna Newheiser, Nathan Kemper, Ning Sa, Bharvee Acharya, Gregorios Katsios
Proceedings of the Annual Meeting of the Association for Computational Linguistics
, 2020
, pp.165-175
Sashank Santhanam, Zhuo Cheng, Brodie Mather, Bonnie Dorr, Archna Bhatia, Bryanna Hebenstreit, Alan Zemel, Adam Dalton, Tomek Strzalkowski, Samira Shaikh
Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020
, 2020
, pp.2736-2750
Bonnie J. Dorr, Archna Bhatia, Adam Dalton, Brodie Mather, Bryanna Hebenstreit, Sashank Santhanam, Z. Cheng, Samira Shaikh, Alan Zemel, Tomek Strzalkowski
AAAI 2020 - 34th AAAI Conference on Artificial Intelligence
, 2020
, pp.7675-7682
Gregorios Katsios, Ning Sa, Tomek Strzalkowski
Advances in Intelligent Systems and Computing
, 965
, 2020
, pp.25-36
Arun Sharma, Tomek Strzalkowski
LREC 2018 - 11th International Conference on Language Resources and Evaluation
, 2019
, pp.689-693

View All Scopus Publications

Back to top