Prof. Tomek Strzalkowski research interests span a wide spectrum of human language technology including computational linguistics and sociolinguistics, socio-behavioral computing, interactive information retrieval, question-answering, human-computer dialogue, serious games, social media analytics, formal semantics, and reversible grammars. He has directed research sponsored by IARPA, DARPA, ARL, AFRL, NSF, the European Commission, NSERC, as well as a number of industry-funded projects. He was involved in IBM’s Jeopardy! Challenge in advanced question answering. Dr. Strzalkowski has published over a hundred and fifty scientific papers, and is the editor of several books, including Advances in Open Domain Question Answering. He serves on the Editorial Board of the journal of Natural Language Engineering.
Prior to joining RPI, Dr. Strzalkowski was Professor of Computer Science at SUNY Albany. At SUNY, he was the founding Director of the Institute for Informatics, Logics, and Security Studies with research budget of more than $35 million. He came to SUNY from GE CRD where he was a Natural Language Group Leader and Principal Scientist. At GE, Dr. Strzalkowski directed projects on automated technical manuals, medical informatics, speech recognition, automated summarization, as well as multimedia processing including language and video. Before coming to GE, he was a research faculty at the Courant Institute of New York University, where he worked on applications of natural language processing to information retrieval.
Current projects include research into social dimensions of information spread online, internet ethnography, and building effective AI defenses against disinformation and exploitation of human socio-cognitive vulnerabilities online, including social engineering attacks. Some example projects include:
GATOR: The Goal-oriented Autonomous Dialogue System. We develop a new type of human-machine dialogue system that uses deep learning technologies (such as transformers) to learn how to recognize and generate dialogue plans, i.e., semantic and pragmatic structures that represent one party’s goals and intentions, as well as the impact these are having on the other party. Unlike the current transformer-driven chatbots, the core learning is not to transform one language expression (input) into another language expression (response) but instead to construct a response plan that would properly address the plan in the input and the history of interaction. Consequently, the learning process takes three types of information: (1) the input utterance; (2) its semantic-pragmatic plan, i.e., the plan that was used to produce the utterance, and (3) the history of interaction up to this point. Furthermore, the cumulative history of the dialogue is not merely the memory of the utterances exchanged earlier, but it captures, in a condensed semantic form, the evolving state of the parties’ objectives as well as the emerging sociolinguistic behavioral patterns of both (all) parties.
Personalized AutoNomous Agents Countering Social Engineering Attacks (PANACEA) protects online users against current and future forms of social engineering. PANACEA serves as an intermediary between attackers (human, automated, hybrid, coordinated) and the potential victim(s) they target. Depending upon the nature and source of communication, PANACEA either handles it autonomously, or allows the user to proceed with an exchange while monitoring the conversation and intervening as needed by (1) inserting or modifying users’ messages, (2) instructing the user how best to respond, while at the same time (3) initiating an investigation to identify the attacker. (DARPA ASED Program)
COMETH (Computational Ethnography from Metaphors and Polarized Language). The objective of this project is to develop a methodology and accompanying software tools for constructing dynamic socio-behavioral models of communities based on online content that their members produce. A community can be defined by the set of salient concepts that its members recognize, along with the values they assign to them. The resulting causal models are then applied to derive culturally biased interpretations of novel information by prototyping the process by which such new information is adapted to fit into the community current model. (DARPA UGB)
Social Convos: A New Approach to Modeling Information Diffusion in Social Media. In this project, we recast our understanding of all social media as a landscape of collectives, or “convos”: sets of users connected by a common interest in an (possibly evolving) information artifact, such as a repository in GitHub, a subreddit in Reddit or a group of hashtags in Twitter. Convos are represented by the collections of features that capture their internal social dynamics. Furthermore, convos are basis for modeling large and small internet-based communities as “hybrid organisms” that interact in various ways with one another and react collectively to external stimuli, including information and disinformation campaigns. (DARPA SocialSim)
Artificial Intelligence, Natural Language Processing, Computational Sociolinguistics
The following is a selection of recent publications in Scopus. Tomek Strzalkowski has 70 indexed publications in the subjects of Social Sciences, Computer Science, and Arts and Humanities.