Download a BibTex file containing all these papers.
Publications (grouped by year): 2022, 2021, 2020, 2019, 2018, 2017, 2016.
2022
• .FRUIT: Faithfully Reflecting Updated Information in Text. Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 2022 Conference
[ BibTex ]
@inproceedings{fruit:naacl22,
author = {Robert L. Logan IV and Alexandre Passos and Sameer Singh and Ming-Wei Chang},
title = { {FRUIT: Faithfully Reflecting Updated Information in Text} },
booktitle = {Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)},
year = {2022}
}
• .ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension . Association for Computational Linguistics (ACL). 2022 Conference
[ BibTex ]
@inproceedings{reclip:acl22,
author = {Sanjay Subramanian and William Merrill and Trevor Darrell and Matt Gardner and Sameer Singh and Anna Rohrbach},
title = { {ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension } },
booktitle = {Association for Computational Linguistics (ACL)},
year = {2022}
}
• .Combining Feature and Instance Attribution to Detect Artifacts. Findings of the Association for Computational Linguistics (ACL Findings). 2022 Conference
PDFArXiV, Abstract, BibTex ]
Training the deep neural networks that dominate NLP requires large datasets. These are often collected automatically or via crowdsourcing, and may exhibit systematic biases or annotation artifacts. By the latter we mean spurious correlations between inputs and outputs that do not represent a generally held causal relationship between features and classes; models that exploit such correlations may appear to perform a given task well, but fail on out of sample data. In this paper we evaluate use of different attribution methods for aiding identification of training data artifacts. We propose new hybrid approaches that combine saliency maps (which highlight "important" input features) with instance attribution methods (which retrieve training samples "influential" to a given prediction). We show that this proposed training-feature attribution can be used to efficiently uncover artifacts in training data when a challenging validation set is available. We also carry out a small user study to evaluate whether these methods are useful to NLP researchers in practice, with promising results.
@inproceedings{tfa:facl22,
author = {Pouya Pezeshkpour and Sarthak Jain and Sameer Singh and Byron Wallace},
title = { {Combining Feature and Instance Attribution to Detect Artifacts} },
booktitle = {Findings of the Association for Computational Linguistics (ACL Findings)},
year = {2022}
}
• .Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models. Findings of the Association for Computational Linguistics (ACL Findings). 2022 Conference
[ BibTex ]
@inproceedings{cutting:facl22,
author = {Robert L. Logan IV and Ivana Balažević and Eric Wallace and Fabio Petroni and Sameer Singh and Sebastian Riedel},
title = { {Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models} },
booktitle = {Findings of the Association for Computational Linguistics (ACL Findings)},
year = {2022}
}
• .BottleFit: Learning Compressed Representations in Deep Neural Networks for Effective and Efficient Split Computing. IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM). 2022 Conference
ArXiV PagePDF, BibTex ]
@inproceedings{bottlefit:wowmom22,
author = {Yoshitomo Matsubara and Davide Callegaro and Sameer Singh and Marco Levorato and Francesco Restuccia},
title = { {BottleFit: Learning Compressed Representations in Deep Neural Networks for Effective and Efficient Split Computing} },
booktitle = {IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM)},
year = {2022}
}
• .An Empirical Comparison of Machine Learning Methods for Text-based Sentiment Analysis of Online Consumer Reviews. International Journal of Research in Marketing. 2022 Journal
Journal, BibTex ]
@article{sentiment:ijrm22,
author = {Huwail J.Alantari and Imran S.Currim and Yiting Deng and Sameer Singh},
title = { {An Empirical Comparison of Machine Learning Methods for Text-based Sentiment Analysis of Online Consumer Reviews} },
journal = {International Journal of Research in Marketing},
volume = {39},
number = {1},
doi = {10.1016/j.ijresmar.2021.10.011},
pages = {1-19},
year = {2022}
}
2021
• .COVR: A Test-Bed for Visually Grounded Compositional Generalization with Real Images. Empirical Methods in Natural Language Processing (EMNLP). 2021 Conference
PDFArXiVWebsiteCodeACL Anthology, BibTex ]
@inproceedings{covr:emnlp21,
author = {Ben Bogin and Shivanshu Gupta and Jonathan Berant and Matt Gardner},
title = { {COVR: A Test-Bed for Visually Grounded Compositional Generalization with Real Images} },
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
year = {2021}
}
• .Counterfactual Explanations Can Be Manipulated. Neural Information Processing Systems (NeurIPS). 2021 Conference
PDFArXiV, BibTex ]
@inproceedings{manipcfs:neurips21,
author = {Dylan Slack and Sophie Hilgard and Himabindu Lakkaraju and Sameer Singh},
title = { {Counterfactual Explanations Can Be Manipulated} },
booktitle = {Neural Information Processing Systems (NeurIPS)},
year = {2021}
}
• .Reliable Post hoc Explanations Modeling Uncertainty in Explainability. Neural Information Processing Systems (NeurIPS). 2021 Conference
PDFArXiV, BibTex ]
@inproceedings{bayeslimeshap:neurips21,
author = {Dylan Slack and Sophie Hilgard and Sameer Singh and Himabindu Lakkaraju},
title = { {Reliable Post hoc Explanations Modeling Uncertainty in Explainability} },
booktitle = {Neural Information Processing Systems (NeurIPS)},
year = {2021}
}
• .Generative Context Pair Selection for Multi-hop Question Answering. Empirical Methods in Natural Language Processing (EMNLP). 2021 Conference
PDFArXiVACL Anthology, BibTex ]
@inproceedings{genqa:emnlp21,
author = {Dheeru Dua and Cicero Nogueira dos Santos and Patrick Ng and Ben Athiwaratkun and Bing Xiang and Matt Gardner and Sameer Singh},
title = { {Generative Context Pair Selection for Multi-hop Question Answering} },
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
year = {2021}
}
• .Entity-Based Knowledge Conflicts in Question Answering. Empirical Methods in Natural Language Processing (EMNLP). 2021 Conference
PDFArXiVProject PageSource CodeACL Anthology, BibTex ]
@inproceedings{qaconflicts:emnlp21,
author = {Shayne Longpre and Kartik Perisetla and Anthony Chen and Nikhil Ramesh and Chris DuBois and Sameer Singh},
title = { {Entity-Based Knowledge Conflicts in Question Answering} },
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
year = {2021}
}
• .Learning with Instance Bundles for Reading Comprehension. Empirical Methods in Natural Language Processing (EMNLP). 2021 Conference
PDFArXiVACL Anthology, Abstract, BibTex ]
When training most modern reading comprehension models, all the questions associated with a context are treated as being independent from each other. However, closely related questions and their corresponding answers are not independent, and leveraging these relationships could provide a strong supervision signal to a model. Drawing on ideas from contrastive estimation, we introduce several new supervision techniques that compare question-answer scores across multiple related instances. Specifically, we normalize these scores across various neighborhoods of closely contrasting questions and/or answers, adding another cross entropy loss term that is used in addition to traditional maximum likelihood estimation. Our techniques require bundles of related question-answer pairs, which we can either mine from within existing data or create using various automated heuristics. We empirically demonstrate the effectiveness of training with instance bundles on two datasets -- HotpotQA and ROPES -- showing up to 11% absolute gains in accuracy.
@inproceedings{bundles:emnlp21,
author = {Dheeru Dua and Pradeep Dasigi and Sameer Singh and Matt Gardner},
title = { {Learning with Instance Bundles for Reading Comprehension} },
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
year = {2021}
}
• .Competency Problems: On Finding and Removing Artifacts in Language Data. Empirical Methods in Natural Language Processing (EMNLP). 2021 Conference
PDFArXiVACL Anthology, BibTex ]
@inproceedings{competency:emnlp21,
author = {Matt Gardner and William Merrill and Jesse Dodge and Matthew Peters and Alexis Ross and Sameer Singh and Noah A. Smith},
title = { {Competency Problems: On Finding and Removing Artifacts in Language Data} },
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
year = {2021}
}
• .Paired Examples as Indirect Supervision in Latent Decision Models. Empirical Methods in Natural Language Processing (EMNLP). 2021 Conference
PDFArXiVACL Anthology, BibTex ]
@inproceedings{pairednmn:emnlp21,
author = {Nitish Gupta and Sameer Singh and Matt Gardner and Dan Roth},
title = { {Paired Examples as Indirect Supervision in Latent Decision Models} },
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
year = {2021}
}
• .Calibrate Before Use: Improving Few-shot Performance of Language Models. International Conference on Machine Learning (ICML). 2021 Conference
PDFArXiVICML PageVideo/Slides, BibTex ]
@inproceedings{poisoning:icml21,
author = {Tony Z. Zhao and Eric Wallace and Shi Feng and Dan Klein and Sameer Singh},
title = { {Calibrate Before Use: Improving Few-shot Performance of Language Models} },
booktitle = {International Conference on Machine Learning (ICML)},
year = {2021}
}
• .Evaluating Entity Disambiguation and the Role of Popularity in Retrieval-Based NLP. Association for Computational Linguistics (ACL). 2021 Conference
ACL AnthologyPDF, BibTex ]
@inproceedings{amber:acl21,
author = {Anthony Chen and Pallavi Gudipati and Shayne Longpre and Xiao Ling and Sameer Singh},
title = { {Evaluating Entity Disambiguation and the Role of Popularity in Retrieval-Based NLP} },
booktitle = {Association for Computational Linguistics (ACL)},
doi = {10.18653/v1/2021.acl-long.345},
year = {2021}
}
• .Benchmarking Scalable Methods for Streaming Cross Document Coreference. Association for Computational Linguistics (ACL). 2021 Conference
ACL AnthologyPDF, BibTex ]
@inproceedings{streamingcdcr:acl21,
author = {Robert L. Logan IV and Andrew McCallum and Sameer Singh and Dan Bikel},
title = { {Benchmarking Scalable Methods for Streaming Cross Document Coreference} },
booktitle = {Association for Computational Linguistics (ACL)},
doi = {10.18653/v1/2021.acl-long.364},
year = {2021}
}
• .Enforcing Consistency in Weakly Supervised Semantic Parsing. Association for Computational Linguistics (ACL). 2021 Conference
ACL AnthologyPDF, BibTex ]
@inproceedings{spconsistency:acl21,
author = {Nitish Gupta and Sameer Singh and Matt Gardner},
title = { {Enforcing Consistency in Weakly Supervised Semantic Parsing} },
booktitle = {Association for Computational Linguistics (ACL)},
doi = {10.18653/v1/2021.acl-short.22},
year = {2021}
}
• .An Empirical Comparison of Instance Attribution Methods for NLP. Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 2021 Conference
PDFArXiV, Abstract, BibTex ]
Widespread adoption of deep pretrained (masked) neural language models has motivated a pressing need for approaches for interpreting network outputs and for facilitating model debugging. Instance attribution methods constitute one means of accomplishing these goals by retrieving training instances that (may have) led to a particular prediction. Influence functions (IF) provide machinery for doing this by quantifying the effect that perturbing individual train instances would have on a specific test prediction. However, even approximating the IF is computationally expensive, to a degree that may be prohibitive in many cases. Might simpler approaches (e.g., retrieving train instance most similar to a given test point) perform comparably? In this work we evaluate the degree to which different potential instance attribution agree with respect to the importance of training samples. We find that simple retrieval methods yield training instances that differ from those identified via gradient-based methods (such as the IF), but that nonetheless exhibit desirable characteristics similar to more complex attribution methods.
@inproceedings{emp-instance:naacl21,
author = {Pouya Pezeshkpour and Sarthak Jain and Byron Wallace and Sameer Singh},
title = { {An Empirical Comparison of Instance Attribution Methods for NLP} },
booktitle = {Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)},
year = {2021}
}
• .Concealed Data Poisoning Attacks on NLP Models. Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 2021 Conference
PDFArXiVACL AnthologyWebsiteCode, Abstract, BibTex ]
Adversarial attacks alter NLP model predictions by perturbing test-time inputs. However, it is much less understood whether, and how, predictions can be manipulated with small, concealed changes to the training data. In this work, we develop a new data poisoning attack that allows an adversary to control model predictions whenever a desired trigger phrase is present in the input. For instance, we insert 50 poison examples into a sentiment model’s training set that causes the model to frequently predict Positive whenever the input contains “James Bond”. Crucially, we craft these poison examples using a gradient-based procedure so that they do not mention the trigger phrase. We also apply our poison attack to language modeling (“Apple iPhone” triggers negative generations) and machine translation (“iced coffee” mistranslated as “hot coffee”). We conclude by proposing three defenses that can mitigate our attack at some cost in prediction accuracy or extra human annotation.
@inproceedings{poisoning:naacl21,
author = {Eric Wallace and Tony Z. Zhao and Shi Feng and Sameer Singh},
title = { {Concealed Data Poisoning Attacks on NLP Models} },
booktitle = {Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)},
year = {2021}
}
• .Improved Consistency Regularization for GANs. AAAI Conference on Artificial Intelligence (AAAI). 2021 Conference
PDFArXiV, Abstract, BibTex ]
Recent work has increased the performance of Generative Adversarial Networks (GANs) by enforcing a consistency cost on the discriminator. We improve on this technique in several ways. We first show that consistency regularization can introduce artifacts into the GAN samples and explain how to fix this issue. We then propose several modifications to the consistency regularization procedure designed to improve its performance. We carry out extensive experiments quantifying the benefit of our improvements. For unconditional image synthesis on CIFAR-10 and CelebA, our modifications yield the best known FID scores on various GAN architectures. For conditional image synthesis on CIFAR-10, we improve the state-of-the-art FID score from 11.48 to 9.21. Finally, on ImageNet-2012, we apply our technique to the original BigGAN model and improve the FID from 6.66 to 5.38, which is the best score at that model size.
@inproceedings{icrgan:aaai21,
author = {Zhengli Zhao and Sameer Singh and Honglak Lee and Zizhao Zhang and Augustus Odena and Han Zhang},
title = { {Improved Consistency Regularization for GANs} },
booktitle = {AAAI Conference on Artificial Intelligence (AAAI)},
year = {2021}
}
• .PARSINLU: A Suite of Language Understanding Challenges for Persian. Transactions of the Association for Computational Linguistics (TACL). 2021 Journal
PDFArXiV, Abstract, BibTex ]
Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this rich language. The availability of high-quality evaluation datasets is a necessity for reliable assessment of the progress on different NLU tasks and domains. We introduce PARSINLU, the first benchmark in Persian language that includes a range of high-level tasks — Reading Comprehension, Textual Entailment, etc. These datasets are collected in a multitude of ways, often involving manual annotations by native speakers. This results in over 14.5k new instances across 6 distinct NLU tasks. Besides, we present the first results on state-of-the-art monolingual and multi-lingual pre-trained language-models on this benchmark and compare them with human performance, which provides valuable insights into our ability to tackle natural language understanding challenges in Persian. We hope PARSINLU fosters further research and advances in Persian language understanding.
@article{parsinlu:tacl21,
author = {Daniel Khashabi and Arman Cohan and Siamak Shakeri and Pedram Hosseini and Pouya Pezeshkpour and Malihe Alikhani and Moin Aminnaseri and Marzieh Bitaab and Faeze Brahman and Sarik Ghazarian and Mozhdeh Gheini and Arman Kabiri and Rabeeh Karimi Mahabadi and Omid Memarrast and Ahmadreza Mosallanezhad and Erfan Noury and Shahab Raji and Mohammad Sadegh Rasooli and Sepideh Sadeghi and Erfan Sadeqi Azer and Niloofar Safi Samghabadi and Mahsa Shafaei and Saber Sheybani and Ali Tazarv and Yadollah Yaghoobzadeh},
title = { {PARSINLU: A Suite of Language Understanding Challenges for Persian} },
journal = {Transactions of the Association for Computational Linguistics (TACL)},
year = {2021}
}
• .Climatology and Evolution of the Antarctic Peninsula Föhn Wind‐Induced Melt Regime From 1979–2018. Journal of Geophysical Research: Atmospheres. 2021 Journal
Journal, BibTex ]
@article{fohn:jgr21,
author = {Matthew K Laffin and Charles Zender and Sameer Singh and J. Van Wessem and C. J. P. P. Smeets and C. H. Reijmer},
title = { {Climatology and Evolution of the Antarctic Peninsula Föhn Wind‐Induced Melt Regime From 1979–2018} },
journal = {Journal of Geophysical Research: Atmospheres},
volume = {126},
number = {4},
doi = {10.1029/2020JD033682},
year = {2021}
}
• .Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models. NeurIPS Workshop on Efficient Natural Language and Speech Processing (ENLSP). 2021 Workshop
Best Poster Award
ArXiVPDFCode, BibTex ]
@inproceedings{nullprompts:effnlp21,
author = {Robert L. Logan IV and Ivana Balažević and Eric Wallace and Fabio Petroni and Sameer Singh and Sebastian Riedel},
title = { {Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models} },
booktitle = {NeurIPS Workshop on Efficient Natural Language and Speech Processing (ENLSP)},
year = {2021}
}
• .Modular Framework for Visuomotor Language Grounding. Embodied AI Workshop at CVPR. 2021 Workshop
PDF, BibTex ]
@inproceedings{modulargl:embodied21,
author = {Kolby Nottingham and Litian Liang and Daeyun Shin and Charless C. Fowlkes and Roy Fox and Sameer Singh},
title = { {Modular Framework for Visuomotor Language Grounding} },
booktitle = {Embodied AI Workshop at CVPR},
year = {2021}
}
• .Deriving Behavioral Tests from Common Sense Knowledge Graphs. AAAI Workshop on Common Sense Knowledge Graphs (CSKGs). 2021 Workshop
PDF, Abstract, BibTex ]
Although NLP models have demonstrated “superhuman” performance on common sense reasoning tasks, it is unclear whether these models truly have common sense knowledge. Constructing evaluation datasets to test this knowledge is expensive due to the manual effort involved, and is also limited in scope. Meanwhile, common sense knowledge graphs (CSKGs) aim for a wide coverage of structured common sense knowledge, but can not be directly used for testing purposes. In this work, we introduce a semi-automated approach that leverages CSKGs to construct out-of-domain evaluation sets for NLP tasks that are more scalable than purely manual approaches. Using this procedure, we create test cases from two popular CSKGs—ConceptNet and ATOMIC—to test the common sense reasoning capability of models trained for natural language inference (NLI) and question answering (QA). These tests reveal interesting differences in failure modes of these models; models trained on NLI tend to perform better on tests of ontological knowledge, e.g. ’is a’ and ’used for’ relations, failing on tests that require understanding ’desires’, ’needs’, and ’wants’, while QA models perform better on tests that involve ’wants’, and ’desires’.
@inproceedings{cskgtests:cskg21,
author = {Yasaman Razeghi and Robert L. Logan IV and Sameer Singh},
title = { {Deriving Behavioral Tests from Common Sense Knowledge Graphs} },
booktitle = {AAAI Workshop on Common Sense Knowledge Graphs (CSKGs)},
year = {2021}
}
• .What Models Know About Their Attackers: Deriving Attacker Information From Latent Representations. EMNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP). 2021 Workshop
PDFACL Anthology, Abstract, BibTex ]
Adversarial attacks curated against NLP models are increasingly becoming practical threats. Although various methods have been developed to detect adversarial attacks, securing learning-based NLP systems in practice would require more than identifying and evading perturbed instances. To address these issues, we propose a new set of adversary identification tasks, Attacker Attribute Classification via Textual Analysis (AACTA), that attempts to obtain more detailed information about the attackers from adversarial texts. Specifically, given a piece of adversarial text, we hope to accomplish tasks such as localizing perturbed tokens, identifying the attacker’s access level to the target model, determining the evasion mechanism imposed, and specifying the perturbation type employed by the attacking algorithm. Our contributions are as follows: we formalize the task of classifying attacker attributes, and create a benchmark on various target models from sentiment classification and abuse detection domains. We show that signals from BERT models and target models can be used to train classifiers that reveal the properties of the attacking algorithms. We demonstrate that adversarial attacks leave interpretable traces in both feature spaces of pre-trained language models and target models, making AACTA a promising direction towards more trustworthy NLP systems.
@inproceedings{advdetect:bbox21,
author = {Zhouhang Xie and Jonathan Brophy and Adam Noack and Wencong You and Kalyani Asthana and Carter Perkins and Sabrina Reis and Zayd Hammoudeh and Daniel Lowd and Sameer Singh},
title = { {What Models Know About Their Attackers: Deriving Attacker Information From Latent Representations} },
booktitle = {EMNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP)},
year = {2021}
}
2020
• .AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts . Empirical Methods in Natural Language Processing (EMNLP). 2020 Conference
PDFWebsiteACL Anthology, Abstract, BibTex ]
The remarkable success of pretrained language models has motivated the study of what kinds of knowledge these models learn during pretraining. Reformulating tasks as fill-in-the-blanks problems (e.g., cloze tests) is a natural approach for gauging such knowledge, however, its usage is limited by the manual effort and guesswork required to write suitable prompts. To address this, we develop AutoPrompt, an automated method to create prompts for a diverse set of tasks, based on a gradient-guided search. Using AutoPrompt, we show that masked language models (MLMs) have an inherent capability to perform sentiment analysis and natural language inference without additional parameters or finetuning, sometimes achieving performance on par with recent state-of-the-art supervised models. We also show that our prompts elicit more accurate factual knowledge from MLMs than the manually created prompts on the LAMA benchmark, and that MLMs can be used as relation extractors more effectively than supervised relation extraction models. These results demonstrate that automatically generated prompts are a viable parameter-free alternative to existing probing methods, and as pretrained LMs become more sophisticated and capable, potentially a replacement for finetuning.
@inproceedings{autoprompt:emnlp20,
author = {Taylor Shin and Yasaman Razeghi and Robert L. Logan IV and Eric Wallace and Sameer Singh},
title = { {AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts } },
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
pages = {4222–4235},
year = {2020}
}
• .MOCHA: A Dataset for Training and Evaluating Generative Reading Comprehension Metrics. Empirical Methods in Natural Language Processing (EMNLP). 2020 Conference
PDFWebsiteACL Anthology, Abstract, BibTex ]
Posing reading comprehension as a generation problem provides a great deal of flexibility, allowing for open-ended questions with few restrictions on possible answers. However, progress is impeded by existing generation metrics, which rely on token overlap and are agnostic to the nuances of reading comprehension. To address this, we introduce a benchmark for training and evaluating generative reading comprehension metrics: MOdeling Correctness with Human Annotations. MOCHA contains 40K human judgement scores on model outputs from 6 diverse question answering datasets and an additional set of minimal pairs for evaluation. Using MOCHA, we train a Learned Evaluation metric for Reading Comprehension, LERC, to mimic human judgement scores. LERC outperforms baseline metrics by 10 to 36 absolute Pearson points on held-out annotations. When we evaluate robustness on minimal pairs, LERC achieves 80% accuracy, outperforming baselines by 14 to 26 absolute percentage points while leaving significant room for improvement. MOCHA presents a challenging problem for developing accurate and robust generative reading comprehension metrics.
@inproceedings{mocha:emnlp20,
author = {Anthony Chen and Gabriel Stanovsky and Sameer Singh and Matt Gardner},
title = { {MOCHA: A Dataset for Training and Evaluating Generative Reading Comprehension Metrics} },
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
pages = {6521–6532},
year = {2020}
}
• .Gradient-based Analysis of NLP Models is Manipulable. Findings of the Association for Computational Linguistics: EMNLP (EMNLP Findings). 2020 Conference
PDFWebsite, BibTex ]
@inproceedings{facade:femnlp20,
author = {Junlin Wang and Jens Tuyls and Eric Wallace and Sameer Singh},
title = { {Gradient-based Analysis of NLP Models is Manipulable} },
booktitle = {Findings of the Association for Computational Linguistics: EMNLP (EMNLP Findings)},
pages = {247–258},
year = {2020}
}
• .Evaluating Models’ Local Decision Boundaries via Contrast Sets. Findings of the Association for Computational Linguistics: EMNLP (EMNLP Findings). 2020 Conference
PDF, BibTex ]
@inproceedings{contrast:femnlp20,
author = {Matt Gardner and Yoav Artzi and Victoria Basmov and Jonathan Berant and Ben Bogin and Sihao Chen and Pradeep Dasigi and Dheeru Dua and Yanai Elazar and Ananth Gottumukkala and Nitish Gupta and Hannaneh Hajishirzi and Gabriel Ilharco and Daniel Khashabi and Kevin Lin and Jiangming Liu and Nelson F. Liu and Phoebe Mulcaire and Qiang Ning and Sameer Singh and Noah A. Smith and Sanjay Subramanian and Reut Tsarfaty and Eric Wallace and Ally Zhang and Ben Zhou},
title = { {Evaluating Models’ Local Decision Boundaries via Contrast Sets} },
booktitle = {Findings of the Association for Computational Linguistics: EMNLP (EMNLP Findings)},
pages = {1307–1323},
year = {2020}
}
• .MedICaT: A Dataset of Medical Images, Captions, and Textual References. Findings of the Association for Computational Linguistics: EMNLP (EMNLP Findings). 2020 Conference
PDF, BibTex ]
@inproceedings{medicat:femnlp20,
author = {Sanjay Subramanian and Lucy Lu Wang and Ben Bogin and Sachin Mehta and Madeleine van Zuylen and Sravanthi Parasa and Sameer Singh and Matt Gardner and Hannaneh Hajishirzi},
title = { {MedICaT: A Dataset of Medical Images, Captions, and Textual References} },
booktitle = {Findings of the Association for Computational Linguistics: EMNLP (EMNLP Findings)},
pages = {2112–2120},
year = {2020}
}
• .Beyond Accuracy: Behavioral Testing of NLP models with CheckList. Association for Computational Linguistics (ACL). 2020 Conference
Best Paper Award
PDFCodeACL AnthologyVideo+SlidesArXiV, Abstract, BibTex ]
Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance of NLP models, while alternative approaches for evaluating models either focus on individual tasks or on specific behaviors. Inspired by principles of behavioral testing in software engineering, we introduce CheckList, a task-agnostic methodology for testing NLP models. CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation, as well as a software tool to generate a large and diverse number of test cases quickly. We illustrate the utility of CheckList with tests for three tasks, identifying critical failures in both commercial and state-of-art models. In a user study, a team responsible for a commercial sentiment analysis model found new and actionable bugs in an extensively tested model. In another user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.
@inproceedings{checklist:acl20,
author = {Marco Tulio Ribeiro and Tongshuang Wu and Carlos Guestrin and Sameer Singh},
title = { {Beyond Accuracy: Behavioral Testing of NLP models with CheckList} },
booktitle = {Association for Computational Linguistics (ACL)},
pages = {4902-4912},
year = {2020}
}
• .On Importance Sampling-Based Evaluation of Latent Language Models. Association for Computational Linguistics (ACL). 2020 Conference
PDFACL AnthologyVideo+Slides, Abstract, BibTex ]
Language models that use additional latent structures (e.g., syntax trees, coreference chains, knowledge graph links) provide several advantages over traditional language models. However, likelihood-based evaluation of these models is often intractable as it requires marginalizing over the latent space. Existing works avoid this issue by using importance sampling. Although this approach has asymptotic guarantees, analysis is rarely conducted on the effect of decisions such as sample size and choice of proposal distribution on the reported estimates. In this paper, we carry out this analysis for three models: RNNG, EntityNLM, and KGLM. In addition, we elucidate subtle differences in how importance sampling is applied in these works that can have substantial effects on the final estimates, as well as provide theoretical results which reinforce the validity of this technique.
@inproceedings{impsample:acl20,
author = {Robert L. Logan IV and Matt Gardner and Sameer Singh},
title = { {On Importance Sampling-Based Evaluation of Latent Language Models} },
booktitle = {Association for Computational Linguistics (ACL)},
pages = {2171-2176},
year = {2020}
}
• .Obtaining Faithful Interpretations from Compositional Neural Networks. Association for Computational Linguistics (ACL). 2020 Conference
PDFACL AnthologyArXiVVideo+Slides, Abstract, BibTex ]
Neural module networks (NMNs) are a popular approach for modeling compositionality: they achieve high accuracy when applied to problems in language and vision, while reflecting the compositional structure of the problem in the network architecture. However, prior work implicitly assumed that the structure of the network modules, describing the abstract reasoning process, provides a faithful explanation of the model’s reasoning; that is, that all modules perform their intended behaviour. In this work, we propose and conduct a systematic evaluation of the intermediate outputs of NMNs on NLVR2 and DROP, two datasets which require composing multiple reasoning steps. We find that the intermediate outputs differ from the expected output, illustrating that the network structure does not provide a faithful explanation of model behaviour. To remedy that, we train the model with auxiliary supervision and propose particular choices for module architecture that yield much better faithfulness, at a minimal cost to accuracy.
@inproceedings{nmninterpret:acl20,
author = {Sanjay Subramanian and Ben Bogin and Nitish Gupta and Tomer Wolfson and Sameer Singh and Jonathan Berant and Matt Gardner},
title = { {Obtaining Faithful Interpretations from Compositional Neural Networks} },
booktitle = {Association for Computational Linguistics (ACL)},
pages = {5594-5608},
year = {2020}
}
• .Benefits of Intermediate Annotations in Reading Comprehension. Association for Computational Linguistics (ACL). 2020 Conference
PDFACL AnthologyVideo+Slides, Abstract, BibTex ]
Complex compositional reading comprehension datasets require performing latent sequential decisions that are learned via supervision from the final answer. A large combinatorial space of possible decision paths that result in the same answer, compounded by the lack of intermediate supervision to help choose the right path, makes the learning particularly hard for this task. In this work, we study the benefits of collecting intermediate reasoning supervision along with the answer during data collection. We find that these intermediate annotations can provide two-fold benefits. First, we observe that for any collection budget, spending a fraction of it on intermediate annotations results in improved model performance, for two complex compositional datasets: DROP and Quoref. Second, these annotations encourage the model to learn the correct latent reasoning steps, helping combat some of the biases introduced during the data collection process.
@inproceedings{intannot:acl20,
author = {Dheeru Dua and Sameer Singh and Matt Gardner},
title = { {Benefits of Intermediate Annotations in Reading Comprehension} },
booktitle = {Association for Computational Linguistics (ACL)},
pages = {5627-5634},
year = {2020}
}
• .Dynamic Sampling Strategies for Multi-Task Reading Comprehension. Association for Computational Linguistics (ACL). 2020 Conference
PDFACL AnthologyVideo+Slides, Abstract, BibTex ]
Building general reading comprehension systems, capable of solving multiple datasets at the same time, is a recent aspirational goal in the research community. Prior work has focused on model architecture or generalization to held out datasets, and largely passed over the particulars of the multi-task learning set up. We show that a simple dynamic sampling strategy, selecting instances for training proportional to the multi-task model’s current performance on a dataset relative to its single task performance, gives substantive gains over prior multi-task sampling strategies, mitigating the catastrophic forgetting that is common in multi-task learning. We also demonstrate that allowing instances of different tasks to be interleaved as much as possible between each epoch and batch has a clear benefit in multitask performance over forcing task homogeneity at the epoch or batch level. Our final model shows greatly increased performance over the best model on ORB, a recently-released multitask reading comprehension benchmark.
@inproceedings{dynsample:acl20,
author = {Ananth Gottumukkala and Dheeru Dua and Sameer Singh and Matt Gardner},
title = { {Dynamic Sampling Strategies for Multi-Task Reading Comprehension} },
booktitle = {Association for Computational Linguistics (ACL)},
pages = {920-924},
year = {2020}
}
• .Revisiting Evaluation of Knowledge Base Completion Models. Automated Knowledge Base Construction (AKBC). 2020 Conference
Runner-up for Best Paper Award
PDFYago3-TC DataVideo+SlidesOpenReviewAKBC Page, Abstract, BibTex ]
Representing knowledge graphs (KGs) by learning embeddings for entities and relations has led to accurate models for existing KG completion benchmarks. However, due to the open-world assumption of existing KGs, evaluation of KG completion uses ranking metrics and triple classification with negative samples, and is thus unable to directly assess models on the goals of the task: completion. In this paper, we first study the shortcomings of these evaluation metrics. Specifically, we demonstrate that these metrics (1) are unreliable for estimating how calibrated the models are, (2) make strong assumptions that are often violated, and 3) do not sufficiently, and consistently, differentiate embedding methods from each other, or from simpler approaches. To address these issues, we gather a semi-complete KG referred as YAGO3-TC, using a random subgraph from the test and validation data of YAGO3-10, which enables us to compute accurate triple classification accuracy on this data. Conducting thorough experiments on existing models, we provide new insights and directions for the KG completion research. Along with the dataset and the open source implementation of the models, we also provide a leaderboard for knowledge graph completion that consists of a hidden, and growing, test set, available at https://pouyapez.github.io/yago3-tc/.
@inproceedings{kbeval:akbc20,
author = {Pouya Pezeshkpour and Yifan Tian and Sameer Singh},
title = { {Revisiting Evaluation of Knowledge Base Completion Models} },
booktitle = {Automated Knowledge Base Construction (AKBC)},
year = {2020}
}
• .Building a Better Lie Detector with BERT: The Difference Between Truth and Lies. International Joint Conference on Neural Networks (IJCNN). 2020 Conference
PDF, BibTex ]
@inproceedings{bertdecept:ijcnn20,
author = {Dan Barsever and Sameer Singh and Emre Neftci},
title = { {Building a Better Lie Detector with BERT: The Difference Between Truth and Lies} },
booktitle = {International Joint Conference on Neural Networks (IJCNN)},
year = {2020}
}
• .Neural Module Networks for Reasoning over Text. International Conference on Learning Representations (ICLR). 2020 Conference
PDFarXivOpenReviewCode, Abstract, BibTex ]
Answering compositional questions that require multiple steps of reasoning against text is challenging, especially when they involve discrete, symbolic operations. Neural module networks (NMNs) learn to parse such questions as executable programs composed of learnable modules, performing well on synthetic visual QA domains. However, we find that it is challenging to learn these models for non-synthetic questions on open-domain text, where a model needs to deal with the diversity of natural language and perform a broader range of reasoning. We extend NMNs by: (a) introducing modules that reason over a paragraph of text, performing symbolic reasoning (such as arithmetic, sorting, counting) over numbers and dates in a probabilistic and differentiable manner; and (b) proposing an unsupervised auxiliary loss to help extract arguments associated with the events in text. Additionally, we show that a limited amount of heuristically-obtained question program and intermediate module output supervision provides sufficient inductive bias for accurate learning. Our proposed model significantly outperforms state-of-the-art models on a subset of the DROP dataset that poses a variety of reasoning challenges that are covered by our modules.
@inproceedings{nmn:iclr20,
author = {Nitish Gupta and Kevin Lin and Dan Roth and Sameer Singh and Matt Gardner},
title = { {Neural Module Networks for Reasoning over Text} },
booktitle = {International Conference on Learning Representations (ICLR)},
year = {2020}
}
• .Explain Your Move: Understanding Agent Actions Using Specific and Relevant Feature Attribution. International Conference on Learning Representations (ICLR). 2020 Conference
PDFProject pagearXivCode+DataOpenReview, Abstract, BibTex ]
As deep reinforcement learning (RL) is applied to more tasks, there is a need to visualize and understand the behavior of learned agents. Saliency maps explain agent behavior by highlighting the features of the input state that are most relevant for the agent in taking an action. Existing perturbation-based approaches to compute saliency often highlight regions of the input that are not relevant to the action taken by the agent. Our proposed approach, SARFA (Specific and Relevant Feature Attribution), generates more focused saliency maps by balancing two aspects (specificity and relevance) that capture different desiderata of saliency. The first captures the impact of perturbation on the relative expected reward of the action to be explained. The second downweighs irrelevant features that alter the relative expected rewards of actions other than the action to be explained. We compare SARFA with existing approaches on agents trained to play board games (Chess and Go) and Atari games (Breakout, Pong and Space Invaders). We show through illustrative examples (Chess, Atari, Go), human studies (Chess), and automated evaluation methods (Chess) that SARFA generates saliency maps that are more interpretable for humans than existing approaches. For the code release and demo videos, see: https://nikaashpuri.github.io/sarfa-saliency/.
@inproceedings{salrl:iclr20,
author = {Piyush Gupta and Nikaash Puri and Sukriti Verma and Dhruv Kayastha and Shripad Deshmukh and Balaji Krishnamurthy and Sameer Singh},
title = { {Explain Your Move: Understanding Agent Actions Using Specific and Relevant Feature Attribution} },
booktitle = {International Conference on Learning Representations (ICLR)},
year = {2020}
}
• .Minecraft as a Platform for Project-Based Learning in AI. AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI). 2020 Conference
PDFWebsitePosterSpotlightAAAI Page, Abstract, BibTex ]
Undergraduate courses that focus on open-ended, projectbased learning teach students how to define concrete goals, transfer conceptual understanding of algorithms to code, and evaluate/analyze/present their solution. However, AI, along with machine learning, is getting increasingly varied in terms of both the approaches and applications, making it challenging to design project courses that span a sufficiently wide spectrum of AI. For these reasons, existing AI project courses are restricted to a narrow set of approaches (e.g. only reinforcement learning) or applications (e.g. only computer vision).
In this paper, we propose to use Minecraft as the platform for teaching AI via project-based learning. Minecraft is an open-world sandbox game with elements of exploration, resource gathering, crafting, construction, and combat, and is supported by the Malmo library that provides a programmatic interface to the player observations and actions at various levels of granularity. In Minecraft, students can design projects to use approaches like search-based AI, reinforcement learning, supervised learning, and constraint satisfaction, on data types like text, audio, images, and tabular data. We describe our experience with an open-ended, undergraduate AI projects course using Minecraft that includes 82 different projects, covering themes that ranged from navigation, instruction following, object detection, combat, and music/image generation.
@inproceedings{malmo:eaai20,
author = {Sameer Singh},
title = { {Minecraft as a Platform for Project-Based Learning in AI} },
booktitle = {AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI)},
doi = {10.1609/aaai.v34i09.7070},
pages = {13504-13505},
year = {2020}
}
• .Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods. AAAI/ACM Conference on AI, Ethics, and Society (AIES). 2020 Conference
PDFarXivACM Page, Abstract, BibTex ]
As machine learning black boxes are increasingly being deployed in domains such as healthcare and criminal justice, there is growing emphasis on building tools and techniques for explaining these black boxes in an interpretable manner. Such explanations are being leveraged by domain experts to diagnose systematic errors and underlying biases of black boxes. In this paper, we demonstrate that post hoc explanations techniques that rely on input perturbations, such as LIME and SHAP, are not reliable. Specifically, we propose a novel scaffolding technique that effectively hides the biases of any given classifier by allowing an adversarial entity to craft an arbitrary desired explanation. Our approach can be used to scaffold any biased classifier in such a way that its predictions on the input data distribution still remain biased, but the post hoc explanations of the scaffolded classifier look innocuous. Using extensive evaluation with multiple real-world datasets (including COMPAS), we demonstrate how extremely biased (racist) classifiers crafted by our framework can easily fool popular explanation techniques such as LIME and SHAP into generating innocuous explanations which do not reflect the underlying biases.
@inproceedings{advlime:aies20,
author = {Dylan Slack and Sophie Hilgard and Emily Jia and Sameer Singh and Himabindu Lakkaraju},
title = { {Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods} },
booktitle = {AAAI/ACM Conference on AI, Ethics, and Society (AIES)},
doi = {10.1145/3375627.3375830},
pages = {180-186},
year = {2020}
}
• .Head Network Distillation: Splitting Distilled Deep Neural Networks for Resource-Constrained Edge Computing Systems. IEEE Access. 2020 Journal
Journal, BibTex ]
@article{headnet:ieee20,
author = {Yoshitomo Matsubara and Davide Callegaro and Sabur Baidya and Marco Levorato and Sameer Singh},
title = { {Head Network Distillation: Splitting Distilled Deep Neural Networks for Resource-Constrained Edge Computing Systems} },
journal = {IEEE Access},
volume = {126},
number = {4},
doi = {10.1109/ACCESS.2020.3039714},
year = {2020}
}
• .On the Utility of Active Instance Selection for Few-Shot Learning. NeurIPS Workshop on Human And Model in the Loop Evaluation and Training Strategies (HAMLETS). 2020 Workshop
PDFOpenReview, BibTex ]
@inproceedings{activefew:hamlets20,
author = {Pouya Pezeshkpour and Zhengli Zhao and Sameer Singh},
title = { {On the Utility of Active Instance Selection for Few-Shot Learning} },
booktitle = {NeurIPS Workshop on Human And Model in the Loop Evaluation and Training Strategies (HAMLETS)},
year = {2020}
}
• .COVIDLies: Detecting COVID-19 Misinformation on Social Media. EMNLP NLP Covid19 Workshop. 2020 Workshop
Best Paper Award
PDFACL AnthologyWebsite (w/ demo), Abstract, BibTex ]
The ongoing pandemic has heightened the need for developing tools to flag COVID-19-related misinformation on the internet, specifically on social media such as Twitter. However, due to novel language and the rapid change of information, existing misinformation detection datasets are not effective for evaluating systems designed to detect misinformation on this topic. Misinformation detection can be divided into two sub-tasks: (i) retrieval of misconceptions relevant to posts being checked for veracity, and (ii) stance detection to identify whether the posts Agree, Disagree, or express No Stance towards the retrieved misconceptions. To facilitate research on this task, we release COVIDLies (https://ucinlp.github.io/covid19), a dataset of 6761 expert-annotated tweets to evaluate the performance of misinformation detection systems on 86 different pieces of COVID-19 related misinformation. We evaluate existing NLP systems on this dataset, providing initial benchmarks and identifying key challenges for future models to improve upon.
@inproceedings{covidlies:nlpcovid20,
author = {Tamanna Hossain and Robert L. Logan IV and Arjuna Ugarte and Yoshitomo Matsubara and Sean Young and Sameer Singh},
title = { {COVIDLies: Detecting COVID-19 Misinformation on Social Media} },
booktitle = {EMNLP NLP Covid19 Workshop},
doi = {10.18653/v1/2020.nlpcovid19-2.11},
year = {2020}
}
• .Tweeki: Linking Named Entities on Twitter to a Knowledge Graph. EMNLP Workshop on Noisy, User-generated Text (W-NUT). 2020 Workshop
PDFACL Anthology, Abstract, BibTex ]
To identify what entities are being talked about in tweets, we need to automatically link named entities that appear in tweets to structured KBs like WikiData. Existing approaches often struggle with such short, noisy texts, or their complex design and reliance on supervision make them brittle, difficult to use and maintain, and lose significance over time. Further, there is a lack of a large, linked corpus of tweets to aid researchers, along with lack of gold dataset to evaluate the accuracy of entity linking. In this paper, we introduce (1) Tweeki, an unsupervised, modular entity linking system for Twitter, (2) TweekiData, a large, automatically-annotated corpus of Tweets linked to entities in WikiData, and (3) TweekiGold, a gold dataset for entity linking evaluation. Through comprehensive analysis, we show that Tweeki is comparable to the performance of recent state-of-the-art entity linkers models, the dataset is of high quality, and a use case of how the dataset can be used to improve downstream tasks in social media analysis (geolocation prediction).
@inproceedings{tweeki:wnut20,
author = {Bahareh Harandizadeh and Sameer Singh},
title = { {Tweeki: Linking Named Entities on Twitter to a Knowledge Graph} },
booktitle = {EMNLP Workshop on Noisy, User-generated Text (W-NUT)},
doi = {10.18653/v1/2020.wnut-1.29},
year = {2020}
}
• .Citations Beyond Self Citations: Identifying Authors, Affiliations, and Nationalities in Scientific Papers. Workshop on Mining Scientific Publications (WOSP). 2020 Workshop
PDFCodeACL Anthology, Abstract, BibTex ]
The question of the utility of the blind peer-review system is fundamental to scientific research. Some studies investigate exactly how “blind” the papers are in the double-blind review system by manually or automatically identifying the true authors, mainly suggesting the number of self-citations in the submitted manuscripts as the primary signal for identity. However, related work on the automated approaches are limited by the sizes of their datasets and the restricted experimental setup, thus they lack practical insights into the blind review process. In this work, we train models that identify the authors, their affiliations, and their nationalities through real-world, large-scale experiments on the Microsoft Academic Graph, including the cold start scenario. Our models are accurate; we identify at least one of authors, affiliations, and nationalities of held-out papers with 40.3%, 47.9% and 86.0% accuracy respectively, from the top-10 guesses of our models. However, through insights from the model, we demonstrate that these entities are identifiable with a small number of guesses primarily by using a combination of self-citations, social, and common citations. Moreover, our further analysis on the results leads to interesting findings, such as that prominent affiliations are easily identifiable (e.g. 93.8% of test papers written by Microsoft are identified with top-10 guesses). The experimental results show, against conventional belief, that the self-citations are no more informative than looking at the common citations, thus suggesting that removing self-citations is not sufficient for authors to maintain their anonymity.
@inproceedings{deblind:wosp20,
author = {Yoshitomo Matsubara and Sameer Singh},
title = { {Citations Beyond Self Citations: Identifying Authors, Affiliations, and Nationalities in Scientific Papers} },
booktitle = {Workshop on Mining Scientific Publications (WOSP)},
year = {2020}
}
• .Data Importance-Based Active Learning for Limited Labels. CVPR Workshop on Visual Learning with Limited Labels (VL3). 2020 Workshop
Video, BibTex ]
@inproceedings{ibal:vl320,
author = {Pouya Pezeshkpour and Zhengli Zhao and Sameer Singh},
title = { {Data Importance-Based Active Learning for Limited Labels} },
booktitle = {CVPR Workshop on Visual Learning with Limited Labels (VL3)},
year = {2020}
}
2019
• .Universal Adversarial Triggers for Attacking and Analyzing NLP. Empirical Methods in Natural Language Processing (EMNLP). 2019 Conference
PDFarXivBlog postCodeACL Anthology, Abstract, BibTex ]
Adversarial examples highlight model vulnerabilities and are useful for evaluation and interpretation. We define universal adversarial triggers: input-agnostic sequences of tokens that trigger a model to produce a specific prediction when concatenated to any input from a dataset. We propose a gradient-guided search over tokens which finds short trigger sequences (e.g., one word for classification and four words for language modeling) that successfully trigger the target prediction. For example, triggers cause SNLI entailment accuracy to drop from 89.94% to 0.55%, 72% of “why” questions in SQuAD to be answered “to kill american people”, and the GPT-2 language model to spew racist output even when conditioned on non-racial contexts. Furthermore, although the triggers are optimized using white-box access to a specific model, they transfer to other models for all tasks we consider. Finally, since triggers are input-agnostic, they provide an analysis of global model behavior. For instance, they confirm that SNLI models exploit dataset biases and help to diagnose heuristics learned by reading comprehension models.
@inproceedings{trigger:emnlp19,
author = {Eric Wallace and Shi Feng and Nikhil Kandpal and Matt Gardner and Sameer Singh},
title = { {Universal Adversarial Triggers for Attacking and Analyzing NLP} },
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
doi = {10.18653/v1/D19-1221},
pages = {2153-2162},
year = {2019}
}
• .Do NLP Models Know Numbers? Probing Numeracy in Embeddings. Empirical Methods in Natural Language Processing (EMNLP). 2019 Conference
PDFarXivACL Anthology, BibTex ]
@inproceedings{numeracy:emnlp19,
author = {Eric Wallace and Yizhong Wang and Sujian Li and Sameer Singh and Matt Gardner},
title = { {Do NLP Models Know Numbers? Probing Numeracy in Embeddings} },
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
doi = {10.18653/v1/D19-1534},
pages = {5307-5315},
year = {2019}
}
• .Knowledge Enhanced Contextual Word Representations. Empirical Methods in Natural Language Processing (EMNLP). 2019 Conference
PDFarXivACL Anthology, BibTex ]
@inproceedings{knobert:emnlp19,
author = {Matthew E. Peters and Mark Neumann and Robert L. Logan IV and Roy Schwartz and Vidur Joshi and Sameer Singh and Noah A. Smith},
title = { {Knowledge Enhanced Contextual Word Representations} },
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
doi = {10.18653/v1/D19-1005},
pages = {43-54},
year = {2019}
}
• .Barack's Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling. Association for Computational Linguistics (ACL). 2019 Conference
PDFarXivDataCodeACL Anthology, Abstract, BibTex ]
Modeling human language requires the ability to not only generate fluent text but also encode factual knowledge. However, traditional language models are only capable of remembering facts seen at training time, and often have difficulty recalling them. To address this, we introduce the knowledge graph language model (KGLM), a neural language model with mechanisms for selecting and copying facts from a knowledge graph that are relevant to the context. These mechanisms enable the model to render information it has never seen before, as well as generate out-of-vocabulary tokens. We also introduce the Linked WikiText-2 dataset, a corpus of annotated text aligned to the Wikidata knowledge graph whose contents (roughly) match the popular WikiText-2 benchmark. In experiments, we demonstrate that the KGLM achieves significantly better performance than a strong baseline language model. We additionally compare different language model’s ability to complete sentences requiring factual knowledge, showing that the KGLM outperforms even very large language models in generating facts.
@inproceedings{kglm:acl19,
author = {Robert L. Logan IV and Nelson F. Liu and Matthew E. Peters and Matt Gardner and Sameer Singh},
title = { {Barack's Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling} },
booktitle = {Association for Computational Linguistics (ACL)},
doi = {10.18653/v1/P19-1598},
pages = {5962-5971},
year = {2019}
}
• .Are Red Roses Red? Evaluating Consistency of Question-Answering Models. Association for Computational Linguistics (ACL). 2019 Conference
PDFACL Anthology, BibTex ]
@inproceedings{impl:acl19,
author = {Marco Tulio Ribeiro and Carlos Guestrin and Sameer Singh},
title = { {Are Red Roses Red? Evaluating Consistency of Question-Answering Models} },
booktitle = {Association for Computational Linguistics (ACL)},
doi = {10.18653/v1/P19-1621},
pages = {6174-6184},
year = {2019}
}
• .Compositional Questions Do Not Necessitate Multi-hop Reasoning. Association for Computational Linguistics (ACL). 2019 Conference
PDFarXivACL Anthology, BibTex ]
@inproceedings{mhop:acl19,
author = {Sewon Min and Eric Wallace and Sameer Singh and Matt Gardner and Hannaneh Hajishirzi and Luke Zettlemoyer},
title = { {Compositional Questions Do Not Necessitate Multi-hop Reasoning} },
booktitle = {Association for Computational Linguistics (ACL)},
doi = {10.18653/v1/P19-1416},
pages = {4249-4257},
year = {2019}
}
• .DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs. Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 2019 Conference
PDFWebsitearXivDataACL AnthologyLeaderboardDemo, Abstract, BibTex ]
Reading comprehension has recently seen rapid progress, with systems matching humans on the most popular datasets for the task. However, a large body of work has highlighted the brittleness of these systems, showing that there is much work left to be done. We introduce a new reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs. In this crowdsourced, adversarially-created, 55k-question benchmark, a system must resolve references in a question, perhaps to multiple input positions, and perform discrete operations over them (such as addition, counting, or sorting). These operations require a much more comprehensive understanding of the content of paragraphs, as they remove the paraphrase-and-entity-typing shortcuts available in prior datasets. We apply state-of-the-art methods from both the reading comprehension and semantic parsing literatures on this dataset and show that the best systems only achieve 38.4% F1 on our generalized accuracy metric, while expert human performance is 96%. We additionally present a new model that combines reading comprehension methods with simple numerical reasoning to achieve 51% F1.
@inproceedings{drop:naacl19,
author = {Dheeru Dua and Yizhong Wang and Pradeep Dasigi and Gabriel Stanovsky and Sameer Singh and Matt Gardner},
title = { {DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs} },
booktitle = {Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)},
doi = {10.18653/v1/N19-1246},
pages = {2368-2378},
year = {2019}
}
• .Investigating Robustness and Interpretability of Link Prediction via Adversarial Modifications. Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 2019 Conference
PDFWebsitearXivCodeVideoACL Anthology, Abstract, BibTex ]
Representing entities and relations in an embedding space is a well-studied approach for machine learning on relational data. Existing approaches, however, primarily focus on improving accuracy and overlook other aspects such as robustness and interpretability. In this paper, we propose adversarial modifications for link prediction models: identifying the fact to add into or remove from the knowledge graph that changes the prediction for a target fact after the model is retrained. Using these single modifications of the graph, we identify the most influential fact for a predicted link and evaluate the sensitivity of the model to the addition of fake facts. We introduce an efficient approach to estimate the effect of such modifications by approximating the change in the embeddings when the knowledge graph changes. To avoid the combinatorial search over all possible facts, we train a network to decode embeddings to their corresponding graph components, allowing the use of gradient-based optimization to identify the adversarial modification. We use these techniques to evaluate the robustness of link prediction models (by measuring sensitivity to additional facts), study interpretability through the facts most responsible for predictions (by identifying the most influential neighbors), and detect incorrect facts in the knowledge base.
@inproceedings{criage:naacl19,
author = {Pouya Pezeshkpour and Yifan Tian and Sameer Singh},
title = { {Investigating Robustness and Interpretability of Link Prediction via Adversarial Modifications} },
booktitle = {Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)},
doi = {10.18653/v1/N19-1337},
pages = {3336-3347},
year = {2019}
}
• .GenderQuant: Quantifying Mention-Level Genderedness. Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 2019 Conference
PDFWebsiteCodeACL Anthology, Abstract, BibTex ]
Language is gendered if the context surrounding a mention is suggestive of a particular binary gender for that mention. Detecting the different ways in which language is gendered is an important task since gendered language can bias NLP models (such as for coreference resolution). This task is challenging since genderedness is often expressed in subtle ways. Existing approaches need considerable annotation efforts for each language, domain, and author, and often require handcrafted lexicons and features. Additionally, these approaches do not provide a quantifiable measure of how gendered the text is, nor are they applicable at the fine-grained mention level.
In this paper, we use existing NLP pipelines to automatically annotate gender of mentions in the text. On corpora labeled using this method, we train a supervised classifier to predict the gender of any mention from its context and evaluate it on unseen text. The model confidence for a mention's gender can be used as a proxy to indicate the level of genderedness of the context. We test this gendered language detector on movie summaries, movie reviews, news articles, and fiction novels, achieving an AUC-ROC of up to 0.71, and observe that the model predictions agree with human judgments collected for this task. We also provide examples of detected gendered sentences from aforementioned domains.
@inproceedings{gender:naacl19,
author = {Ananya Ananya and Nitya Parthasarthi and Sameer Singh},
title = { {GenderQuant: Quantifying Mention-Level Genderedness} },
booktitle = {Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)},
doi = {10.18653/v1/N19-1303},
pages = {2959-2969},
year = {2019}
}
• .PoMo: Generating Entity-Specific Post-Modifiers in Context. Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 2019 Conference
PDFWebsitearXivDataACL Anthology, Abstract, BibTex ]
We introduce entity post-modifier generation as an instance of a collaborative writing task. Given a sentence about a target entity, the task is to automatically generate a post-modifier phrase that provides contextually relevant information about the entity. For example, for the sentence, "Barack Obama, _______, supported the #MeToo movement.", the phrase "a father of two girls" is a contextually relevant post-modifier. To this end, we build PoMo, a post-modifier dataset created automatically from news articles reflecting a journalistic need for incorporating entity information that is relevant to a particular news event. PoMo consists of more than 231K sentences with post-modifiers and associated facts extracted from Wikidata for around 57K unique entities. We use crowdsourcing to show that modeling contextual relevance is necessary for accurate post-modifier generation.
We adapt a number of existing generation approaches as baselines for this dataset. Our results show there is large room for improvement in terms of both identifying relevant facts to include (knowing which claims are relevant gives a >20% improvement in BLEU score), and generating appropriate post-modifier text for the context (providing relevant claims is not sufficient for accurate generation). We conduct an error analysis that suggests promising directions for future research.
@inproceedings{pomo:naacl19,
author = {Jun Seok Kang and Robert L. Logan IV and Zewei Chu and Yang Chen and Dheeru Dua and Kevin Gimpel and Sameer Singh and Niranjan Balasubramanian},
title = { {PoMo: Generating Entity-Specific Post-Modifiers in Context} },
booktitle = {Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)},
doi = {10.18653/v1/N19-1089},
pages = {826-838},
year = {2019}
}
• .AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models. Demo at the Empirical Methods in Natural Language Processing (EMNLP). 2019 Demo
Best Demonstration Paper Award.
PDFProject PageACL AnthologyArXivPoster, Abstract, BibTex ]
Neural NLP models are increasingly accurate but are imperfect and opaque---they break in counterintuitive ways and leave end users puzzled at their behavior. Model interpretation methods ameliorate this opacity by providing explanations for specific model predictions. Unfortunately, existing interpretation codebases make it difficult to apply these methods to new models and tasks, which hinders adoption for practitioners and burdens interpretability researchers. We introduce AllenNLP Interpret, a flexible framework for interpreting NLP models. The toolkit provides interpretation primitives (e.g., input gradients) for any AllenNLP model and task, a suite of built-in interpretation methods, and a library of front-end visualization components. We demonstrate the toolkit's flexibility and utility by implementing live demos for five interpretation methods (e.g., saliency maps and adversarial attacks) on a variety of models and tasks (e.g., masked language modeling using BERT and reading comprehension using BiDAF). These demos, alongside our code and tutorials, are available at https://allennlp.org/interpret.
@inproceedings{interpret:emnlp19,
author = {Eric Wallace and Jens Tuyls and Junlin Wang and Sanjay Subramanian and Matt Gardner and Sameer Singh},
title = { {AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models} },
booktitle = {Demo at the Empirical Methods in Natural Language Processing (EMNLP)},
doi = {10.18653/v1/D19-3002},
pages = {7-12},
year = {2019}
}
• .Detecting Conversation Topics in Primary Care Office Visits from Transcripts of Patient-Provider Interactions. Journal of the American Medical Informatics Association. 2019 Journal
PDFWebsite, BibTex ]
@article{convtopics:jamia19,
author = {Jihyun Park and Dimitrios Kotzias and Patty Kuo and Robert L. Logan IV and Kritzia Merced and Sameer Singh and Michael Tanana and Efi Karra-Taniskidou and Jennifer Elston Lafata and David C. Atkins and Ming Tai-Seale and Zac E Imel and Padhraic Smyth},
title = { {Detecting Conversation Topics in Primary Care Office Visits from Transcripts of Patient-Provider Interactions} },
journal = {Journal of the American Medical Informatics Association},
volume = {26},
number = {12},
doi = {10.1093/jamia/ocz140},
pages = {1493-1504},
year = {2019}
}
• .Comment on Semantic Based Adversarial Examples Fool Face Recognition. Synced Review. 2019 Online
Article, BibTex ]
@misc{review:synced19,
author = {Sameer Singh},
title = { {Comment on Semantic Based Adversarial Examples Fool Face Recognition} },
editor = {Synced Review},
month = {August},
url = {https://syncedreview.com/2019/08/09/semantic-based-adversarial-examples-fool-face-recognition/},
year = {2019}
}
• .Distilled Split Deep Neural Networks for Edge-Assisted Real-Time Systems. Mobicom Workshop on Hot Topics in Video Analytics and Intelligent Edges. 2019 Workshop
PDF, BibTex ]
@inproceedings{distill:hottopics19,
author = {Yoshitomo Matsubara and Sabur Baidya and Davide Callegaro and Marco Levorato and Sameer Singh},
title = { {Distilled Split Deep Neural Networks for Edge-Assisted Real-Time Systems} },
booktitle = {Mobicom Workshop on Hot Topics in Video Analytics and Intelligent Edges},
year = {2019}
}
• .Evaluating Question Answering Evaluation. Workshop on Machine Reading and Question Answering (MRQA). 2019 Workshop
Best Paper Award.
PDF, BibTex ]
@inproceedings{evalqa:mrqa19,
author = {Anthony Chen and Gabriel Stanovsky and Sameer Singh and Matt Gardner},
title = { {Evaluating Question Answering Evaluation} },
booktitle = {Workshop on Machine Reading and Question Answering (MRQA)},
year = {2019}
}
• .ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine Reading Comprehension. Workshop on Machine Reading and Question Answering (MRQA). 2019 Workshop
PDF, BibTex ]
@inproceedings{orb:mrqa19,
author = {Dheeru Dua and Ananth Gottumukkala and Alon Talmor and Sameer Singh and Matt Gardner},
title = { {ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine Reading Comprehension} },
booktitle = {Workshop on Machine Reading and Question Answering (MRQA)},
year = {2019}
}
• .Analyzing Compositionality of Visual Question Answering. NeurIPS Workshop on Visually Grounded Interaction and Language (ViGIL). 2019 Workshop
PDF, BibTex ]
@inproceedings{compvqa:vigil19,
author = {Sanjay Subramanian and Sameer Singh and Matt Gardner},
title = { {Analyzing Compositionality of Visual Question Answering} },
booktitle = {NeurIPS Workshop on Visually Grounded Interaction and Language (ViGIL)},
year = {2019}
}
• .Improving Differentially Private Models with Active Learning. NeurIPS Workshop on Privacy in Machine Learning (PriML). 2019 Workshop
PDFarXiv, BibTex ]
@inproceedings{dpal:priml19,
author = {Zhengli Zhao and Nicolas Papernot and Sameer Singh and Neoklis Polyzotis and Augustus Odena},
title = { {Improving Differentially Private Models with Active Learning} },
booktitle = {NeurIPS Workshop on Privacy in Machine Learning (PriML)},
year = {2019}
}
2018
• .From Reinforcement Learning to Deep Reinforcement Learning: An Overview. Braverman Readings in Machine Learning: Key Ideas from Inception to Current State, Springer Press. 2018 Chapter
PDF (Springer)SpringerAmazonGoogle Books, BibTex ]
@incollection{deeprl:chap18,
author = {Forest Agostinelli and Guillaume Hocquet and Sameer Singh and Pierre Baldi},
title = { {From Reinforcement Learning to Deep Reinforcement Learning: An Overview} },
booktitle = {Braverman Readings in Machine Learning: Key Ideas from Inception to Current State, Springer Press},
pages = {298-328},
year = {2018}
}
• .Embedding Multimodal Relational Data for Knowledge Base Completion. Empirical Methods in Natural Language Processing (EMNLP). 2018 Conference
PDFCode/DataarXivACL AnthologyVideo, Abstract, BibTex ]
Representing entities and relations in an embedding space is a well-studied approach for machine learning on relational data. Existing approaches, however, primarily focus on simple link structure between a finite set of entities, ignoring the variety of data types that are often used in knowledge bases, such as text, images, and numerical values. In this paper, we propose multimodal knowledge base embeddings (MKBE) that use different neural encoders for this variety of observed data, and combine them with existing relational models to learn embeddings of the entities and multimodal data. Further, using these learned embedings and different neural decoders, we introduce a novel multimodal imputation model to generate missing multimodal values, like text and images, from information in the knowledge base. We enrich existing relational datasets to create two novel benchmarks that contain additional information such as textual descriptions and images of the original entities. We demonstrate that our models utilize this additional information effectively to provide more accurate link prediction, achieving state-of-the-art results with a considerable gap of 5-7% over existing methods. Further, we evaluate the quality of our generated multimodal values via a user study.
@inproceedings{mmkb:emnlp18,
author = {Pouya Pezeshkpour and Liyan Chen and Sameer Singh},
title = { {Embedding Multimodal Relational Data for Knowledge Base Completion} },
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
doi = {10.18653/v1/D18-1359},
pages = {3208-3218},
year = {2018}
}
• .Interpretation of Natural Language Rules in Conversational Machine Reading. Empirical Methods in Natural Language Processing (EMNLP). 2018 Conference
PDFarXivACL Anthology, Abstract, BibTex ]
Most work in machine reading focuses on question answering problems where the answer is directly expressed in the text to read. However, many real-world question answering problems require the reading of text not because it contains the literal answer, but because it contains a recipe to derive an answer together with the reader's background knowledge. One example is the task of interpreting regulations to answer "Can I...?" or "Do I have to...?" questions such as "I am working in Canada. Do I have to carry on paying UK National Insurance?" after reading a UK government website about this topic. This task requires both the interpretation of rules and the application of background knowledge. It is further complicated due to the fact that, in practice, most questions are underspecified, and a human assistant will regularly have to ask clarification questions such as "How long have you been working abroad?" when the answer cannot be directly derived from the question and text. In this paper, we formalise this task and develop a crowd-sourcing strategy to collect 32k task instances based on real-world rules and crowd-generated questions and scenarios. We analyse the challenges of this task and assess its difficulty by evaluating the performance of rule-based and machine-learning baselines. We observe promising results when no background knowledge is necessary, and substantial room for improvement whenever background knowledge is needed.
@inproceedings{quarc:emnlp18,
author = {Marzieh Saeidi and Max Bartolo and Patrick Lewis and Sameer Singh and Tim Rocktaschel and Mike Sheldon and Guillaume Bouchard and Sebastian Riedel},
title = { {Interpretation of Natural Language Rules in Conversational Machine Reading} },
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
doi = {10.18653/v1/D18-1233},
pages = {2087-2097},
year = {2018}
}
• .Semantically Equivalent Adversarial Rules for Debugging NLP models. Association for Computational Linguistics (ACL). 2018 Conference
Honorable Mention for Best Paper.
PDFAppendixCodeACL AnthologyVideoSlides, Abstract, BibTex ]
Complex machine learning models for NLP are often brittle, making different predictions for input instances that are extremely similar semantically. To automatically detect this behavior for individual instances, we present semantically equivalent adversaries (SEAs) - semantic-preserving perturbations that induce changes in the model’s predictions. We generalize these adversaries into semantically equivalent adversarial rules (SEARs) - simple, universal replacement rules that induce adversaries on many instances. We demonstrate the usefulness and flexibility of SEAs and SEARs by detecting bugs in black-box state-of-the-art models for three domains: machine comprehension, visual question-answering, and sentiment analysis. Via user studies, we demonstrate that we generate high-quality local adversaries for more instances than humans, and that SEARs induce four times as many mistakes as the bugs discovered by human experts. SEARs are also actionable: retraining models using data augmentation significantly reduces bugs, while maintaining accuracy.
@inproceedings{sears:acl18,
author = {Marco Tulio Ribeiro and Sameer Singh and Carlos Guestrin},
title = { {Semantically Equivalent Adversarial Rules for Debugging NLP models} },
booktitle = {Association for Computational Linguistics (ACL)},
doi = {10.18653/v1/P18-1079},
pages = {856-865},
year = {2018}
}
• .Generating Natural Adversarial Examples. International Conference on Learning Representations (ICLR). 2018 Conference
PDFSource CodearXivOpenReview, Abstract, BibTex ]
Due to their complex nature, it is hard to characterize the ways in which machine learning models can misbehave or be exploited when deployed. Recent work on adversarial examples, i.e. inputs with minor perturbations that result in substantially different model predictions, is helpful in evaluating the robustness of these models by exposing the adversarial scenarios where they fail. However, these malicious perturbations are often unnatural, not semantically meaningful, and not applicable to complicated domains such as language. In this paper, we propose a framework to generate natural and legible adversarial examples that lie on the data manifold, by searching in semantic space of dense and continuous data representation, utilizing the recent advances in generative adversarial networks. We present generated adversaries to demonstrate the potential of the proposed approach for black-box classifiers for a wide range of applications such as image classification, textual entailment, and machine translation. We include experiments to show that the generated adversaries are natural, legible to humans, and useful in evaluating and analyzing black-box classifiers.
@inproceedings{natadv:iclr18,
author = {Zhengli Zhao and Dheeru Dua and Sameer Singh},
title = { {Generating Natural Adversarial Examples} },
booktitle = {International Conference on Learning Representations (ICLR)},
year = {2018}
}
• .Combining Symbolic Expressions and Black-box Function Evaluations for Training Neural Programs. International Conference on Learning Representations (ICLR). 2018 Conference
PDFSource CodearXivOpenReview, Abstract, BibTex ]
Neural programming involves training neural networks to learn programs, mathematics, or logic from data. Previous works have failed to achieve good generalization performance, especially on problems and programs with high complexity or on large domains. This is because they mostly rely either on black-box function evaluations that do not capture the structure of the program, or on detailed execution traces that are expensive to obtain, and hence the training data has poor coverage of the domain under consideration. We present a novel framework that utilizes black-box function evaluations, in conjunction with symbolic expressions that define relationships between the given functions. We employ tree LSTMs to incorporate the structure of the symbolic expression trees. We use tree encoding for numbers present in function evaluation data, based on their decimal representation. We present an evaluation benchmark for this task to demonstrate our proposed model combines symbolic reasoning and function evaluation in a fruitful manner, obtaining high accuracies in our experiments. Our framework generalizes significantly better to expressions of higher depth and is able to fill partial equations with valid completions.
@inproceedings{funeval:iclr18,
author = {Forough Arabshahi and Sameer Singh and Animashree Anandkumar},
title = { {Combining Symbolic Expressions and Black-box Function Evaluations for Training Neural Programs} },
booktitle = {International Conference on Learning Representations (ICLR)},
year = {2018}
}
• .Anchors: High-Precision Model-Agnostic Explanations. AAAI Conference on Artificial Intelligence (AAAI). 2018 Conference
PDFCode (package)Code (results)AAAI Page, Abstract, BibTex ]
We introduce a novel model-agnostic system that explains the behavior of complex models with high-precision rules called anchors, representing local, “sufficient” conditions for predictions. We propose an algorithm to efficiently compute these explanations for any black-box model with high-probability guarantees. We demonstrate the flexibility of anchors by explaining a myriad of different models for different domains and tasks. In a user study, we show that anchors enable users to predict how a model would behave on unseen instances with less effort and higher precision, as compared to existing linear explanations or no explanations.
@inproceedings{anchors:aaai18,
author = {Marco Tulio Ribeiro and Sameer Singh and Carlos Guestrin},
title = { {Anchors: High-Precision Model-Agnostic Explanations} },
booktitle = {AAAI Conference on Artificial Intelligence (AAAI)},
pages = {1527-1535},
year = {2018}
}
• .A Framework of Rapid Regional Tsunami Damage Recognition from Post-event TerraSAR-X Imagery Using Deep Neural Networks. IEEE Geoscience and Remote Sensing Letters. 2018 Journal
PDFIEEE, Abstract, BibTex ]
Near real-time building damage mapping is an indispensable prerequisite for governments to make decisions for disaster relief. With high-resolution synthetic aperture radar (SAR) systems, such as TerraSAR-X, the provision of such products in a fast and effective way becomes possible. In this letter, a deep learning-based framework for rapid regional tsunami damage recognition using post-event SAR imagery is proposed. To perform such a rapid damage mapping, a series of tile-based image split analysis is employed to generate the data set. Next, a selection algorithm with the SqueezeNet network is developed to swiftly distinguish between built-up (BU) and nonbuilt-up regions. Finally, a recognition algorithm with a modified wide residual network is developed to classify the BU regions into wash away, collapsed, and slightly damaged regions. Experiments performed on the TerraSAR-X data from the 2011 Tohoku earthquake and tsunami in Japan show a BU region extraction accuracy of 80.4% and a damage-level recognition accuracy of 74.8%, respectively. Our framework takes around 2 h to train on a new region, and only several minutes for prediction.
@article{tsunami:geosense18,
author = {Yanbing Bai and Chang Gao and Sameer Singh and Magaly Koch and Bruno Adriano and Erick Mas and Shunichi Koshimura},
title = { {A Framework of Rapid Regional Tsunami Damage Recognition from Post-event TerraSAR-X Imagery Using Deep Neural Networks} },
journal = {IEEE Geoscience and Remote Sensing Letters},
volume = {15},
number = {1},
doi = {10.1109/LGRS.2017.2772349},
pages = {43-47},
year = {2018}
}
• .Towards Solving Differential Equations through Neural Programming. ICML Workshop on Neural Abstract Machines and Program Induction (NAMPI). 2018 Workshop
PDFPoster, BibTex ]
@inproceedings{diffeqeval:nampi18,
author = {Forough Arabshahi and Sameer Singh and Animashree Anandkumar},
title = { {Towards Solving Differential Equations through Neural Programming} },
booktitle = {ICML Workshop on Neural Abstract Machines and Program Induction (NAMPI)},
year = {2018}
}
2017
• .Entity Linking via Joint Encoding of Types, Descriptions, and Context. Empirical Methods in Natural Language Processing (EMNLP). 2017 Conference
PDFCodeACL AnthologyWebsite, Abstract, BibTex ]
For accurate entity linking, we need to capture various information aspects of an entity, such as its description in a KB, contexts in which it is mentioned, and structured knowledge. Additionally, a linking system should work on texts from different domains without requiring domain-specific training data or hand-engineered features.
In this work we present a neural, modular entity linking system that learns a unified dense representation for each entity using multiple sources of information, such as its description, contexts around its mentions, and its fine-grained types. We show that the resulting entity linking system is effective at combining these sources, and performs competitively, sometimes out-performing current state-of-the-art systems across datasets, without requiring any domain-specific training data or hand-engineered features. We also show that our model can effectively "embed" entities that are new to the KB, and is able to link its mentions accurately.
@inproceedings{neuralel:emnlp17,
author = {Nitish Gupta and Sameer Singh and Dan Roth},
title = { {Entity Linking via Joint Encoding of Types, Descriptions, and Context} },
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
month = {September},
doi = {10.18653/v1/D17-1284},
pages = {2681-2690},
year = {2017}
}
• .Intelligent Data Filtering in Constrained IoT Systems. Asilomar Conference on Signals, Systems, and Computers. 2017 Invited
PDFIEEE Xplore, Abstract, BibTex ]
The expansion of complex autonomous sensing and control mechanisms in the Internet-of-Things systems clashes with constraints on computation and wireless communication resources. In this paper, we propose a framework to address this conflict for applications in which resolution using a centralized architecture with a general-purpose compression of observations is not appropriate. Three approaches for distributing observation detection workload between sensing and processing devices are considered for sensor systems within wireless islands. Each of the approaches is formulated for the shared configuration of a sensor-edge system, in which the network structure, observation monitoring problem, and machine learning-based detector implementing it are not modified. For every approach, a high-level strategy for realization of the detector for different assumptions on the relation between its complexity and the system's constraints is considered. In each case, the potential for the constraints' satisfaction is shown to exist and be exploitable via division, approximation, and delegation of the detector's workload to the sensing devices off the edge processor. We present examples of applications that benefit from the proposed approaches.
@inproceedings{semcompress:asilomar17,
author = {Igor Burago and Davide Callegaro and Marco Levorato and Sameer Singh},
title = { {Intelligent Data Filtering in Constrained IoT Systems} },
booktitle = {Asilomar Conference on Signals, Systems, and Computers},
year = {2017}
}
• .Semantic Compression for Edge-Assisted Systems. Information Theory and Applications (ITA) Workshop. 2017 Invited
PDFArXiv version, BibTex ]
@inproceedings{semcompress:ita17,
author = {Igor Burago and Marco Levorato and Sameer Singh},
title = { {Semantic Compression for Edge-Assisted Systems} },
booktitle = {Information Theory and Applications (ITA) Workshop},
month = {February},
year = {2017}
}
• .Generating Natural Adversarial Examples. NeurIPS Workshop on Machine Deception. 2017 Workshop
Amazon Best Poster Award at the Southern California Machine Learning Symposium.
Shorter version of the paper at ICLR 2018.
PDFArXiv (full paper), Abstract, BibTex ]
Due to their complex nature, it is hard to characterize the ways in which machine learning models can misbehave or be exploited when deployed. Recent work on adversarial examples, i.e. inputs with minor perturbations that result in substantially different model predictions, is helpful in evaluating the robustness of these models by exposing the adversarial scenarios where they fail. However, these malicious perturbations are often unnatural, not semantically meaningful, and not applicable to complicated domains such as language. In this paper, we propose a framework to generate natural and legible adversarial examples by searching in semantic space of dense and continuous data representation, utilizing the recent advances in generative adversarial networks. We present generated adversaries to demonstrate the potential of the proposed approach for black-box classifiers in a wide range of applications such as image classification, textual entailment, and machine translation. We include experiments to show that the generated adversaries are natural, legible to humans, and useful in evaluating and analyzing black-box classifiers.
@inproceedings{natadv:mldecept17,
author = {Zhengli Zhao and Dheeru Dua and Sameer Singh},
title = { {Generating Natural Adversarial Examples} },
booktitle = {NeurIPS Workshop on Machine Deception},
year = {2017}
}
• .How Biased Are We? Automated Detection of Gendered Language. ACL Workshop on Women and Underrepresented Minorities in NLP (WiNLP). 2017 Workshop
Also presented at the NeurIPS 2017 Workshop for Women in Machine Learning (WiML).
PDF, BibTex ]
@inproceedings{gender:winlp17,
author = {Ananya Ananya and Sameer Singh},
title = { {How Biased Are We? Automated Detection of Gendered Language} },
booktitle = {ACL Workshop on Women and Underrepresented Minorities in NLP (WiNLP)},
month = {August},
year = {2017}
}
• .Compact Factorization of Matrices Using Generalized Round-Rank. Southern California Machine Learning Symposium. 2017 Workshop
PDF, BibTex ]
@inproceedings{grank:southcal17,
author = {Pouya Pezeshkpour and Carlos Guestrin and Sameer Singh},
title = { {Compact Factorization of Matrices Using Generalized Round-Rank} },
booktitle = {Southern California Machine Learning Symposium},
year = {2017}
}
• .Embedding Multimodal Relational Data. Workshop on Automated Knowledge Base Construction (AKBC). 2017 Workshop
PDF, BibTex ]
@inproceedings{mmkbe:akbc17,
author = {Pouya Pezeshkpour and Liyan Chen and Sameer Singh},
title = { {Embedding Multimodal Relational Data} },
booktitle = {Workshop on Automated Knowledge Base Construction (AKBC)},
year = {2017}
}
• .Multimodal Attribute Extraction. Workshop on Automated Knowledge Base Construction (AKBC). 2017 Workshop
PDF, BibTex ]
@inproceedings{maed:akbc17,
author = {Robert L. Logan IV and Samuel Humeau and Sameer Singh},
title = { {Multimodal Attribute Extraction} },
booktitle = {Workshop on Automated Knowledge Base Construction (AKBC)},
year = {2017}
}
• .Relational Learning and Feature Extraction by Querying over Heterogeneous Information Networks. International Workshop on Statistical Relational AI (StarAI). 2017 Workshop
PDFArXiv version, Abstract, BibTex ]
Many real world systems need to operate on heterogeneous information networks that consist of numerous interacting components of different types. Examples include systems that perform data analysis on biological information networks; social networks; and information extraction systems processing unstructured data to convert raw text to knowledge graphs. Many previous works describe specialized approaches to perform specific types of analysis, mining and learning on such networks. In this work, we propose a unified framework consisting of a data model -a graph with a first order schema along with a declarative language for constructing, querying and manipulating such networks in ways that facilitate relational and structured machine learning. In particular, we provide an initial prototype for a relational and graph traversal query language where queries are directly used as relational features for structured machine learning models. Feature extraction is performed by making declarative graph traversal queries. Learning and inference models can directly operate on this relational representation and augment it with new data and knowledge that, in turn, is integrated seamlessly into the relational structure to support new predictions. We demonstrate this system's capabilities by showcasing tasks in natural language processing and computational biology domains.
@inproceedings{saul:starai17,
author = {Parisa Kordjamshidi and Sameer Singh and Daniel Khashabi and Christos Christodoulopoulos and Mark Summons and Saurabh Sinha and Dan Roth},
title = { {Relational Learning and Feature Extraction by Querying over Heterogeneous Information Networks} },
booktitle = {International Workshop on Statistical Relational AI (StarAI)},
month = {July},
year = {2017}
}
2016
• .Better call Saul: Flexible Programming for Learning and Inference in NLP. International Conference on Computational Linguistics (COLING). 2016 Conference
PDFACL Anthology, BibTex ]
@inproceedings{saul:coling16,
author = {Parisa Kordjamshidi and Daniel Khashabi and Christos Christodoulopoulos and Bhargav Mangipudi and Sameer Singh and Dan Roth},
title = { {Better call Saul: Flexible Programming for Learning and Inference in NLP} },
booktitle = {International Conference on Computational Linguistics (COLING)},
month = {December},
pages = {3030-3040},
year = {2016}
}
• .Connotation Frames: A Data-Driven Investigation. Association for Computational Linguistics (ACL). 2016 Conference
PDFarXivWebsiteACL Anthology, BibTex ]
@inproceedings{connot:acl16,
author = {Hannah Rashkin and Sameer Singh and Yejin Choi},
title = { {Connotation Frames: A Data-Driven Investigation} },
booktitle = {Association for Computational Linguistics (ACL)},
month = {August},
doi = {10.18653/v1/P16-1030},
pages = {311-321},
year = {2016}
}
• ."Why Should I Trust You?": Explaining the Predictions of Any Classifier. Knowledge Discovery and Data Mining (KDD). 2016 Conference
Audience Appreciation Award
Also presented at the CHI 2016 Workshop on Human-Centred Machine Learning (HCML).
PDFarXivCodeVideoO'ReillyCode (experiments)ACM Page, BibTex ]
@inproceedings{lime:kdd16,
author = {Marco Tulio Ribeiro and Sameer Singh and Carlos Guestrin},
title = { {"Why Should I Trust You?": Explaining the Predictions of Any Classifier} },
booktitle = {Knowledge Discovery and Data Mining (KDD)},
month = {August},
doi = {10.1145/2939672.2939778},
pages = {1135-1144},
year = {2016}
}
• ."Why Should I Trust You?": Explaining the Predictions of Any Classifier. Demo at the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 2016 Demo
Demonstration of the KDD 2016 paper.
PDFCode, BibTex ]
@inproceedings{lime:naacl16,
author = {Marco Tulio Ribeiro and Sameer Singh and Carlos Guestrin},
title = { {"Why Should I Trust You?": Explaining the Predictions of Any Classifier} },
booktitle = {Demo at the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)},
month = {June},
year = {2016}
}
• .Introduction to Local Interpretable Model-Agnostic Explanations (LIME). O'Reilly Media. 2016 Online
Article, BibTex ]
@misc{lime:oreilly16,
author = {Marco Tulio Ribeiro and Sameer Singh and Carlos Guestrin},
title = { {Introduction to Local Interpretable Model-Agnostic Explanations (LIME)} },
editor = {O'Reilly Media},
month = {August},
url = {https://www.oreilly.com/learning/introduction-to-local-interpretable-model-agnostic-explanations-lime},
year = {2016}
}
• .Programs as Black-Box Explanations. NeurIPS Workshop on Interpretable Machine Learning in Complex Systems. 2016 Workshop
PDFarXiv, Abstract, BibTex ]
Recent work in model-agnostic explanations of black-box machine learning has demonstrated that interpretability of complex models does not have to come at the cost of accuracy or model flexibility. However, it is not clear what kind of explanations, such as linear models, decision trees, and rule lists, are the appropriate family to consider, and different tasks and models may benefit from different kinds of explanations. Instead of picking a single family of representations, in this work we propose to use "programs" as model-agnostic explanations. We show that small programs can be expressive yet intuitive as explanations, and generalize over a number of existing interpretable families. We propose a prototype program induction method based on simulated annealing that approximates the local behavior of black-box classifiers around a specific prediction using random perturbations. Finally, we present preliminary application on small datasets and show that the generated explanations are intuitive and accurate for a number of classifiers.
@inproceedings{prog:nipsws16,
author = {Sameer Singh and Marco Tulio Ribeiro and Carlos Guestrin},
title = { {Programs as Black-Box Explanations} },
booktitle = {NeurIPS Workshop on Interpretable Machine Learning in Complex Systems},
month = {November},
year = {2016}
}
• .Nothing Else Matters: Model-Agnostic Explanations By Identifying Prediction Invariance. NeurIPS Workshop on Interpretable Machine Learning in Complex Systems. 2016 Workshop
PDFarXiv, Abstract, BibTex ]
At the core of interpretable machine learning is the question of whether humans are able to make accurate predictions about a model's behavior. Assumed in this question are three properties of the interpretable output: coverage, precision, and effort. Coverage refers to how often humans think they can predict the model's behavior, precision to how accurate humans are in those predictions, and effort is either the up-front effort required in interpreting the model, or the effort required to make predictions about a model's behavior.
In this work, we propose anchor-LIME (aLIME), a model-agnostic technique that produces high-precision rule-based explanations for which the coverage boundaries are very clear. We compare aLIME to linear LIME with simulated experiments, and demonstrate the flexibility of aLIME with qualitative examples from a variety of domains and tasks.
@inproceedings{anchor:nipsws16,
author = {Marco Tulio Ribeiro and Sameer Singh and Carlos Guestrin},
title = { {Nothing Else Matters: Model-Agnostic Explanations By Identifying Prediction Invariance} },
booktitle = {NeurIPS Workshop on Interpretable Machine Learning in Complex Systems},
month = {November},
year = {2016}
}
• ."Why Should I Trust You?": Explaining the Predictions of Any Classifier. CHI Workshop on Human-Centred Machine Learning (HCML). 2016 Workshop
Shorter version of the paper presented at KDD 2016.
PDF, BibTex ]
@inproceedings{lime:hcml16,
author = {Marco Tulio Ribeiro and Sameer Singh and Carlos Guestrin},
title = { {"Why Should I Trust You?": Explaining the Predictions of Any Classifier} },
booktitle = {CHI Workshop on Human-Centred Machine Learning (HCML)},
month = {May},
year = {2016}
}
• .Model-Agnostic Interpretability of Machine Learning. ICML Workshop on Human Interpretability in Machine Learning (WHI). 2016 Workshop
Best Paper Award
PDF, BibTex ]
@inproceedings{lime:whi16,
author = {Marco Tulio Ribeiro and Sameer Singh and Carlos Guestrin},
title = { {Model-Agnostic Interpretability of Machine Learning} },
booktitle = {ICML Workshop on Human Interpretability in Machine Learning (WHI)},
month = {June},
year = {2016}
}
• .Creating Interactive and Visual Educational Resources for AI. AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI). 2016 Workshop
PDFAAAI Page, Abstract, BibTex ]
Teaching artificial intelligence is effective if the experience is a visual and interactive one, with educational materials that utilize combinations of various content types such as text, math, and code into an integrated experience. Unfortunately, easy-to-use tools for creating such pedagogical resources are not available to the educators, resulting in most courses being taught using a disconnected set of static materials, which is not only ineffective for learning AI, but further, requires repeated and redundant effort for the instructor. In this paper, we introduce Moro, a software tool for easily creating and presenting AI-friendly teaching materials. Moro notebooks integrate content of different types (text, math, code, images), allow real-time interactions via modifiable and executable code blocks, and are viewable in browsers both as long-form pages and as presentations. Creating notebooks is easy and intuitive; the creation tool is also in-browser, is WYSIWYG for quick iterations of editing, and supports a variety of shortcuts and customizations for efficiency. We present three deployed case studies of Moro that widely differ from each other, demonstrating its utility in a variety of scenarios such as in-class teaching and conference tutorials.
@inproceedings{moro:eaai16,
author = {Sameer Singh and Sebastian Riedel},
title = { {Creating Interactive and Visual Educational Resources for {AI}} },
booktitle = {AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI)},
year = {2016}
}