image-title-here

Yang Xiao

  • Ph.D. student @ PolyU-CS
  • yangxiaocq12@gmail.com
  • PolyU Rm VA316
  • Natural Language Processing

About Me

Hi there. Currently, I am a Ph.D. student at The Hong Kong Polytechnic University (PolyU) advised by Prof. Wenjie Li. My research experience mainly focused on generative agent, dialogue generation and evaluation for NLP tasks but I’m open to other areas as well. My long-time research goal is to develop intelligent agents for human goods.

I received my Bachelor’s degree from Fudan University in 2022, majoring software engineering and computer science. During my Bachelor’s period, I worked closely with Dr. Pengfei Liu, Dr. Jinlan Fu and Prof. Graham Neubig, developing my interest in natural language processing.


Awards


Projects

  • DataLab: A Platform for Data Analysis and Intervention: Homepage Code
  • ExplainaBoard: An Explainable Leaderboard for NLP: Homepage Code

Selected Publications

A complete list is in Google Scholar.

* denotes the corresponding author.

2022

  • Are All the Datasets in Benchmark Necessary? A Pilot Study of Dataset Evaluation for Text Classification

    Yang Xiao, Jinlan Fu*, See-Kiong Ng, Pengfei Liu
    NAACL Full Text Code DataLab Abstract BibTeX
    In this paper, we ask the research question of whether all the datasets in the benchmark are necessary. We approach this by first characterizing the distinguishability of datasets when comparing different systems. Experiments on 9 datasets and 36 systems show that several existing benchmark datasets contribute little to discriminating top-scoring systems, while those less used datasets exhibit impressive discriminative power. We further, taking the text classification task as a case study, investigate the possibility of predicting dataset discrimination based on its properties (e.g., average sentence length). Our preliminary experiments promisingly show that given a sufficient number of training experimental records, a meaningful predictor can be learned to estimate dataset discrimination over unseen datasets. We released all datasets with features explored in this work on DataLab 
    @inproceedings{xiao2022eval,
      title = {Are All the Datasets in Benchmark Necessary? A Pilot Study of Dataset Evaluation for Text Classification},
      author = {Yang Xiao, Jinlan Fu, See-Kiong Ng, Pengfei Liu},
      booktitle = {NAACL},
      year = {2022}
    }
    
  • DataLab: A Platform for Data Analysis and Intervention

    Yang Xiao, Jinlan Fu, Weizhe Yuan, Vijay Viswanathan, Zhoumianze Liu, Yixin Liu, Graham Neubig, Pengfei Liu
    ACL-2022, Outstanding Demo Full Text Code DataLab Abstract BibTeX
    Despite data’s crucial role in machine learning, most existing tools and research tend to focus on systems on top of existing data rather than how to interpret and manipulate data.In this paper, we propose DataLab, a unified data-oriented platform that not only allows users to interactively analyze the characteristics of data but also provides a standardized interface so that many data processing operations can be provided within a unified interface. Additionally, in view of the ongoing surge in the proliferation of datasets, DataLab has features for dataset recommendation and global vision analysis that help researchers form a better view of the data ecosystem. So far, DataLab covers 1,300 datasets and 3,583 of its transformed version, where 313 datasets support different types of analysis (e.g., with respect to gender bias) with the help of 119M samples annotated by 318 feature functions. DataLab is under active development and will be supported going forward. We have released a web platform, web API, Python SDK, and PyPI published package, which hopefully, can meet the diverse needs of researchers. 
    @inproceedings{xiao2022datalab,
      title = {DataLab: A Platform for Data Analysis and Intervention},
      author = {Yang Xiao, Jinlan Fu, Weizhe Yuan, Vijay Viswanathan, Zhoumianze Liu, Yixin Liu, Graham Neubig, Pengfei Liu},
      booktitle = {ACL},
      year = {2022}
    }
    
  • On the Robustness of Reading Comprehension Models to Entity Renaming

    Jun Yan, Yang Xiao, Sagnik Mukherjee, Bill Yuchen Lin, Robin Jia, Xiang Ren
    NACL Full Text Code Abstract BibTeX
    We study the robustness of machine read- ing comprehension (MRC) models to entity renaming—do models make more wrong pre- dictions when the same questions are asked about an entity whose name has been changed? Such failures imply that models overly rely on entity information to answer questions, and thus may generalize poorly when facts about the world change or questions are asked about novel entities. To systematically audit this is- sue, we present a pipeline to automatically gen- erate test examples at scale, by replacing entity names in the original test sample with names from a variety of sources, ranging from names in the same test set, to common names in life, to arbitrary strings. Across five datasets and three pretrained model architectures, MRC models consistently perform worse when enti- ties are renamed, with particularly large accu- racy drops on datasets constructed via distant supervision. We also find large differences be- tween models: SpanBERT, which is pretrained with span-level masking, is more robust than RoBERTa, despite having similar accuracy on unperturbed test data. We further experiment with different masking strategies as the contin- ual pretraining objective and find that entity- based masking can improve the robustness of MRC models.
    @article{yan2021robustness,
      title={On the Robustness of Reading Comprehension Models to Entity Renaming},
      author={Yan, Jun and Xiao, Yang and Mukherjee, Sagnik and Lin, Bill Yuchen and Jia, Robin and Ren, Xiang},
      journal={arXiv preprint arXiv:2110.08555},
      year={2021}
    }
    

2021

  • EXPLAINABOARD: An Explainable Leaderboard for NLP

    Pengfei Liu, Jinlan Fu, Yang Xiao, Weizhe Yuan, Shuaicheng Chang, Junqi Dai, Yixin Liu, Zihuiwen Ye, Zi-Yi Dou, Graham Neubig
    ACL, Best Demo Full Text Code ExplainaBoard Abstract BibTeX
    With the rapid development of NLP research, leaderboards have emerged as one tool to track the performance of various systems on various NLP tasks. They are effective in this goal to some extent, but generally present a rather simplistic one-dimensional view of the submitted systems, communicated only through holistic accuracy numbers. In this paper, we present a new conceptualization and implementation of NLP evaluation: the ExplainaBoard, which in addition to inheriting the functionality of the standard leaderboard, also allows researchers to (i) diagnose strengths and weaknesses of a single system (e.g.~what is the best-performing system bad at?) (ii) interpret relationships between multiple systems. (e.g.~where does system A outperform system B? What if we combine systems A, B, and C?) and (iii) examine prediction results closely (e.g.~what are common errors made by multiple systems, or in what contexts do particular errors occur?). So far, ExplainaBoard covers more than 400 systems, 50 datasets, 40 languages, and 12 tasks. ExplainaBoard keeps updated and is recently upgraded by supporting (1) multilingual multi-task benchmark, (2) meta-evaluation, and (3) more complicated task: machine translation, which reviewers also suggested.} We not only released an online platform on the website \url{http://explainaboard.nlpedia.ai/} but also make our evaluation tool an API with MIT Licence at Github \url{https://github.com/neulab/explainaBoard} and PyPi \url{https://pypi.org/project/interpret-eval/} that allows users to conveniently assess their models offline. We additionally release all output files from systems that we have run or collected to motivate "output-driven" research in the future. 
    @inproceedings{liu2021explain,
      title = {EXPLAINABOARD: An Explainable Leaderboard for NLP},
      author = {Pengfei Liu, Jinlan Fu, Yang Xiao, Weizhe Yuan, Shuaicheng Chang, Junqi Dai, Yixin Liu, Zihuiwen Ye, Zi-Yi Dou, Graham Neubig},
      booktitle = {ACL},
      year = {2021}
    }