Department of Computer Science
USC Viterbi School of Engineering
I'm an Assistant Professor in the Department of Computer Science at USC affliated with USC Machine Learning Center and USC ISI. Priorly, I was a visiting researcher at Stanford University collaborating with Dan Jurafsky and Jure Leskovec, and received my PhD in CS@UIUC where I worked with Jiawei Han. I'm interested in computational methods and systems that extract machine-actionable knowledge from massive unstructured data (e.g., text data). I'm particularly excited about problems in the space of modeling sequence and graph data under weak supervision (learning with partial/noisy labels, semi-supervised learning) and indirect supervision (multi-task learning, transfer learning, reinforcement learning). My dissertation research recevied a Google PhD Fellowship, a Yahoo!-DAIS Research Excellence Award and a David J. Kuck Outstanding Thesis Award.
Research- I'm co-organizing the 1st Workshop on Knowledge Base Construction, Reasoning and Mining (KBCOM'18) co-located with WSDM'18 on Feb 9, 2018. We have invited speakers including Luna Dong, Oren Etzioni, Lise Getoor, Alon Halevy, Monica Lam, Chris Ré, Xifeng Yan and Luke Zettlemoyer. See you at LA!
Blog posts: Information Extraction with Indirection Supervision and Heterougeneous Supervision, Dynamic Network Embedding.
- Learning with weak supervision: In many information extraction tasks, direct supervision in the form of manually-annotated text sequences is expensive to obtain but different kinds of weak supervisions (e.g., KB facts, hand-craft rules, crowd-sourced labels, user feedbacks) are much easier to collect at a large scale. Our WWW 2017 tutorial summarize recent advances on denoising distant supervision, multi-tasking extraction, and leveraging QA data as indirection supervision.
- To self-learn from a few examples of given relations (and a large corpus), REPEL jointly optimize an embedding-based discriminator and a pattern-based generator.
- Both human annotators and external knowledge bases can provide weak supervision for information extraction tasks. Such heterogenous forms of weak supervisions trades off label quality with the amount of labeled data one can obtain. How could we leverage these heterogenous supervisions in a principled way?
- Indirection supervision may result in noisily- and partially-labeled data. This is especially challenging when dealing with a complex label space (e.g., a label hierarchy). We propose hierarchical partial-label embeddingn to overcome these issues.
NewsDec 2017 - To give a full-day tutorial at WWW 2018 about "Construction and Querying of Large-scale Knowledge Bases".
Nov 2017 - Two papaers got accepted to AAAI 2018. Congrats Lucas and Lekui!
Oct 2017 - Our new paper on improving relation extraction with question-answer pairs has been accepted to WSDM'18. Congrats Ellen!
Oct 2017 - Excited to be back to CS@Illinois and recieved a David Kuck Master Thesis Award.
Aug 2017 - Talking about entity and relation typing in KDD'17 tutorial on Mining Entity-Relation-Attribute Structures from Massive Text Data.
Aug 2017 - Talking about text-rich recommendation models in KDD'17 tutorial on Context-Rich Recommendation: Integrating Links, Text, and Spatio-Temporal Dimensions.
Aug 2017 - One paper on multi-view network embedding is accepted to CIKM'17.
Jul 2017 - New work on Heterogeneous Supervision has been accepted to EMNLP'17!
May 2017 - Two research papers were accepted to KDD 2017!
Tweets by xiangrenUSC