Dong's Photo

Dong Wang

Ph.D.
Computer Science Department
University of Illinois at Urbana Champaign (UIUC).
201 North GoodWin Avenue
Urbana, IL-61801, USA
Email: dwang24 at illinois dot edu

I will join the CSE Department of University of Notre Dame as an Assistant Professor in Fall 2014. Here is my new homepage at Notre Dame. I am looking for highly motivated Ph.D. students and visiting scholars. If you are interested in working with me, please feel free to drop me an email with your CV.

CV

   

Biography

   

Research

   

Selected Projects

   

Publications

   

Tool and Demo

   

Awards

   

Service

   

Google Scholar

Biography

I received my Ph.D degree in Computer Science at University of Illinois at Urbana Champaign (UIUC) in December 2012, under Professor Tarek F. Abdelzaher. I got my Master's degree in Electrical Engineering from Peking University (PKU), and Bachelor degree in Electrical Engineering from University of Electronic Science and Technology of China (UESTC).


Research Interests


Research Theme

The advent of online social media (e.g., Twitter and Flickr), the ubiquity of wireless communication capabilities (e.g., 4G and WiFi), and the proliferation of a wide variety of sensors in the possession of common individuals (e.g., smartphones) allow humans to create a deluge of unfiltered, unstructured, and unvetted data about their physical environment. This opens up unprecedented challenges and opportunities in the field of social sensing, where the goal is to distill accurate and credible information from social sources (e.g., humans) and devices in their possession that accurately describes the state of the physical world. The problem requires multi-disciplinary solutions that combine data mining, statistics, network science and cyber physical computing. My research addresses the aforementioned needs by building theories, techniques and tools for accurately extracting high quality information from data generated with humans in the loop, and for reconstructing the correct "state of the world" both physical and social. I believe my research can lead to the next generation of information distillation services, where predictable, reliable, and timely answers are found from the huge amount of real-time and heterogeneous data feeds, empowering humans to better understand, utilize and make sound decisions from such data.


Selected Research Projects

My primary research focus is reliable information distillation in the emerging area of Social Sensing, where data are collected from human sources or devices on their behalf. Social sensing systems are one example of information distillation systems in current era of Big Data. I carreid out a set of projects to address several key challenges in social sensing and I beleived the theories, algorithms, frameworks and systems developed in these projects are useful in building future information distillabtion systems in general.

Social Sensing

Truth Discovery in Social Sensing

This project solves a fundamental problem in information distillation in social sensing where data are collected from human sources or devices in their possession: how to ascertain the credibility of information and estimate reliability of sources, as the information sources are usually unvetted and potentially unreliable. We call this problem truth discovery. Current research in data mining and machine learning (e.g., fact-finding) solves similar problems with important limitations on analysis semantics and suboptimal solutions. In contrast, our research presented, for the first time, an optimal truth discovery framework and system that provides accurate and quantifiable conclusions on both information credibility and source reliability without prior knowledge on either. Our work provides a new generic foundation for distilling reliable and quantifiable information from unreliable sources (e.g., humans). The results have been published in Fusion 11 , ACM/IEEE IPSN 12 , ACM ToSN .

QoI

Quality of Information (QoI) Assurance in Social Sensing

This project investigates another critical problem in social sensing: how to accurately assess the quality of the truth discovery results by quantifying estimation errors and providing confidence bounds. This guaranteed quality analysis is immensely important in any practical settings where errors have consequences. However, this is largely missing in current literature. We successfully derived the first performance bound that is able to accurately predict the estimation errors of the truth discovery results. Our work allows real world applications to assess the quality of data obtained from unreliable sources to a desired confidence level, in the absence of independent means to verify the data and in the absence of prior knowledge of reliability of sources. Our work was mentioned explicitly in the National Academies Press as a "good example of Army's cross-genre research" in 2013. The research results have been published in DMSN 11 , IEEE SECON 12 , IEEE JSAC .

Net

Link Analysis across Multi-genre Networks

Social sensing data is generated through the complicated interactions of information, social and physical networks. The interdisciplinary network systems are so complex that link analysis across multi-genre networks is essential. However, link analysis taking into account the three networks altogether is rare in current research. In this project, we generalized the truth discovery framework to jointly analyze links across multi-genre networks and developed a new information distillation system, called Apollo. Apollo has been continuously tested through real world case studies using large-scale datasets collected from open source media and smart road applications. The results showed good correspondence between observations deemed correct by Apollo and ground truth, demonstrating the power of using link analysis across multi-genre networks for efficient information distillation. The results have been published in IEEE RTSS 13 .

RT

Real-Time Information Distillation from Streaming Data

Social sensing data usually come at such large volume and high speed (e.g., more than 200k tweets are uploaded to Twitter every minute) that they must be processed in real-time in order to maximize their value. However, the truth discovery studies in current research are mostly batch algorithms that generally cannot scale with the streaming data or do not exploit all data available. In this project, we developed the first on-line truth estimation approach to determine the quality of information and the reliability of sources in real-time for social sensing applications. The results demonstrated that our approach was able to analyze the data at a speed 10-100 times faster than the state-of-arts while keeping the estimation accuracy approximately the same. The results have been published in ICDCS 13 .

RT

Using Humans as Sensors: The Uncertain Data Provenance Challenge

The explosive growth in social network content suggests that the largest "sensor network" yet might be human . Extending the social sensing model, this project explores the prospect of utilizing social networks as sensor networks, which gives rise to an interesting reliable sensing problem. From a networked sensing standpoint, what makes this sensing problem formulation different is that, in the case of human participants, not only is the reliability of sources usually unknown but also the original data provenance may be uncertain. Individuals may report observations made by others as their own. The contribution of this project lies in developing a model that considers the impact of such information sharing on the analytical foundations of reliable sensing, and embed it into our tool Apollo that uses Twitter as a "sensor network" for observing events in the physical world. Evaluation, using Twitter-based case-studies, shows good correspondence between observations deemed correct by Apollo and ground truth. The results have been published in ACM/IEEE IPSN 14 .

RT

Provenance-assisted Classification in Social Networks

Signal feature extraction and classification are two common tasks in the signal processing literature. This project investigates the use of source identities as a common mechanism for enhancing the classification accuracy of social signals . We define social signals as outputs, such as microblog entries, geotags, or uploaded images, contributed by users in a social network. While the design of such classifiers is application-specific, social signals share in common one key property: they are augmented by the explicit identity of the source. This motivates investigating whether or not knowing the source of each signal allows the classification accuracy to be improved. We call it provenance-assisted classification. This project answers the above question affirmatively, demonstrating how source identities can improve classification accuracy, and derives confidence bounds to quantify the accuracy of results. Evaluation is performed in two real-world contexts: (i) fact-finding that classifies microblog entries into true and false, and (ii) language classification of tweets issued by a set of possibly multi-lingual speakers. The results show that provenance features significantly improve classification accuracy of social signals. This observation offers a general mechanism for enhancing classification results in social networks. The results of this work are going to appear in IEEE J-STSP 14 .

I also carried out projects to address energy, scheduling, optimization challenges in cyber-physical computing, real-time and embedded systems, data fusion and sensor networks.

mPlatform

Energy-Optimal Scheduling on mPlatform

This project derives energy-optimal batching periods for asynchronous multistage data processing on sensor nodes in the sense of minimizing energy consumption while meeting end-to-end deadlines. Batching the processing of (sensor) data maximizes processor sleep periods, hence minimizing the wakeup frequency and the corresponding overhead. The algorithm is evaluated on mPlatform, a next-generation heterogeneous sensor node platform equipped with both a low-end microcontroller (MSP430) and a higher-end embedded systems processor (ARM). Experimental results show that the total energy consumption of mPlatform, when processing data flows at their optimal batching periods at the appropriate processor can use as much as 80% less energy than running the same task set on the ARM alone and 25% less energy than running the task set on the MSP430 alone. The results have been published in Journal of Real-Time Systems and IEEE RTAS (Best Paper Award).

Fussion

Optimizing Quality-of-Information in Cost-sensitive Sensor Data Fusion

This project investigates maximizing quality of information subject to cost constraints in data fusion systems. Rather than optimizing generic network-level metrics such as latency or throughput, we achieve more resource-efficient sensor network operation by directly optimizing an application-level notion of quality, namely prediction error. Unlike prior cost-sensitive prediction/regression schemes, our solution considers more complex prediction problems that arise in sensor networks where phenomena behave differently under different conditions. The scheme is evaluated through real sensor network applications in localization and path planning. Experimental results show that non-trivial cost savings can be achieved by our scheme compared to popular cost-insensitive schemes, and a significantly better prediction error can be achieved compared to the cost-sensitive linear regression schemes. The results have beeen published in IEEE DCoSS 2011 .


Book

Dong Wang , Tarek Abdelzaher, and Lance Kaplan. Social Sensing: Building Reliable Systems on Unreliable Data, Elsevier, (expected in) late 2014.

Book Chapter

Tarek Abdelzaher and Dong Wang . Analytic Challenges in Social Sensing. The Art of Wireless Sensor Networks, Springer, 2014.

Referred Papers

Paper Code: [J]: Journal paper; [C]: Conference paper; [W]: Workshop paper;

Note: In Computer Science, conference publications are as competitive as journals.


Tool and Demo

My research work also generated a reliable information distillation tool called Apollo that is used to summarize the flow of important events that are fundamentally changing our world and lives (e.g., Egyptian uprising, Japanese nuclear disaster, Hurricane Sandy, Boston Marathon Explosion etc.). Apollo was demoed to very high-level army personnel (e.g., Mr. Gary Martin, the Executive Deputy to the Commanding General, Dr. Thomas Russell, the Director of Army Research Lab), as well as the United States Army Intelligence and Security Command (INSCOM). Apollo was selected as one of very few top showcases of the Network Science Collaborative Technology Alliance founded by the U.S. Army Research Laboratory (ARL) in 2011, 2012, and 2013. Now Apollo is now used by different branches at ARL.

Apollo-Inoformation Distillation Tool for Social (Human-Centric) Sensing

Apollo Demo Video 2012: "Fighting Information Overload"

Apollo Demo Video 2011: "Fact Finding from Noisy Data"


Honors and Awards


Professional Service