Aditya Parameswaran

I am an assistant professor of Computer Science at the University of Illinois (UIUC) . My research interests are broadly in simplifying and improving data analytics, i.e., helping users make better use of their data.

My work involves building real data analytics systems with principled foundations, designing algorithms (with formal guarantees) for the systems, as well as mining data obtained from such systems.

Biographical Sketch

Aditya Parameswaran is an Assistant Professor in Computer Science at the University of Illinois (UIUC). He spent the 2013-14 year visiting MIT CSAIL and Microsoft Research New England, after completing his Ph.D. from Stanford University, advised by Prof. Hector Garcia-Molina. He is broadly interested in data analytics, with research results in human computation, visual analytics, information extraction and integration, and recommender systems.

Aditya is a recipient of the Arthur Samuel award for the best dissertation in CS at Stanford (2014), the SIGMOD Jim Gray dissertation award (2014), the SIGKDD dissertation award runner up (2014), a Google Faculty Research Award (2015), an "Excellent Instructor" award from Illinois (2016), the Key Scientific Challenges Award from Yahoo! Research (2010), three best-of-conference citations (VLDB 2010, KDD 2012 and ICDE 2014), the Terry Groswith graduate fellowship at Stanford (2007), and the Gold Medal in Computer Science at IIT Bombay (2007). His research group is supported with funding from by the Siebel Energy Institute, the NIH, the NSF, and Google.


  • June 15, 2016: After two years of extensive collaborations with folks at the two institutes, I am now an "official" affiliate of the Institute for Genomic Biology, and the Beckman Institute for Advanced Science and Technology.
  • June 1, 2016: Our paper on Squish: a tool for compression of relational datasets was accepted at KDD 2016! Our code is open-source and available on Github.
  • May 1, 2016: New release on our visual data exploration platform zenvisage. Paper here, and website dedicated to Zenvisage here. Contact us if you'd like to test run zenvisage on your datasets!
  • April 15, 2016: We just received a small seed grant from the Siebel Energy Institute to develop Zenvisage in collaboration with battery scientists at Carnegie Mellon! Excited to see what happens next.
  • April 10, 2016: We received a whopping 3X the number of submissions for the undergraduate research contest. Who knows what these young researchers will accomplish next?
  • April 1, 2016: Our paper on Decibel, the storage engine underlying DataHub, was accepted at SIGMOD 2016!
  • March 1, 2016: Thrilled to be among the "List of Teachers Ranked as Excellent by their Students" at Illinois! Happy to see that students enjoy my classes.
  • January 6, 2016: Adam and I are proud to finally release a book on crowdsourced data management, a labor of love under development for two years. The book not only covers the state of the art, but also contains a survey of both industry users of crowdsourcing and managers of crowdsourcing marketplaces. We hope that this book will be the definitive reference for how crowdsourcing is used in practice. Do send us comments!
  • January 1, 2016: Our vision paper on the unsolved challenges in large-scale data crowdsourcing was accepted at TKDE.
  • December 15, 2015: Our paper on interactive exploration using a more expressive drill-down operator was accepted at ICDE 2016 in Finland.
  • November 25, 2015: Some Illinois press on our NSF-funded DataHub grant. Thrilled and honored to be working with the amazing Sam Madden and Amol Deshpande at solving the problems underlying collaborative data analytics.
  • November 15, 2015: Our paper on optimally managing worker and answer quality in crowdsourcing was accepted at SIGMOD 2016.
  • October 1, 2015: We just heard word that NIH has funded our BD2K commons supplement. Looking forward to working with folks at UChicago to improve data publication workflows!
  • September 15, 2015: Student awesomeness: my student Silu Huang won the 3M foundation fellowship, while Tarique Siddiqui won the Siebel Foundation fellowship.
  • September 1, 2015: Thanks to the NSF, we now have funding to support research and development on DataHub via a Medium IIS grant with MIT and UMD! Link to the project page here.
  • August 1, 2015: The full SeeDB paper has been accepted at VLDB 2016 in India!
  • July 1, 2015: Our JellyBean paper on using humans to count objects in images will appear at HCOMP 2015!
  • June 9, 2015: Release of a new preprint on calibrating the output of confidence estimates from classification algorithms, using classical learning theory tools. This is work driven by my awesome student Yihan Gao.
  • June 6, 2015: Our DataHub query language proposal was accepted at TaPP, a focused provenance workshop.
  • June 1, 2015: Final tally for VLDB 2015 -- three papers and three demos on a variety of topics:
    • papers: crowds, visualizations, and versioning;
    • demos: data exploration, Excel-meets-databases, and collaborative data analytics.
  • May 27, 2015: Our paper on versioning principles was accepted at VLDB'15 without any revisions!
  • May 15, 2015: Undergraduate research news: Andrew Kuznetsov, a freshman working in our group won the ISUR undergraduate research prize, and Andrew with two other freshmen -- Andrew Thieck and Radhir Kothuri won the third prize in the Illinois Engineering Open House competition for their crowdsourcing tool.
  • May 12, 2015: Our paper on debiasing was accepted at KDD 2015!
  • April 9, 2015: Our first release of a new project, titled Data-Spread, with my esteemed colleague Kevin Chang and student Mangesh Bendre. Data-Spread is a tool that unifies databases and spreadsheets. You have to see it to believe it!
    • Here is a YouTube video showing Data-Spread in action.
    • Here is our demo paper on Data-Spread.
  • March 10, 2015: Four more new preprints in the last month! These were:
    • our paper on SeeDB for query driven automatic visualization generation;
    • our jellybean paper on counting objects in images; turns out we can do way better than humans or computer vision algorithms!
    • our paper on debiasing of batches; crowdsourcing practitioners often use batching to save costs, but this can lead to non-independence: we deal with this issue.
    • our versioning theory paper; to build a solid foundation for our DataHub project, we explored how to trade off storage and retrieval costs.
  • February 9, 2015: Our paper on exploiting correlations to avoid expensive predicate evaluations was accepted at SIGMOD 2015!
  • February 12, 2015: Many thanks to Google for their support via a Google Faculty Research Award! Excited to be building the next generation visualization toolkit.
  • December 10, 2014: Three new preprints in the last month! These were:
    • smart drill-down, our tool for zooming into portions of a dataset quickly;
    • our paper on globally optimal crowdsourcing quality management; and
    • our paper on gathering data using the crowd, exploiting a hierarchy and MABs.
  • November 10, 2014: Three new paper acceptances in the last month!
  • October 10, 2014: Thrilled to be a part of the new NIH BD2K (Big Data 2 Knowledge) center for revolutionizing genomic data analysis. Thank you, NIH, for the support!
Synergistic Activities

I am currently serving on or have served on the Program Committees of: VLDB 2013-14-15, KDD 2015, SIGMOD 2014-15, WSDM 2015, WWW 2014, SOCC 2014, HCOMP 2014, ICDE 2014, and EDBT 2014.

I am an Associate Editor for SIGMOD Record. Please consider sending us your most controversial and/or interesting papers!

I am the SIGMOD 2016 Undergraduate Research Chair. Contact me for more details on how your students can participate. We're trying to get undergrads hooked to databases early!

Visual Analytics

Automatically recommending visualizations or visual summaries on very large volumes of data

Interactive Analytics

Interactive querying of large datasets, keeping track of versions, while possibly sacrificing slightly on accuracy of query results

Crowd-Powered Analytics

Using crowdsourcing to process and make sense of large volumes of data

Recent Releases

Selected Projects



zenvisage is a tool for effortlessly visualizing insights from very large data sets. It automates finding the right visualization for a query, significantly simplifying the laborious task of identifying appropriate visualizations.



DataSpread is a tool that marries the best of databases and spreadsheets.


DataHub: Collaborative Dataset Version Management at Scale

DataHub (or "GitHub for Data") is a system that enables collaborative data science by keeping track of large numbers of versions and their dependencies compactly, and allowing users to progressively clean, integrate and visualize their datasets.


DataSift: A Crowd-Powered Search Engine

DataSift is a crowd-powered search engine that is useful for long or complex queries that traditional search engines have trouble with, or with queries that contain rich media, such as images or videos.


Crowd Algorithms

Our work has developed a number of algorithms for gathering, processing, and understanding data obtained from humans (or crowds), while minimizing cost, latency, and error.


NeedleTail: A System for Browsing

NeedleTail is a system tuned towards instantly returning a small number (a "screenful") of query results very quickly on extremely large datasets.