Summary
Through social media platforms, massive amounts of data are being produced. Twitter, as a microblogging social media platform, enables users to post short updates as “tweets” on an unprecedented scale. Once analyzed by using machine learning (ML) techniques and in aggregate, Twitter data can be an invaluable resource for gaining insight. However, when applied to real-time data streams, due to covariate shifts in the data (i.e., changes in the distributions of the inputs of ML algorithms), existing ML approaches result in different types of biases and provide uncertain outputs. This research describes a visual analytics system (i.e., a tool that combines data visualization, human-data interaction, and ML) to help users monitor, analyze, and make sense of the streams of discussions on Twitter in a real-time manner. This system helps the users to understand “who” is talking about “what” and “how” and/or “why” a tweet is posted. As case-studies, we use public-health and election discussions to demonstrate the capabilities enabled by the system. The system then not only provides categorized and aggregate results of such discussions but also enables the stakeholders to diagnose and to heuristically suggest fixes for the errors in the outcome, resulting in a more detailed understanding of the discussions.