Statistics and Data Science Seminar

Department of Mathematics and Statistics

The Statistics & Data Science Seminar is hosted by the Department of Mathematics and Statistics and provides a weekly platform for academics and researchers from different domains to present and discuss problems and solutions regarding data collection, management and analysis.


Fall 2023 Seminars

Welcome to the Fall 2023 Seminar series! The seminar takes place on Wednesdays at 2 p.m. CT. The seminars will be hybrid (in-person and over Zoom) or virtual only (over Zoom). The location is Parker Hall 354. For any questions or requests, please contact Haoran LiThe list of speakers for this series can be found in the table below which is followed by information on the title and abstract of each talk.


Speaker Institution Date Format
Wenying Li 
Auburn University Sep. 6 Hybrid
Yang Chen
University of Michigan Sep. 13 Virtual
Davide Guzzetti Auburn University  Sep. 20 Hybrid
Jilei Yang (Canceled) Linkedin Sep. 27


Takumi Saegusa University Of Maryland
Oct. 4  Hybrid
Chenglong Ye University of Kentucky
Oct. 11 Virtual 
Francesca Chiaromonte
Pennsylvania State University  Oct. 18
Yanyuan Ma Pennsylvania State University Oct. 25
Subrata Kundu George Washington University Nov. 1 Hybrid 
Haiying Wang University of Connecticut Nov. 8
Raghu Pasupathy Purdue University Nov. 15 Hybrid 
Shujie Ma  UC Riverside  Nov. 29 TBD 

 * Location: Parker Hall 354



Wenying Li (Auburn University)

TitleDimension Reduction: Addressing Aggregation Bias in Large Consumer Demand Systems

Abstract: Building on an insight of Lewbel (1996) that aggregation bias is a special case of the omitted variable bias, we propose two strategies for reducing bias in inconsistently aggregated consumer demand systems. The first uses a penalized lasso approach and the second relies on a residual-based instrumental variable technique to control for the correlation between group prices and the residual in an aggregate demand. In an example, the preferred strategy reduces bias by up to 91% in own-price elasticities and 57% in cross-price elasticities. These strategies are useful to situations where an inconsistently aggregated demand has to be used for practical purposes. 



Yang Chen (University of Michigan)

TitleVideo Imputation and Prediction Methods with Applications in Space Weather

Abstract: The total electron content (TEC) maps can be used to estimate the signal delay of GPS due to the ionospheric electron content between a receiver and a satellite. This delay can result in a GPS positioning error. Thus, it is important to monitor and forecast the TEC maps. However, the observed TEC maps have big patches of missingness in the ocean and scattered small areas on the land. Thus, precise imputation and prediction of the TEC maps are crucial in space weather forecasting. 


In this talk, I first present several extensions of existing matrix completion algorithms to achieve TEC map reconstruction, accounting for spatial smoothness and temporal consistency while preserving important structures of the TEC maps. We call the proposed method video imputation with softImpute, temporal smoothing, and auxiliary data (VISTA). We show that our proposed method achieves better reconstructed TEC maps than existing methods in the literature. I will also briefly describe the use of our large-scale complete TEC database. Then, I present a new model for forecasting time series data distributed on a matrix-shaped spatial grid, using the historical spatiotemporal data and auxiliary vector-valued time series data. We model the matrix time series as an auto-regressive process, where a future matrix is jointly predicted by the historical values of the matrix time series and an auxiliary vector time series. Large sample asymptotics of the estimators are established, and performances of the model are validated with extensive simulation studies and a real data application to forecast the global TEC distributions.




Davide Guzzetti (Auburn University)

Title: Orbit Shapes in the Three-Body Problem: Importance and Applications

Abstract: Within an unperturbed central-body gravitational field, Keplerian orbital elements form a coordinate set that is also an effective and intuitive topological description, amenable to the visualization of orbit properties and the design of space flight solutions. Unfortunately, a compact and elegant topological description for all orbits in the Circular Restricted Three-Body Problem (CR3BP), akin to the widely used Keplerian orbital elements, or alternative two-body-problem coordinate sets, is not currently available. As a result, there exists a disconnect between coordinate sets and topological features that may render orbit uniqueness within CR3BP dynamics. Tools from topological data analysis offer the opportunity to bridge this disconnect by further equipping coordinate sets with additional elements—signatures and distance metrics—that precisely represent orbit topology. Our current work explores the possibility of developing a comprehensive and dependable representation of dynamical structures within gravitational multi-body environments at all levels of fidelity, one that is derived from the study of persistence of topology generators, such as loops and voids. Synergistically, our work introduces spatial computing interfaces as a new paradigm for trajectory design. In particular, we explore the challenges of mapping user-drawn curves in virtual reality to feasible spacecraft trajectories in the Earth-Moon system. Such new modalities in human-computer interactions could enhance the interface between human insight and algorithmic processes. More effective visual steering strategies are particularly beneficial for trajectory designers who have temporary, limited access to the solution space of a dynamical system, like in the case of CR3BP dynamics.



Jilei Yang (Linkedin)

Title: Canceled




Takumi Saegusa (UMD)

TitleData Integration in Public Health Research

Abstract: Various data sets collected from numerous sources have a great potential to enhance the quality of inference and accelerate scientific discovery. Inference for merged data is, however, quite challenging because such data may contain unidentified duplication from overlapping inhomogeneous sources and each data set often opportunistically collected induces complex dependence. In public health research, for example, epidemiological studies have different inclusion and exclusion criteria in contrast to hospital records without a well-defined target population, and when combined with a disease registry, patients appear in multiple data sets. In this talk, we present several examples in public health research which potentially enjoy the merits of data integration. We overview existing research such as random effects model approach and multiple frame surveys and discuss their limitations in view of inferential goals, privacy protection, and large sample theory. We then propose our estimation and testing method in the context of survival analysis and two-sample tests. We illustrate our theory in simulation and real data examples. If time permitted, we discuss extensions of our proposed method in several directions.



Chenglong Ye (University of Kentucky)

Title: Meta Clustering for Collaborative Learning

Abstract: In collaborative learning, learners coordinate to enhance each of their learning performances. From the perspective of any learner, a critical challenge is to filter out unqualified collaborators. We propose a framework named meta clustering to address the challenge. Unlike the classical problem of clustering data points, meta clustering categorizes learners. Assuming each learner performs a supervised regression on a standalone local dataset, we propose a Select-Exchange-Cluster (SEC) method to classify the learners by their underlying supervised functions. We theoretically show that the SEC can cluster learners into accurate collaboration sets. Empirical studies corroborate the theoretical analysis and demonstrate that SEC can be computationally efficient, robust against learner heterogeneity, and effective in enhancing single-learner performance. Also, we show how the proposed approach may be used to enhance data fairness.



Francesca Chiaromonte (PSU)

Title: TBD

Abstract: TBD



Yanyuan Ma (PSU)

Title: TBD

Abstract: TBD



Subrata Kundu (GWU)

Title: TBD

Abstract: TBD



Haiying Wang (University of Connecticut)

Title: TBD

Abstract: TBD



Raghu Pasupathy (Purdue University)

Title: TBD

Abstract: TBD



Shujie Ma (UC Riverside)

Title: TBD

Abstract: TBD