Statistics and Data Science Seminar
Department of Mathematics and Statistics
Spring 2026 Seminars
Welcome to the Spring 2026 Seminar series! The seminar takes place on Wednesdays at 2 p.m. CT. The seminars will be hybrid (in-person and over Zoom) or virtual only (over Zoom). The location is Parker Hall 358. For any questions or requests, please contact Huan He or Haotian Xu. The list of speakers for this series can be found in the table below which is followed by information on the title and abstract of each talk.
| Speaker | Institution | Date | Format |
|---|---|---|---|
| Feb. 4 | |||
| Sayar Karmakar | U of Florida | Feb. 11 | In-person |
| Feb. 18 | |||
| Jiajin Sun | Florida State | Feb. 25 | In-person |
| Shuoyang Wang | U of Louisville | Mar. 4 | In-person |
| NA | NA | Mar. 11 | NA |
| Florian Gunsilius | Emory | Mar. 18 | In-person |
| Yan Li | Auburn | Mar. 25 | In-person |
| Rich Lehoucq | Sandia National Labs | Apr. 1 | In-person |
| Mine Dogucu | UC Irvine | Apr. 8 | |
| Apr. 15 | |||
| Shivam Kumar | U Chicago | Apr. 22 | In-Person |
Sayar Karmakar (U of Florida)
Title: Epidemic Changepoints: Applications in spatial anomaly detection and localizing LLM watermarks
Abstract: We present epidemic change-points as a unifying lens for two localization problems:(i) detecting spatial anomalies and (ii) segmenting watermarked regions in mixed-source text. For spatial data, we formalize a `spatial' change-point as an anomalous region (an epidemic in space), provide detection-accuracy results for single and multiple breaks, and propose a block-based scan that delivers substantial computational savings with guarantees. Next, we move to a seemingly unrelated but a very pertinent topic.
As large language models proliferate, ensuring content provenance has become a statistical challenge. For this problem on finding locLized modified text data segments, we introduce WISER, a fast epidemic-segmentation approach with finite-sample error bounds and consistency for multiple watermarked segments, and we demonstrate empirical gains over state-of-the-art baselines on benchmark datasets.
We emphasize how classical changepoint ideas catered to epidemic and transient departures yield principled, scalable solutions to modern problems in text provenance and spatial anomaly detection. Simulations and empirical studies corroborate the theory and point to open questions for PhD-level research.
Joint work with Soham Bonnerjee & Subhrajyoty Roy (watermarks) and with Soham Bonnerjee & George Michailidis (spatial anomaly)
Jiajin Sun (Florida State)
Title: Efficient Analysis of Latent Spaces in Heterogeneous Networks
Abstract: This work proposes a unified framework for efficient estimation under latent space modeling of heterogeneous networks. We consider a class of latent space models that decompose latent vectors into shared and network-specific components across networks. We develop a novel procedure that first identifies the shared latent vectors and further refines estimates through efficient score equations to achieve statistical efficiency. Oracle error rates for estimating the shared and heterogeneous latent vectors are established simultaneously. The analysis framework offers remarkable flexibility, accommodating various types of edge weights under general distributions.
Shuoyang Wang (U of Louisville)
Title: Deep Learning for Complex Functional Data Analysis
Abstract: Functional data are realizations of random functions observed over a continuum, such as signals and images. In many modern applications, including neuroscience and biomedical research, observations are more naturally represented as random functions rather than finite dimensional vectors. The intrinsic complexity of such data stems from high dimensional functional domains, cross cohort heterogeneity, and unknown data generating distributions, which together complicate principled modeling and performance guarantees. Although deep learning has shown strong empirical performance in biomedical studies, its methodological and theoretical foundations for complex functional data settings remain limited. In this talk, I will present two methodological contributions that develop principled deep learning frameworks for complex functional data. First, I will introduce a federated deep learning approach for functional data classification across multiple heterogeneous cohorts. The learner visits each cohort once, performs local updates, and transmits only compressed model weights, thereby preserving privacy and reducing communication and computational costs. To address cross cohort heterogeneity, we develop an adaptive sequential weight updating strategy that progressively corrects distributional shifts and improves performance on a target cohort. We establish minimax optimal excess risk bounds and characterize a sharp sampling threshold governing learnability under both densely and sparsely observed functional data. Second, I will present a deep learning based functional graphical modeling framework for learning conditional independence structures in multivariate functional data. Each node’s neighborhood is estimated via flexible functional regression with embedded feature selection, allowing a fully nonparametric specification, and the overall graph is recovered by aggregating the neighborhood estimates. The method avoids restrictive distributional assumptions and does not rely on a well-defined functional precision operator. We prove global model selection consistency and establish convergence rates that attain the classical nonparametric regression rate up to a logarithmic factor, with a fundamental sampling threshold determining the estimator’s convergence behavior. Empirical performance is demonstrated through simulations and real data applications, including analyses of ADNI dataset and the ADHD-200 Consortium.
Florian Gunsilius (Emory)
Title: Partial Identification with Schrödinger Bridges
Abstract: Partial identification provides an alternative to point identification: instead of pinning down a unique parameter estimate, the goal is to characterize a set guaranteed to contain the true parameter value. Many partial identification approaches take the form of linear optimization problems, which seek the ``best- and worst-case scenarios" of a proposed model subject to the constraint that the model replicates correct observable information. However, such linear programs become intractable in settings with multivalued or continuous variables. This paper introduces a novel method to overcome this computational and statistical curse of cardinality: we provide a duality between a general class of optimal transportation problems and the lower bound of a partial identified effect. Building on such duality, we propose a discretization of the instrument realizations and an entropy transform of these potentially infinite-dimensional linear programs. This maps the problem into general versions of multi-marginal Schrödinger bridges, enabling efficient approximation of their solutions. In the process, we establish novel statistical and mathematical properties of such multi-marginal Schrödinger bridges---including consistency of the estimator and an analysis of the asymptotic distribution of entropic approximations to infinite-dimensional linear programs. We illustrate this approach by analyzing instrumental variable models with continuous variables, a setting that has been out of reach for existing methods that do not rely on sampling. (joint w/ Bruno Nunes Costa from the University of Michigan)
Yan Li (Auburn)
Title:
Abstract:
Rich Lehoucq (Sandia National Labs)
Title: The Poisson tensor completion parametric estimator
Abstract: We introduce the Poisson tensor completion (PTC) estimator that exploits inter-sample relationships to compute a low-rank Poisson tensor decomposition of the frequency histogram for samples of a multivariate distribution. Our crucial observation is that the histogram bins are an instance of a space partitioning of counts and thus can be identified with a spatial non-homogeneous Poisson process. The Poisson tensor decomposition leads to a completion of the mean measure over all bins---including those containing few to no samples---and leads to our proposed estimator. A Poisson tensor decomposition models the underlying distribution of the count data and guarantees non-negative estimated values obviating the need for additional constraints to ensure non-negativity. Furthermore, we demonstrate that our PTC estimator is a substantial improvement over standard histogram-based estimators for sub-Gaussian probability distributions because of the concentration of norm phenomenon.
Mine Dogucu (UC Irvine)
Title:
Abstract:
Shivam Kumar (U Chicago)
Title:
Abstract: