Events

DMS Statistics and Data Science Seminar

Time: Nov 15, 2023 (02:00 PM)
Location: 354 Parker Hall

Details:
pasupathy.jpg
 
Speaker: Dr. Raghu Pasupathy (Purdue University)
 
Title: Batching as an Uncertainty Quantification Device
 
 
Abstract: Consider the context of a statistician, simulationist, or an optimizer seeking to assess the quality of \(\theta_n\), an estimator of an unknown object \(\theta \in \mathbb{R}^d\), constructed using data \((Y_1, Y_2,\ldots, Y_n)\) gathered from a source such as a dataset, a simulation, or an optimization routine.  The unknown object \(\theta\) is assumed to be a statistical function of the probability measure that generates the stationary time series \((Y_1, Y_2,\ldots, Y_n)\). In such contexts, resampling methods such as the bootstrap or subsampling have been the classical answer to the question of how to approximate the sampling distribution of the error \(\theta_n - \theta\). In this talk, we propose a simple alternative called batching. Batching works by appropriately grouping the data \((Y_1, Y_2,\ldots, Y_n)\) into contiguous and possibly overlapping batches, each of which is then used to construct an estimate of \(\theta\). These batch estimates, along with the original estimate \(\theta_n\), are then combined and scaled appropriately to approximate any functional such as the bias, or the mean-squared error of the error \(\theta_n - \theta\), or to construct the \((1-\alpha)\)-confidence region on \(\theta\). We show that batching, like bootstrapping, enjoys strong consistency and high-order accuracy properties. Furthermore, we show that the weak asymptotics of batched studentized statistics are not necessarily normal but characterizable. In particular, using large overlapping batches when constructing confidence regions delivers consistently favorable performance. A number of theoretical and practical questions about batching are open.