Events

DMS Statistics and Data Science Seminar

Time: Nov 08, 2023 (02:00 PM)
Location: ZOOM

Details:
 haiyingwang.jpg

 

Speaker:  Dr. HaiYing Wang (University of Connecticut)

Title: Rare Events Data and Maximum Sampled Conditional Likelihood 


Abstract: We show that the available information about unknown parameters in rare events data is only tied to the relatively small number of cases, which justifies the usage of negative sampling. However, if the negative instances are subsampled to the same level of the positive cases, there is information loss. We derive an optimal sampling probability for the inverse probability weighted (IPW) estimator to minimize the information loss. We further we propose a likelihood-based estimator to further improve the estimation efficiency, and show that the improved estimator has the smallest asymptotic variance among a large class of estimators. It is also more robust to pilot misspecification. The likelihood-based estimator is also generalized to a class of models beyond binary response models. We validate our approach on simulated data, the MNIST data, and a real click-through rate dataset with more than 0.3 trillion instances.