Event Detail

Event Type: 
Friday, June 4, 2021 - 11:00 to 13:00
Zoom - If you are interested in attending this presentation, please send an email to Nikki Sullivan - nikki.sullivan@oregonstate.edu - to request Zoom log in details.

The HyperLogLog (HLL) algorithm is used to estimate the cardinality of large sets. This thesis gives novel analysis of the HyperLogLog algorithm by using techniques from statistics and probability. Initially, closed form bounds for the mean and variance of the max of n independent and identically distributed geometric random variables are developed. Surprisingly, the beta, digamma and trigamma functions show up in the bounds discovered in integral ways. These bounds and other discovered techniques are used to apply the delta method to investigate the asymptotic behavior of the estimator for the HyperLogLog algorithm. The maximum likelihood estimator of the HyperLogLog algorithm is also derived and compared to the original HyperLogLog algorithm. Finally, some preliminary work is included that abstracts the idea of random interval subdivision to random area splitting for triangular regions. This is a genrealization of the Kakutani-splitting process.