Event Detail

Event Type: 
Mathematical Biology Seminar
Wednesday, May 24, 2017 - 16:00 to 16:45
BEXL 323

Min hash and bloom filters are two relatively new probabilistic techniques that seek to provide fast and low memory approximate answers to queries of extremely large data sets. In this talk, I will discuss some recent work with Hooman Zabeti in which we improve a probabilistic estimate of the Jaccard estimate when comparing similarity of very large sets (hundreds of millions to billions of elements) to sets of comparatively small size (tens of millions of elements). As an application, we demonstrate that this technique can be used to quickly identify the presence or absence (and relative abundance) of microbial organisms in a metagenomic sample.

I will assume no background in probabilistic data structures or "data sketches" for this talk, and it should be appropriate for a very general audience.