Brief Introduction
Invertible Bloom Lookup Table (abbr. IBLT)
a fancy hash-table stores key-value pair \((x,y)\)
space budget: \(t\)
Guarantee
It "works" when the number of key-value pairs, \(n\), in table has \(n\le t\).
"temporarily" stops working if \(n\gt t\).
starts working again if \(n\) drops below \(t\).
Operations
(x,y)
is key-value pair.
insert(x,y)
, delete(x,y)
, get(x)
, listentries
.
Details
Start with table with \(k\ge3\) rows, \(m=2\times t\) columns.
In each row, we have a hash function \(h_1,\cdots,h_k\).
Success
\(h_i\) is universal.
Each of the entry, called cell, stores values:
count
key_sum
value_sum
Insert Operation
- for \(i=1\) to \(k\)
- \(c=A[i][h_i(x)]\)
- \(c.count++\)
- \(c.key\_sum+= x\)
- \(c.value\_sum+=y\)
Delete Operation
"undo" insert operation
Danger
We must know the pair is there before deletion.
- for \(i=1\) to \(k\)
- \(c=A[i][h_i(x)]\)
- \(c.count--\)
- \(c.key\_sum-= x\)
- \(c.value\_sum-=y\)
Get Operation
slightly different from min sketch count
-
\(ret=\infty\)
-
for \(i=1\) to \(k\)
-
\(c=A[i][h_i(x)]\)
-
if \(c.count==0\) return "not there"
-
else if \(c.count==1\) then
- if \(c.key\_sum==x\)
return \(c.value\_sum\)
- else return "not there"
-
return "don't know"
Probability of Success
If \(n\le t\), then get(x)
succeeds with probability \(\ge1-2^{-k}\).
Assume \(k\gt1\), \(n\) pairs: \((x_i,y_i)\) is stored in IBLT, \(n\le t\).
For get(x)
- If \(x\) is in IBLT
- 当 \(x\) 在 IBLT 的这一行里面,但是 get 函数发现对应的哈希值指向的位置都发生了碰撞的概率。
- 明明 \(x\) 不在 IBLT 中,但是 get 函数发现对应的哈希值指向的位置中存在一个值,即 count=1 但 value 不是 \(x\) 对应的。
Define fail-event \(E_i:\{h(x)=h(x_i)\}\),
- If \(x\) is not in IBLT.
明明 \(x\) 不在 IBLT 的这一行里面,但是 get 函数得出该键对应的哈希值指向的位置在 IBLT 这一行中产生碰撞的概率。
Define fail-event \(E_{ij}=\{h(x)=h(x_i)=h(x_j)\}\) $$ \Pr[\bigcup_{i\neq j}E_{ij}]\le\sum_{i\lt j}\Pr[E_{ij}]=\sum_{i\lt j}\frac{1}{m2}=C_n2\frac{1}{m^2}\ $$
$$ =\frac{n(n-1)}{2m2}\le\frac{t2}{2m2}\le\frac{t2}{8t^2}=\frac{1}{8} $$
Success
If we choose \(k=\log_2t+\log_2\frac{1}{\delta}\), the probability of failure is at most \(\delta\).
How to Avoid False Deletion?
another set of universal hash function \(g_1,\cdots,g_k:U\rightarrow[R]\)
while query, check whether \(c.hash\_sum==g(c.key\_sum)\)
ListEntries Operation
-
\(Q=\emptyset\)
-
for \(c\) in IBLT
-
if \(c.count==1\)
- \(Q.push(c.key\_sum,c.value\_sum)\)
// Max size of \(Q\) is \(mk\) here
-
while \(Q.size\gt0\)
-
\((x,y)=Q.top()\)
-
\(Q.pop()\)
-
if \(output.continas(x)==false\)
\(output.add(x,y)\)
-
for \(i=1\) to \(k\)
-
\(c'=A[i][h_i(x)]\)
- \(c'.count--\)
- \(c'.key\_sum-=x\)
- \(c'.value\_sum-=y\)
- if \(c'.count==1\)
- \(Q.push(c'.key\_sum,c',value\_sum)\)
-
Analysis of Success Probability
hyper-graph
node: memory location
\(h\) is truly random
\(n\) edges, \(e_1,\cdots,e_n\) are independently uniformly random.
List entries fails iff non-empty 2-core exists.
i.e. largest subgraph where every node has \(degree\ge2\)
\(F=\{ListEntries\text{ fails}\}\)
\(F_j=\{ListEntries\text{ fails with j key-value left}\}\)
For every \(S\in C_{x}^j\), \(F_{jS}=\{ListEntries\text{ fails, with S as j keys}\}\)
Denote: \((T_1,\cdots,T_k)\in (C_{[m]}^{j/2})^k\)
\(F_{j,S,T_1,\cdots,T_k}=\{LE\text{ fails for S and }\forall_i:h_i(S)\subseteq T_i\}\)