Đăng bởi tqlong on Tháng Sáu 23, 2009
Tổng quát hóa bất đẳng thức Hoeffding cho lớp hàm bị chặn ta được bất đẳng thức McDiarmid như sau
Định lý (McDiarmid): Nếu hàm
sao cho

và các biến
độc lập xác suất thì
![\displaystyle \mathrm{Pr}_{X_1,\ldots,X_n}\{f - E[f]\geq\epsilon\}\leq \exp \left\{-\frac{2\epsilon^2}{\sum_{i=1}^n c_i^2} \right\} \displaystyle \mathrm{Pr}_{X_1,\ldots,X_n}\{f - E[f]\geq\epsilon\}\leq \exp \left\{-\frac{2\epsilon^2}{\sum_{i=1}^n c_i^2} \right\}](http://l.wordpress.com/latex.php?latex=%5Cdisplaystyle+%5Cmathrm%7BPr%7D_%7BX_1%2C%5Cldots%2CX_n%7D%5C%7Bf+-+E%5Bf%5D%5Cgeq%5Cepsilon%5C%7D%5Cleq+%5Cexp+%5Cleft%5C%7B-%5Cfrac%7B2%5Cepsilon%5E2%7D%7B%5Csum_%7Bi%3D1%7D%5En+c_i%5E2%7D+%5Cright%5C%7D&bg=fafcff&fg=2a2a2a&s=0)
Ta thấy bất đẳng thức Hoeffding là hệ quả của bất đẳng thức McDiarmid với hàm
.
Chứng minh: Đặt
là kì vọng của hàm
nếu biết trước giá trị của
biến đầu tiên. Như vậy
![\displaystyle Z_0 = E[f], Z_n = f \displaystyle Z_0 = E[f], Z_n = f](http://l.wordpress.com/latex.php?latex=%5Cdisplaystyle+Z_0+%3D+E%5Bf%5D%2C+Z_n+%3D+f&bg=fafcff&fg=2a2a2a&s=0)
![\displaystyle f - E[f] = Z_n - Z_0 = \sum_{i=1}^n Z_i - Z_{i-1} \displaystyle f - E[f] = Z_n - Z_0 = \sum_{i=1}^n Z_i - Z_{i-1}](http://l.wordpress.com/latex.php?latex=%5Cdisplaystyle+f+-+E%5Bf%5D+%3D+Z_n+-+Z_0+%3D+%5Csum_%7Bi%3D1%7D%5En+Z_i+-+Z_%7Bi-1%7D&bg=fafcff&fg=2a2a2a&s=0)
Theo bất đẳng thức Chernoff, với mọi
ta có
![\displaystyle \mathrm{Pr}\{f - E[f]\geq \epsilon\} = e^{-s\epsilon}E\left[\exp \left\{s \sum_{i=1}^n Z_i - Z_{i-1}\right\}\right] \displaystyle \mathrm{Pr}\{f - E[f]\geq \epsilon\} = e^{-s\epsilon}E\left[\exp \left\{s \sum_{i=1}^n Z_i - Z_{i-1}\right\}\right]](http://l.wordpress.com/latex.php?latex=%5Cdisplaystyle+%5Cmathrm%7BPr%7D%5C%7Bf+-+E%5Bf%5D%5Cgeq+%5Cepsilon%5C%7D+%3D+e%5E%7B-s%5Cepsilon%7DE%5Cleft%5B%5Cexp+%5Cleft%5C%7Bs+%5Csum_%7Bi%3D1%7D%5En+Z_i+-+Z_%7Bi-1%7D%5Cright%5C%7D%5Cright%5D&bg=fafcff&fg=2a2a2a&s=0)
![\displaystyle = e^{-s\epsilon}E\left[E\left[\left.\exp \left\{s \sum_{i=1}^n Z_i - Z_{i-1}\right\}\right|X_1,\ldots,X_{n-1}\right]\right] \displaystyle = e^{-s\epsilon}E\left[E\left[\left.\exp \left\{s \sum_{i=1}^n Z_i - Z_{i-1}\right\}\right|X_1,\ldots,X_{n-1}\right]\right]](http://l.wordpress.com/latex.php?latex=%5Cdisplaystyle+%3D+e%5E%7B-s%5Cepsilon%7DE%5Cleft%5BE%5Cleft%5B%5Cleft.%5Cexp+%5Cleft%5C%7Bs+%5Csum_%7Bi%3D1%7D%5En+Z_i+-+Z_%7Bi-1%7D%5Cright%5C%7D%5Cright%7CX_1%2C%5Cldots%2CX_%7Bn-1%7D%5Cright%5D%5Cright%5D&bg=fafcff&fg=2a2a2a&s=0)
![\displaystyle = e^{-s\epsilon}E\left[\exp \left\{s \sum_{i=1}^{n-1} Z_i - Z_{i-1}\right\}E\left[\left.e^{s(Z_n - Z_{n-1}) }\right|X_1,\ldots,X_{n-1}\right]\right] \displaystyle = e^{-s\epsilon}E\left[\exp \left\{s \sum_{i=1}^{n-1} Z_i - Z_{i-1}\right\}E\left[\left.e^{s(Z_n - Z_{n-1}) }\right|X_1,\ldots,X_{n-1}\right]\right]](http://l.wordpress.com/latex.php?latex=%5Cdisplaystyle+%3D+e%5E%7B-s%5Cepsilon%7DE%5Cleft%5B%5Cexp+%5Cleft%5C%7Bs+%5Csum_%7Bi%3D1%7D%5E%7Bn-1%7D+Z_i+-+Z_%7Bi-1%7D%5Cright%5C%7DE%5Cleft%5B%5Cleft.e%5E%7Bs%28Z_n+-+Z_%7Bn-1%7D%29+%7D%5Cright%7CX_1%2C%5Cldots%2CX_%7Bn-1%7D%5Cright%5D%5Cright%5D&bg=fafcff&fg=2a2a2a&s=0)
Bây giờ ta đánh giá
bằng bổ đề Hoeffding.
Đặt
. Ta có
![\displaystyle E[Y] = E[Z_n - Z_{n-1}|X_1,\ldots,X_{n-1}] = 0 \displaystyle E[Y] = E[Z_n - Z_{n-1}|X_1,\ldots,X_{n-1}] = 0](http://l.wordpress.com/latex.php?latex=%5Cdisplaystyle+E%5BY%5D+%3D+E%5BZ_n+-+Z_%7Bn-1%7D%7CX_1%2C%5Cldots%2CX_%7Bn-1%7D%5D+%3D+0&bg=fafcff&fg=2a2a2a&s=0)
Đặt
![\displaystyle U_n = \sup_u E[f|X_1,\ldots,X_{n-1},u]-E[f|X_1,\ldots,X_{n-1}] \displaystyle U_n = \sup_u E[f|X_1,\ldots,X_{n-1},u]-E[f|X_1,\ldots,X_{n-1}]](http://l.wordpress.com/latex.php?latex=%5Cdisplaystyle+U_n+%3D+%5Csup_u+E%5Bf%7CX_1%2C%5Cldots%2CX_%7Bn-1%7D%2Cu%5D-E%5Bf%7CX_1%2C%5Cldots%2CX_%7Bn-1%7D%5D&bg=fafcff&fg=2a2a2a&s=0)
![\displaystyle L_n = \inf_u E[f|X_1,\ldots,X_{n-1},u]-E[f|X_1,\ldots,X_{n-1}] \displaystyle L_n = \inf_u E[f|X_1,\ldots,X_{n-1},u]-E[f|X_1,\ldots,X_{n-1}]](http://l.wordpress.com/latex.php?latex=%5Cdisplaystyle+L_n+%3D+%5Cinf_u+E%5Bf%7CX_1%2C%5Cldots%2CX_%7Bn-1%7D%2Cu%5D-E%5Bf%7CX_1%2C%5Cldots%2CX_%7Bn-1%7D%5D&bg=fafcff&fg=2a2a2a&s=0)
Ta có
và 
Vậy
.
Tiếp tục ta được
![\displaystyle \mathrm{Pr}\{f - E[f]\geq \epsilon\}\leq e^{-s\epsilon}E\left[\exp \left\{s \sum_{i=1}^{n-1} Z_i - Z_{i-1}\right\}\right] e^{s^2 c_n^2/8} \displaystyle \mathrm{Pr}\{f - E[f]\geq \epsilon\}\leq e^{-s\epsilon}E\left[\exp \left\{s \sum_{i=1}^{n-1} Z_i - Z_{i-1}\right\}\right] e^{s^2 c_n^2/8}](http://l.wordpress.com/latex.php?latex=%5Cdisplaystyle+%5Cmathrm%7BPr%7D%5C%7Bf+-+E%5Bf%5D%5Cgeq+%5Cepsilon%5C%7D%5Cleq+e%5E%7B-s%5Cepsilon%7DE%5Cleft%5B%5Cexp+%5Cleft%5C%7Bs+%5Csum_%7Bi%3D1%7D%5E%7Bn-1%7D+Z_i+-+Z_%7Bi-1%7D%5Cright%5C%7D%5Cright%5D+e%5E%7Bs%5E2+c_n%5E2%2F8%7D&bg=fafcff&fg=2a2a2a&s=0)

Chọn
để cực tiểu hóa hàm
ta được điều phải chứng minh.
Đăng trong Công cụ xác suất | Tagged: Hoeffding, McDiarmid | Leave a Comment »
Đăng bởi tqlong on Tháng Sáu 21, 2009
Bất đẳng thức Chebyshev cho ta biết cận trên của xác suất một biến ngẫu nhiên nằm xa kì vọng của nó (tail bound):
![\displaystyle \mathrm{Pr}\{|X-E[X]|\geq u\}\leq \frac{V[X]}{u^2} \displaystyle \mathrm{Pr}\{|X-E[X]|\geq u\}\leq \frac{V[X]}{u^2}](http://l.wordpress.com/latex.php?latex=%5Cdisplaystyle+%5Cmathrm%7BPr%7D%5C%7B%7CX-E%5BX%5D%7C%5Cgeq+u%5C%7D%5Cleq+%5Cfrac%7BV%5BX%5D%7D%7Bu%5E2%7D&bg=fafcff&fg=2a2a2a&s=0)
Trong bài này ta sẽ tìm hiểu xác suất nằm gần kì vọng của biến
với
là các biến ngẫu nhiên độc lập có cùng phân bố xác suất với biến ngẫu nhiên
(independent and identically distributed – i.i.d).
Từ bất đẳng thức Chebyshev
![\displaystyle \mathrm{Pr}\{|\overline{X}-E[X]|\geq \epsilon\}\leq \frac{V[\overline{X}]}{\epsilon^2}=\frac{V[X]} {n\epsilon^2}\underset{n\rightarrow\infty}{\rightarrow} 0 \displaystyle \mathrm{Pr}\{|\overline{X}-E[X]|\geq \epsilon\}\leq \frac{V[\overline{X}]}{\epsilon^2}=\frac{V[X]} {n\epsilon^2}\underset{n\rightarrow\infty}{\rightarrow} 0](http://l.wordpress.com/latex.php?latex=%5Cdisplaystyle+%5Cmathrm%7BPr%7D%5C%7B%7C%5Coverline%7BX%7D-E%5BX%5D%7C%5Cgeq+%5Cepsilon%5C%7D%5Cleq+%5Cfrac%7BV%5B%5Coverline%7BX%7D%5D%7D%7B%5Cepsilon%5E2%7D%3D%5Cfrac%7BV%5BX%5D%7D+%7Bn%5Cepsilon%5E2%7D%5Cunderset%7Bn%5Crightarrow%5Cinfty%7D%7B%5Crightarrow%7D+0&bg=fafcff&fg=2a2a2a&s=0)
Như vậy, khi
, giá trị trung bình
càng gần giá trị kì vọng
(theo nghĩa xác suất).
Để bất đẳng thức chặt hơn, ta có thể dùng bất đẳng thức Chernoff
![\displaystyle \mathrm{Pr}\{\overline{X}-E[X]\geq \epsilon\}=\mathrm{Pr}\left\{\sum_{i=1}^n X_i - nE[X]\geq n\epsilon\right\} \displaystyle \mathrm{Pr}\{\overline{X}-E[X]\geq \epsilon\}=\mathrm{Pr}\left\{\sum_{i=1}^n X_i - nE[X]\geq n\epsilon\right\}](http://l.wordpress.com/latex.php?latex=%5Cdisplaystyle+%5Cmathrm%7BPr%7D%5C%7B%5Coverline%7BX%7D-E%5BX%5D%5Cgeq+%5Cepsilon%5C%7D%3D%5Cmathrm%7BPr%7D%5Cleft%5C%7B%5Csum_%7Bi%3D1%7D%5En+X_i+-+nE%5BX%5D%5Cgeq+n%5Cepsilon%5Cright%5C%7D&bg=fafcff&fg=2a2a2a&s=0)
![\displaystyle \leq e^{-sn\epsilon} E\left[e^{s\sum_{i=1}^n (X_i - E[X])}\right] = e^{-sn\epsilon} \prod_{i=1}^n E\left[e^{s(X_i-E[X_i])}\right] \displaystyle \leq e^{-sn\epsilon} E\left[e^{s\sum_{i=1}^n (X_i - E[X])}\right] = e^{-sn\epsilon} \prod_{i=1}^n E\left[e^{s(X_i-E[X_i])}\right]](http://l.wordpress.com/latex.php?latex=%5Cdisplaystyle+%5Cleq+e%5E%7B-sn%5Cepsilon%7D+E%5Cleft%5Be%5E%7Bs%5Csum_%7Bi%3D1%7D%5En+%28X_i+-+E%5BX%5D%29%7D%5Cright%5D+%3D+e%5E%7B-sn%5Cepsilon%7D+%5Cprod_%7Bi%3D1%7D%5En+E%5Cleft%5Be%5E%7Bs%28X_i-E%5BX_i%5D%29%7D%5Cright%5D&bg=fafcff&fg=2a2a2a&s=0)
Bổ đề (Hoeffding): Nếu
là biến ngẫu nhiên với
và
thì
![\displaystyle M_X(s) = E[e^{sX}]\leq e^{s^2(b-a)^2/8} \displaystyle M_X(s) = E[e^{sX}]\leq e^{s^2(b-a)^2/8}](http://l.wordpress.com/latex.php?latex=%5Cdisplaystyle+M_X%28s%29+%3D+E%5Be%5E%7BsX%7D%5D%5Cleq+e%5E%7Bs%5E2%28b-a%29%5E2%2F8%7D&bg=fafcff&fg=2a2a2a&s=0)
Chứng minh: Với mọi
, do
là hàm lồi nên

Lấy kì vọng hai vế suy ra
![\displaystyle E[e^{sX}]\leq \frac{be^{sa}-ae^{sb}}{b-a} \displaystyle E[e^{sX}]\leq \frac{be^{sa}-ae^{sb}}{b-a}](http://l.wordpress.com/latex.php?latex=%5Cdisplaystyle+E%5Be%5E%7BsX%7D%5D%5Cleq+%5Cfrac%7Bbe%5E%7Bsa%7D-ae%5E%7Bsb%7D%7D%7Bb-a%7D&bg=fafcff&fg=2a2a2a&s=0)
Đặt
ta có

Vậy
(đpcm).
Tiếp tục áp dụng bổ đề Hoeffding cho biến ngẫu nhiên
, nếu
, ta được
![\displaystyle \mathrm{Pr}\{\overline{X}-E[X]\geq \epsilon\}\leq e^{-sn\epsilon+ns^2(b-a)^2/8}, \forall s \displaystyle \mathrm{Pr}\{\overline{X}-E[X]\geq \epsilon\}\leq e^{-sn\epsilon+ns^2(b-a)^2/8}, \forall s](http://l.wordpress.com/latex.php?latex=%5Cdisplaystyle+%5Cmathrm%7BPr%7D%5C%7B%5Coverline%7BX%7D-E%5BX%5D%5Cgeq+%5Cepsilon%5C%7D%5Cleq+e%5E%7B-sn%5Cepsilon%2Bns%5E2%28b-a%29%5E2%2F8%7D%2C+%5Cforall+s&bg=fafcff&fg=2a2a2a&s=0)
Chọn
để tối thiểu hóa hàm
ta được
và
![\displaystyle \mathrm{Pr}\{\overline{X}-E[X]\geq \epsilon\}\leq e^{-\frac{2n\epsilon^2}{(b-a)^2}} \displaystyle \mathrm{Pr}\{\overline{X}-E[X]\geq \epsilon\}\leq e^{-\frac{2n\epsilon^2}{(b-a)^2}}](http://l.wordpress.com/latex.php?latex=%5Cdisplaystyle+%5Cmathrm%7BPr%7D%5C%7B%5Coverline%7BX%7D-E%5BX%5D%5Cgeq+%5Cepsilon%5C%7D%5Cleq+e%5E%7B-%5Cfrac%7B2n%5Cepsilon%5E2%7D%7B%28b-a%29%5E2%7D%7D&bg=fafcff&fg=2a2a2a&s=0)
Vậy ta có định lý
Định lý (Hoeffding): Nếu
là biến ngẫu nhiên với
thì
![\displaystyle \mathrm{Pr}\{\overline{X}-E[X]\geq \epsilon\}\leq e^{-\frac{2n\epsilon^2}{(b-a)^2}} \displaystyle \mathrm{Pr}\{\overline{X}-E[X]\geq \epsilon\}\leq e^{-\frac{2n\epsilon^2}{(b-a)^2}}](http://l.wordpress.com/latex.php?latex=%5Cdisplaystyle+%5Cmathrm%7BPr%7D%5C%7B%5Coverline%7BX%7D-E%5BX%5D%5Cgeq+%5Cepsilon%5C%7D%5Cleq+e%5E%7B-%5Cfrac%7B2n%5Cepsilon%5E2%7D%7B%28b-a%29%5E2%7D%7D&bg=fafcff&fg=2a2a2a&s=0)
Tổng quát hơn, nếu
là các biến ngẫu nhiên độc lập và
thì
![\displaystyle \mathrm{Pr}\{\overline{X}-E[\overline{X}]\geq \epsilon\}\leq e^{-\frac{2n^2\epsilon^2}{\sum_{i=1}^n(b_i-a_i)^2}} \displaystyle \mathrm{Pr}\{\overline{X}-E[\overline{X}]\geq \epsilon\}\leq e^{-\frac{2n^2\epsilon^2}{\sum_{i=1}^n(b_i-a_i)^2}}](http://l.wordpress.com/latex.php?latex=%5Cdisplaystyle+%5Cmathrm%7BPr%7D%5C%7B%5Coverline%7BX%7D-E%5B%5Coverline%7BX%7D%5D%5Cgeq+%5Cepsilon%5C%7D%5Cleq+e%5E%7B-%5Cfrac%7B2n%5E2%5Cepsilon%5E2%7D%7B%5Csum_%7Bi%3D1%7D%5En%28b_i-a_i%29%5E2%7D%7D&bg=fafcff&fg=2a2a2a&s=0)
![\displaystyle \mathrm{Pr}\{|\overline{X}-E[\overline{X}]|\geq \epsilon\}\leq 2e^{-\frac{2n^2\epsilon^2}{\sum_{i=1}^n(b_i-a_i)^2}} \displaystyle \mathrm{Pr}\{|\overline{X}-E[\overline{X}]|\geq \epsilon\}\leq 2e^{-\frac{2n^2\epsilon^2}{\sum_{i=1}^n(b_i-a_i)^2}}](http://l.wordpress.com/latex.php?latex=%5Cdisplaystyle+%5Cmathrm%7BPr%7D%5C%7B%7C%5Coverline%7BX%7D-E%5B%5Coverline%7BX%7D%5D%7C%5Cgeq+%5Cepsilon%5C%7D%5Cleq+2e%5E%7B-%5Cfrac%7B2n%5E2%5Cepsilon%5E2%7D%7B%5Csum_%7Bi%3D1%7D%5En%28b_i-a_i%29%5E2%7D%7D&bg=fafcff&fg=2a2a2a&s=0)
Bất đẳng thức Hoeffding thường được dùng khi đánh giá chất lượng phân lớp bởi giá trị hàm phân lớp thường bị chặn (ví dụ:
).
Đăng trong Công cụ xác suất | Tagged: Chebyshev, Chernoff, Hoeffding | Leave a Comment »