Federated Learning

[ICLR 2020] Convergence of FedAvg - (2) 본문

Federated Learning/Papers

[ICLR 2020] Convergence of FedAvg - (2)

pseudope 2022. 8. 16. 22:30
728x90

논문 제목: On the Convergence of FedAvg on Non-IID Data

출처: https://arxiv.org/abs/1907.02189

 

 지난 포스트에서, 우리는 full device participation case에서 FedAvg의 convergence analysis를 진행하기 위한 사전 준비를 마쳤습니다. (이전 글 보기) 이번 포스트에서는 해당 내용에 관한 증명 과정을 다루도록 하겠습니다. 증명 과정이 다소 길기 때문에, 두세 편으로 나누어서 게재할 예정입니다.

 

3. Key Lemmas for $\text{Theorem 1}$

 

 우선, 증명에 앞서 추가적인 notation을 언급하도록 하겠습니다. (이는 증명 과정에서만 사용됩니다.) $\bar{v}_t := \sum_{k=1}^N p_k v_t^k$와 $\bar{w}_t := \sum_{k=1}^N p_k w_t^k$는 일반적으로 우리가 접근할 수 없는 값입니다. 오직 $t + 1 \in \mathcal{I}$일 때에만 $\bar{w}_{t+1}$를 알 수 있을 뿐입니다. 다음으로, $\bar{g}_t := \sum_{k=1}^N p_k \nabla F_k(w_t^k)$, $g_t := \sum_{k=1}^N p_k \nabla F_k(w_t^k. \xi_t^k)$로 정의하는데, 이 둘은 mini-batch와 full-batch의 차이입니다. 그리고 이 정의에 따르면 $\bar{v}_{t+1} = \bar{w}_t - \eta_t g_t$이며, $\mathbb{E}[g_t] = \bar{g}_t$입니다. 또한, 편의 상 $F_k^* := F_k(w_k^*)$, $F^* := F(w^*)$로 표기합니다.

 

$\text{Lemma 1}$ [Results of one step SGD]

$\text{Assumption 1, 2}$가 성립한다고 가정하자. 만약 $\eta_t \leq \frac {1} {4L}$이라면, 다음 부등식이 성립한다.

$$\mathbb{E} [||\bar{v}_{t + 1} - w^*||^2] \leq (1 - \eta_t \mu) \mathbb{E} [||\bar{w}_t - w^*||^2] + \eta_t^2 \mathbb{E} [||g_t - \bar{g}_t||^2] + 6L \eta_t^2 \Gamma + 2 \mathbb{E} \sum_{k = 1}^N p_k ||\bar{w}_t - w_t^k||^2$$

$\text{Proof}$

 $\bar{v}_{t+1} = \bar{w}_t - \eta_t g_t$임을 이용하면 다음과 같이 식을 정리할 수 있다. 이중 $A_1$을 제외한 term은 expectation을 취할 경우 $0$이 되기 때문에, 우리는 $A_1$만 bound시키면 된다.

\begin{align*} ||\bar{v}_{t+1} - w^*||^2 &= ||\bar{w}_t - \eta_t g_t - w^*||^2 = ||\bar{w}_t - \eta_t g_t - w^* - \eta_t \bar{g}_t + \eta_t \bar{g_t}||^2 \\&= \underbrace{||\bar{w}_t - w^* - \eta_t \bar{g_t}||^2}_{=: A_1} + 2 \eta_t \langle \bar{w}_t - w^* - \eta_t \bar{g}_t, \bar{g}_t - g_t \rangle + \eta_t^2 ||g_t - \bar{g}_t||^2 \end{align*}

 다음으로, $A_1 = |\bar{w}_t - w^* - \eta_t \bar{g_t}||^2 = ||\bar{w}_t - w^*||^2 \underbrace{- 2 \eta_t \langle \bar{w}_t - w^*, \bar{g}_t \rangle}_{=: B_1} + \underbrace{\eta_t^2 ||\bar{g}_t||^2}_{=: B_2}$로 $A_1$을 분리할 수 있다. 이때, 우선 $B_2$를 정리하자.  $\text{Assumption 1}$에 의하여

\begin{align*} ||\nabla F_k (w_t^k)||^2 &= ||\nabla F_k^* (w_k^*) - \nabla F_k (w_t^k)|| ^ 2 \; (\because \nabla F_k^* (w_k^*) = 0) \\&\leq ||\nabla F_k (w^*) - \nabla F_k (w_t^k)||^2 \\&\leq 2L \left(F_k(w_t^k) - F_k(w_t^k) - \nabla F_k(W_t^k)^T (w_t^k - w_k^*) \right) \; (\because \text{Assumption 1}) \\&\leq 2L (F_k(w_t^k) - F_k(w_k^*)) \\&= 2L (F_k(w_t^k) - F_k^*)  \end{align*}

이므로,  $(1)$ $B_2$는 다음과 같이 정리된다.

\begin{align*} B_2 &= \eta_t^2 ||\bar{g}_t||^2 = \eta_t^2 ||\sum_{k=1}^N p_k \nabla F_k (w_t^k)||^2 \; (\because \bar{g}_t = \sum_{k=1}^N p_k \nabla F_k(w_t^k)) \\&\leq \eta_t^2 \sum_{k=1}^N p_k ||\nabla F_k (w_t^k)||^2 \; (\because \text{convexity of } ||\cdot||^2) \\&\leq 2L \eta_t^2 \sum_{k=1}^N p_k (F_k (w_t^k) - F_k^*) \; (\because (1))              \end{align*}

 다음으로, $B_1$을 정리하자.

\begin{align*} B_1 &= - 2 \eta_t \langle \bar{w}_t - w^*, \bar{g}_t \rangle = -2\eta_t \sum_{k=1}^N p_k \langle \bar{w}_t - w^*, \nabla F_k (w_t^k) \rangle \; (\because \bar{g}_t = \sum_{k=1}^N  p_k \nabla F_k(w_t^k)) \\&=  -2\eta_t \sum_{k=1}^N p_k \langle \bar{w}_t -w_t^k + w_t^k - w^*, \nabla F_k (w_t^k) \rangle \\&=  -2\eta_t \sum_{k=1}^N p_k \langle \bar{w}_t -w_t^k, \nabla F_k (w_t^k) \rangle -2\eta_t \sum_{k=1}^N p_k \langle w_t^k - w^*, \nabla F_k (w_t^k) \rangle \end{align*}

 여기에서, $\text{Assumption 2}$에 의하여 $\langle w_t^k - w^*, \nabla F_k (w_t^k) \rangle \leq - \left( F_k (w_t^k) - F_k (w^*) \right) - \frac {\mu} {2} ||w_t^k - w^*||^2$이며,

\begin{align*} \frac {1} {\eta_t} ||\bar{w}_t - w_t^k||^2 + \eta_t ||\nabla F_k (w_k^t)||^2 &\geq 2 \sqrt{||\bar{w}_t - w_t^k||^2 ||\nabla F_k (w_t^k)||^2} \; (\because \text{AM-GM}) \\&\geq -2\eta_t \sum_{k=1}^N \langle \bar{w}_t -w_t^k, \nabla F_k (w_t^k) \rangle \; (\because \text{Cauchy - Schwarz Inequality})\end{align*}

이므로, 이 두 식을 이용하여 $B_1$을 정리하면 다음과 같다.

\begin{align*} B_1 &= 2\eta_t \sum_{k=1}^N p_k \langle \bar{w}_t -w_t^k, \nabla F_k (w_t^k) \rangle -2\eta_t \sum_{k=1}^N p_k \langle w_t^k - w^*, \nabla F_k (w_t^k) \rangle \\&\leq \eta_t \sum_{k=1}^N p_k \left( \frac {1} {\eta_t} ||\bar{w}_t - w_t^k||^2 + \eta_t ||\nabla F_k (w_t^k)||^2 \right) - 2 \eta_t \sum_{k=1}^N p_k \left( F_t (w_t^k) - F_k(w^*) + \frac {\mu} {2} ||w_t^k - w^*||^2 )\right) \end{align*}

 따라서, $A_1$은 다음과 같이 정리된다.

\begin{align*} A_1 &= ||\bar{w}_t - w^* - \eta_t \bar{g_t}||^2 = ||\bar{w}_t - w^*||^2 + B_1 +  B_2 \\&\leq ||\bar{w}_t - w^*||^2 + \eta_t \sum_{k=1}^N p_k \left( \frac {1} {\eta_t} ||\bar{w}_t - w_t^k||^2 + \eta_t ||\nabla F_k (w_t^k)||^2 \right) \\&\quad - 2 \eta_t \sum_{k=1}^N p_k \left( F_k (w_t^k) - F_k(w^*) + \frac {\mu} {2} ||w_t^k - w^*||^2 )\right) + 2L \eta_t^2 \sum_{k=1}^N p_k (F_k (w_t^k) - F_k^*) \\&\leq ||\bar{w}_t - w^*||^2 + \eta_t \sum_{k=1}^N p_k \left( \frac {1} {\eta_t} ||\bar{w}_t - w_t^k||^2 + \eta_t 2L (F_k (w_t^k - F_k^*)) \right) \\&\quad - 2 \eta_t \sum_{k=1}^N p_k \left( F_k (w_t^k) - F_k(w^*) + \frac {\mu} {2} ||w_t^k - w^*||^2 )\right) + 2L \eta_t^2 \sum_{k=1}^N p_k (F_k (w_t^k) - F_k^*) \; (\because (1)) \\&= ||\bar{w}_t - w^*||^2 + \sum_{k=1}^N p_k ||\bar{w}_t - w_t^k||^2 + 2L \eta_t^2 \sum_{k=1}^N p_k (F_k (w_t^k) - F_k^*) \\&\quad - 2 \eta_t \sum_{k=1}^N p_k (F_t (w_t^k) - F_k (w^*)) - \mu \eta_t \sum_{k=1}^N p_k ||w_t^k - w^*||^2 + 2L \eta_t^2 \sum_{k=1}^N p_k (F_k (w_t^k) - F_k^*) \\&= (1 - \mu \eta_t) ||\bar{w}_t - w^*||^2 + \sum_{k=1}^N p_k ||\bar{w}_t - w_t^k||^2 \\&\quad + \underbrace{4L \eta_t^2 \sum_{k=1}^N p_k (F_k (w_t^k) - F_k^*) - 2 \eta_t \sum_{k=1}^N p_k (F_k (w_t^k) - F_k (w^*))}_{=:C} \end{align*}

 이제, $C$를 bound시키자. 우선, $\gamma_t := 2 \eta_t (1 - 2L \eta_t)$로 정의하자. $\text{Lemma 1}$의 가정에 의하여 $\eta_t \leq \frac {1} {4L}$이므로, $\gamma_t \geq 2 \eta_t \times (1 - \frac {1} {2}) = \eta_t$이며, $\gamma_t \leq 2 \eta_t$이다. 즉, $\eta_t \leq \gamma_t \leq 2 \eta_t$이다. $(2)$ 이 $\gamma_t$를 이용하여 $C$를 정리하면 다음과 같다.

\begin{align*} C &= 4L \eta_t^2 \sum_{k=1}^N p_k (F_k (w_t^k) - F_k^*) - 2 \eta_t \sum_{k=1}^N p_k (F_k (w_t^k) - F_k (w^*)) \\&= 4L \eta_t^2 \sum_{k=1}^N p_k (F_k (w_t^k) - F_k^*) - 2 \eta_t \sum_{k=1}^N p_k (F_k (w_t^k) - F_k^* + F_k^* - F_k (w^*)) \\&= 4L \eta_t^2 \sum_{k=1}^N p_k (F_k (w_t^k) - F_k^*) - 2 \eta_t \sum_{k=1}^N p_k (F_k (w_t^k) - F_k^*) + 2 \eta_t \sum_{k=1}^N p_k (F_k (w^*) - F_k^*) \\&= - 2 \eta_t(1 - 2L \eta_t) \sum_{k=1}^N p_k (F_k (w_t^k) - F_k^*) + 2 \eta_t \sum_{k=1}^N p_k (F_k (w^*) - F_k^*) \\&= - 2 \eta_t(1 - 2L \eta_t) \sum_{k=1}^N p_k (F_k (w_t^k) - F_k^*) + 2 \eta_t \sum_{k=1}^N p_k (F^* - F_k^*) \; (\because \sum_{k=1}^N p_k F_k(w^*) = \sum_{k=1}^N p_k F^*) \\&= - \gamma_t \sum_{k=1}^N p_k (F_k (w_t^k) - F_k^*) + 2 \eta_t \sum_{k=1}^N p_k (F^* - F_k^*) \; (\because \gamma_t = 2 \eta_t (1 - 2L \eta_t)) \\&= - \gamma_t \sum_{k=1}^N p_k (F_k (w_t^k) - F^* + F^* - F_k^*) + 2 \eta_t \sum_{k=1}^N p_k (F^* - F_k^*) \\&= - \gamma_t \sum_{k=1}^N p_k (F_k (w_t^k) - F^*)  - \gamma_t \sum_{k=1}^N p_k (F^* - F_k^*) + 2 \eta_t \sum_{k=1}^N p_k (F^* - F_k^*) \\&= - \gamma_t \sum_{k=1}^N p_k (F_k (w_t^k) - F^*) + (2 \eta_t- \gamma_t) \sum_{k=1}^N p_k (F^* - F_k^*) \\&= - \gamma_t \sum_{k=1}^N p_k (F_k (w_t^k) - F^*) + 4L \eta_t^2 \sum_{k=1}^N p_k (F^* - F_k^*) \; (\because \gamma_t = 2 \eta_t (1 - 2L \eta_t)) \\&= \underbrace{-\gamma_t \sum_{k=1}^N p_k (F_k (w_t^k) - F^*)}_{=: D} + 4L \eta_t^2 \Gamma \; (\because \Gamma = F^* - \sum_{k=1}^N p_k F_k^*) \end{align*}

 $D$를 bound하기 위하여 다음 식을 사용하자. $(3)$ (여기에서, $F_k$의 convexity는 $\text{Assumption 2}$로부터 자명하다.)

\begin{align*} \sum_{k=1}^N p_k (F_k (w_t^k) - F^*) &= \sum_{k=1}^N p_k (F_k (w_t^k) - F_k (\bar{w_t}) + F_k (\bar{w_t}) - F^*) \\&=  \sum_{k=1}^N p_k (F_k (w_t^k) - F_k (\bar{w_t})) + \sum_{k=1}^N p_k (F_k (\bar{w_t}) - F^*) \\&=  \sum_{k=1}^N p_k (F_k (w_t^k) - F_k (\bar{w_t})) + (F (\bar{w}_t) - F^*) \\&\geq  \sum_{k=1}^N p_k \langle \nabla F_k (\bar{w}_t), \bar{w}_t^k - \bar{w}_t \rangle + (F (\bar{w}_t) - F^*) \; (\because \text{convexity of } F_k) \\&\geq  - \frac {1} {2} \sum_{k=1}^N p_k \left[ \eta_t ||\nabla F_k(\bar{(w)_t})||^2 + \frac {1} {\eta_t} ||w_t^k - \bar{w}_t||^2 \right] + (F (\bar{w}_t) - F^*) \; (\because \text{AM-GM}) \\&\geq  - \frac {1} {2} \sum_{k=1}^N p_k \left[ 2 \eta_t L (F_k (\bar{w}_t) - F_k^*) + \frac {1} {\eta_t} ||w_t^k - \bar{w}_t||^2 \right] + (F (\bar{w}_t) - F^*) \; (\because (1)) \\&\geq  - \sum_{k=1}^N p_k \left[ \eta_t L (F_k (\bar{w}_t) - F_k^*) + \frac {1} {2 \eta_t} ||w_t^k - \bar{w}_t||^2 \right] + (F (\bar{w}_t) - F^*)           \end{align*}

 이를 사용하여 $C$를 다시 정리하면 다음과 같다.

\begin{align*} C &= -\gamma_t \sum_{k=1}^N p_k (F_k (w_t^k) - F^*) + 4L \eta_t^2 \Gamma \\&\leq \gamma_t \sum_{k=1}^N p_k \left[ \eta_t L (F_k (\bar{w}_t) - F_k^*) + \frac {1} {2 \eta_t} ||w_t^k - \bar{w}_t||^2 \right] - \gamma_t (F (\bar{w}_t) - F^*) + 4L \eta_t^2 \Gamma \; (\because (3)) \\&= \gamma_t \sum_{k=1}^N p_k \left[ \eta_t L (F_k (\bar{w}_t) - F^* + F^* - F_k^*) + \frac {1} {2 \eta_t} ||w_t^k - \bar{w}_t||^2 \right] - \gamma_t (F (\bar{w}_t) - F^*) + 4L \eta_t^2 \Gamma \\&= \gamma_t \sum_{k=1}^N p_k \eta_t L (F_k (\bar{w}_t) - F^*) + \gamma_t \sum_{k=1}^N p_k \eta_t L (F^* - F_k^*) + \gamma_t \sum_{k=1}^N p_k \frac {1} {2 \eta_t} ||w_t^k - \bar{w}_t||^2 \\&\quad - \gamma_t (F (\bar{w}_t) - F^*) + 4L \eta_t^2 \Gamma \\&= \gamma_t(\eta_t L - 1) \sum_{k=1}^N p_k (F_k (\bar{w}_t) - F^*) + \gamma_t \sum_{k=1}^N p_k \eta_t L (F^* - F_k^*) + \frac {\gamma_t} {2 \eta_t} \sum_{k=1}^N p_k  ||w_t^k - \bar{w}_t||^2 + 4L \eta_t^2 \Gamma \\&= \gamma_t(\eta_t L - 1) \sum_{k=1}^N p_k  (F_k (\bar{w}_t) - F^*) + \gamma_t \eta_t L \Gamma + \frac {\gamma_t} {2 \eta_t} \sum_{k=1}^N p_k  ||w_t^k - \bar{w}_t||^2 + 4L \eta_t^2 \Gamma \\&\quad (\because \Gamma = F^* - \sum_{k=1}^N p_k F_k^*) \\&= \underbrace{\gamma_t(\eta_t L - 1)}_{\leq 0} \sum_{k=1}^N p_k  (F_k (\bar{w}_t) - F^*) + (4 L \eta_t^2 + \gamma_t \eta_t L) \Gamma + \frac {\gamma_t} {2 \eta_t} \sum_{k=1}^N p_k  ||w_t^k - \bar{w}_t||^2 \\&\leq (4 L \eta_t^2 + \gamma_t \eta_t L) \Gamma + \frac {\gamma_t} {2 \eta_t} \sum_{k=1}^N p_k  ||w_t^k - \bar{w}_t||^2 \; (\because \eta_t \leq \frac {1} {4L} \text{ by assumption}) \\&\leq (4 L \eta_t^2 + \gamma_t \eta_t L) \Gamma + \sum_{k=1}^N p_k  ||w_t^k - \bar{w}_t||^2 \; (\because (2)) \\&\leq 6 \eta_t^2 L \Gamma + \sum_{k=1}^N p_k  ||w_t^k - \bar{w}_t||^2 \; (\because (2)) \end{align*}

 그러므로, $A_1$은 다음과 같이 정리되며,

\begin{align*} A_1 &\leq (1 - \mu \eta_t) ||\bar{w}_t - w^*||^2 + \sum_{k=1}^N p_k ||\bar{w}_t - w_t^k||^2 + C \\&\leq (1 - \mu \eta_t) ||\bar{w}_t - w^*||^2 + \sum_{k=1}^N p_k ||\bar{w}_t - w_t^k||^2 + 6 \eta_t^2 L \Gamma + \sum_{k=1}^N p_k  ||w_t^k - \bar{w}_t||^2 \\&= (1 - \mu \eta_t) ||\bar{w}_t - w^*||^2 + 2 \sum_{k=1}^N p_k ||\bar{w}_t - w_t^k||^2 + 6 \eta_t^2 L \Gamma \end{align*}

 우리가 bound하고자 하는 $||\bar{v}_{t+1} - w^*||^2$ 다음과 같이 정리된다.

\begin{align*} ||\bar{v}_{t+1} - w^*||^2 &= A_1 + 2 \eta_t \langle \bar{w}_t - w^* - \eta_t \bar{g}_t, \bar{g}_t - g_t \rangle + \eta_t^2 ||g_t - \bar{g}_t||^2 \\&\leq (1 - \mu \eta_t) ||\bar{w}_t - w^*||^2 + 2 \sum_{k=1}^N p_k ||\bar{w}_t - w_t^k||^2 + 6 \eta_t^2 L \Gamma \\&\quad + 2 \eta_t \langle \bar{w}_t - w^* - \eta_t \bar{g}_t, \bar{g}_t - g_t \rangle + \eta_t^2 ||g_t - \bar{g}_t||^2 \end{align*}

 최종적으로, 위 식의 양변에 expectation을 취해주는 것으로 증명을 마무리한다. $\square$

 

 다음 포스트에서 $\text{Lemma 2, 3}$을 마저 확인해보도록 하겠습니다.

'Federated Learning > Papers' 카테고리의 다른 글

[ICLR 2020] Convergence of FedAvg - (4)  (0) 2022.08.25
[ICLR 2020] Convergence of FedAvg - (3)  (0) 2022.08.19
[ICLR 2020] Convergence of FedAvg - (1)  (0) 2022.08.15
[MLSys 2020] FedProx - (4)  (0) 2022.08.13
[MLSys 2020] FedProx - (3)  (0) 2022.08.01
Comments