Communication-efficient algorithms for online distributed multitask learning

online optimization
distributed optimization
multitask learning
communication-efficient algorithms
Author
Affiliation

The Hong Kong Polytechnic University

Published

March 6, 2024

Modified

March 14, 2024

Abstract

This is a short review for the communication-efficient algorithms for online distributed multitask learning.

Keywords

online optimization, distributed optimization, heterogeneity, communication-efficient algorithm, subgroup analysis, multitask learning

1 Introduction

Traditional machine learning paradigms involve aggregating data at once and transferring them to a centralized entity for processing and training. This offline centralized manner, however, suffers from significant challenges, including inflexibility in model updating, high communication and storage costs, significant computational demands, and concerns regarding data privacy. For a detailed discussion, see (Verbraeken et al. 2020, sec. 2). To address the issues inherent to offline centralized manner, and to accommodate the proliferation of intelligent devices generating mass data as well as increasing concerns regarding privacy, online distributed learning has emerged as a paradigm shift (Predd, Kulkarni, and Poor 2006; Yang et al. 2019; T.-H. Chang et al. 2020; Verbraeken et al. 2020; S. Hu et al. 2021; X. Li, Xie, and Li 2023; Beltrán et al. 2023). This new paradigm enables multiple entities to collaboratively and distributedly train models utilizing their local online/streaming data, without sharing them, thereby significantly improving training efficiency and enhancing privacy protection (see Lian et al. 2017; also Verbraeken et al. 2020, sec. 5).

Online distributed learning can be simply divided into two components: online optimization and distributed learning. In the field of online optimization, compared to well-developed online convex optimization with a long history (Hazan 2021; X. Li, Xie, and Li 2023), the exploration into online non-convex optimization remains relatively limited (Xin 2022, secs. 4, 5; Lu and Wang 2023; T.-H. Chang et al. 2020). Additionally, the research also focuses on how to deal with different kinds of streaming data such as stochastic setting (Alghunaim and Yuan 2024) and the multiarmed bandit problem (Hazan 2021, sec. 6). On the other hand, the efficiency of distributed learning is impeded by constraints such as the limited and unbalanced computation resources across nodes in the network and the substantial communication costs. These factors pose significant challenges in achieving efficient online distributed learning, leading to a surge of research interests, especially in communication-efficient algorithm design (see X. Cao et al. 2023 and references therein) and computation-communication trade-off/balance (Nedic, Olshevsky, and Rabbat 2018; Berahas et al. 2019; Berahas, Bollapragada, and Gupta 2023).

The goal of distributed learning is typically to obtain a single global model, which assumes consensus among all nodes in the network. This assumption is based on the premise that the data across different nodes are independently and identically distributed (i.i.d.). However, this premise can be violated in many real-world scenarios, especially when the data are collected from multiple sources; see, for example, (Lee, Zhu, and Toscas 2015; Tong et al. 2022). The violation of this assumption, termed heterogeneity, complicates the design and analysis of algorithms, particularly those that are gradient-based. For instance, decentralized gradient descent (DGD) and its variants (Kar, Moura, and Ramanan 2012; Nedic and Ozdaglar 2009; J. Chen and Sayed 2012), which employ average consensus (Olfati-Saber, Fax, and Murray 2007) and gradient descent techniques, may diverge or converge to biased solutions when node-specific gradients differ significantly from the global gradient due to heterogeneity. To address this issue, numerous studies were presented in the past decades, especially the well-known gradient tracking technique; see, for example, (Zhu and Martínez 2010; Xin and Khan 2018; Pu et al. 2021; Nedić, Olshevsky, and Shi 2017; Lorenzo and Scutari 2016; Qu and Li 2018; B. Li et al. 2020; Alghunaim and Yuan 2024) and so forth. Its basic idea is that each node keeps tracking a local estimation of the global gradient to eliminate the influences of heterogeneity. Based on the assumption of bounded heterogeneity, i.e., the differences of local gradients and the global gradient are bounded above, the gradient tracking technique can achieve better performance; see (Xin 2022, secs. 1.4, 1.5) for detailed discussion and related literature review. The assumption of bounded heterogeneity is essential in the convergence analysis of DGD and decentralized stochastic gradient descent (DSGD), especially in the non-convex settings. A counterexample that without bounded heterogeneity, DSGD diverges for any constant step-size can be found in (T.-H. Chang et al. 2020). Many efforts were made to get rid of such a kind of assumptions, such as the GT-VR framework in (Xin 2022, chaps. 2, 3; Xin, Kar, and Khan 2020).

The so-called personalized learning offers an alternative approach to manage unbounded heterogeneity by allowing nodes to train personalized models distinct from the global model to better cope with heterogeneity (Tan et al. 2023). For instance, as a specific case of personalized learning, clustered federated learning (Ghosh et al. 2022; Sattler, Muller, and Samek 2021; Armacki et al. 2024) aims to classify the nodes in the network into \(K\) mutually disjoint clusters with the number of clusters \(K\) being known beforehand. Within each cluster, the nodes share the same model, resulting in \(K\) different models.1 In the settings that the network is fully decentralized and the number of clusters is unknown in advance, clustered federated learning can be further extended to subgroup analysis originated from statistics (S. Ma and Huang 2017). Following the pioneering work (S. Ma and Huang 2017), subgroup analysis is always formulated as an optimization problem with a pairwise fusion regularizer. It is also for the pairwise fusion regularizer that subgroup analysis appears frequently with different names in literature, including network lasso (Hallac, Leskovec, and Boyd 2015), spatial heterogeneity detection/identification (F. Li and Sang 2019; X. Zhang, Liu, and Zhu 2022), spatial cost minimization (Y. Lin et al. 2022; P. Li et al. 2023), network cost minimization (X. Cao and Liu 2018; X. Cao and Liu 2021), pairwise constrained optimization (X. Cao and Basar 2020), or a broader topic multitask learning (Caruana 1997; Yu Zhang and Yang 2022; Jacob, Bach, and Vert 2008; Zhou, Chen, and Ye 2011; Smith et al. 2017; Koppel, Sadler, and Ribeiro 2017; X. Cao and Liu 2017; Mortaheb, Vahapoglu, and Ulukus 2022; H. Hu et al. 2023; H. Ma, Guo, and Lau 2023; Okazaki and Kawano 2023; Singh et al. 2024; Yan and Cao 2024).

1 One should be noted that clustered federated learning differs from the conventional clustering in unsupervised learning in that the former clusters models obtained by training on data distributed over different nodes, while the latter clusters the data directly.

The prevalence of unbounded heterogeneity in the real world has led to increased research interest in clustered multitask learning, epitomized by subgroup analysis. However, the lack of a methodological framework analogous to gradient tracking has resulted in a notable paucity of research concerning online, communication-efficient algorithms for this topic.

In Section 2, we will present some related works in the areas related to the aforementioned problem, including some intuitive understandings and crucial techniques. A comprehensive discussion of the problem formulation and challenges will also be provided in Section 3.

3 Problem formulation and challenges

In this section, we provide a brief introduction of the problem we are considering: how to design communication-efficient algorithm for online distributed multitask learning. Since the mathematical formula of multitask learning varies from application to application, here we only provide two widely used formulae that can cover many applications including consensual distributed optimization, network lasso, subgroup analysis, etc.

The first formula is an unconstrained optimization problem admitting the following form \[ \min_{\mathbf{x}} \sum_{k=1}^K f_k(x_k) + \sum_{(j, k) \in \mathcal{E}} g_{jk}(x_j, y_k), \tag{4}\] where \(\mathbf{x} := (x_k^{\top})_{i=1}^K\); \(f_k\) is the local cost function at the \(i\)-th node; \(g_{jk}\) is the cost function for the edge \((j, k)\). Alternatively, one may consider a constrained optimization problem as \[ \begin{aligned} \min_{\mathbf{x}} & \quad \sum_{k=1}^K f_k(x_k) \\ \text{s.t. } & \quad h_{jk}(x_j, x_k) \leq \gamma_{jk} \quad \forall \, (j, k) \in \mathcal{E}, \end{aligned} \tag{5}\] where \(h_{jk}\) is the cost function for the edge \((j, k)\).

Next we give some examples.

  • If we let \(h_{jk}\) be some norm and let \(\gamma_{jk} = 0\), then problem (5) reduces to the consensual distributed optimization.

  • If we let \(g_{jk}(x_j, x_k) = \|x_j - x_k\|_2\), then (4) reduces to network lasso in (Hallac, Leskovec, and Boyd 2015).

  • If we let \(f_k(x_k) = \frac{1}{2}\| A_k x_k - b_k \|_2^2\) with some local data \((A_k, b_k)\) and let \(g_{jk}(x_j, x_k) = \|x_j - x_k\|_1\), then (4) reduces to the subgroup analysis problem in (F. Li and Sang 2019; X. Zhang, Liu, and Zhu 2022).

The main challenge comes from the two functions \(g_{jk}\) and \(h_{jk}\) which lead the problem to be not consensual. Hence many existing algorithms for consensual distributed optimization cannot apply. One possible way to solve (4) is to transfer it to a consensual distributed optimization problem by considering the whole variable \(\mathbf{x}\) in each node, as in (Jiang, Mukherjee, and Sarkar 2018). Although it allows us to use consensual distributed optimization techniques, it significantly increases the problem scale and does not make use of the problem structure.

Looking at the problem structure as a regularized separable problem, it is natural to consider using the celebrated ADMM. Some efforts were made in (Hallac, Leskovec, and Boyd 2015; X. Cao and Liu 2017; X. Cao and Liu 2018; F. Li and Sang 2019; X. Zhang, Liu, and Zhu 2022). However, the ADMMs in these papers still require full communication in each iteration except for two minimal spanning tree (MST)-based methods in (F. Li and Sang 2019; X. Zhang, Liu, and Zhu 2022), while these two papers implicitly assume that data in each node are homogeneous and adequate to give a good initial point and hence fail to handle the heterogeneity and online settings.

Some other attempts are seen in literature. For example, (X. Cao and Liu 2021) uses a distributed Newton’s method to solve (4). However, this Newton’s method requires \(f_k\) and \(g_{jk}\) for all \(i\) and \((j, k) \in \mathcal{E}\) to be twice differentiable, which is impractical. Saddle point methods are employed in (Koppel, Sadler, and Ribeiro 2017; X. Cao and Basar 2020; Singh et al. 2024; Yan and Cao 2024) to solve (5), where the smoothness of \(f_k\) and \(g_{jk}\) for all \(i\) and \((j, k) \in \mathcal{E}\) is required. However, both network lasso and subgroup analysis problem do not fit these conditions.

To conclude, so far the research focuses on these two problems is restricted.

Reference

Alghunaim, Sulaiman A., and Kun Yuan. 2024. “An Enhanced Gradient-Tracking Bound for Distributed Online Stochastic Convex Optimization.” Signal Processing 217 (April): 109345. https://doi.org/10.1016/j.sigpro.2023.109345.
Armacki, Aleksandar, Dragana Bajović, Dušan Jakovetić, and Soummya Kar. 2024. “A One-Shot Framework for Distributed Clustered Learning in Heterogeneous Environments.” IEEE Transactions on Signal Processing 72: 636–51. https://doi.org/10.1109/tsp.2023.3343561.
Beltrán, Enrique Tomás Martínez, Mario Quiles Pérez, Pedro Miguel Sánchez Sánchez, Sergio López Bernal, Gérôme Bovet, Manuel Gil Pérez, Gregorio Martínez Pérez, and Alberto Huertas Celdrán. 2023. “Decentralized Federated Learning: Fundamentals, State of the Art, Frameworks, Trends, and Challenges.” IEEE Communications Surveys & Tutorials 25 (4): 2983–3013. https://doi.org/10.1109/comst.2023.3315746.
Berahas, Albert S., Raghu Bollapragada, and Shagun Gupta. 2023. “Balancing Communication and Computation in Gradient Tracking Algorithms for Decentralized Optimization.” 2023. https://arxiv.org/abs/2303.14289v2.
Berahas, Albert S., Raghu Bollapragada, Nitish Shirish Keskar, and Ermin Wei. 2019. “Balancing Communication and Computation in Distributed Optimization.” IEEE Transactions on Automatic Control 64 (8): 3141–55. https://doi.org/10.1109/tac.2018.2880407.
Borsos, Zalán, Sebastian Curi, Kfir Yehuda Levy, and Andreas Krause. 2019. “Online Variance Reduction with Mixtures.” In International Conference on Machine Learning, 705–14. PMLR.
Borsos, Zalán, Andreas Krause, and Kfir Y Levy. 2018. “Online Variance Reduction for Stochastic Optimization.” In Conference on Learning Theory, 324–57. PMLR.
Boyd, S., A. Ghosh, B. Prabhakar, and D. Shah. 2006. “Randomized Gossip Algorithms.” IEEE Transactions on Information Theory 52 (6): 2508–30. https://doi.org/10.1109/tit.2006.874516.
Boyd, Stephen, and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press. https://doi.org/10.1017/cbo9780511804441.
Cao, Ming, Daniel A Spielman, and Edmund M Yeh. 2006. “Accelerated Gossip Algorithms for Distributed Computation.” In Proc. Of the 44th Annual Allerton Conference on Communication, Control, and Computation, 952–59.
Cao, Xuanyu, and Tamer Basar. 2020. “Decentralized Multi-Agent Stochastic Optimization with Pairwise Constraints and Quantized Communications.” IEEE Transactions on Signal Processing 68: 3296–3311. https://doi.org/10.1109/tsp.2020.2997394.
———. 2021. “Decentralized Online Convex Optimization with Event-Triggered Communications.” IEEE Transactions on Signal Processing 69: 284–99. https://doi.org/10.1109/tsp.2020.3044843.
Cao, Xuanyu, Tamer Başar, Suhas Diggavi, Yonina C. Eldar, Khaled B. Letaief, H. Vincent Poor, and Junshan Zhang. 2023. “Communication-Efficient Distributed Learning: An Overview.” IEEE Journal on Selected Areas in Communications 41 (4): 851–73. https://doi.org/10.1109/jsac.2023.3242710.
Cao, Xuanyu, and K. J. Ray Liu. 2017. “Decentralized Sparse Multitask RLS over Networks.” IEEE Transactions on Signal Processing 65 (23): 6217–32. https://doi.org/10.1109/tsp.2017.2750110.
———. 2021. “Distributed Newton’s Method for Network Cost Minimization.” IEEE Transactions on Automatic Control 66 (3): 1278–85. https://doi.org/10.1109/tac.2020.2989266.
Cao, Xuanyu, and K. J.Ray Liu. 2018. “Distributed Linearized ADMM for Network Cost Minimization.” IEEE Transactions on Signal and Information Processing over Networks 4 (3): 626–38. https://doi.org/10.1109/tsipn.2018.2806841.
Caruana, Rich. 1997. “Multitask Learning.” Machine Learning 28 (1): 41–75. https://doi.org/10.1023/A:1007379606734.
Chang, Ting-Jui, and Shahin Shahrampour. 2021. “On Online Optimization: Dynamic Regret Analysis of Strongly Convex and Smooth Problems.” In Proceedings of the AAAI Conference on Artificial Intelligence, 35:6966–73. 8.
Chang, Tsung-Hui, Mingyi Hong, Hoi-To Wai, Xinwei Zhang, and Songtao Lu. 2020. “Distributed Learning in the Nonconvex World: From Batch Data to Streaming and Beyond.” IEEE Signal Processing Magazine 37 (3): 26–38. https://doi.org/10.1109/msp.2020.2970170.
Chen, Jianshu, and Ali H. Sayed. 2012. “Diffusion Adaptation Strategies for Distributed Optimization and Learning over Networks.” IEEE Transactions on Signal Processing 60 (8): 4289–4305. https://doi.org/10.1109/tsp.2012.2198470.
Chen, Mingzhe, Nir Shlezinger, H. Vincent Poor, Yonina C. Eldar, and Shuguang Cui. 2021. “Communication-Efficient Federated Learning.” Proceedings of the National Academy of Sciences 118 (17). https://doi.org/10.1073/pnas.2024789118.
Chen, Tianlong, Xiaohan Chen, Wuyang Chen, Howard Heaton, Jialin Liu, Zhangyang Wang, and Wotao Yin. 2022. “Learning to Optimize: A Primer and A Benchmark.” Journal of Machine Learning Ressearch 23: 189:1–59. http://jmlr.org/papers/v23/21-0308.html.
Dimakis, Alexandros G, Soummya Kar, José MF Moura, Michael G Rabbat, and Anna Scaglione. 2010. “Gossip Algorithms for Distributed Signal Processing.” Proceedings of the IEEE 98 (11): 1847–64.
Dong, Xingrong, Zhaoxian Wu, Qing Ling, and Zhi Tian. 2024. “Byzantine-Robust Distributed Online Learning: Taming Adversarial Participants in an Adversarial Environment.” IEEE Transactions on Signal Processing 72: 235–48. https://doi.org/10.1109/tsp.2023.3340028.
Elgabli, Anis, Jihong Park, Amrit S. Bedi, Mehdi Bennis, and Vaneet Aggarwal. 2020. GADMM: Fast and Communication Efficient Framework for Distributed Machine Learning.” Journal of Machine Learning Ressearch 21: 76:1–39. http://jmlr.org/papers/v21/19-718.html.
Elgabli, Anis, Jihong Park, Amrit Singh Bedi, Chaouki Ben Issaid, Mehdi Bennis, and Vaneet Aggarwal. 2021. “Q-GADMM: Quantized Group ADMM for Communication Efficient Decentralized Machine Learning.” IEEE Transactions on Communications 69 (1): 164–81. https://doi.org/10.1109/tcomm.2020.3026398.
Fang, Huang, Nicholas J. A. Harvey, Victor S. Portella, and Michael P. Friedlander. 2022. “Online Mirror Descent and Dual Averaging: Keeping Pace in the Dynamic Case.” Journal of Machine Learning Research 23 (121): 1–38. http://jmlr.org/papers/v23/21-1027.html.
Ghosh, Avishek, Jichan Chung, Dong Yin, and Kannan Ramchandran. 2022. “An Efficient Framework for Clustered Federated Learning.” IEEE Transactions on Information Theory 68 (12): 8076–91. https://doi.org/10.1109/tit.2022.3192506.
Gower, Robert M., Mark Schmidt, Francis Bach, and Peter Richtarik. 2020. “Variance-Reduced Methods for Machine Learning.” Proceedings of the IEEE 108 (11): 1968–83. https://doi.org/10.1109/jproc.2020.3028013.
Hallac, David, Jure Leskovec, and Stephen Boyd. 2015. “Network Lasso: Clustering and Optimization in Large Graphs.” In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’15. ACM. https://doi.org/10.1145/2783258.2783313.
Hazan, Elad. 2021. Introduction to Online Convex Optimization. now Publishers Inc. https://doi.org/10.1561/9781680831719.
Hazan, Elad, Amit Agarwal, and Satyen Kale. 2007. “Logarithmic Regret Algorithms for Online Convex Optimization.” Machine Learning 69: 169–92.
Hazan, Elad, and Haipeng Luo. 2016. “Variance-Reduced and Projection-Free Stochastic Optimization.” In International Conference on Machine Learning, 1263–71. PMLR.
He, Xuechao, Heng Zhu, and Qing Ling. 2023. “C-RSA: Byzantine-Robust and Communication-Efficient Distributed Learning in the Non-Convex and Non-IID Regime.” Signal Processing 213 (December): 109222. https://doi.org/10.1016/j.sigpro.2023.109222.
Hu, Haoyang, Youlong Wu, Yuanming Shi, Songze Li, Chunxiao Jiang, and Wei Zhang. 2023. “Communication-Efficient Coded Computing for Distributed Multi-Task Learning.” IEEE Transactions on Communications 71 (7): 3861–75. https://doi.org/10.1109/tcomm.2023.3275194.
Hu, Shuyan, Xiaojing Chen, Wei Ni, Ekram Hossain, and Xin Wang. 2021. “Distributed Machine Learning for Wireless Communication Networks: Techniques, Architectures, and Applications.” IEEE Communications Surveys & Tutorials 23 (3): 1458–93. https://doi.org/10.1109/comst.2021.3086014.
Jacob, Laurent, Francis R. Bach, and Jean-Philippe Vert. 2008. “Clustered Multi-Task Learning: A Convex Formulation.” In Advances in Neural Information Processing Systems 21, Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 8-11, 2008, 745–52. https://proceedings.neurips.cc/paper/2008/hash/fccb3cdc9acc14a6e70a12f74560c026-Abstract.html.
Jaggi, Martin, Virginia Smith, Martin Takác, Jonathan Terhorst, Sanjay Krishnan, Thomas Hofmann, and Michael I. Jordan. 2014. “Communication-Efficient Distributed Dual Coordinate Ascent.” In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, 3068–76. https://proceedings.neurips.cc/paper/2014/hash/894b77f805bd94d292574c38c5d628d5-Abstract.html.
Jiang, Zhanhong, Kushal Mukherjee, and Soumik Sarkar. 2018. “On Consensus-Disagreement Tradeoff in Distributed Optimization.” In 2018 Annual American Control Conference (ACC). IEEE. https://doi.org/10.23919/acc.2018.8430883.
Kar, Soummya, José M. F. Moura, and Kavita Ramanan. 2012. “Distributed Parameter Estimation in Sensor Networks: Nonlinear Observation Models and Imperfect Communication.” IEEE Transactions on Information Theory 58 (6): 3575–605. https://doi.org/10.1109/tit.2012.2191450.
Kaya, Ege C., Mehmet Berk Sahin, and Abolfazl Hashemi. 2023. “Communication-Efficient Zeroth-Order Distributed Online Optimization: Algorithm, Theory, and Applications.” IEEE Access 11: 61173–91. https://doi.org/10.1109/access.2023.3284891.
Koppel, Alec, Brian M. Sadler, and Alejandro Ribeiro. 2017. “Proximity Without Consensus in Online Multiagent Optimization.” IEEE Transactions on Signal Processing 65 (12): 3062–77. https://doi.org/10.1109/tsp.2017.2686368.
Lamport, Leslie, Robert E. Shostak, and Marshall C. Pease. 1982. “The Byzantine Generals Problem.” ACM Trans. Program. Lang. Syst. 4 (3): 382–401. https://doi.org/10.1145/357172.357176.
Lan, Guanghui, Soomin Lee, and Yi Zhou. 2020. “Communication-Efficient Algorithms for Decentralized and Stochastic Optimization.” Mathematical Programming 180 (1–2): 237–84. https://doi.org/10.1007/s10107-018-1355-4.
Lee, D.‐J., Z. Zhu, and P. Toscas. 2015. “Spatio‐temporal Functional Data Analysis for Wireless Sensor Networks Data.” Environmetrics 26 (5): 354–62. https://doi.org/10.1002/env.2344.
Lesage-Landry, Antoine, Joshua A. Taylor, and Iman Shames. 2021. “Second-Order Online Nonconvex Optimization.” IEEE Transactions on Automatic Control 66 (10): 4866–72. https://doi.org/10.1109/tac.2020.3040372.
Li, Boyue, Shicong Cen, Yuxin Chen, and Yuejie Chi. 2020. “Communication-Efficient Distributed Optimization in Networks with Gradient Tracking and Variance Reduction.” Journal of Machine Learning Ressearch 21: 180:1–51. http://jmlr.org/papers/v21/20-210.html.
Li, Furong, and Huiyan Sang. 2019. “Spatial Homogeneity Pursuit of Regression Coefficients for Large Datasets.” Journal of the American Statistical Association 114 (527): 1050–62. https://doi.org/10.1080/01621459.2018.1529595.
Li, Liping, Wei Xu, Tianyi Chen, Georgios B. Giannakis, and Qing Ling. 2019. RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets.” In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, the Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, the Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, 1544–51. https://doi.org/10.1609/AAAI.V33I01.33011544.
Li, Pengfei, Jianyi Yang, Adam Wierman, and Shaolei Ren. 2023. “Learning-Augmented Decentralized Online Convex Optimization in Networks.” 2023. https://arxiv.org/abs/2306.10158v2.
Li, Xiuxian, Lihua Xie, and Na Li. 2023. “A Survey on Distributed Online Optimization and Online Games.” Annual Reviews in Control 56: 100904. https://doi.org/10.1016/j.arcontrol.2023.100904.
Lian, Xiangru, Ce Zhang, Huan Zhang, Cho-Jui Hsieh, Wei Zhang, and Ji Liu. 2017. “Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent.” In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, 5330–40. https://proceedings.neurips.cc/paper/2017/hash/f75526659f31040afeb61cb7133e4e6d-Abstract.html.
Liao, Yiwei, Zhuorui Li, Kun Huang, and Shi Pu. 2022. “A Compressed Gradient Tracking Method for Decentralized Optimization with Linear Convergence.” IEEE Transactions on Automatic Control 67 (10): 5622–29. https://doi.org/10.1109/tac.2022.3180695.
Lin, Sen, Daouda Sow, Kaiyi Ji, Yingbin Liang, and Ness B. Shroff. 2023. “Non-Convex Bilevel Optimization with Time-Varying Objective Functions.” In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023. http://papers.nips.cc/paper\_files/paper/2023/hash/5ee60ca5686bbcf756e56a6c75e66f32-Abstract-Conference.html.
Lin, Yiheng, Judy Gan, Guannan Qu, Yash Kanoria, and Adam Wierman. 2022. “Decentralized Online Convex Optimization in Networked Systems.” In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, 13356–93. https://proceedings.mlr.press/v162/lin22c.html.
Liu, Huikang, Jiaojiao Zhang, Anthony Man-Cho So, and Qing Ling. 2023. “A Communication-Efficient Decentralized Newton’s Method with Provably Faster Convergence.” IEEE Transactions on Signal and Information Processing over Networks 9: 427–41. https://doi.org/10.1109/tsipn.2023.3290397.
Liu, Sijia, Jie Chen, Pin-Yu Chen, and Alfred Hero. 2018. “Zeroth-Order Online Alternating Direction Method of Multipliers: Convergence Analysis and Applications.” In International Conference on Artificial Intelligence and Statistics, 288–97. PMLR.
Liu, Yuanyuan, Jiacheng Geng, Fanhua Shang, Weixin An, Hongying Liu, and Qi Zhu. 2022. “Loopless Variance Reduced Stochastic ADMM for Equality Constrained Problems in IoT Applications.” IEEE Internet of Things Journal 9 (3): 2293–303. https://doi.org/10.1109/jiot.2021.3095561.
Liu, Yuanyuan, Fanhua Shang, and James Cheng. 2017. “Accelerated Variance Reduced Stochastic ADMM.” In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 31. 1.
Liu, Yuanyuan, Fanhua Shang, Hongying Liu, Lin Kong, Licheng Jiao, and Zhouchen Lin. 2021. “Accelerated Variance Reduction Stochastic ADMM for Large-Scale Machine Learning.” IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (12): 4242–55. https://doi.org/10.1109/tpami.2020.3000512.
Lorenzo, Paolo Di, and Gesualdo Scutari. 2016. “NEXT: In-Network Nonconvex Optimization.” IEEE Transactions on Signal and Information Processing over Networks 2 (2): 120–36. https://doi.org/10.1109/tsipn.2016.2524588.
Lu, Kaihong, and Long Wang. 2023. “Online Distributed Optimization with Nonconvex Objective Functions via Dynamic Regrets.” IEEE Transactions on Automatic Control 68 (11): 6509–24. https://doi.org/10.1109/tac.2023.3239432.
Ma, Haoyu, Huayan Guo, and Vincent K. N. Lau. 2023. “Communication-Efficient Federated Multitask Learning over Wireless Networks.” IEEE Internet of Things Journal 10 (1): 609–24. https://doi.org/10.1109/jiot.2022.3201310.
Ma, Shujie, and Jian Huang. 2017. “A Concave Pairwise Fusion Approach to Subgroup Analysis.” Journal of the American Statistical Association 112 (517): 410–23. https://doi.org/10.1080/01621459.2016.1148039.
Maass, Alejandro I., Chris Manzie, Dragan Nešić, Jonathan H. Manton, and Iman Shames. 2022. “Tracking and Regret Bounds for Online Zeroth-Order Euclidean and Riemannian Optimization.” SIAM Journal on Optimization 32 (2): 445–69. https://doi.org/10.1137/21m1405551.
Mao, Xianghui, Kun Yuan, Yubin Hu, Yuantao Gu, Ali H. Sayed, and Wotao Yin. 2020. “Walkman: A Communication-Efficient Random-Walk Algorithm for Decentralized Optimization.” IEEE Transactions on Signal Processing 68: 2513–28. https://doi.org/10.1109/tsp.2020.2983167.
Molinaro, Marco. 2021. “Robust Algorithms for Online Convex Problems via Primal-Dual.” In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), 2078–92. Society for Industrial; Applied Mathematics. https://doi.org/10.1137/1.9781611976465.124.
Mortaheb, Matin, Cemil Vahapoglu, and Sennur Ulukus. 2022. “Personalized Federated Multi-Task Learning over Wireless Fading Channels.” Algorithms 15 (11): 421. https://doi.org/10.3390/A15110421.
Nedic, Angelia, Alex Olshevsky, and Michael G. Rabbat. 2018. “Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization.” Proceedings of the IEEE 106 (5): 953–76. https://doi.org/10.1109/jproc.2018.2817461.
Nedic, Angelia, and Asuman Ozdaglar. 2009. “Distributed Subgradient Methods for Multi-Agent Optimization.” IEEE Transactions on Automatic Control 54 (1): 48–61. https://doi.org/10.1109/tac.2008.2009515.
Nedić, Angelia, Alex Olshevsky, and Wei Shi. 2017. “Achieving Geometric Convergence for Distributed Optimization over Time-Varying Graphs.” SIAM Journal on Optimization 27 (4): 2597–2633. https://doi.org/10.1137/16m1084316.
Niu, Youcheng, Jinming Xu, Ying Sun, Yan Huang, and Li Chai. 2023. “Distributed Stochastic Bilevel Optimization: Improved Complexity and Heterogeneity Analysis.” 2023. https://arxiv.org/abs/2312.14690v2.
Nocedal, Jorge, and Stephen J. Wright. 2006. Numerical Optimization. Second. Springer Series in Operations Research and Financial Engineering. Springer, New York.
Okazaki, Akira, and Shuichi Kawano. 2023. “Multi-Task Learning Regression via Convex Clustering.” 2023. https://arxiv.org/abs/2304.13342v1.
Olfati-Saber, Reza, J. Alex Fax, and Richard M. Murray. 2007. “Consensus and Cooperation in Networked Multi-Agent Systems.” Proceedings of the IEEE 95 (1): 215–33. https://doi.org/10.1109/jproc.2006.887293.
Ozkara, Kaan, Navjot Singh, Deepesh Data, and Suhas N. Diggavi. 2021. “QuPeD: Quantized Personalization via Distillation with Applications to Federated Learning.” In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, 3622–34. https://proceedings.neurips.cc/paper/2021/hash/1dba3025b159cd9354da65e2d0436a31-Abstract.html.
Predd, Joel B., Sanjeev B. Kulkarni, and H. Vincent Poor. 2006. “Distributed Learning in Wireless Sensor Networks.” IEEE Signal Processing Magazine 23 (4): 56–69. https://doi.org/10.1109/msp.2006.1657817.
Pu, Shi, Wei Shi, Jinming Xu, and Angelia Nedic. 2021. “Push–Pull Gradient Methods for Distributed Optimization in Networks.” IEEE Transactions on Automatic Control 66 (1): 1–16. https://doi.org/10.1109/tac.2020.2972824.
Qu, Guannan, and Na Li. 2018. “Harnessing Smoothness to Accelerate Distributed Optimization.” IEEE Transactions on Control of Network Systems 5 (3): 1245–60. https://doi.org/10.1109/tcns.2017.2698261.
Sattler, Felix, Klaus-Robert Muller, and Wojciech Samek. 2021. “Clustered Federated Learning: Model-Agnostic Distributed Multitask Optimization Under Privacy Constraints.” IEEE Transactions on Neural Networks and Learning Systems 32 (8): 3710–22. https://doi.org/10.1109/tnnls.2020.3015958.
Scavuzzo, Lara, Karen Aardal, Andrea Lodi, and Neil Yorke-Smith. 2024. “Machine Learning Augmented Branch and Bound for Mixed Integer Linear Programming.” 2024. https://arxiv.org/abs/2402.05501v1.
Schraudolph, Nicol N, Jin Yu, and Simon Günter. 2007. “A Stochastic Quasi-Newton Method for Online Convex Optimization.” In Artificial Intelligence and Statistics, 436–43. PMLR.
Shalev-Shwartz, Shai, and Yoram Singer. 2007. “A Primal-Dual Perspective of Online Learning Algorithms.” Machine Learning 69: 115–42.
Shalev-Shwartz, Shai, and Tong Zhang. 2013. “Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization.” Journal of Machine Learning Research 14 (1).
Shen, Lingqing, Nam Ho-Nguyen, and Fatma Kılınç-Karzan. 2022. “An Online Convex Optimization-Based Framework for Convex Bilevel Optimization.” Mathematical Programming 198 (2): 1519–82. https://doi.org/10.1007/s10107-022-01894-5.
Shi, Wei, Qing Ling, Gang Wu, and Wotao Yin. 2015. EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization.” SIAM Journal on Optimization 25 (2): 944–66. https://doi.org/10.1137/14096668x.
Singh, Navjot, Xuanyu Cao, Suhas Diggavi, and Tamer Başar. 2024. “Decentralized Multi-Task Stochastic Optimization with Compressed Communications.” Automatica 159 (January): 111363. https://doi.org/10.1016/j.automatica.2023.111363.
Smith, Virginia, Chao-Kai Chiang, Maziar Sanjabi, and Ameet Talwalkar. 2017. “Federated Multi-Task Learning.” In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, 4424–34. https://proceedings.neurips.cc/paper/2017/hash/6211080fa89981f66b1a0c9d55c61d0f-Abstract.html.
Smith, Virginia, Simone Forte, Chenxin Ma, Martin Takáč, Michael I Jordan, and Martin Jaggi. 2018. “CoCoA: A General Framework for Communication-Efficient Distributed Optimization.” Journal of Machine Learning Research 18 (230): 1–49.
Suzuki, Taiji. 2014. “Stochastic Dual Coordinate Ascent with Alternating Direction Method of Multipliers.” In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014, 736–44. http://proceedings.mlr.press/v32/suzuki14.html.
Tan, Alysa Ziying, Han Yu, Lizhen Cui, and Qiang Yang. 2023. “Towards Personalized Federated Learning.” IEEE Transactions on Neural Networks and Learning Systems 34 (12): 9587–9603. https://doi.org/10.1109/tnnls.2022.3160699.
Tarzanagh, Davoud Ataee, Parvin Nazari, Bojian Hou, Li Shen, and Laura Balzano. 2022. Online Bilevel Optimization: Regret Analysis of Online Alternating Gradient Methods.” 2022. https://arxiv.org/abs/2207.02829v6.
Tong, Jiayi, Chongliang Luo, Md Nazmul Islam, Natalie E. Sheils, John Buresh, Mackenzie Edmondson, Peter A. Merkel, Ebbing Lautenbach, Rui Duan, and Yong Chen. 2022. “Distributed Learning for Heterogeneous Clinical Data with Application to Integrating COVID-19 Data Across 230 Sites.” Npj Digital Medicine 5 (1). https://doi.org/10.1038/s41746-022-00615-8.
Verbraeken, Joost, Matthijs Wolting, Jonathan Katzy, Jeroen Kloppenburg, Tim Verbelen, and Jan S. Rellermeyer. 2020. “A Survey on Distributed Machine Learning.” ACM Computing Surveys 53 (2): 1–33. https://doi.org/10.1145/3377454.
Wang, Lei, Le Bao, and Xin Liu. 2024. “A Decentralized Proximal Gradient Tracking Algorithm for Composite Optimization on Riemannian Manifolds.” 2024. https://arxiv.org/abs/2401.11573v1.
Wang, Lei, and Xin Liu. 2022. “Decentralized Optimization over the Stiefel Manifold by an Approximate Augmented Lagrangian Function.” IEEE Transactions on Signal Processing 70: 3029–41. https://doi.org/10.1109/tsp.2022.3182883.
Wang, Shuai, Yanqing Xu, Zhiguo Wang, Tsung-Hui Chang, Tony Q. S. Quek, and Defeng Sun. 2023. “Beyond ADMM: A Unified Client-Variance-Reduced Adaptive Federated Learning Framework.” In Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, 10175–83. https://doi.org/10.1609/AAAI.V37I8.26212.
Wang, Xi, Zhipeng Tu, Yiguang Hong, Yingyi Wu, and Guodong Shi. 2021. “No-Regret Online Learning over Riemannian Manifolds.” Advances in Neural Information Processing Systems 34: 28323–35.
Xin, Ran. 2022. “Decentralized Non-Convex Optimization and Learning over Heterogeneous Networks.” PhD thesis, Carnegie Mellon University. https://www.proquest.com/openview/9eec69c1293e6fb01fd4daa8cb514015.
Xin, Ran, Soummya Kar, and Usman A Khan. 2020. “Decentralized Stochastic Optimization and Machine Learning: A Unified Variance-Reduction Framework for Robust Performance and Fast Convergence.” IEEE Signal Processing Magazine 37 (3): 102–13.
Xin, Ran, and Usman A. Khan. 2018. “A Linear Algorithm for Optimization over Directed Graphs with Geometric Convergence.” IEEE Control Systems Letters 2 (3): 315–20. https://doi.org/10.1109/lcsys.2018.2834316.
Xu, Ping, Yue Wang, Xiang Chen, and Zhi Tian. 2024. “QC-ODKLA: Quantized and Communication-Censored Online Decentralized Kernel Learning via Linearized ADMM.” IEEE Transactions on Neural Networks and Learning Systems, 1–13. https://doi.org/10.1109/tnnls.2023.3310499.
Yan, Wenjing, and Xuanyu Cao. 2024. “Decentralized Multitask Online Convex Optimization Under Random Link Failures.” IEEE Transactions on Signal Processing 72: 622–35. https://doi.org/10.1109/tsp.2023.3348954.
Yang, Tao, Xinlei Yi, Junfeng Wu, Ye Yuan, Di Wu, Ziyang Meng, Yiguang Hong, Hong Wang, Zongli Lin, and Karl H. Johansson. 2019. “A Survey of Distributed Optimization.” Annual Reviews in Control 47: 278–305. https://doi.org/10.1016/j.arcontrol.2019.05.006.
Zhang, Jiaojiao, Huikang Liu, Anthony Man-Cho So, and Qing Ling. 2023. “Variance-Reduced Stochastic Quasi-Newton Methods for Decentralized Learning.” IEEE Transactions on Signal Processing 71: 311–26. https://doi.org/10.1109/tsp.2023.3240652.
Zhang, Mengfei, Danqi Jin, and Jie Chen. 2020. “Variance Reduced Diffusion Adaptation for Online Learning over Networks.” In 2020 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC). IEEE. https://doi.org/10.1109/icspcc50002.2020.9259540.
Zhang, Xin, Jia Liu, and Zhengyuan Zhu. 2022. “Learning Coefficient Heterogeneity over Networks: A Distributed Spanning-Tree-Based Fused-Lasso Regression.” Journal of the American Statistical Association, December, 1–13. https://doi.org/10.1080/01621459.2022.2126363.
Zhang, Yijian, Emiliano Dall’Anese, and Mingyi Hong. 2021. “Online Proximal-ADMM for Time-Varying Constrained Convex Optimization.” IEEE Transactions on Signal and Information Processing over Networks 7: 144–55. https://doi.org/10.1109/tsipn.2021.3051292.
Zhang, Yu, and Qiang Yang. 2022. “A Survey on Multi-Task Learning.” IEEE Transactions on Knowledge and Data Engineering 34 (12): 5586–5609. https://doi.org/10.1109/tkde.2021.3070203.
Zhou, Jiayu, Jianhui Chen, and Jieping Ye. 2011. “Clustered Multi-Task Learning via Alternating Structure Optimization.” In Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a Meeting Held 12-14 December 2011, Granada, Spain, 702–10. https://proceedings.neurips.cc/paper/2011/hash/a516a87cfcaef229b342c437fe2b95f7-Abstract.html.
Zhu, Minghui, and Sonia Martínez. 2010. “Discrete-Time Dynamic Average Consensus.” Automatica 46 (2): 322–29. https://doi.org/10.1016/j.automatica.2009.10.021.
Zinkevich, Martin. 2003. “Online Convex Programming and Generalized Infinitesimal Gradient Ascent.” In Proceedings of the 20th International Conference on Machine Learning (Icml-03), 928–36.