Методы первого порядка для задач оптимизации с неточной информацией о градиенте / First-Order Methods for Optimization Problems with Inexact Gradient Information тема диссертации и автореферата по ВАК РФ 00.00.00, кандидат наук Курузов Илья Алексеевич
- Специальность ВАК РФ00.00.00
- Количество страниц 135
Оглавление диссертации кандидат наук Курузов Илья Алексеевич
Contents
Introduction
1 Low-Dimensional Optimization with Inexact Gradient Information and Its Application
for Saddle-Point Problems
1.1 Problem Statement and Obtained Results
1.2 Description of the Multidimensional Dichotomy Method
1.3 Low-Dimensional Subproblem in x
1.4 Numerical Experiments
1.4.1 Compared Methods
1.4.2 Lagrangian Saddle Point Problem for Quadratic Optimization
2 Methods with Inexact Gradient Based on Sequential Subspace Optimization for Quasar-Convex Problems
2.1 Introduction
2.2 SESOP method with Inexact Gradient
2.2.1 Subspace Optimization Method with Inexact Gradient
2.2.2 Subspace Optimization Method with Inexact Solutions of Auxiliary Subproblems
2.2.3 Conjugate Gradient Method
2.3 Complexity of Solution for Auxillary Problem
2.3.1 Ellipsoid Methods for SESOP
2.3.2 Multidimensional Dichotomy
2.4 Numerical Experiments
2.5 Proof of Results in Sections 2.2 46 2.5.1 Estimation for the sum of wk
2.5.2 Technical Lemma for Theorem
2.5.3 Proof for Theorem
47
3 Stopping Rules for Gradient Methods for Non-Convex Problems with Additive Noise in
Gradient
3.1 Introduction 53 3.1.1 The formulation of the problem
3.2 The Proposed Approach and Main Theoretical Results
3.2.1 Variant of the Gradient Descent method with a constant step-size
3.2.2 Gradient Descent Method with an Adaptive Step-Size Policy
3.3 Numerical Experiments
3.3.1 The quadratic form minimization problem
3.3.2 The problem of minimizing the logistic regression function
3.3.3 Some experiments with the Rosenbrock-type function
3.3.4 Some experiments with the Nesterov-Skokov function
3.4 Proof of some statements in Section
3.4.1 The proof of Theorem
3.4.2 Proof of Proposition
3.5 The Nesterov-Skokov function
3.6 Summary
4 The Mirror-Prox Sliding Method for Non-smooth decentralized saddle-point problems
4.1 Introduction
4.2 The mirror-prox sliding method with inexactness
4.3 Mirror-prox sliding for decentralized optimization
4.3.1 Decentralized problem statement
4.3.2 Moving the affine constraints of SPP to penalty
4.3.3 Main Result
4.4 Numerical Experiments
4.5 Summary
Conclusion
A Appendix for Chapter
A.1 Proof of Lemma
A.2 Proof of Theorem
A.3 Proof of Theorem
A.4 Proof of Theorem
A.5 Proof of Theorem
A.6 Proof of Theorem
A.7 Some Auxillary Results
B Appendix for Chapter
B.1 Proof of Results in Section
B.1.1 Proof of Lemma
B.1.2 Proof of Theorem
B.1.3 Proof of Theorem
B.2 Proof of Results in Section
B.2.1 Proof of Theorem
B.2.2 Proof of Theorem
Рекомендованный список диссертаций по специальности «Другие cпециальности», 00.00.00 шифр ВАК
Численные методы решения негладких задач выпуклой оптимизации с функциональными ограничениями / Numerical Methods for Non-Smooth Convex Optimization Problems with Functional Constraints2020 год, кандидат наук Алкуса Мохаммад
Оптимизация функционалов предыскажения сигнала по типу Виннера-Гаммерштейна, для устранения интермодуляционных компонент, возникающих при усилении мощности2024 год, кандидат наук Масловский Александр Юрьевич
Децентрализованная оптимизация на меняющихся со временем сетях / Decentralized optimization over time-varying networks2023 год, кандидат наук Рогозин Александр Викторович
Общий подход к теории и методологии метода анализа сингулярного спектра2023 год, доктор наук Голяндина Нина Эдуардовна
Новые оценки для стохастических безградиентных методов с одноточечным оракулом / New Bounds for One-point Stochastic Gradient-free Methods2024 год, кандидат наук Новицкий Василий Геннадьевич
Введение диссертации (часть автореферата) на тему «Методы первого порядка для задач оптимизации с неточной информацией о градиенте / First-Order Methods for Optimization Problems with Inexact Gradient Information»
Introduction
This chapter provides an overview of the results in the thesis. Let us start from the description of some applications for inexact gradient.
There are numeruous cases of applitcations where inexact gradient is not available. One of the first papers devoted to that is [20]. Author emphasizes optimization problems in infinite dimension spaces. Due to limitness of computational power, one cannot use with such objects directly in the most applications. Due to that fact, there are different approximation of optimization process that lead to error. Some of such methods estimate gradient by finite differences to solve inverse problems in infinite space (see e.g. [31]). Note, that class of gradient-free methods contains numerous methods. Some of such methods were considered in [78], [79], [25], [18]. Note, gradient free methods can be applied even in finite-dimension spaces when gradient is not available or it is too hard to calculate.
The class of problems with uncalculatable gradient of object functions cover problems with minimization of max-function. It is known as saddle point-problems (SPP). They arise in different applications, e.g. fair training [46], adversarial machine learning [42], Generative adversarial network [81]. One of approaches to solve such problems is inexact calculation of point maximizing internal function and use of obtained point for gradient calculation of object function that given by Demyanov-Danskin Theorem. One of the well-known algorithms in such cases is Multi-Step Gradient Descent Ascent (see e.g.[81, 82, 83]).
On the other hand, the recent development in area of decentralized calculation provides another class of problems with uncalculatable gradient [10]. Such approaches allows to significantly decrease requirements for hard-ware to solve large-scale problems. On the other hand, the main complexity of such approach is that noone knows full function value or full gradient value. Due to popularity of such approah in modern world, there are different methods based on ideas primal-dual methods, gradient tracking
etc. Here we want emphasize methods based on so-called procedure of Consensus (e.g. MSDA [84], D-NC [85], and Mudag [86]). These papers approximate unknowng gradient by sequential mean through so-called communication matrix and it leads to inexactness.
Further, let us note that inexactness has another applications. In some optimization problems, it simulates our ignorance of optimization problem. There are diffrent models that allows us to decompose some complex structure of optimization problem as some simple problem and some approximation error. There are well-known examples of papers that consider some non-smooth problems as smooth problems with inexactness [37]. In such cases, non-smoothness is considered as some additive noise in gradient of function with good property. It allows proving theoretical estimation that significantly better than lower bounds for corresponding class of problems. Such point of view explains practical speed of some methods for non-smooth optimization. Note, that there are specific cases of non-smooth of saddle-point problems too. In particular, it is problems with ^-penalization [44], [45] and some statements of Support Vector Machine [43].
Another class of problems with lack of structure knowledge is non-convex optimization problems. It is well-known that there are diffrent applications that require to minimize non-convex functions. Despite the numerous cases and wide class of methods with well practical performance, there is enough big gap between theory and practice. One of the ways to explain practical performance is to consider optimization problem as sum of dominated convex function and small non-convexity. Especially, paper [80] uses such approach and inexact model to demonstrate why some practical algorithms demonstrate well performance.
Further, we want emphasize another problem with inexact gradient oracle. Due to the numerous applications with inexact gradient, there is big interest to methods working in such cases. There is one important question differing research of such methods. It is questions devoted to an error accumulation. The special case of such question is influence of inexactness for method convergence in comparison with exact case. Usually, speed convergence remain the same untill some level.
Significantly more important question is quality that can be approached by method with inexactness. Some methods like gradient descent [20], [78], [18] is known to converge with some quality that proportional to inexactness. It is natural result that inexactness limits performance of algorithm. On the other hand, there are methods that we cannot guarantee even such result. Some special case is accelerated gradient-type methods. They demonstrate great performance in practice (see [81]) and approach lower bounds in class of (strong) convex smooth problems. At the same time, it is well-known that such methods may diverge in the case of singular convex minimization problems with inexact gradient. it was demonstrated in [26] and [27]. Note, that the last paper shows that relative inexactness avoids this issue, unlike additive inexactness.
Even non-accelerated methods can exhibit undesirable behavior, such as trajectories diverging far from the starting point. If the initial point possesses desirable properties (e.g., small norm, sparsity, low rank), this divergence may lead to their loss.
Finally, Let us present some models of inexactness. We start from classical assumption about additive inexactness. Let us consider minimization problems of convex and L-smooth function f (|| ■ || is
a usual Euclidean norm)
||V/(x)-V/(y)|| ^ L\\x - y|| Vx,y G Rn (1)
with an inexact gradient g : Rn ^ Rn:
(x) -V/(x)|| ^ Ô, (2)
where L > 0 and 5 > 0. The following estimate is known for smooth optimization problems (see [26, 27]):
/(xN) — min /(x) = O ^L||x0 — x*||2N-2 + 5max ||xk — x*||^
for each x* : /(x*) = minxeRn /(x). Quantity maxk<N ||xk - x* || demonstrates effect of error accumulation, described above. It can be large enough and lead to disconvergence.
Another important class of inexact oracle works with Varitional Inequalities. Note, that this class of problems covers saddle-point optimization problem. The following inequality is known as Variational Inequality:
(H(z) + VG(z),z* - z) < 0 Vz G Z. (3)
To solve this, one need to find corresponding z*. Usual assumptions H(z) is a monotone field and G(z) is a differentiable convex function. We will use them during this work too. At the same time, H(z) can has different properties. Well-known case is smooth operator:
(Hz) - H(z2),z1 - za) < M ||zi - z2||2 + M ||zi - za|2
Significantly more complex is non-smooth. One of the ways to consider this non-smoothness is introduce the following inexact oracle.
Assumption 1. Field H(z) is monotone, i.e. for any z1,z2 G Z it holds
(H(z2) - H(zi)z - zi)> 0. Moreover, there exist constants M > 0 and 5 > 0 such that for any z1,z2,za G Z we have
(H(zi) - H(z2),zi - za) < M ||zi - z2||2 + M ||zi - za||2 + 5. (4)
In such definition, "constants of smoothness" M and "inexactness" 5 measure non-smoothness of problems. It gives us balance between rate and desired quality.
Remark 2. Let us consider two cases when Assumption 1 holds.
1. Let H be a L-Lipschitz field, i.e. for any z1, z2 G Z it holds ||H(z1) — H(z2)|| < L ||z1 — z2||. As shown in [28], Assumption 1 holds with M = L and 5 = 0.
2. Let H be bounded, i.e. for any z E Z it holds ||H (z) ||2 < C. Then for any j > 0 we have (HZ) - H(Z2),z1 - zs)< 2 ||zx - Z3|2 + 2j l|H(z1) - H(Z2)||2
< 2 ||zi - zs||2 + 2j(2 ||H(zi)|2 + 2 ||H(z2)|2) < 2||zi - zs||2 + 2C,
i.e. Assumption 1 holds with M = j and 5 = 2C/j.
The main goals of thesis are the following:
• Development of new optimization methods based on the inexact oracle model for saddle-point problems
• Research of accelerated methods that are robust to noise in gradient
• Theoretical study of gradient method trajectories under gradient inexactness
• Development of optimal method for non-smooth saddle-point problems
Scientific novelty. As said above, important class of problems that can be solved by methods with inexactness is saddle-point problem. Despite numerous methods for this problem, noone of them discover possible difference in dimensions of internal and external optimization problems. We consider specific case when one of variables is low-dimensional. In such cases, usual gradient methods is non-effective in comparison with some geometric methods like Ellipsoid Method. We propose another method -multidimensional dichotomy. Note, one dimensional dichotomy is efficient and universal enough to solve unimodal problems. Therefore, it is natural to consider its generalization and to apply for such class of saddle-point problems.
The second part of our research is devoted to problem with error accumulation in accelerated methods for singular convex problems. We provide theoretical analysis of method based on sequential subspace optimization (SESOP) and demonstrate that this accelerated methods does not accumulate error.
Further, we propose solution for problems with trajectory of gradient methods with inexact information. Especially, we consider well-known class of non-convex optimization problems with PL-condition. In such case, we demonstrate that trajectory can move far away from starting point. Moreover, it can lead to overfitting in machine learning problems. Note, that recent development in this area leads to wide usage of such methods in mathematical modeling. To solve this problem, we provide stopping rule based on gradient norm and theoretical and practical evidences about its benefit. Especially, we check this on such important problems as linear and logistic regression problems that uses in different areas for regression and classification.
Finally, we use inexact oracle to approach lower bound in the case of non-smooth saddle point problems. Note the recent paper [10] show importance of decentralized algorithm for effective solution of mathematical modeling problems. Therefore, we consider decentralized setting for non-smooth SPP too. Despite the numerous research in this sphere and well-known lower bound for classes of convex-concave saddle-point problems, there is no optimal algorithm for such problem in decentralized case. To solve
this problem, we join three ideas: sliding, penalty for decentralized problem and inexact oracle to avoid non-smooth analysis. It gives us optimal in theory method for deterministic and stochastic saddle-point problems with non-smooth objective in decentralized setting.
Therefore, the scientific novelty can be summarized in the following four main points:
• New effective algorithm for Saddle-Point problems with low dimension in one group of variables
• Analysis of methods based on subspace optimization with inexact gradient and inexact solution of auxillary problem
• New stopping rule for adaptive gradient method with inexact gradient
• Optimal algorithm for non-convex decentralized saddle-point problems and its analysis based on inexact oracle
Theoretical and practical value of the work in the thesis. The work addresses several open theoretical questions, including the construction of an optimal distributed algorithm for non-smooth saddle-point problems and the reduction of error accumulation in accelerated gradient methods. Moreover, the proposed optimization methods are applicable to real-world problems, demonstrating their practical utility and effectiveness.
Statements to be defended are the following:
1. A new efficient algorithm for low-dimensional optimization with applications to saddle-point problems of special structure, namely with low-dimensional external or internal variables, was developed.
2. A new analysis of the SESOP method was conducted, demonstrating its robustness to gradient noise while maintaining accelerated convergence rates.
3. The problem of potential unboundedness of gradient method trajectories when using inexact gradients was studied, and an approach to this problem was proposed based on a stopping criterion in terms of the small norm of the inexact gradient.
4. A new method for non-smooth saddle-point problems in a distributed setting was developed, combining sliding and penalty methods. Its optimality for this class of problems was proven using the inexact oracle framework.
The main results in the thesis are presented in the following papers. Published papers.
[1] I. Kuruzov and F. Stonyakin (2021). Sequential Subspace Optimization for Quasar-Convex Optimization Problems with Inexact Gradient. In: Olenev, N.N., Evtushenko, Y.G., Jacimovic, M., Khachay, M., Malkova, V. (eds) Advances in Optimization and Applications. OPTIMA 2021. Communications in Computer and Information Science, vol 1514. Springer, Cham. https://doi.org/10.1007/978-3-030-92711-0_2.
[2] E. Gladin, I. Kuruzov, M. Alkousa, A. Gasnikov, D. Pasechnyuk, F. Stonyakin (2023). Solving strongly convex-concave composite saddle-point problems with low dimension of one group of variable. Sbornik: Mathematics. 214. 285-333. 10.4213/sm9700e.
[3] F. Stonyakin, I. Kuruzov, B.Polyak (2023). Stopping Rules for Gradient Methods for Non-convex Problems with Additive Noise in Gradient. Journal of Optimization Theory and Applications. 198. 1-21. 10.1007/s10957-023-02245-w.
[4] I. Kuruzov, A. Rogozin, D. Yarmoshik, A. Gasnikov (2025). The mirror-prox sliding method for non-smooth decentralized saddle-point problems. Optimization Methods and Software. 1-25. 10.1080/10556788.2024.2396295.
The results were presented at the following conferences:
1. Sequential subspace optimization for quasar-convex optimization problems with inexact gradient. XII International Conference on Optimization and Applications. Petrovac, Montenegro, September 27, 2021 (online)
2. Gradient-Type Methods for Optimization Problems with Polyak-Lojasiewicz Condition: Early Stopping and Adaptivity to Inexactness Parameter XIII International Conference on Optimization and Applications. Petrovac, Montenegro, September 26, 2022 (online)
3. On Some Versions of Subspace Optimization Methods with Inexact Gradient Information. International Conference on Computational Optimization 2024. Russia, Kazan, October 10, 2024 (poster)
Personal contribution of the applicant in the thesis and works is the following.
• Publications [1,3, 4]: the development of optimization methods, their theoretical analysis, and the implementation of numerical experiments.
• Publication [2]: the theoretical analysis of the multidimensional dichotomy method, its application to saddle-point problems, and its practical implementation.
All results presented in this dissertation were obtained by the author personally. The research problems were formulated by the author's scientific supervisor, F.S. Stonyakin, and scientific consultant, A.V. Gasnikov. From collaborative works, only the results obtained by the author's personal contribution have been included in the dissertation.
The thesis consists of an introduction, four main chapters, a conclusion, a list of references, and two appendices.
Похожие диссертационные работы по специальности «Другие cпециальности», 00.00.00 шифр ВАК
Численные методы оптимизации для задач большой размерности: неточный оракул и прямо-двойственный анализ2020 год, доктор наук Двуреченский Павел Евгеньевич
Подход к отслеживанию траектории многороторных летательных аппаратов в неизвестных условиях / Trajectory Tracking Approach for Multi-rotor Aerial Vehicles in Unknown Environments2024 год, кандидат наук Кулатхунга Мудийанселаге Гисара Пратхап Кулатхунга
Безградиентные методы выпуклой оптимизации в условиях шума / Gradient-Free Methods for Convex Optimization under Noise Conditions2025 год, кандидат наук Лобанов Александр Владимирович
Неклассические методы вероятностного и статистического анализа моделей смеси распределений2022 год, доктор наук Панов Владимир Александрович
Алгоритмы ускорения сверточных нейронных сетей / Algorithms For Speeding Up Convolutional Neural Networks2018 год, кандидат наук Лебедев Вадим Владимирович
Заключение диссертации по теме «Другие cпециальности», Курузов Илья Алексеевич
Conclusion
Firstly, we present the new low-dimensional method based on the well-known dichotomy method. It generalizes the one-dimensional approach to a multidimensional hypercube and provides theoretical guarantees for convergence in the case of an inexact gradient. Furthermore, we provide an estimation of the complexity required to apply the new method to a saddle-point problem with a low-dimensional block of variables. Note that the complexity of this method grows significantly faster than that of Ellipsoid method. At the same time, it is more stable numerically and outperforms other low-dimensional methods in some experiments.
Furthermore, we propose a modification of the Sequential Subspace Optimization Method [53] for the case of a 5-additive noise in the gradient (2). For the first time, a result has been obtained that describes the influence of this inexactness on the convergence rate estimate, where the quantity O (5 maxk |xk — x* |) is replaced by the constant O(5|x0 — x*|). Obviously, the sequence |xk — x*| can be unbounded without additional assumptions. Therefore, the obtained result guarantees the convergence of the method at a rate of O (k) without error accumulation.
The aforementioned result was obtained by solving an auxiliary problem after each iteration. Therefore, we investigate the influence of inexactness in solving the auxiliary minimization problems on the overall theoretical estimate for Algorithm (2). This allows for estimating the complexity of this subprocedure using different low-dimensional methods such as the Ellipsoid method.
The next problem is trajectory of gradient methods with inexact gradient. We propose stopping criteria for such methods. The main focus of this research is the case of non-convex functions. The chapter presents a stopping criterion that establishes a compromise between the accuracy of the resulting point and the distance to the starting point. Moreover, it is shown that the method does not move farther from the starting point than a distance comparable to the distance to the nearest solution, provided the
function satisfies the PL-condition.
Additionally, the chapter considers the cases of both constant and adaptive step sizes in gradient methods. For both cases, we present a theoretical analysis and estimate the number of iterations required to satisfy the stopping criterion or to find a point with the required accuracy.
Finally, we propose the optimal algorithm for decentralized saddle-point problem with a non-smooth convex-concave objective function. The main idea of this optimal approach is to use the mirror-prox sliding technique, recently proposed in [28], for the penalized initial problem with affine constraints. The proposed approach is considered for both deterministic and stochastic cases. We show that in both cases it finds an e-optimal point in O (total communication steps and O (J2) gradient computations per node. Note that these estimates are optimal and cannot be improved.
Moreover, mirror-prox sliding technique proposed in [28] is proven to converge to a solution for a non-smooth operator H. We propose a generalization of the Lipschitz condition for operator H that introduces a new parameter 5. We show that the error 5 in the inexact setup (4) does not accumulate in the sliding procedure. This means that the mirror-prox sliding method can be applied to a much wider class of optimization problems.
In conclusion, it is important to note the limitations of the present work. While SESOP method and the proposed stopping rule have a comprehensive analysis for the case of deterministic noise, their extension to stochastic settings remains an open question, despite the prevalence of stochasticity in numerous applications.
Furthermore, the approach based on MPS faces a common practical challenge: its implementation requires prior knowledge of several parameters. Specifically, to run the algorithm, one needs estimates for the constants L and M, as well as appropriate adjustments for the regularization parameters Ra and Rp. In the stochastic case, this issue is compounded by the need for an estimate of the gradient variance a2 and the selection of sample batch sizes QX0 and Qyo to achieve the desired accuracy.
Список литературы диссертационного исследования кандидат наук Курузов Илья Алексеевич, 2025 год
Bibliography
[1] I. Kuruzov and F. Stonyakin. Sequential Subspace Optimization for Quasar-Convex Optimization Problems with Inexact Gradient // Advances in Optimization and Applications. OPTIMA 2021. Communications in Computer and Information Science - 2021 - Vol. 1514 P. 19-33.
[2] E. Gladin, I. Kuruzov, F. Stonyakin, D. Pasechnyuk, M. Alkousa, A. Gasnikov. Solving strongly convex-concave composite saddle point problems with a small dimension of one of the variables // Sbornik: Mathematics - 2023 - Vol. 214 - P. 1-25.
[3] F. Stonyakin, I. Kuruzov, B. Polyak. Stopping rules for gradient methods for non-convex problems with additive noise in gradient // Journal of Optimization Theory and Applications - 2023 - Vol. 198 -P. 1-21.
[4] I. Kuruzov, A. Rogozin, D. Yarmoshik, A. Gasnikov. The mirror-prox sliding method for non-smooth decentralized saddle-point problems // Optimization Methods and Software - 2025 - P. 1-25
[5] S. Guminov, A. Gasnikov, I. Kuruzov. Accelerated methods for weakly-quasi-convex optimization problems // Computational Management Science - 2023 - Vol. 20 - P. 1-25.
[6] H. Xu, C. Caramanis, S. Mannor. Robust regression and lasso // Advances in Neural Information Processing Systems - 2008 - Vol. 21 - P. 1-8.
[7] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers // Foundations and Trends in Machine Learning -
2011-Vol. 3, № 1-P. 1-122.
[8] A. Nemirovski, S. Onn, U. G. Rothblum. Accuracy certificates for computational problems with convex structure // Mathematics of Operations Research - 2010 - Vol. 35, № 1-P. 52-78.
[9] G. Lan. First-order and Stochastic Optimization Methods for Machine Learning // Springer - 2020 -350 p.
[10] X. Lian, C. Zhang, H. Zhang, C.-J. Hsieh, W. Zhang, J. Liu. Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent // Advances in Neural Information Processing Systems - 2017 - Vol. 30 - P. 1-10.
[11] Yu. Nesterov. Universal gradient methods for convex optimization problems // Mathematical Programming - 2015 - Vol. 152, № 1-P. 381-404.
[12] A. V. Gasnikov. Modern numerical optimization methods: The universal gradient descent method // MCCME - Moscow - 2021 - 250 p.
[13] D. Kovalev, A. Beznosikov, A. Sadiev, M. Persiianov, P. Richtarik, A. Gasnikov. Optimal algorithms for decentralized stochastic variational inequalities // arXiv preprint - 2022. URL: https://arxiv.org/abs/2202.02771 (Date of access: 30.06.2025).
[14] A. Beznosikov, V. Samokhin, A. Gasnikov. Distributed saddle-point problems: Lower bounds, near-optimal and robust algorithms // Optimization Methods and Software - 2025 - Vol. 40, № 1 -P. 1-25.
[15] B. T. Polyak. Gradient methods for minimizing functionals // Comput. Math. Math. Phys. - 1963 -Vol. 3, № 4 - P. 864-878.
[16] H. Karimi, J. Nutini, M. Schmidt. Linear convergence of gradient and proximal-gradient methods under the Polyak-Lojasiewicz condition // Joint European Conference on Machine Learning and Knowledge Discovery in Databases - 2016 - P. 795-811.
[17] M. Belkin. Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation // Acta Numerica - 2021 - Vol. 30 - P. 203-248.
[18] O. Devolder, F. Glineur, Yu. Nesterov. First-order methods of smooth convex optimization with inexact oracle // Mathematical Programming - 2014 - Vol. 146, № 1-P. 37-75.
[19] A. V. Gasnikov, D. M. Dvinskikh, P. E. Dvurechensky, D. I. Kamzolov, V. V. Matyukhin, D. A. Pasechnyuk, N. K. Tupitsa, A. V. Chernov. Accelerated meta-algorithm for convex opti-
mization problems // Computational Mathematics and Mathematical Physics - 2020 - Vol. 60, № 12 -P. 1-15.
[20] B. T. Polyak. Introduction to Optimization // Nauka - Moscow - 1983 - 384 p.
[21] D. A. Pasechnyuk, F. S. Stonyakin. On a method for minimizing a convex Lipschitz function of two variables on a square // Computer Research and Modeling - 2019 - Vol. 11, № 3 - P. 379-395.
[22] A. V. Gasnikov, P. E. Dvurechensky, Yu. E. Nesterov. Stochastic gradient methods with inexact oracle // Proceedings of MIPT - 2016 - Vol. 8, № 1 - P. 41-91.
[23] A. V. Gasnikov, Yu. E. Nesterov. Universal method for stochastic composite optimization problems // Computational Mathematics and Mathematical Physics - 2018 - Vol. 28, № 1 - P. 51-68.
[24] M. Alkousa, D. Dvinskikh, F. Stonyakin, A. Gasnikov, D. Kovalev. Accelerated methods for saddle point problems // Computational Mathematics and Mathematical Physics - 2020 - Vol. 60, № 11-P. 1843-1866.
[25] O. Devolder. Exactness, inexactness and stochasticity in first-order methods for large-scale convex optimization // PhD thesis - Université Catholique de Louvain - 2013 - 150 p.
[26] A. d'Aspremont. Smooth optimization with approximate gradient // SIAM Journal on Optimization -2008-Vol. 19, № 3-P. 1171-1183.
[27] A. Vasin, A. Gasnikov, V. Spokoiny. Stopping rules for accelerated gradient methods with additive noise in gradient // arXiv preprint - 2021. URL: https://arxiv.org/abs/2102.02921 (Date of access: 30.06.2025).
[28] G. Lan, Y. Ouyang. Mirror-prox sliding methods for solving a class of monotone variational inequalities // arXiv preprint - 2021. URL: https://arxiv.org/abs/2111.00996 (Date of access: 30.06.2025).
[29] B. Polyak, A. Tremba. New versions of Newton method: step-size choice, convergence domain and under-determined equations // Optimization Methods and Software - 2020 - Vol. 35, № 6 -P. 1272-1303.
[30] Yu. Nesterov, B. T. Polyak. Cubic regularization of Newton method and its global performance // Mathematical Programming - 2006 - Vol. 108, № 1 - P. 177-205.
[31] S. I. Kabanikhin. Inverse and Ill-posed Problems // deGruyter - 2011 - 450 p.
[32] F. Vasilyev. Optimization Methods // FP - Moscow - 2002 - 300 p.
[33] Yu. Nesterov. Algorithmic convex optimization // Doctoral thesis - 2013 - 200 p.
[34] Yu. Nesterov, V. Skokov. On the issue of testing unconstrained optimization algorithms // Numerical methods of mathematical programming - 1980 - P. 77-91.
[35] E. Vorontsova, R. Hildbrand, A. Gasnikov, F. Stonyakin. Convex optimization // MIPT - Moscow -2021 -280 p.
[36] E. Gorbunov, D. Dvinskikh, A. Gasnikov. Optimal decentralized distributed algorithms for stochastic convex optimization // arXiv preprint - 2019. URL: https://arxiv.org/abs/1911.07363 (Date of access: 30.06.2025).
[37] E. Gorbunov, A. Rogozin, A. Beznosikov, D. Dvinskikh, A. Gasnikov. Recent theoretical advances in decentralized distributed convex optimization // High-Dimensional Optimization and Probability -2022-P. 253-325.
[38] Y. Drori, S. Sabach, M. Teboulle. A simple algorithm for a class of nonsmooth convex-concave saddle-point problems // Operations Research Letters - 2015 - Vol. 43, № 2 - P. 209-214.
[39] O. Yufereva, M. Persiianov, P. Dvurechensky, A. Gasnikov, D. Kovalev. Decentralized computation of Wasserstein barycenter over time-varying networks // arXiv preprint - 2022. URL: https://arxiv.org/abs/2205.15669 (Date of access: 30.06.2025).
[40] D. Dvinskikh, A. Gasnikov. Decentralized and parallel primal and dual accelerated methods for stochastic convex programming problems // Journal of Inverse and Ill-posed Problems - 2021 -Vol. 29, № 3-P. 385-405.
[41] G. Lan, S. Lee, Y. Zhou. Communication-efficient algorithms for decentralized and stochastic optimization // Mathematical Programming - 2017 - P. 1-48.
[42] A. Kurakin, I. J. Goodfellow, S. Bengio. Adversarial machine learning at scale // International Conference on Learning Representations - 2017 - P. 1-15.
[43] L. Bottou, C.-J. Lin. Support vector machine solvers // Large Scale Kernel Machines - 2007 -P. 301-320.
[44] S. Mahdizadehaghdam, A. Panahi, H. Krim. Sparse generative adversarial network // IEEE International Conference on Computer Vision Workshops - 2019 - P. 3063-3071.
[45] J. Mairal, F. Bach, J. Ponce, G. Sapiro. Online dictionary learning for sparse coding // Proceedings of the 26th International Conference on Machine Learning - 2009 - Vol. 382 - P. 87.
[46] B. Barazandeh, D. A. Tarzanagh, G. Michailidis. Solving a class of non-convex min-max games using adaptive momentum methods // IEEE International Conference on Acoustics, Speech and Signal Processing - 2021 - P. 3625-3629.
[47] M. Collins, R. Schapire, Y. Singer. Logistic regression, AdaBoost and Bregman distances // Machine Learning - 2002 - Vol. 48 - P. 253-285.
[48] L. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming // USSR Computational Mathematics and Mathematical Physics - 1967 - Vol. 7 - P. 207-217.
[49] I. Necoara, Yu. Nesterov, F. Glineur. Linear convergence of first order methods for non-strongly convex optimization // Mathematical Programming - 2019 - Vol. 175, № 1 - P. 69-107.
[50] A. S. Nemirovsky, D. B. Yudin. Problem Complexity and Optimization Method Efficiency // Nauka -Moscow - 1979-400 p.
[51] O. Hinder, A. Sidford, N. S. Sohoni. Near-Optimal Methods for Minimizing Star-Convex Functions and Beyond // Proceedings of Machine Learning Research - 2020 - Vol. 125 - P. 1-45.
[52] Y. Zhou, J. Yang, H. Zhang, Y. Liang, V. Tarokh. SGD Converges To Global Minimum In Deep Learning Via Star-Convex Path // arXiv preprint - 2019. URL: https://arxiv.org/abs/1901.00451 (Date of access: 30.06.2025).
[53] G. Narkiss, M. Zibulevsky. Sequential subspace optimization method for large-scale unconstrained problems // Technical Report - Technion-IIT - 2005 - 25 p.
[54] M. Hardt, T. Ma, B. Recht. Gradient descent learns linear dynamical systems // Journal of Machine Learning Research - 2018 - Vol. 19, № 29 - P. 1-44.
[55] A. d'Aspremont, D. Scieur, A. Taylor. Acceleration methods // Foundations and Trends® in Optimization - 2021 - Vol. 5 - P. 1-245.
[56] P. M. Vaidya. A new algorithm for minimizing convex functions over convex sets // Mathematical Programming - 1996 - Vol. 73 - P. 291-341.
[57] A. Nemirovski. Prox-method with rate of convergence O(1/T) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems // SIAM J. Optim. - 2004 - Vol. 15, № 1 - P. 229-251.
[58] S. Bubeck. Convex optimization: algorithms and complexity // Foundations and Trends in Machine Learning - 2015 - Vol. 8, № 3-4 - P. 231-357.
[59] A. V. Gasnikov, A. I. Tyurin. Fast Gradient Descent for Convex Minimization Problems with an Oracle Producing a (5, L)-Model of Function at the Requested Point // Comput. Math. and Math. Phys. - 2019 - Vol. 59, № 7 - P. 1085-1097.
[60] T. Lin, C. Jin, M. I. Jordan. Near-Optimal Algorithms for Minimax Optimization // arXiv preprint -2020. URL: https://arxiv.org/abs/2002.02417 (Date of access: 30.06.2025).
[61] Yu. Nesterov. Lectures on convex optimization // Springer - 2018 - 500 p.
[62] J. Zhang, M. Hong, S. Zhang. On Lower Iteration Complexity Bounds for the Saddle Point Problems // Mathematical Programming - 2021 - Vol. 189 - P. 1-35.
[63] O. Devolder, F. Glineur, Yu. Nesterov. First-order methods with inexact oracle: the strongly convex case // CORE Discussion Papers - 2013 - Vol. 2013016 - P. 47.
[64] L. T. K. Hien, R. Zhao, W. B. Haskell. An inexact primal-dual framework for large-scale non bilinear saddle point problem // Journal of Optimization Theory and Applications - 2023 - Vol. 197 - P. 1-30.
[65] Yu. Nesterov. Excessive Gap Technique in Nonsmooth Convex Minimization // SIAM J. Optim. -2005 - Vol. 16, № 1-P. 235-249.
[66] B. Cox, A. Juditsky, A. Nemirovski. Decomposition Techniques for Bilinear Saddle Point Problems and Variational Inequalities with Affine Monotone Operators // J. Optim. Theory Appl. - 2017 -Vol. 172 - P. 402-435.
[67] Y. Wang, J. Li. Improved Algorithms for Convex-Concave Minimax Optimization // arXiv preprint -2020. URL: https://arxiv.org/abs/2006.06359 (Date of access: 30.06.2025).
[68] W. Azizian, I. Mitliagkas, S. Lacoste-Julien, G. Gidel. A Tight and Unified Analysis of Extragradient for a Whole Spectrum of Differentiable Games // Proceedings of Machine Learning Research - 2020 -Vol. 108-P. 2863-2873.
[69] F. Stonyakin, A. Tyurin, A. Gasnikov, P. Dvurechensky, A. Agafonov, D. Dvinskikh, D. Pasech-nyuk, S. Artamonov, V. Piskunova. Inexact Relative Smoothness and Strong Convexity for Optimization and Variational Inequalities by Inexact Model // arXiv preprint - 2020. URL: https://arxiv.org/abs/2001.09013 (Date of access: 30.06.2025).
[70] Y. T. Lee, A. Sidford, S. C. Wong. A faster cutting plane method and its implications for combinatorial and convex optimization //2015 IEEE 56th Annual Symposium on Foundations of Computer Science - 2015 - P. 1049-1065.
[71] J. M. Danskin. The theory of Max-Min, with applications // J. SIAM Appl. Math. - 1966 - Vol. 14, № 4-P. 1-25.
[72] F. S. Stonyakin, M. S. Alkousa, A. A. Titov, V. V. Piskunova. On Some Methods for Strongly Convex Optimization Problems with One Functional Constraint // MOTOR 2019 - 2019 - P. 82-96.
[73] A. Nemirovski. Information-based complexity of convex programming // Lecture Notes - 1995 -100 p.
[74] I. V. Emelin, M. A. Krasnosel'skii. The stoppage rule in iterative procedures of solving ill-posed problems // Autom. Remote Control - 1979 - Vol. 39 - P. 1783-1787.
[75] C. Shen, H. Li. On the Dual Formulation of Boosting Algorithms // IEEE Transactions on - 2010 -Vol. 32, № 12-P. 2216-2231.
[76] A. Gasnikov. Searching equilibriums in large transport network // arXiv preprint - 2016. URL: https://arxiv.org/abs/1607.03124 (Date of access: 30.06.2025).
[77] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio. Generative adversarial networks // NeurIPS - 2014 - P. 1-9.
[78] A. Ben-Tal, A. Nemirovski. Lectures on Modern Convex Optimization (Lecture Notes) // Personal web-page of A. Nemirovski - 2015 - 300 p.
[79] A. Conn, K. Scheinberg, L. Vicente. Introduction to Derivative-Free Optimization // Society for Industrial and Applied Mathematics - 2009 - 400 p.
[80] A. Risteski, Y. Li. Algorithms and matching lower bounds for approximately-convex optimization // Advances in Neural Information Processing Systems - 2016 - Vol. 29 - P. 4745-4753.
[81] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio. Generative adversarial networks // Communications of the ACM - 2020 - Vol. 63, № 11 - P. 139-144.
[82] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, A. C. Courville. Improved training of wasserstein gans // Advances in Neural Information Processing Systems - 2017 - Vol. 30 - P. 1-10.
[83] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, A. Vladu. Towards deep learning models resistant to adversarial attacks // International Conference on Learning Representations - 2018 - P. 1-15. URL: https://openreview.net/pdf?id=rJzIBfZAb (Date of access: 30.06.2025).
[84] K. Scaman, F. Bach, S. Bubeck, Y. T. Lee, L. Massoulié. Optimal algorithms for smooth and strongly convex distributed optimization in networks // International Conference on Machine Learning -2017-P. 3027-3036.
[85] D. Jakovetic. A unification and generalization of exact distributed first-order methods // IEEE Transactions on Signal and Information Processing over Networks - 2018 - Vol. 5, № 1 - P. 31-46.
[86] H. Ye, L. Luo, Z. Zhou, T. Zhang. Multi-consensus decentralized accelerated gradient descent // Journal of Machine Learning Research - 2023 - Vol. 24, № 306 - P. 1-50.
Обратите внимание, представленные выше научные тексты размещены для ознакомления и получены посредством распознавания оригинальных текстов диссертаций (OCR). В связи с чем, в них могут содержаться ошибки, связанные с несовершенством алгоритмов распознавания. В PDF файлах диссертаций и авторефератов, которые мы доставляем, подобных ошибок нет.