A Comparison of Artificial Intelligence Algorithms
for Dynamic Power Allocation in Flexible High
Throughput Satellites
by
Juan Jose Garau Luis
Submitted to the Department of Aeronautics and Astronautics
in partial fulfillment of the requirements for the degree of
Master of Science in Aeronautics and Astronautics
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
May 2020
c
○ Massachusetts Institute of Technology 2020. All rights reserved.
Author . . . .
Department of Aeronautics and Astronautics
May 13, 2020
Certified by . . . .
Prof. Edward F. Crawley
Professor of Aeronautics and Astronautics
Thesis Supervisor
Accepted by . . . .
Sertac Karaman
Associate Professor of Aeronautics and Astronautics
Chair, Graduate Program Committee
A Comparison of Artificial Intelligence Algorithms for
Dynamic Power Allocation in Flexible High Throughput
Satellites
by
Juan Jose Garau Luis
Submitted to the Department of Aeronautics and Astronautics on May 13, 2020, in partial fulfillment of the
requirements for the degree of
Master of Science in Aeronautics and Astronautics
Abstract
The Dynamic Resource Management (DRM) problem in the context of multibeam satellite communications is becoming more relevant than ever. The future landscape of the industry will be defined by a substantial increase in demand alongside the introduction of digital and highly flexible payloads able to operate and reconfigure hundreds or even thousands of beams in real time. This increase in complexity and dimensionality puts the spotlight on new resource allocation strategies that use autonomous algorithms at the core of their decision-making systems. These algorithms must be able to find optimal resource allocations in real or near-real time. Traditional optimization approaches no longer meet all these DRM requirements and the research community is studying the application of Artificial Intelligence (AI) algorithms to the problem as a potential alternative that satisfies the operational constraints.
Although multiple AI approaches have been proposed in the recent years, most of the analyses have been conducted under assumptions that do not entirely reflect the new operation scenarios’ requirements, such as near-real time performance or high-dimensionality. Furthermore, little work has been done in thoroughly comparing the performance of different algorithms and characterizing them. This Thesis considers the Dynamic Power Allocation problem, a DRM subproblem, as a use case and compares nine different AI algorithms under the same near-real time operational assumptions, using the same satellite and link budget models, and four different demand datasets. The study focuses on Genetic Algorithms (GA), Simulated Annealing (SA), Particle Swarm Optimization (PSO), Deep Reinforcement Learning (DRL), and
hybrid approaches, including a novel DRL-GA hybrid.
The comparison considers the following characteristics: time convergence, con-tinuous operability, scalability, and robustness. After evaluating the algorithms’ performance on the different test scenarios, three algorithms are identified as potential candidates to be used during real satellite operations. The novel DRL-GA implemen-tation shows the best overall performance, being also the most robust. When the
update frequency is in the order of seconds, DRL is identified as the best algorithm, since it is the fastest. Finally, when the online data substantially diverges from the training dataset of the DRL algorithm, both DRL and DRL-GA hybrid might not perform adequately and an individual GA might be the best option instead.
Thesis Supervisor: Prof. Edward F. Crawley Title: Professor of Aeronautics and Astronautics
Acknowledgments
This Thesis has not been written under normal circumstances. Right now the world is suffering the social, economic, and structural effects of the COVID-19 global pandemic. My first thoughts of gratitude go to all healthcare workers on the front line and people carrying out essential tasks that keep our society moving. I also appreciate the efforts that my advisor, the department of Aeronautics and Astronautics, and MIT have made to facilitate my work during these months.
I would like to sincerely thank my advisor, Prof. Edward Crawley, and Dr. Bruce Cameron for their support, advice, and guidance throughout these past two years. Ed, Bruce, I really appreciate your honest input and implication to make this research project succeed. It is truly an honor to work side by side with you.
I would also like to thank everyone else that has been part of this research project. First, I really appreciate the support received from SES, especially the feedback and motivation from Joel Grotz and Valvanera Moreno. Next, I would like to express my gratitude to my labmate Markus Guerster, with whom I have shared many moments of joy during the project. Markus, working together on this project has been a rewarding learning experience, I sincerely wish you all the best in your future endeavors. I also appreciate the valuable inputs and help received from Dr. Kalyan Veeramachaneni. I would like to acknowledge the rest of my labmates who also have been part of this project at some point during these two years. Nils, Damon, Rubén, and Skylar, you have been a source of inspiration and stimulating discussions, best of luck on your next steps.
The rest of my labmates at the Engineering Systems Lab have been a key factor to succeed in this first part of my graduate studies. Sydney, Matt, Alex, Anne, Eric, Beldon, George, Tommy, Katie, Michael, thank you for your constant encouragement and for all the fun times together. I would also like to show my appreciation to my former labmate, Íñigo del Portillo, for his valuable advice and honest guidance during my first year at the lab. I do not want to miss the chance to say thank you for the invaluable administrative support I always get from Amy Jarvis, Beth Marois, and
Ping Lee, and the counseling received from Suraiya Baluch.
Going through these two years would not have been possible without the support of my roomate Marc de Cea. Marc, I feel very fortunate to share the MIT adventure with you. I am also deeply grateful to my friends for their continous doses of joy and fun throughout these two years: María, Inés, Helena, Ximo, Alex, Dani, Álvaro, Reus, Íñigo, Ondrej, Lukas, Regina, Faisal. Also, being part of Spain@MIT has been so much fun!
I also want to thank the tremendous support received from my family and friends in Spain. Big thanks to my parents, Ana and Simón, for encouraging me to work hard and aim high. Also, thanks to my grandparents, godparents, aunts, uncles, cousins and the rest of my family for motivating me and constantly checking in. I am also deeply grateful to my closest friends – you know who you are – who always show me that distance is nothing when it comes to our friendship.
Finally, I would like to dedicate this Thesis to my late grandmother Antònia, who passed away shortly before it was completed. Thank you for everything you have done for me, pradina.
Contents
1 Introduction 19 1.1 Motivation . . . 19 1.2 General Objectives . . . 22 1.3 Literature Review . . . 23 1.4 Specific Objectives . . . 26 1.5 Thesis Overview . . . 282 Dynamic Power Allocation in Multibeam Satellites 31 2.1 Introduction . . . 31
2.2 Dynamic Resource Management . . . 32
2.3 Multibeam Satellite Communications Systems . . . 33
2.3.1 Overview . . . 33
2.3.2 Dynamic Resource Management in Multibeam Satellites . . . 35
2.3.3 Artificial Intelligence for the DRM problem in satellite commu-nications . . . 38
2.4 Dynamic Power Allocation Problem . . . 40
2.4.1 Power Allocation and Transmission . . . 41
2.4.2 Problem Statement . . . 43
2.4.3 Objective Metrics . . . 45
3 Algorithm Implementations 47 3.1 Introduction . . . 47
3.2.1 Genetic Algorithm . . . 48
3.2.2 Simulated Annealing . . . 49
3.2.3 Particle Swarm Optimization . . . 52
3.3 Deep Reinforcement Learning . . . 53
3.4 Hybrid Algorithms . . . 58 3.4.1 SA-GA Hybrid . . . 58 3.4.2 PSO-GA Hybrid . . . 59 3.4.3 DRL-GA Hybrid . . . 60 4 Simulation Models 61 4.1 Introduction . . . 61 4.2 Satellite Model . . . 61 4.3 Demand Models . . . 62
4.4 Link Budget Model . . . 64
5 Results 67 5.1 Introduction . . . 67
5.2 Convergence Analyses . . . 68
5.3 Continuous Operation Performance . . . 71
5.4 Scalability Analyses . . . 77 5.5 Robustness Analyses . . . 81 5.5.1 Sequential Activation . . . 82 5.5.2 Spurious Events . . . 85 5.5.3 Non-stationarity . . . 88 5.5.4 Conclusions on robustness . . . 92 6 Conclusions 95 6.1 Thesis Summary . . . 95 6.2 Main Findings . . . 97 6.3 Future Work . . . 99
A Additional Figures 101
A.1 Convergence Analyses . . . 101
A.2 Continuous Operation . . . 102
A.3 Scalability Analyses . . . 104
A.4 Robustness Analyses . . . 109
B Metric details 113 B.1 Satisfaction-Gap Measure . . . 113
List of Figures
1-1 Data rate provided by a slightly flexible system (blue) and a highly flexible system (green) with respect to the requested data rate (red). The amount of resource savings corresponds to the area between the green and the blue curves. . . 20
2-1 Multibeam satellite with 7 beams. . . 34
3-1 DRL Architecture. . . 55
4-1 Normalized aggregated demand plot for the four scenarios considered. . . 64
5-1 Average aggregated power and 95% confidence interval against computing time available. Power is normalized with respect to the optimal aggregated power. Reference scenario used. . 70 5-2 Average aggregated UD and 95% confidence interval against
computing time available. UD is normalized with respect to the aggregated demand. Reference scenario used. . . 70 5-3 Aggregated power delivered by every algorithm during the
continuous execution simulations. Power is normalized with respect to the optimal aggregated power (optimal power is 1). Reference scenario used. . . 72
5-4 Aggregated UD achieved by every algorithm during the con-tinuous execution simulations. UD is normalized with respect to aggregated demand (optimal UD is 0). Reference scenario used. . . 73 5-5 Average power and UD performance per algorithm on the
Ref-erence dataset. Standard deviation for each metric is shown as one of the semiaxes of the ellipses. . . 74 5-6 Average aggregated power against number of beams. Power
is normalized with respect to the optimal aggregated power. Reference scenario used. . . 79 5-7 Average aggregated UD against number of beams. UD is
nor-malized with respect to the aggregated demand. Reference scenario used. . . 79 5-8 Aggregated power delivered in a continuous execution using
the Sequential activation dataset. Power is normalized with respect to the optimal aggregated power. . . 83 5-9 Aggregated UD achieved in a continuous execution using the
Sequential activation dataset. UD is normalized with respect to the aggregated demand. . . 84 5-10 Average power and UD performance per algorithm on the
Sequential activation dataset. Standard deviation for each metric is shown as one of the semiaxes of the ellipses. . . 85 5-11 Aggregated power delivered in a continuous execution using
the Spurious dataset. Power is normalized with respect to the optimal aggregated power. . . 86 5-12 Aggregated UD achieved in a continuous execution using the
Spurious dataset. UD is normalized with respect to the ag-gregated demand. . . 87
5-13 Average power and UD performance per algorithm on the Spurious dataset. Standard deviation for each metric is shown as one of the semiaxes of the ellipses. . . 88 5-14 Aggregated power delivered in a continuous execution using
the Non-stationary dataset. Power is normalized with respect to the optimal aggregated power. . . 89 5-15 Aggregated UD achieved in a continuous execution using the
Non-stationary dataset. UD is normalized with respect to the aggregated demand. . . 90 5-16 Average power and UD performance per algorithm on the
Non-stationary dataset. Standard deviation for each metric is shown as one of the semiaxes of the ellipses. . . 91
A-1 Average aggregated SGM and 95% confidence interval against computing time available for the SA algorithm. Reference scenario used. . . 101 A-2 Aggregated power delivered by every algorithm during the
continuous execution simulations. Power is normalized with respect to the optimal aggregated power (optimal power is 1). Reference scenario used. . . 102 A-3 Aggregated UD achieved by every algorithm during the
con-tinuous execution simulations. UD is normalized with respect to aggregated demand (optimal UD is 0). Reference scenario used. . . 103 A-4 Average power and UD performance per algorithm on the
Ref-erence dataset. Standard deviation for each metric is shown as one of the semiaxes of the ellipses. . . 103
A-5 Aggregated power delivered by each metaheuristic algorithm for the scalability test using 200 beams. Power is normalized with respect to the optimal aggregated power (optimal power is 1). Reference scenario used. . . 104 A-6 Aggregated UD achieved by each metaheuristic algorithm for
the scalability test using 200 beams. UD is normalized with respect to aggregated demand (optimal UD is 0). Reference scenario used. . . 105 A-7 Aggregated power delivered by each metaheuristic algorithm
for the scalability test using 400 beams. Power is normalized with respect to the optimal aggregated power (optimal power is 1). Reference scenario used. . . 105 A-8 Aggregated UD achieved by each metaheuristic algorithm for
the scalability test using 400 beams. UD is normalized with respect to aggregated demand (optimal UD is 0). Reference scenario used. . . 106 A-9 Aggregated power delivered by each metaheuristic algorithm
for the scalability test using 1000 beams. Power is normalized with respect to the optimal aggregated power (optimal power is 1). Reference scenario used. . . 106 A-10 Aggregated UD achieved by each metaheuristic algorithm for
the scalability test using 1000 beams. UD is normalized with respect to aggregated demand (optimal UD is 0). Reference scenario used. . . 107 A-11 Aggregated power delivered by each metaheuristic algorithm
for the scalability test using 2000 beams. Power is normalized with respect to the optimal aggregated power (optimal power is 1). Reference scenario used. . . 107
A-12 Aggregated UD achieved by each metaheuristic algorithm for the scalability test using 2000 beams. UD is normalized with respect to aggregated demand (optimal UD is 0). Reference scenario used. . . 108 A-13 Aggregated power delivered in a continuous execution using
the Sequential activation dataset. Power is normalized with respect to the optimal aggregated power. . . 109 A-14 Aggregated UD achieved in a continuous execution using the
Sequential activation dataset. UD is normalized with respect to the aggregated demand. . . 110 A-15 Aggregated power delivered in a continuous execution using
the Spurious dataset. Power is normalized with respect to the optimal aggregated power. . . 110 A-16 Aggregated UD achieved in a continuous execution using the
Spurious dataset. UD is normalized with respect to the ag-gregated demand. . . 111 A-17 Aggregated power delivered in a continuous execution using
the Non-stationary dataset. Power is normalized with respect to the optimal aggregated power. . . 111 A-18 Aggregated UD achieved in a continuous execution using the
Non-stationary dataset. UD is normalized with respect to the aggregated demand. . . 112
List of Tables
2.1 List of algorithms and their respective optimization metrics. 46
3.1 GA Parameters. . . 50
3.2 SA Parameters. . . 51
3.3 PSO Parameters. . . 53
3.4 DRL Parameters. . . 57
4.1 Link Budget Parameters. . . 66
5.1 Aggregated Power and UD results for each algorithm. Power and UD are normalized with respect to the optimal aggre-gated power and aggreaggre-gated demand, respectively. . . 74
5.2 Aggregated Power and UD results for each algorithm, using the Sequential Activation dataset. Power and UD are nor-malized with respect to the optimal aggregated power and aggregated demand, respectively. . . 83
5.3 Aggregated Power and UD results for each algorithm, using the Spurious dataset. Power and UD are normalized with respect to the optimal aggregated power and aggregated de-mand, respectively. . . 87
5.4 Aggregated Power and UD results for each algorithm, using the Non-stationary dataset. Power and UD are normalized with respect to the optimal aggregated power and aggregated demand, respectively. . . 90
Chapter 1
Introduction
1.1
Motivation
In the coming years, the competitiveness in the satellite communications market will be largely driven by the operators’ ability to automate part of their systems’ key processes, such as capacity management or telemetry analysis. Companies will rely on autonomous engines to make decisions over their operation policies in order to adapt to faster and larger changes in their customer pools, and to have a better management of their systems’ efficiency [10].
Two tendencies set up this new scenario: a shift from static communications payloads to highly-flexible payloads [2] and an increasing demand for data [48]. The former responds to the recent improvements in multibeam satellite technology, where the number of beams and individually-configurable parameters in orbit is growing exponentially – the power, frequency, routing, pointing direction, and sizing of each beam will be individually tunable in real time. The latter trend includes the growing necessity for data transmission through satellite links, since services such as guaranteeing connectivity in isolated regions or providing streaming capabilities in planes and ships are becoming more frequent – aeronautical connectivity grew by $400M and passenger aircraft retail revenues reached $1B in 2018; in-flight connectivity is expected to reach $36B in cumulative revenue by 2028 [47].
order to operate on tighter margins, as seen in Figure 1-1. When shifting from a nonflexible or slightly flexible system (blue in the figure) to a highly flexible resource allocation strategy (green in the figure), the same amount of data rate demand (red in the figure) can be served using less resources. These resource savings constitute an additional capacity that can be used to accommodate new users into the system. This shift will be key in order to be competitive in the new markets.
Figure 1-1: Data rate provided by a slightly flexible system (blue) and a highly flexible system (green) with respect to the requested data rate (red). The amount of resource savings corresponds to the area between the green and the blue curves.
On top of the flexibility and demand increase trends, an increase in hardware scalability also contributes to the complexity of future satellite systems. Multibeam technology is expected to improve, and constellations are assumed to be able to sustain hundreds to thousands of active spot beams simultaneously. Examples include SpaceX’s 4,425-satellite LEO constellation with up to 32 beams per satellite and SES’s O3b mPower MEO constellation, consisting of 7 satellites able to power thousands of beams each [32]. In both cases, Terabit-level capacity is expected to be offered [14][59]. This scenario entails that, when it comes to efficiently defining operation policies that exploit the flexibility of communications satellites, a larger number of decisions will be necessary – new flexible parameters and more beams – and these decisions will be reevaluated often – dynamically – due to increasing fluctuations in demand. This is
known as the Dynamic Resource Management (DRM) problem.
The DRM problem is a common trend across many different industries. Multiple studies emphasize the importance of finding efficient resource allocation policies in different domains, such as supply chain [72], vehicle-to-vehicle communications [68], car parking management [71], cloud computing resource assignation [67], or air transportation optimization [5]. The system scalability and the introduction of new hardware technologies that allow a better resource control has motivated the study and design of resource optimization algorithms that constitute the basis of new decision-making pipelines in these industries. The performance of these algorithms is therefore evaluated on two different levels: optimality and runtime. The former relates to the ability of allocating resources as efficiently as possible given a certain context. The latter directly affects the ability to repeat this resource allocation process at a higher frequency, and therefore adapt quicker to changes in the environment. If a company or institution uses an algorithm that decides a poor allocation strategy given a maximum computing time, this might have a substantial impact on the competitiveness of such company with respect to other companies using faster and/or better algorithms.
In the specific case of the DRM problem in multibeam satellites, while traditional approaches were based on static, human-controlled policies that relied on conservative operational margins, new systems will make use of dynamic algorithms capable of handling the increasing dimensionality and the rapidly-changing nature of the user demand [31]. Consequently, these systems must be prepared to make quick decisions on multiple parameters per beam across thousands of active spot beams. These decisions will be taken simultaneously and repeatedly, responding to every change in the constellation’s or satellite’s environment. This is especially relevant in the case of High Throughput Satellites (HTS), which are able to provide around 20 times the throughput of classic FSS satellites [45], and therefore usually serve more customers – a higher variation in demand.
Multiple algorithm families have already been proposed as possible alternatives to current human decision-making processes in the satellite communications industry. Specifically, in the recent years, the spotlight has been placed on Artificial Intelligence
(AI) algorithms given its success in other areas [1]. These algorithms range from evolutionary approaches to Machine Learning-based methods. While one can find DRM studies in the communications domain that emphasize the potential of each algorithm, there has been little effort to characterize and compare these methods under the same operationally-realistic premises. It is important to largely characterize the different algorithm alternatives to pick the one that shows the potential competitive advantage. In order to study their suitability for the upcoming resource allocation challenges, this Thesis tries to close this research-gap by comparing the most recent and popular AI algorithms in solving an instance of the DRM problem in the context of multibeam communications satellites.
1.2
General Objectives
The most important objective of this Thesis is to offer the reader insight on how different AI algorithms perform on a specific but generalizable DRM problem, based on a multibeam communications satellite context, that is high-dimensional and when runtime is short. This work aims to provide a complete comparison such that the key conclusions can be extrapolated to other problems in the same or different domains that share similar features such as high-dimensionality or near-real time control.
In order to compare a set of methods, first it is necessary to find and rigorously characterize the problem under which those are compared. Consequently, another objective of this Thesis is to identify and formulate a suitable DRM problem to be used as a comparative benchmark and provide a clear problem statement. As part of this process, important test scenarios that will help to quantify performance need to be identified.
Finally, this Thesis also provides details on how different AI algorithms are im-plemented and how parameters are selected in order to best adapt to the problem in question. Although parameter tuning is a process intimately linked to each specific problem, this work aims to contrast algorithms with respect to implementations found in literature and motivate the use of specific parameters or subroutines.
1.3
Literature Review
The problem of allocating resources for multibeam communication satellites, especially in an offline fashion, is a well-studied NP-hard [4] and non-convex [8] problem. Math-ematical Programming (MP), which includes well-known subdomains such Linear Programming, Integer Optimization, or Convex Optimization, has been a popular method for years to deal with highly-constrained instances of these resource allocation problems. Examples include the use of MP-based algorithms for beam layout optimiza-tion [6], for power allocaoptimiza-tion [66], for spectrum allocaoptimiza-tion [35], and even joint power and bandwidth allocation [39]. To deal with the complexities of the problem, these studies need to rely on relaxations such as piecewise linearization or coordinate descent optimization that guarantee convexity or solve the problem in an often suboptimal iterative fashion.
The increasing dimensionality present in the industry brings up new problems with more optimization variables. Consequently, in addition to the nature of the problem, scalability adds a new layer of complexity. The availability of computing resources is already critical to MP algorithms, but if the dimensionality of the problems increases, having MP algorithms as the core of these online and fast optimization tools is inefficient and impractical. As a reference, in [6] it is shown that, with a timeout of 600 seconds, a MP-based algorithm is able to find optimal beam layouts for 200 user terminals, but starts failing to do so when the number of user terminals reaches 370. The best layout shows a gap of 150% with respect to the best theoretical outcome for 800 user terminals. It is clear that it is difficult to envision this kind of algorithms working in near-real time.
AI is a potential solution to overcome issues like exponential computing cost or complex non-linear solution spaces. Metaheuristic algorithms [64] are a popular group of AI algorithms that have been thoroughly studied in the context of DRM subproblems for satellite communications. In [4] a metaheuristic algorithm known as Genetic Algorithm (GA) is used to dynamically allocate power in a 37-beam satellite and is compared to other metaheuristic approaches. A similar GA formulation is
extended to include bandwidth optimization in [50], which demonstrates the advantages of dynamic allocation based on two 37-beam and 61-beam (Viasat-1) scenarios. Joint power and carrier allocation is also proposed in [42], where a two-step heuristic optimization is carried out iteratively for a 84-beam use case. In time-dependant problems, beam task scheduling has been addressed using the GA, with positive results for a 20-beam system [33]; and on beam hopping problems, optimizing the illumination schedule for different time slots in a 100-beam satellite [3].
Other authors have opted for approaches based on the Simulated Annealing (SA) algorithm. In [8] a single-objective and discrete power and bandwidth allocation problem was formulated using SA and applied to a 200-beam case, with a focus on fairness. Fairness is a relevant topic that has also been considered in [66], where an iterative dual algorithm is proposed to optimally and fairly allocate power in a 4-beam setting. SA is very present on hybrid approaches, such as in [65], where it is combined with a GA to optimize channel assignment in a 64-cell use case, or in [56], where it is used in combination with a simple neural network. A prior hybrid optimization stage combining the GA and SA is also considered in [4]. SA is also proposed as one of the components of a constrained beam layout greedy optimization approach [7], being responsible for the reflector allocation decision-making process and validated with a 150-beam use case.
Besides population-based and annealing approaches, swarm-based methods have also been applied to the DRM problem in the context of satellite communications. In [18] the Particle Swarm Optimization (PSO) algorithm is applied to a 16-beam satellite in order to optimally allocate power. Then, in [49] a PSO-based power allocation algorithm is implemented and tested on 200-beam use cases. This last work also studies the improvements of adding a subsequent GA stage and forming a PSO-GA hybrid algorithm.
While the performance of the presented methods is compared to equal-allocation, heuristic-based, or stochastic approaches, authors only consider the offline performance when presenting their results and do not account for runtime thresholds that specific scenarios might impose. In other words, none of these studies show the results
of continuously applying the respective methods in a dynamic and time-changing environment where there is limited computing time and resources available, nor do they provide the performance metrics as aggregations over complete operation cycles. Therefore, the adequacy of these algorithms to online scenarios has yet to be fully tested and contrasted.
As a potential alternative for repeated uses of the optimization tool, some studies have recently focused on Machine Learning algorithms, specifically Deep Reinforcement Learning (DRL) architectures, which address the needs of fast online performances. In [22], authors use a DRL architecture to control a communication channel in real time considering multiple objectives. Alternatively, DRL has also been exposed to multibeam scenarios, as in [37], where a Deep Q-network was used to carry out channel allocation in a 37-beam scenario. Then, in [25] a continuous DRL architecture is used to allocate power in a 30-beam HTS, showing a 1,300-times speed increase with respect to a comparable GA approach. Finally, in [38] DRL is used to carry out beam-hopping tasks in a 10-beam and 37-cell scenario, showing a stable performance throughout a 24-hour test case.
Prior to the increase of research interest in the DRL field, authors had relied on neural networks as a way of creating hybrid algorithm architectures in combination with heuristic or metaheuristic algorithms. Two examples of this kind of unions include the use of multiple perceptrons in combination with convergence heuristics to carry out spectrum allocation in multisatellite systems [24], and the adoption of a recurrent neural network plus a GA to improve the same task [55]. Compared to these basic structures, DRL achieves an additional level of abstraction as it is able to improve the resource allocation performance with respect to its supervised learning counterparts. It raises little doubt why it has become an active field of research in the communications community [43].
The cited works show that DRL architectures have been shown to reach good solutions and be capable of exploiting the time and spatial dependencies of the DRM problem. However, most of the test cases focus on optimality and leave other important features such as robustness out of the studies. A lack of robustness in a
DRL architecture might lead to non-desirable allocations in cases in which the input does not match the average behaviour of the environment or the system has not been well-trained. The fact that these algorithms must go through a prior training stage raises the question of how a DRL training process should be designed in order to result in robust DRL architectures ready to be operable.
1.4
Specific Objectives
As seen in the previous section, there are multiple papers that address one or more dimensions of the DRM problem for multibeam satellite communications. In all cases, the studies focus on specific resources to optimize and allocate such as power, bandwidth, or beam placement. Designing a method or algorithm able to handle multiple of these optimization variables simultaneously and repeatedly is still a work in progress in the community.
The results obtained so far show that, apart from the lack of comparative analyses in most of the studies presented, two important modeling decisions that reflect the future of satellite communications are not adequately addressed: first, while the majority of papers highlight the adaptive nature of the algorithms, there are no results on their performance on a continuous execution in which a repeated use of the algorithms is necessary and computing time is a limiting factor. Second, a few of the studies explore the scalability of their approach. Those that do, fail to do so for use cases with more than 200 beams, and none explore systems in the range of thousands of beams. While the expected dimensionality of the problem lies in the range of hundreds to thousands of beams, the performance of the algorithms has mostly been assessed, on average, for scenarios with less than 100 beams.
This Thesis considers the four AI algorithms introduced in the literature review – GA, SA, PSO, and DRL – as a focus of the comparison. In addition, since hybrid algorithms have proved their usefulness for instances of resource allocation problems, this work also considers the two hybrids introduced in the Literature Review – SA-GA and PSO-SA-GA – alongside a novel implementation of a DRL-SA-GA hybrid. This
Thesis presents an implementation for each of these algorithms and analyzes their performance on the same use case and scenarios under a realistic operational context.
The power allocation problem is used as the use case of the comparison, given that it is one of the most studied resource allocation problems for satellite communications. To fully make it an instance of the DRM problem, this Thesis precisely considers the Dynamic Power Allocation (DPA) problem, with a control frequency of 3 minutes. In the context of this problem, the main specific objectives addressed by this Thesis are: 1. To provide a formulation of several of the most well-studied AI algorithms in
literature for solving the DPA problem.
2. To implement a novel DRL-GA hybrid approach as a baseline for the DPA problem.
3. To compare and contrast the performance of the considered algorithms under the same satellite models and on a set of different test scenarios to account for robustness analyses.
4. To characterize the online performance of the algorithms given computing time restrictions.
5. To provide scalability and robustness performance results.
This Thesis focuses on the DPA problem for simplicity and to reduce the amount of uncertainty around the results. Given the current state of the research in the field, the complexity of the full-scale DRM problem would add a noise layer when determining which algorithms work better, since finding a good representation for the complete problem is still a work in progress in the research community. Using the DPA problem as a baseline, the intended objective is to offer the reader a comprehensive and fair comparison of these methods in order to guide the algorithm downselection process for future instances, possibly more complex, of the DRM problem in the context of communication satellites and other fields.
If the reader is interested in finding a robust implementation for another DRM problem, the nature or structure of such problem should be carefully analyzed first.
Other DRM problems in the satellite communications domain, such as dynamic bandwidth allocation, or in other domains, such as CPU allocation in data centers, have a similar nature to that of the DPA problem and therefore might share several features, such as the constraints-to-variables ratio. Then, the structure of the algorithms could be mapped from one problem to the other without large modifications. Then, generally, the algorithm that showed a better performance on the DPA problem would potentially perform equally good with respect to the other algorithms. This would potentially reduce the cost of implementing multiple algorithms for the second problem, tuning, and characterizing them.
In cases in which the nature of the second DRM problem is not aligned with the DPA problem, the downselection process is not that immediate. For these situations, this Thesis aims to explain and understand why the different algorithms have a certain relative performance and to understand why each algorithm performs better or worse given the context and the input data. Those depend on the optimization procedure of every algorithm. It is then the intention of the author that these insights allow the reader to rank the algorithms in terms of the suitability to the second problem, although modifying some parts of the implementations or testing more than one algorithm might be necessary.
1.5
Thesis Overview
The remainder of this document is structured as follows: Chapter 2 explains the generic DRM problem concept, puts it in the context of satellite communications and describes the problem statement and the formulation used throughout the rest of this work; Chapter 3 presents the implementations for each of the nine algorithms studied in this Thesis alongside the main algorithm parameters to allow for reproducibility; Chapter 4 covers the satellite, data rate demand, and link budget models considered in the simulations and used to compute the performance metrics; Chapter 5 discusses the performance of the algorithms in terms of their convergence behavior, online operation results, scalability, and robustness against multiple scenarios; and finally Chapter 6
summarizes the work and outlines the conclusions and future research directions as a result of this Thesis.
Chapter 2
Dynamic Power Allocation in
Multibeam Satellites
2.1
Introduction
This chapter addresses the Dynamic Resource Management (DRM) problem in the context of multibeam communications satellites. Specifically, it outlines and focuses on the Dynamic Power Allocation (DPA) problem as a use case for the purpose of this Thesis.
To that end, Section 2.2 starts with a general overview of DRM problems and highlights their main challenges. Then, Section 2.3 puts the spotlight on satellite communications systems and introduces the peculiarities of the DRM problem in this context. To close the chapter, the minutiae of the DPA problem are presented in Section 2.4. These include a synopsis of the power allocation mechanisms in multibeam communications satellites, a detailed problem formulation with the assumptions considered, and an overview of the different objective functions that will be used throughout the rest of the Thesis.
2.2
Dynamic Resource Management
Resource Management problems consist in the division and assignment of a limited supply known as the resource to a set of users or systems that make use of it under certain restrictions. A unit or part of this resource can be exclusive, i.e. only one user can utilize it, or non-exclusive, when a finite or infinite number of users are able to exploit it. Exclusive examples include stock management in storage facilities or the cargo-vehicle assignment operations of a transportation company. On the other hand, spectrum allocation in frequency reuse systems and vehicle routing are examples of non-exclusive cases.
Generally, for every resource allocation decision, one incurs a cost and obtains a benefit from it. Then, the goal of the problem is to jointly maximize the total benefit and minimize the total cost, which can be defined in multiple ways. All decisions are taken according to this goal. For instance, a transportation company might want to maximize the revenues per unit of distance travelled and therefore vehicles and cargo would be paired according to this objective.
In all the examples presented, the users’ resource demands might actually vary over time due to temporal effects, such as seasonality or contingency scenarios. Re-source management problems are usually time-dependant and, consequently, multiple instances of the problem should be considered to account for these time variations. One alternative to be robust against time is to make the resource allocation decisions at one time instant but simultaneously fulfilling the resource demands at all subse-quent moments, assuming the future demand or an estimate of the maximum possible demand is perfectly known.
However, when the future demand is uncertain, there is a need to adapt to the time variations and constantly reconsider the resource allocation decisions made in past instances of the problem. This is known as the Dynamic Resource Management (DRM) problem, where time-dependency plays a relevant role. Usually, following these adapting strategies also leads to a better fulfillment of the goal compared to the worst-case estimations, by increasing the benefits and/or reducing the cost (e.g.
there is no need to rent the same storage space every week for a product that is highly seasonal on a yearly basis).
2.3
Multibeam Satellite Communications Systems
This Thesis focuses on the DRM problem in the context of satellite communications, where the goal is to successfully provide communication service to multiple users or customers on Earth by means of satellite coverage. At any given moment, a user has a certain throughput demand, which then changes over time. To satisfy that varying demand, a satellite has a pool of resources such as power and bandwidth. This section addresses that scenario and presents, first, the main features of communication satellites; and second, the specific nuances of the DRM problem in this background.
2.3.1
Overview
A communications satellite is a complex system that is primarily designed to provide communications service to user terminals through space-borne links. Services such as monitoring cargo ships in the middle of the ocean, providing streaming capabilities in airplanes, or internet connectivity in remote areas, rely on satellite communications. Generally, the users serviced by these systems have limited or no access at all to other communications infrastructure, such as terrestrial links.
A satellite system is composed by three different segments: the space segment, the ground segment, and the control segment. The first segment comprises all spacecraft and inter-satellite links involved in the communication process. These spacecraft are located in a determined orbit from a wide range of LEO, MEO, and GEO available orbits. The ground segment includes all user terminals and ground stations that participate in the information transmission process. Finally, the control segment consists of all stations that manage and monitor the satellites. The space and ground segments are connected through two types of links: the uplinks, which support the communication from Earth stations to spacecraft; and the downlinks, which do the same from spacecraft to Earth stations. A satellite can establish multiple uplinks and
downlinks simultaneously, each associated with a radio frequency-modulated carrier. A carrier is the signal transmitted to (downlink) or from (uplink) a specific user terminal.
A satellite is composed of the payload and the platform. The former comprises all the hardware and antennas that sustain the carrier transmission, while the latter consists of all the subsystems that allow the payload to operate (e.g. electric power supply, Telemetry, Tracking and Control (TT&C)). The payload incorporates one or multiple antennas, which supports the creation of one or multiple beams, each with the capacity to back multiple carriers. A beam represents a coverage area on Earth’s surface called footprint. A multibeam satellite comprises several – tens to thousands – of these beams to provide coverage to multiple different regions, as seen in Figure 2-1. The narrower a beam is, the better it can serve the terminals above its footprint, since it can better concentrate the transmitting power on a specific area. The communication effectiveness of a terminal also depends on the relative position of the user inside the beam’s footprint, since the closer to the center of the beam the user is, the less power the satellite needs to serve such user.
Figure 2-1: Multibeam satellite with 7 beams.
spectrum, the International Telecommunication Union (ITU) is responsible for creating regulations that states must adopt in their efforts to exploit space communications technology. Among these regulations, the ITU establishes the allowed frequency bands for the uplinks and downlinks of different radiocommunications services.
2.3.2
Dynamic Resource Management in Multibeam Satellites
A satellite user requires a service from the satellite in the form of a specific data rate or throughput. To successfully serve all of its users, i.e. provide at least the required throughput to each user, a multibeam communications satellite has multiple available resources, which are mostly defined by the hardware architecture of its payload and the regulatory restrictions. In addition to the beams, which constitute individual resources per se, a satellite also possesses spectrum and power resources which have to be shared among all active beams. Before a satellite can transmit information to a user, satellite operators need to allocate enough of these power and spectrum resources to the beam that covers the user.
Given the location of multiple users on Earth, a multibeam satellite generates a certain number of spot beams such that the union of all of their footprints covers all the locations of interest. Operators must carefully decide how many beams are required in the process. In the case that all beams are equal in size and shape, operators deal with a minimum coverage problem, which imposes a lower bound on the number of required beams. An appropriate maximum number of beams can also be estimated this way, since placing too many beams can lead to interference problems during operations. If the payload allows flexibility in the size and shape of the beams, those can be exploited to create better coverage configurations with a wider range in the possible number of beams (e.g. one architecture might have a large number of narrow beams while a different one might completely cover users with a few wide beams).
After choosing an appropriate coverage configuration, operators need to address the frequency plan decisions: how much bandwidth and which part of the spectrum each beam should use. A satellite has an available spectrum pool, limited by ITU restrictions, to split among its beams. While ideally one would want to maximize
bandwidth usage, an excessive allocation of bandwidth might lead to interference and information loss problems. Additionally, some systems incorporate frequency reuse mechanisms in their payloads, thus increasing the ability to exploit and make full use of the available spectrum.
Once a specific beam has been placed at a location with one or more users and a sufficient amount of bandwidth has been assigned to it, the payload needs to provide enough power such that the received data rate at the user’s terminal is equal or greater than the one requested. At the end of this process, one can realize four different satellite resources are involved in the user-beam communication process: the beam’s position relative to the user, the shape of the beam, the frequency assignment to that beam, and finally the power allocated to initiate the transmission.
Despite the user demand being time dependant (e.g. users might not require the same amount of data rate during the morning and the evening) in the past most of these decisions were made only once, generally before launching the satellite, since payloads were fixed-by-design or allowed very little flexibility and therefore the resource management problem was only addressed a single time (the payload was then configured to accommodate the decisions made). User demand was known in advance and operators could account for the worst case demand requirements (peak demand) to efficiently make the resource decisions.
However, in the recent years, with the introduction of modern digital payloads, satellites have seen a significant increase in their flexibility. It is expected that in the next generation of multibeam satellites, in addition to an increase in the number of active spot beams being sustained simultaneously (from tens to thousands), the beam placement, frequency, and power resources will allow for reallocation in real-time. In addition, the satellite communications market is seeing an increase in the number of users, especially those in the data segment, which is expected to double by 2025 [48].
Thanks to the new payload flexibility, operators no longer need to account for the worst-case scenario, which limits their Service Level Agreements (SLA) when designing a payload and its resource management policies. Instead, they are able to reconfigure the payload in orbit and adapt to the changing user pool and user needs, resulting in a
more efficient use of their systems. However, this adaptation comes at the expense of a continuous recomputing of the payload’s resource allocation decisions. Operators now face the problem of making and reconsidering their allocation decisions at a chosen time frequency in order to maximize the efficiency of their systems. This is known as the DRM problem for multibeam satellite systems [31]. For instance, one might want to reduce the transmitted power for beams covering users in nighttime longitudes, since the throughput demand is reduced. By minimizing the use of resources to satisfy the demand needs, operators can successfully accommodate new demand requests in their user pools and thus the increase in the number of users can be well-handled without the need to launch extra satellites.
In the past, making resource allocation decisions was straightforward, since the numbers of variables was small due to the limited flexibility and the number of active spot beams. With the addition of a large set of tunable variables per beam, the optimization problem involved no longer has an immediate solution. The complexity of the problem has three dimensions:
∙ NP-hardness: Since it is not guaranteed that the DRM problem can be solved in polynomial time [4], exhaustive search methods that explore the whole solution space require massive amounts of computing resources, especially when dimensionality is high.
∙ Non-convexity: Given the relationship between the optimization variables, the problem is not convex [8] and therefore classic convex optimization methods and solvers can not be used in this context without involving relaxations that lead to suboptimalities. Nevertheless, some specific subproblems of the DRM problem in the context of satellite communications do have convex formulations, although this Thesis focuses on the need to address the non-convex nature of the whole problem.
∙ Multiobjectiveness: The problem not only consists on serving all users satis-factorily, but also minimizing the amount of resources used when doing so. At the same time, this minimization task can be framed in many ways:
minimiz-ing the number of beams used, minimizminimiz-ing spectrum needs, or reducminimiz-ing power consumption.
On top of these challenges, high-dimensionality and real-time operation constraints add an extra computational burden. Being able to solve the problem in near real-time ensures the maximization of efficiency, but involves an additional use of computing resources due to the large number of optimization variables, mainly as a consequence of the large number of beams modern payloads are able to sustain.
2.3.3
Artificial Intelligence for the DRM problem in satellite
communications
Given the nature of the DRM problem for satellite communications, it is necessary that the resource allocation decisions are transferred to autonomous or semi-autonomous systems that are capable of handling the task in near-real time using powerful DRM algorithms, as opposed to manual or simple rule-based approaches that need to rely on additional resource usage to guarantee user service and/or are hard to deploy in near-real time systems. Still, finding an appropriate algorithm is a challenging task itself, since most of the well-established optimization techniques can be hard to implement due to the reasons exposed in the previous section. As a consequence, a relevant amount of research efforts have been targeting Artificial Intelligence (AI) methods as a potential solution to overcome these issues.
AI is a large field that involves ideas from philosophy, mathematics, neuroscience, decision theory, computer science, and probability. It focuses on the design and implementation of systems that can “think” and act rationally, without the need of a constant human supervision [54]. AI comprises a wide range of disciplines including logic-based decision-making, Probabilistic Reasoning, and Machine Learning (ML). Throughout the years, AI has proved its usefulness in numerous real-world domains as well as in advanced computer simulations. This supports the potential of AI-based algorithms to solve the DRM problem for satellite communications. AI algorithms could provide a solution to the optimization problem which is then translated into a
specific resource allocation setting.
As discussed in Section 1.3, most of the recent work on AI-based solutions for the DRM problem focuses on two algorithm categories:
∙ Metaheuristics: Optimization algorithms that consist of the iterative im-provement of a non-optimal solution or set of solutions by means of heuristics. Examples of these include Genetic Algorithms, Ant Colony Optimization, or Simulated Annealing. Given an instance of a DRM problem, these algorithms produce an initial “bad” solution or set of solutions and improve them iteratively, obtaining a close-to-optimal solution after a certain number of iterations. ∙ Machine Learning: Inference-based algorithms that, after a training stage
involving the use of known or “seen” data, perform a specific task without the need of human supervision. Neural Networks or Support Vector Machines are examples of algorithms that belong in this category. These algorithms would process several instances of resource allocation optimization problems, alongside their solutions, and “learn” an underlying optimization function. Then, when faced with an unseen problem instance, the algorithms would be able to provide a solution following their learned function.
On one hand, metaheuristic algorithms have been widely applied to DRM problems mostly involving the optimization of power and/or frequency variables. Examples include the use of Genetic Algorithms [4][50], Particle Swarm Optimization [18][49], and Simulated Annealing [8]. These algorithms have been shown to reach close-to-optimal solutions, but the majority of test cases have not accounted for dimensionality – almost all the studies use satellite simulations with less than a hundred beams. The need to achieve near real-time performance has not been addressed by these studies either. The contribution of these algorithms in a real satellite operation scenario, or in a DRM problem from another domain, remains unclear.
On the other hand, since most of the algorithms involve the use of linear ap-proximators known as neural networks, ML algorithms are able to operate in near real-time. However, there are little dimensionality studies that open the doors to their
applicability in hundred or thousand-beams cases. In addition, these algorithms are greatly affected by the characteristics of the training data, and therefore have the risk of performing poorly when the input data diverges from the nature of the training dataset. This uncertainty in their robustness capabilities needs to be further addressed in order to characterize and approve them for a real DRM problem in the satellite communications context.
Although AI algorithms are a promising solution to be part of DRM autonomous engines that operate in near real-time, there is still little understanding of the implica-tions each of the methods studied in the literature would have in a real deployment. Furthermore, the results evidence that operators might need to consider different algorithms and make decisions over which algorithm or group of algorithms to use in each context. Without a comprehensive and in-depth comparison of the capabilities of each of these methods, the selection and implementation of an algorithm involves excessive amount of trial and error, slowing down the progress in the field. This problem is shared across multiple domains in which DRM is a central issue.
While the complete DRM problem for multibeam satellite communications has not been solved – there is currently no algorithm implementation that can handle all resource flexibilities in near real-time – this Thesis tries to offer a complete comparison procedure that helps downselecting to the set of most promising algorithms. To that end, the Dynamic Power Allocation problem (DPA) is chosen as a supporting example throughout the rest of this Thesis. Since the optimality criteria is clear for this specific problem, i.e. there is a good understanding of the tradeoffs between the different optimization variables, the analyses can focus on the specific benchmarking tasks across the different algorithms considered. In the following section, the specifics of the DPA problem formulation are presented.
2.4
Dynamic Power Allocation Problem
The power allocation problem in multibeam communications satellites consists of, given a data rate demand for every beam and a fixed setting to the rest of resources,
choosing how much power should be allocated to each beam. The goal of the operator is to carry out this task using a policy that allows serving every user while minimizing the use of the power of the system. This way, the power efficiency is maximized and the operator can serve a larger number of new users when following this policy.
The particularity of the Dynamic Power Allocation problem is the need to carry out the power allocation task in a continued manner, constantly updating the time-dependant demand requirements of the users and accordingly adapting the power allocation mechanisms of the system. This section first describes the equations that govern the power transmission subsystems and then focuses on the specific problem statement that will be followed for the rest of this Thesis.
2.4.1
Power Allocation and Transmission
Given an active spot beam with a fixed position and shape, and with a predefined frequency band to transmit in, the allocation of a certain power level to that beam entails that a certain data rate will be provided to a user located on its footprint. There are multiple elements that are involved in and affect this process, all of them defined in what is known as the link budget. This term comprises the set of equations that govern a communication link between a transmitter and a receiver. For the specific case of a downlink, the beam’s antenna is the transmitter and the user terminal the receiver. The link budget equations allow to understand which is the data rate at the receiver 𝑅𝑏 given a certain amount of power 𝑃𝑇 𝑋 allocated at the transmitter.
While this section covers the basics, an in-depth explanation on the details of the link budget elements can be found in [44]. The link’s carrier-to-noise ratio, 𝐶/𝑁0,
expresses the ratio of the carrier power at the receiver over the noise power spectral density at the receiver and is defined as
𝐶 𝑁0
= 𝑃𝑇 𝑋 − OBO + 𝐺𝑇 𝑋+ 𝐺𝑅𝑋− 𝐿 − 10 · log10(𝑘 · 𝑇𝑠𝑦𝑠) [dB] (2.1)
where OBO is the power-amplifier output back-off, 𝐺𝑇 𝑋 and 𝐺𝑅𝑋 are the transmitting
the system’s temperature. 𝐿 represents the sum of all the losses involved in the communication process, being
𝐿 = FSPL + 𝐿𝑎𝑡𝑚+ 𝐿𝑅𝐹𝑇 𝑋 + 𝐿𝑅𝐹𝑅𝑋 [dB] (2.2)
where FSPL indicate the free-space path losses, 𝐿𝑎𝑡𝑚the atmospheric losses, and 𝐿𝑅𝐹𝑇 𝑋
and 𝐿𝑅𝐹𝑅𝑋 the transmitting and receiving radiofrequency chain losses, respectively.
The system temperature 𝑇𝑠𝑦𝑠 (K) from Eq. (2.1) is computed using the Friis
formula
𝑇𝑠𝑦𝑠 = 𝑇𝑎𝑛𝑡· 10−𝐿𝑅𝐹𝑅𝑋/10+ 𝑇𝑎𝑡𝑚· 10(𝐿𝑅𝐹𝑅𝑋+𝐿𝑎𝑡𝑚)/10+ 𝑇𝑤 · (1 − 10−𝐿𝑅𝐹𝑅𝑋/10) (2.3)
where 𝑇𝑎𝑛𝑡 is the receiving antenna temperature, 𝑇𝑎𝑡𝑚 is the atmospheric temperature,
and 𝑇𝑤 is the waveguide temperature.
Next, considering interference sources that add to the transmitted signal, the carrier-to-noise-plus-interference ratio 𝐶/(𝑁0+ 𝐼) is computed as
𝐶 𝑁0+ 𝐼 = (︂ 1 𝐶𝐴𝐵𝐼 + 1 𝐶𝐴𝑆𝐼 + 1 𝐶𝑋𝑃 𝐼 + 1 𝐶3𝐼𝑀 + 1 𝐶/𝑁0 )︂−1 (2.4)
where CABI is the Carrier to Adjacent Beam Interference, CASI is the Carrier to Adjacent Satellites Interference, CXPI is the Carrier to cross Polarization Interference, and C3IM is the Carrier to third order Inter-Modulation products interference. Since the use case and scenarios considered in the following chapters assume interference minimization (𝐼 ≃ 0) no further details on interference sources are provided in this section.
With the carrier-to-noise-plus-interference ratio, the bit-energy-to-noise-plus- in-terference ratio, 𝐸𝑏/(𝑁 + 𝐼), is defined as
𝐸𝑏 𝑁 + 𝐼 = 𝐶 𝑁0+ 𝐼 · 𝐵𝑊 𝑅𝑏 (2.5)
to the link. At the same time, the data rate can be computed as 𝑅𝑏 = 𝐵𝑊 1 + 𝛼𝑟 · Γ (︂ 𝐸𝑏 𝑁 + 𝐼 )︂ (2.6)
where 𝛼𝑟 is the roll-off factor and Γ is a parametric function that represents the
spectral efficiency of the modulation and coding scheme (MODCOD) (bps/Hz), given the value of 𝐸𝑏/(𝑁 + 𝐼) itself.
2.4.2
Problem Statement
This Thesis considers a multibeam High Throughput GEO Satellite with 𝑁𝑏
non-steerable beams. These beams are already pointed to a location and have a defined shape. In addition, each beam is allocated a certain amount of spectrum beforehand such that interference among beams is minimized and can be ignored. A sequence of timesteps {1, ..., 𝑇 } represents all instants in which the satellite is requested a certain throughput demand per beam, i.e. there needs to be a power allocation decision per beam. The goal of the problem is to, at every timestep, allocate a sufficient amount of power to each beam in order to satisfy the demand and constraints imposed by the system while minimizing resource consumption.
At a given timestep 𝑡, the demand requested at beam 𝑏 is represented by 𝐷𝑏,𝑡.
Likewise, the power allocated to beam 𝑏 at timestep 𝑡 and the data rate attained when doing so are denoted as 𝑃𝑏,𝑡 and 𝑅𝑏,𝑡, respectively (the notation 𝑃𝑏,𝑡 is used, instead of
the notation from Section 2.4.1, to refer to the transmitting power 𝑃𝑇 𝑋 of beam 𝑏 at
timestep 𝑡). As seen in the previous section, there is an explicit dependency between the data rate achieved and the power allocated to a particular beam.
It is assumed that the satellite, at any moment, has a total available power of 𝑃𝑡𝑜𝑡. Similarly, every beam 𝑏 has a maximum power constraint, denoted by 𝑃𝑏𝑚𝑎𝑥.
Apart from maximum total and individual beam power constraints, some payloads are further limited by some of their subsystems, such as power amplifiers. This work assumes the satellite is equipped with 𝑁𝑎 power amplifiers, with 𝑁𝑎 ≤ 𝑁𝑏. Every
more than one amplifier. These connections are given and can not be changed during operation. The amplifiers also impose a maximum power constraint: the sum of the power allocated to a group of beams connected to amplifier 𝑎 can not exceed a certain amount 𝑃𝑚𝑎𝑥
𝑎 .
Taking all the constraints into account, the problem is formulated as follows
min 𝑃𝑏,𝑡 𝑇 ∑︁ 𝑡=1 𝑓 (𝒫𝑡, 𝒟𝑡) (2.7) s.t. 𝑃𝑏,𝑡 ≤ 𝑃𝑏𝑚𝑎𝑥, ∀𝑏 ∈ ℬ, ∀𝑡 ∈ {1, ..., 𝑇 } (2.8) 𝑁𝑏 ∑︁ 𝑏=1 𝑃𝑏,𝑡≤ 𝑃𝑡𝑜𝑡, ∀𝑡 ∈ {1, ..., 𝑇 } (2.9) ∑︁ 𝑏∈𝑎 𝑃𝑏,𝑡≤ 𝑃𝑎𝑚𝑎𝑥, ∀𝑎 ∈ 𝒜, ∀𝑡 ∈ {1, ..., 𝑇 } (2.10) 𝛾𝑚(𝑅𝑏,𝑡) ≥ 𝛾𝑀, ∀𝑏 ∈ ℬ, ∀𝑡 ∈ {1, ..., 𝑇 } (2.11) 𝑃𝑏,𝑡 ≥ 0, ∀𝑏 ∈ ℬ, ∀𝑡 ∈ {1, ..., 𝑇 } (2.12)
where ℬ and 𝒜 represent the set of beams and amplifiers of the satellite, respectively; and 𝒟𝑡, 𝒫𝑡, and ℛ𝑡 denote the set of throughput demand, power allocated, and data
rate attained per beam at timestep 𝑡, respectively.
This formulation includes the constraints presented so far: on one hand, constraints (2.8) and (2.12) represent the upper and lower bounds of the power for each beam in ℬ at any given timestep, respectively. On the other hand, constraints (2.9) and (2.10) express the limitations imposed by the satellite’s and amplifiers’ maximum power, respectively. Then, constraint (2.11) refers to the need of achieving a certain link-margin per beam greater than 𝛾𝑀, which depends on the data rate attained, as will
be described in Section 4.4. Finally, the objective (2.7) is a function of the requested data rate and the power allocated that reflects the goal of the problem: successfully serving all users while minimizing the power resource consumption. Multiple specific objective formulations can be considered to represent this goal, this Thesis focuses on three different metrics extracted from literature, which constitute the objective functions of the algorithms compared.
2.4.3
Objective Metrics
Regarding the objective (2.7), 𝑓 (𝒫𝑡, 𝒟𝑡), different metrics have been used in different
works on power allocation algorithms for multibeam satellites. Three different objective metrics are chosen in this Thesis, all of them based on the power allocation at a given timestep 𝑡. First, the Total Power (TP) is introduced as a measure of resource consumption. This metric simply sums the power allocated to every beam, specifically,
𝑇 𝑃𝑡 = 𝑁𝑏
∑︁
𝑏=1
𝑃𝑏,𝑡 (2.13)
The second metric is, the Unmet Demand (UD), defined as the fraction of the demand that is not served given the power allocation. This metric has already been used in numerous DRM studies [4][25][50][53] and is formulated as
𝑈 𝐷𝑡= 𝑁𝑏
∑︁
𝑏=1
max[𝐷𝑏,𝑡− 𝑅𝑏,𝑡(𝑃𝑏,𝑡), 0] (2.14)
Finally, the third metric is the Satisfaction-Gap Measure (SGM), which is used as the objective function in [8]. The SGM is based on the demand and data rate per beam (𝐷𝑏,𝑡 and 𝑅𝑏,𝑡, respectively) and, following a series of transformations, measures the
mismatch with respect to an ideal allocation case, ensuring fairness among beams. The SGM takes values in the interval [0, 1], one indicating the best possible performance. A detailed description of how this metric is computed can be found in Appendix B.1. Following the formulation from Eq. (2.7), the minimization of the negative SGM is required.
For each algorithm, the metric or metrics – as objective functions – that show a better convergence towards an optimal solution are used. Table 2.1 summarizes the relationship between the algorithms and the metrics used. While GA and PSO allow for multi-objective implementations, SA and DRL are generally single-objective approaches and therefore only one metric can be used. Specifically, in the case of DRL a linear combination of the TP and UD is considered, as explained in Section 3.3.
Table 2.1: List of algorithms and their respective optimization metrics. Algorithm Optimization type Metric(s)
GA Multi-objective TP and UD
SA Single objective SGM
PSO Multi-objective TP and UD
Chapter 3
Algorithm Implementations
3.1
Introduction
After presenting the details of the DRM problem for satellite communications and the precise DPA problem formulation that constitutes the test case of this Thesis, this Chapter covers the implementation minutiae of the algorithms compared.
First, Section 3.2 presents the three metaheuristic algorithms (GA, SA, and PSO) individually. Then, Section 3.3 introduces the main features of the DRL architecture used and the considerations involving the use of two different types of neural networks, which constitute two separate algorithms for comparison purposes. Finally, Section 3.4 explains the concepts behind each of the hybrid algorithms considered, being SA-GA, PSO-GA, and the two hybrid versions of DRL-GA (one with each neural network).
3.2
Metaheuristics
Metaheuristic algorithms are a class of optimization algorithms that prove to be useful for optimization problems with non-linear, complex search spaces, especially when time plays a relevant role [29]. While achieving optimality is generally hard for these algorithms, they provide “good enough” solutions in admissible amounts of time. There is a wide range of metaheuristic algorithms (e.g., nature-inspired vs. non-nature inspired, population-based versus single-point search [64]).
This Thesis considers three of the most popular metaheuristic algorithms, which have already been applied to the power allocation problem. These are the Genetic Algorithm (GA), the Simulated Annealing (SA) method, and the Particle Swarm Optimization (PSO) approach. The GA and PSO are population-based algorithms while SA is a single-point search method. All three algorithms are nature-inspired.
3.2.1
Genetic Algorithm
A GA [36] is a population-based metaheuristic optimization algorithm inspired by the natural evolution of the species. The algorithm operates with a set of solutions to the optimization problem known as a population. Each single solution is called an individual. Iteratively, the procedure combines different solutions (crossing) to generate new individuals and then selects the fittest ones (in terms of the goodness of the solution with respect to the objective function of the problem) to keep a constant population size. On top of that, some individuals might undergo mutations, thus changing the actual solution associated with it. This is done to avoid focusing only on one area of the search space and keep exploring potentially better zones. After a certain number of iterations, when the algorithm execution ends, the fittest individual is chosen as the best solution to the problem.
In the implementation chosen for the comparison, in the context of a GA exe-cution at timestep 𝑡, an individual 𝑥𝑘,𝑡 is defined as an array of power allocations
{𝑃𝑘
1,𝑡, 𝑃2,𝑡𝑘 , ..., 𝑃𝑁𝑘𝑏,𝑡}, for 𝑘 ≥ 1. To fulfill constraints (2.8) and (2.12), these values are
always kept between 0 and 𝑃𝑚𝑎𝑥
𝑏 for each beam 𝑏, respectively. The population is then
composed by 𝑁𝑝 individuals. When generating new individuals by the procedures
described above, these are denoted as 𝑥𝑁𝑝+1,𝑡, 𝑥𝑁𝑝+2,𝑡, and so on.
Depending on the case, the population is initialized randomly or the final population from the previous timestep execution is used as initial population for the subsequent timestep execution, i.e.