Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinforcement Learning Approach(点击可见原文)

 

论文要解决的问题

用户中心式的虚拟小区 (VC, virtual cell) 中,考虑V2I通信,在保证可靠性、数据速率、用户公平性的前提下,通过资源分配最大化能效 (EE, energy-efficient) 。

使用分布式多主体强化学习加以解决。

通信场景

【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor

用U表示VU (vehicle user, 车辆用户) 的集合,用A表示AP的集合,用B表示边缘服务器的集合。边缘服务器与云端相连,其无线资源有限,用W_l Hz表示。在云端服务器可获得完整的CSI信息,其可以调度AP的波束赋形的权重。

为每个用户规划单独的VC提供服务,VU-AP的连接关系用下式表示

【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor

V2I通信模型

考虑多入单出的通信模型,车辆单天线,AP有【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor个天线。

一个time slot内,信道服从 quasi-static flat fading 。

VU 【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor 和AP之间的信道表示为:【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor

VU 【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor 和 AP 【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor (多天线) 的信道响应:【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor

大尺度衰落、log-Normal 阴影衰落、快衰落:【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor

AP 【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor 到 VU 【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor 的波束赋形权重:【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor

AP a_j传输的信号:【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor,x为单位能量信号。

下行接收信号: 【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor

用户中心式小区构成

VU的可达速率:

【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor

其中【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor为SINR。

到VU 【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor 的下行链路损耗:【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor

EE计算:【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor

构建联合优化问题:

【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor

【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor

寻找VU-AP连接关系、波束赋形权重

约束于:7b:确保每个VC内的AP数大于一; 7c:确保SINR大于阈值;7d:总功率有限; 7e:用户连接是bool量。

为便于求解,将AP的传输功率分成 K 个离散数值。

波束赋形向量表示为:

【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor

动作空间大小为:【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor

我们使用QL来解决上述优化问题。

解法:强化学习

状态空间为【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor,分别表示VU的位置,AP的位置,链路的CSI。

动作空间:VU-AP链接,波束赋形向量【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor

奖励函数:【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor

为确保公平性,设置如下限制,由此可保证agent不会选择使任何用户的SINR小于SINR阈值的动作:

【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor 

 SARL(Single Agent Reinforcement Learning)

QL公式如下所示:

【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor

 在状态和动作空间过大时,SARL很难处理。此外,当环境的状态有限时,通常用某种方法进行近似,但对于每个近似状态,agent仍需要从动作空间中选取合适的动作。为此,使用D-MARL算法改进

Distributed Multi-Agent RL (D-MARL)

通常来讲,MARL通过控制每个agent对于各自的状态选择动作,进而缩小每个agent的动作空间。

对于场景中的N个agent,Q-table的维度变为【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor,动作空间大小变为【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor

 此外,使用集中式向量Q存储每个状态的宏观最佳动作:

【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor

 总体算法如下:

【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor

 实验设计

如下所示:

【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor 

文献[14]:“3rd Generation Partnership Project; Technical Specification Group Radio Access Network; Study on LTE-based V2X Services,” 3GPP TR 36.885 V14.0.0, Release 14, Jun. 2016.

 场景如下,设置VU并排行驶

【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor
通信场景(来自3GPP TR 36.885 V14.0.0, Release 14, Jun. 2016)

 对照组设置为:1.遍历所有可能的动作以获取最大reward;2.SARL;3.MARL(作为state of art);4.功率平均分配;5.随机功率分配。

性能表现如下:

【文献阅读01】Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinfor

 结论:

  1. 与两种目前较先进的MARL算法相比,本算法只需要1/4的episode即可达到相似性能
  2. SINR阈值的影响:增大SINR阈值意味着对于agent的要求变大,随着SINR阈值增大,因此各方案的传输成功率均减小,但D-MARL和MARL的差距更加明显,在此实验中4 5方案效果格外差。
  3. 覆盖范围的影响:随着小区半径增加,AP可服务更多VU,这将增加单个小区内的用户速率和;此外,用户和AP距离增加说明AP需要发送更大功率以到达边缘的VU,这强迫agent找到最佳功率分配。因此小区范围增加时,D-MARL对应的EE更佳(但没有SARL好)
  4. 用户公平性的影响:公平是指AP对于所有用户发以相同的速率发送数据。我们用 ( 9 ) 式表示对于公平性考虑。笨方法的公平性为0。99915,遍历法、SARL、MARL分别为0.99915, 0.99915,0.99899。

 

 

 

相关文章:

  • 2021-07-21
  • 2021-10-20
  • 2021-12-22
  • 2021-06-01
  • 2021-10-31
  • 2021-06-12
  • 2021-04-28
  • 2021-09-15
猜你喜欢
  • 2021-09-28
  • 2021-04-11
  • 2021-07-22
  • 2022-01-05
  • 2021-07-25
  • 2021-09-08
相关资源
相似解决方案