TEST mathjax

这里是第一个公式 $ F = ma^2 $

\[ \text{Reinforcement Learning} \doteq \pi_* \\ \quad \updownarrow \\ \pi_* \doteq \{ \pi(s) \}, \ s \in \mathcal{S} \\ \quad \updownarrow \\ \begin{cases} \pi(s) = \underset{a}{argmax} \ v_{\pi}(s' | s, a), \ s' \in S(s), \quad \text{or} \\ \pi(s) = \underset{a}{argmax} \ q_{\pi}(s, a) \\ \end{cases} \\ \quad \updownarrow \\ \begin{cases} v_*(s), \quad \text{or} \\ q_*(s, a) \\ \end{cases} \\ \quad \updownarrow \\ \text{approximation cases:} \\ \begin{cases} \hat{v}(s, \theta) \doteq \theta^T \phi(s), \quad \text{state value function} \\ \hat{q}(s, a, \theta) \doteq \theta^T \phi(s, a), \quad \text{action value function} \\ \end{cases} \\ where \\ \theta \text{ - value function;s weight vector} \\ \]

2022-12-23
2020-04-16
2022-12-23
2022-12-23
2021-11-12
2021-08-19
2021-09-08
2021-10-14