In general ELU > leaky ReLU(and its variants) > ReLU > tanh > logistic. If you care a lot about runtime performance, then you may prefer leaky ReLUs over ELUs. If you don't want to tweak yet another hyperparameter, you may just use the default $\alpha$ value suggested earlier(0.01 for the leaky ReLU, and 1 for ELU). If you have spare time and computing power, you can use cross-validation to evaluate other activation functions, in particular RReLU if your network is overfitting, or PReLU if you have a huge training set.
\begin{equation}
ReLU(z) = max(0, z)
\end{equation}
tf.nn.relu
import matplotlib.pyplot as plt import numpy as np def relu(z): return np.maximum(0, z) z = np.linspace(-5, 5, 200) plt.plot(z, relu(z), "r--", linewidth=2) props = dict(facecolor='black', shrink=0.1) plt.annotate('ReLU', xytext=(-3.5, 0.5), xy=(-5, 0.1), arrowprops=props, fontsize=14, ha="center") plt.title("ReLU activation function", fontsize=14) plt.plot([-5, 5], [0, 0], 'k-') plt.plot([0, 0], [-0.5, 4.2], 'k-') plt.grid(True) plt.axis([-5, 5, -0.5, 4.2]) plt.tight_layout() plt.show()