skywalking是一款国产的开源的链路追踪软件,那么链路追踪、监控系统、日志系统的区别是什么呢。本质上链路追踪也算是一种监控,而链路追踪跟监控系统都是日志。
skywalking中文文档: https://skyapm.github.io/document-cn-translation-of-skywalking/zh/8.0.0/
与日常监控不同的是我们对监控得出的结果处理可以更主动。以prometheus为例,prometheus收集了数据在grafana上展出出来,并且按制定的规则报警,但是我们一般不会主动去看prometheus的线图然后得出哪里哪里马上要出问题了,我们得提前处理,都是它报警了我去看下情况,然后再去看看日志,根据经验,进行处理以及后续的优化。在常规运维中,这是一个被动的行为,可以理解为“亡羊补牢”。
而链路追踪软件在启用后,就可以看到哪个调用链用得频率高,哪个函数方法执行的慢,跟XXX的连接延时比较大,此时就可以根据实际排期进行更高性价比的调整优化,此时业务并没有出问题,可能就是稍慢一点。当然了,也会出现某个业务使用过程中慢,才要对此进行分析的,这个行为可以理解成普通的被动监控了。不过在在常规运维中,我们对链路追踪的期望是前者,这是一个主动的行为,可以理解为“未雨绸缪”。
那么日志系统呢?日志系统收集了很多日志,而监控跟链路追踪其实是对自己所需要的日志进行了收集及聚合处理后得出了自己所需要的数值、目标等等,最后进行了不同的展示。所以日志系统是最底层的东西,监控报警我只看线条没有用,我得去看当时的日志,到底系统、业务是因为什么才波动了;链路追踪也一样,函数运行的慢,那我去看这个函数的处理逻辑,处理流程都经历了什么才能去调优。
目前,APM中skywalking与pinpoint是实现了对代码完全无任何侵入,这样比较符合运维人员的想法,毕竟Zipkin类的对代码侵入了,那么那就需要有风险担责,这个业务运行时的锅我们还是不要轻易背。具体的对比大家可以看https://www.jianshu.com/p/626cae6c0522 这篇文章。
我们使用k8s内运行的方式来安装skywalking,官方指引是用helm安装,这边笔者已经将yaml导出并进行修改调整
elasticsearch:skywalking可以对接的后端很多:https://skyapm.github.io/document-cn-translation-of-skywalking/zh/8.0.0/setup/backend/backend-storage.html,当然了你的elasticsearch不用跑在容器里,所以这是一个非必要操作,如果跑在容器里记得要分配对应的存储进行持久化。下面这个文件在只有一个节点时重启后会起不来,因为他无法变成green状态不符合健康检查,所以在单独测试时将健康检查的那段注释掉即可。
apiVersion: v1 kind: Service metadata: name: skywalking-elasticsearch namespace: default labels: app: skywalking-elasticsearch spec: ports: - name: http port: 9200 protocol: TCP targetPort: 9200 - name: transport port: 9300 protocol: TCP targetPort: 9300 selector: app: skywalking-elasticsearch --- apiVersion: v1 kind: Service metadata: name: skywalking-elasticsearch-headless namespace: default labels: app: skywalking-elasticsearch spec: clusterIP: None publishNotReadyAddresses: true ports: - name: http port: 9200 protocol: TCP targetPort: 9200 - name: transport port: 9300 protocol: TCP targetPort: 9300 selector: app: skywalking-elasticsearch --- apiVersion: apps/v1 kind: StatefulSet metadata: name: skywalking-elasticsearch namespace: default labels: app: skywalking-elasticsearch spec: replicas: 1 podManagementPolicy: Parallel selector: matchLabels: app: skywalking-elasticsearch serviceName: skywalking-elasticsearch-headless template: metadata: name: skywalking-elasticsearch labels: app: skywalking-elasticsearch spec: # affinity: # podAntiAffinity: # requiredDuringSchedulingIgnoredDuringExecution: # - labelSelector: # matchExpressions: # - key: app # operator: In # values: # - skywalking-elasticsearch # topologyKey: kubernetes.io/hostname initContainers: - command: - sysctl - -w - vm.max_map_count=262144 image: docker.elastic.co/elasticsearch/elasticsearch:7.5.1 imagePullPolicy: IfNotPresent name: configure-sysctl resources: {} securityContext: privileged: true runAsUser: 0 securityContext: fsGroup: 1000 runAsUser: 1000 containers: - env: - name: node.name valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name - name: cluster.initial_master_nodes value: skywalking-elasticsearch-0 - name: discovery.seed_hosts value: skywalking-elasticsearch-headless - name: cluster.name value: skywalking-elasticsearch - name: network.host value: 0.0.0.0 - name: ES_JAVA_OPTS value: -Xmx1g -Xms1g - name: node.data value: "true" - name: node.ingest value: "true" - name: node.master value: "true" name: skywalking-elasticsearch image: docker.elastic.co/elasticsearch/elasticsearch:7.5.1 imagePullPolicy: IfNotPresent ports: - containerPort: 9200 name: http protocol: TCP - containerPort: 9300 name: transport protocol: TCP resources: limits: cpu: "1" memory: 2Gi requests: cpu: 100m memory: 2Gi readinessProbe: exec: command: - sh - -c - | #!/usr/bin/env bash -e # If the node is starting up wait for the cluster to be ready (request params: 'wait_for_status=green&timeout=1s' ) # Once it has started only check that the node itself is responding START_FILE=/tmp/.es_start_file http () { local path="${1}" if [ -n "${ELASTIC_USERNAME}" ] && [ -n "${ELASTIC_PASSWORD}" ]; then BASIC_AUTH="-u ${ELASTIC_USERNAME}:${ELASTIC_PASSWORD}" else BASIC_AUTH='' fi curl -XGET -s -k --fail ${BASIC_AUTH} http://127.0.0.1:9200${path} } if [ -f "${START_FILE}" ]; then echo 'Elasticsearch is already running, lets check the node is healthy and there are master nodes available' http "/_cluster/health?timeout=0s" else echo 'Waiting for elasticsearch cluster to become cluster to be ready (request params: "wait_for_status=green&timeout=1s" )' if http "/_cluster/health?wait_for_status=green&timeout=1s" ; then touch ${START_FILE} exit 0 else echo 'Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )' exit 1 fi fi failureThreshold: 3 initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 3 timeoutSeconds: 5 securityContext: capabilities: drop: - ALL runAsNonRoot: true runAsUser: 1000 volumeMounts: - name: skywalking-elasticsearch mountPath: /usr/share/elasticsearch/data terminationGracePeriodSeconds: 120 volumeClaimTemplates: - metadata: name: skywalking-elasticsearch spec: accessModes: - ReadWriteOnce storageClassName: yizhuang-nfs resources: requests: storage: 100Gi