hystrix 与高可用系统架构深入分析(二)

商品服务接口故障导致的高并发访问耗尽缓存服务资源的场景分析

本章讲解一下最基本的商品服务接口调用故障，导致缓存服务资源耗尽的场景

这里总结下上图的信息：

我们的缓存架构大体上上面这样，缓存架构简介
1. nginx 本地缓存，过期之后去请求 redis 缓存
2. redis 哨兵集群，高可用，大数据量，高并发
3. nginx 在 redis 获取不到的时候，就去缓存服务获取
4. 缓存服务会在本地缓存中获取，如果获取不到则去商品服务获取，并返回 nginx，同时更新 redis 缓存信息（通过一些手段保证数据不会并发冲突覆盖）
5. 商品信息有更新，则通过消息队列通知缓存服务更新 redis 相关缓存
缓存故障的产生

当所有缓存都失效的时候，大量获取商品详情的请求会到达商品服务，商品服务会去数据库获取信息（这里不考虑数据库是否能支撑住），这时当获取商品服务接口比平时耗时更长时，大量的请求会被阻塞

缓存服务的线程资源也被阻塞，nginx 的线程资源也被阻塞，这个时候就会出现，大量的商品详情页请求失败，一个服务还有其他的接口，比如店铺接口，当线程资源被耗尽的时候，其他服务也不能正常提供服务了

这样一来所有服务不能对外提供服务，大量流量进来，系统崩溃

如何使用 hystrix 在具体的业务场景，去开发高可用的架构呢？

这里介绍 hystrix 最基本的资源隔离技术：线程池隔离技术

提供了一个抽象 Command，把某一个依赖服务所有的调用请求，都走同一个线程池中的线程，而不会用其他的线程资源，这就叫做资源隔离

Command ：每次服务调用请求，都是使用线程池内的一个线程去执行 command 的， comman 里面是你的业务逻辑。

假设该组服务线程池是 3 个线程，同时发起了 1000 个请求，最多也只会有 3 个线程去执行请求，那么就算这个服务故障了，也不会将所有资源耗尽

HystrixCommand 将商品服务接口调用的逻辑进行封装

是一个获取单条数据的抽象

import com.alibaba.fastjson.JSON;
import com.netflix.hystrix.HystrixCommand;
import com.netflix.hystrix.HystrixCommandGroupKey;
import com.netflix.hystrix.HystrixCommandProperties;
import com.netflix.hystrix.HystrixThreadPoolProperties;

import java.util.concurrent.TimeUnit;

public class GetProductCommand extends HystrixCommand<ProductInfo> {
    private Long productId;

    public GetProductCommand(Long productId) {
//        super(HystrixCommandGroupKey.Factory.asKey("GetProductCommandGroup"));
        // 线程组名
        super(Setter.withGroupKey(HystrixCommandGroupKey.Factory.asKey("GetProductCommandGroup"))
                // 超时时间
                .andCommandPropertiesDefaults(HystrixCommandProperties.Setter().withExecutionTimeoutInMilliseconds(6000))
                .andThreadPoolPropertiesDefaults(HystrixThreadPoolProperties.Setter()
                        // 线程池大小，最多有多少个线程同时并发
                        .withCoreSize(2)
                        // 排队，默认为 -1 ，假设 10 个请求，2 个执行，2 个排队，那么其他 6 个将直接返回错误
                        .withMaxQueueSize(2)
                )

        );
        this.productId = productId;
    }

    @Override
    protected ProductInfo run() throws Exception {
        //商品服务url
        String url = "http://localhost:7000/getProduct?productId=" + productId;
        String response = HttpClientUtils.sendGetRequest(url);
        System.out.println("睡眠 5 秒，模拟");
        TimeUnit.SECONDS.sleep(5);
        return JSON.parseObject(response, ProductInfo.class);
    }
}

controller 调用

@RequestMapping("/getProduct")
public ProductInfo getProduct(Long productId) {
    GetProductCommand getProductCommand = new GetProductCommand(productId);
    // 同步执行
    ProductInfo productInfo = getProductCommand.execute();
    return productInfo;
}

测试访问：http://localhost:7001/getProduct?productId=1

一共点击 6 次，只有 4 条被执行了，有两条直接报错

睡眠 5 秒，模拟
睡眠 5 秒，模拟
com.netflix.hystrix.exception.HystrixRuntimeException: GetProductCommand could not be queued for execution and no fallback available.
    at com.netflix.hystrix.AbstractCommand$22.call(AbstractCommand.java:819) ~[hystrix-core-1.5.12.jar:1.5.12]
睡眠 5 秒，模拟
睡眠 5 秒，模拟

上面的日志顺序，后面有两条请求，是因为后面的是前面 4 条数据，其中有两条在排队，所以前面两条请求完成后才会执行后面两条。报错的两条被拒绝了，说不能排队也没有可用的 fallback（后面会讲解这个概念）

HystrixObservableCommand 批量获取商品数据封装

本章的使用方式都是官网教程中有的 HelloWord 例子


import com.alibaba.fastjson.JSON;
import com.netflix.hystrix.HystrixCommandGroupKey;
import com.netflix.hystrix.HystrixObservableCommand;

import rx.Observable;
import rx.schedulers.Schedulers;


public class GetProductsCommand extends HystrixObservableCommand {
    private Long[] pids;

    public GetProductsCommand(Long[] pids) {
        super(HystrixCommandGroupKey.Factory.asKey("GetProductCommandGroup"));
        this.pids = pids;
    }

    @Override
    protected Observable construct() {
        // create OnSubscribe 方法已经过时
        // 文档说改为了 unsafeCreate 方法
        return Observable.unsafeCreate((Observable.OnSubscribe<ProductInfo>) onSubscribe -> {
//            for (Long pid : pids) {
//                String url = "http://localhost:7000/getProduct?productId=" + pid;
//                String response = HttpClientUtils.sendGetRequest(url);
//                onSubscribe.onNext(JSON.parseObject(response, ProductInfo.class));
//            }
//            onSubscribe.onCompleted();
            try {
                if (!onSubscribe.isUnsubscribed()) {
                    for (Long pid : pids) {
                        String url = "http://localhost:7000/getProduct?productId=" + pid;
                        String response = HttpClientUtils.sendGetRequest(url);
                        onSubscribe.onNext(JSON.parseObject(response, ProductInfo.class));
                    }
                    onSubscribe.onCompleted();
                }
            } catch (Exception e) {
                onSubscribe.onError(e);
            }
        }).subscribeOn(Schedulers.io());
    }
}

HystrixObservableCommand 的调用方式

Action1 方式

拉姆达表达式的方式调用，订阅获取每一条结果

/**
 * @param productIds 英文逗号分隔
 */
@RequestMapping("/getProducts")
public void getProduct(String productIds) {
    List<Long> pids = Arrays.stream(productIds.split(",")).map(Long::parseLong).collect(Collectors.toList());
    GetProductsCommand getProductsCommand = new GetProductsCommand(pids.toArray(new Long[pids.size()]));
    // 第一种获取数据模式
    getProductsCommand.observe().subscribe(productInfo -> {
        System.out.println(productInfo);
    });
    System.out.println("方法已执行完成");
}

访问 http://localhost:7001/getProducts?productIds=1,2,3 日志

方法已执行完成
ProductInfo{id=1, name='iphone7手机', price=5599.0, pictureList='a.jpg,b.jpg', specification='iphone7的规格', service='iphone7的售后服务', color='红色,白色,黑色', size='5.5', shopId=1, modifyTime=Mon May 13 22:00:00 CST 2019}
ProductInfo{id=2, name='iphone7手机', price=5599.0, pictureList='a.jpg,b.jpg', specification='iphone7的规格', service='iphone7的售后服务', color='红色,白色,黑色', size='5.5', shopId=1, modifyTime=Mon May 13 22:00:00 CST 2019}
ProductInfo{id=3, name='iphone7手机', price=5599.0, pictureList='a.jpg,b.jpg', specification='iphone7的规格', service='iphone7的售后服务', color='红色,白色,黑色', size='5.5', shopId=1, modifyTime=Mon May 13 22:00:00 CST 2019}

Observer 方式

// 第二种获取数据模式
// 注意不要多次在同一个 command 上订阅
// 否则报错 GetProductsCommand command executed multiple times - this is not permitted.
getProductsCommand.observe().subscribe(new Observer<ProductInfo>() {

    @Override
    public void onCompleted() {
        System.out.println("Observer: onCompleted");
    }

    @Override
    public void onError(Throwable e) {
        System.out.println("Observer: onError:" + e);
    }

    @Override
    public void onNext(ProductInfo productInfo) {
        System.out.println("Observer: onNext:" + productInfo);
    }
});

方法已执行完成
Observer: onNext:ProductInfo{id=1, name='iphone7手机', price=5599.0, pictureList='a.jpg,b.jpg', specification='iphone7的规格', service='iphone7的售后服务', color='红色,白色,黑色', size='5.5', shopId=1, modifyTime=Mon May 13 22:00:00 CST 2019}
Observer: onNext:ProductInfo{id=2, name='iphone7手机', price=5599.0, pictureList='a.jpg,b.jpg', specification='iphone7的规格', service='iphone7的售后服务', color='红色,白色,黑色', size='5.5', shopId=1, modifyTime=Mon May 13 22:00:00 CST 2019}
Observer: onNext:ProductInfo{id=3, name='iphone7手机', price=5599.0, pictureList='a.jpg,b.jpg', specification='iphone7的规格', service='iphone7的售后服务', color='红色,白色,黑色', size='5.5', shopId=1, modifyTime=Mon May 13 22:00:00 CST 2019}
Observer: onCompleted

两种方式有什么不同，其实看对象方法就知道了，如：当异常时，可以通过方法回调获取异常，而 Action1 方式则没有这样的功能

Observer: onError:com.netflix.hystrix.exception.HystrixRuntimeException: GetProductsCommand timed-out and no fallback available.
java.net.ConnectException: Connection refused: connect
    at java.net.DualStackPlainSocketImpl.connect0(Native Method)

同步调用方式

// 同步调用方式
Iterator<ProductInfo> iterator = getProductsCommand.observe().toBlocking().getIterator();
while (iterator.hasNext()) {
    System.out.println(iterator.next());
}

从日志看出来，同步方式的确能达到效果

ProductInfo{id=1, name='iphone7手机', price=5599.0, pictureList='a.jpg,b.jpg', specification='iphone7的规格', service='iphone7的售后服务', color='红色,白色,黑色', size='5.5', shopId=1, modifyTime=Mon May 13 22:00:00 CST 2019}
ProductInfo{id=2, name='iphone7手机', price=5599.0, pictureList='a.jpg,b.jpg', specification='iphone7的规格', service='iphone7的售后服务', color='红色,白色,黑色', size='5.5', shopId=1, modifyTime=Mon May 13 22:00:00 CST 2019}
ProductInfo{id=3, name='iphone7手机', price=5599.0, pictureList='a.jpg,b.jpg', specification='iphone7的规格', service='iphone7的售后服务', color='红色,白色,黑色', size='5.5', shopId=1, modifyTime=Mon May 13 22:00:00 CST 2019}
方法已执行完成

资源隔离效果

hystrix 与高可用系统架构深入分析(二)

基于 hystrix 的信号量技术对地理位置获取逻辑进行资源隔离与限流

什么是信号量（Semaphore）?

信号量（Semaphore）也称为计数器，在 jdk 线程知识中也提供了信号量

线程池与信号量隔离技术的区别？

在 hystrix 中的一个核心就是资源隔离，提供了线程池和信号量的方式，那么他们有什么区别呢?

hystrix 与高可用系统架构深入分析(二)

简单来说：

线程池：
- 使用独立线程池去执行业务逻辑，与当前请求线程（tomcat）不是同一个
- 线程阻塞可中断，所以有超时功能
- 可异步执行
信号量
- 计数器方式，只能是当前请求线程去执行业务逻辑
- 由于使用了当前请求线程，无法实现超时功能（实际测试可以实现，具体不知道是什么原因）
- 由于使用了当前请求线程，无法异步执行

官网中说到线程池的优点有好长的列表。那么线程池主要缺点是它们增加了计算开销。每个命令执行都涉及在单独的线程上运行命令所涉及的排队，调度和上下文切换。

Netflix 在设计这个系统时决定接受这个缺点，以换取它提供的好处，并认为它足够小，不会产生重大的成本或性能影响。

所以信号量方式只是单纯的你觉得客户端不会有故障的情况下，丢掉线程池开销这点性能消耗时使用。

下图示意了线程池与信号量在线程上的区别于原理示意图

信号量在代码中的使用

在了解了信号量与线程池的区别情况下

大体上的思路是：商品信息中包含了发货地址信息，地址信息是缓存在本地 map 中的，使用信号量方式来限流获取地址信息。

官网中已经讲得很明白了。所以，对于信号量的使用，这里只是演示下

使用信号量策略很简单，在构造 command 时，更改隔离策略为 SEMAPHORE


import com.alibaba.fastjson.JSON;
import com.netflix.hystrix.HystrixCommand;
import com.netflix.hystrix.HystrixCommandGroupKey;
import com.netflix.hystrix.HystrixCommandProperties;

import java.util.concurrent.TimeUnit;



public class GetCityCommand extends HystrixCommand<ProductInfo> {
    private Long productId;

    public GetCityCommand(Long productId) {
//        super(HystrixCommandGroupKey.Factory.asKey("GetProductCommandGroup"));
        // 线程组名
        super(Setter.withGroupKey(HystrixCommandGroupKey.Factory.asKey("GetProductCommandGroup"))
                // 超时时间
                .andCommandPropertiesDefaults(HystrixCommandProperties.Setter()
                        // 设置 4 秒超时，看是否有效果
                        .withExecutionTimeoutInMilliseconds(6000)
                        .withExecutionIsolationStrategy(HystrixCommandProperties.ExecutionIsolationStrategy.SEMAPHORE)
                        // 信号量最大请求数量设置
                        .withExecutionIsolationSemaphoreMaxConcurrentRequests(2)
                )

        );
        this.productId = productId;
    }

    @Override
    protected ProductInfo run() throws Exception {
        System.out.println(Thread.currentThread().getName());
        String url = "http://localhost:7000/getProduct?productId=" + productId;
        String response = HttpClientUtils.sendGetRequest(url);
        System.out.println("睡眠 5 秒，模拟");
        TimeUnit.SECONDS.sleep(5);
        return JSON.parseObject(response, ProductInfo.class);
    }
}

调用处代码

@RequestMapping("/semaphore/getProduct")
public ProductInfo semaphoreGetProduct(Long productId) {
    GetCityCommand getCityCommand = new GetCityCommand(productId);
    System.out.println(Thread.currentThread().getName());
    ProductInfo productInfo = getCityCommand.execute();
    return productInfo;
}

访问：http://localhost:7001/semaphore/getProduct?productId=1

测试结果：

对于限流日志报错如下

com.netflix.hystrix.exception.HystrixRuntimeException: GetCityCommand could not acquire a semaphore for execution and no fallback available.

这里测试超时也是有效果的，但是不知道是怎么实现的，看了下源码，里面 jdk 多线程的代码很多，看不明白；

应该是没有使用自己的线程池了，看日志打印的线程名称是 tomcat 的线程