【发布时间】:2019-02-24 18:55:15
【问题描述】:
在 RxJava 和 Reactor 中有虚拟时间的概念来测试依赖于时间的操作符。我不知道如何在 Flink 中做到这一点。例如,我整理了以下示例,我想在其中玩弄迟到的事件以了解它们是如何处理的。但是我无法理解这样的测试会是什么样子?有没有办法将 Flink 和 Reactor 结合起来让测试变得更好?
public class PlayWithFlink {
public static void main(String[] args) throws Exception {
final OutputTag<MyEvent> lateOutputTag = new OutputTag<MyEvent>("late-data"){};
// TODO understand how BoundedOutOfOrderness is related to allowedLateness
BoundedOutOfOrdernessTimestampExtractor<MyEvent> eventTimeFunction = new BoundedOutOfOrdernessTimestampExtractor<MyEvent>(Time.seconds(10)) {
@Override
public long extractTimestamp(MyEvent element) {
return element.getEventTime();
}
};
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
DataStream<MyEvent> events = env.fromCollection(MyEvent.examples())
.assignTimestampsAndWatermarks(eventTimeFunction);
AggregateFunction<MyEvent, MyAggregate, MyAggregate> aggregateFn = new AggregateFunction<MyEvent, MyAggregate, MyAggregate>() {
@Override
public MyAggregate createAccumulator() {
return new MyAggregate();
}
@Override
public MyAggregate add(MyEvent myEvent, MyAggregate myAggregate) {
if (myEvent.getTracingId().equals("trace1")) {
myAggregate.getTrace1().add(myEvent);
return myAggregate;
}
myAggregate.getTrace2().add(myEvent);
return myAggregate;
}
@Override
public MyAggregate getResult(MyAggregate myAggregate) {
return myAggregate;
}
@Override
public MyAggregate merge(MyAggregate myAggregate, MyAggregate acc1) {
acc1.getTrace1().addAll(myAggregate.getTrace1());
acc1.getTrace2().addAll(myAggregate.getTrace2());
return acc1;
}
};
KeySelector<MyEvent, String> keyFn = new KeySelector<MyEvent, String>() {
@Override
public String getKey(MyEvent myEvent) throws Exception {
return myEvent.getTracingId();
}
};
SingleOutputStreamOperator<MyAggregate> result = events
.keyBy(keyFn)
.window(EventTimeSessionWindows.withGap(Time.seconds(10)))
.allowedLateness(Time.seconds(20))
.sideOutputLateData(lateOutputTag)
.aggregate(aggregateFn);
DataStream lateStream = result.getSideOutput(lateOutputTag);
result.print("SessionData");
lateStream.print("LateData");
env.execute();
}
}
class MyEvent {
private final String tracingId;
private final Integer count;
private final long eventTime;
public MyEvent(String tracingId, Integer count, long eventTime) {
this.tracingId = tracingId;
this.count = count;
this.eventTime = eventTime;
}
public String getTracingId() {
return tracingId;
}
public Integer getCount() {
return count;
}
public long getEventTime() {
return eventTime;
}
public static List<MyEvent> examples() {
long now = System.currentTimeMillis();
MyEvent e1 = new MyEvent("trace1", 1, now);
MyEvent e2 = new MyEvent("trace2", 1, now);
MyEvent e3 = new MyEvent("trace2", 1, now - 1000);
MyEvent e4 = new MyEvent("trace1", 1, now - 200);
MyEvent e5 = new MyEvent("trace1", 1, now - 50000);
return Arrays.asList(e1,e2,e3,e4, e5);
}
@Override
public String toString() {
return "MyEvent{" +
"tracingId='" + tracingId + '\'' +
", count=" + count +
", eventTime=" + eventTime +
'}';
}
}
class MyAggregate {
private final List<MyEvent> trace1 = new ArrayList<>();
private final List<MyEvent> trace2 = new ArrayList<>();
public List<MyEvent> getTrace1() {
return trace1;
}
public List<MyEvent> getTrace2() {
return trace2;
}
@Override
public String toString() {
return "MyAggregate{" +
"trace1=" + trace1 +
", trace2=" + trace2 +
'}';
}
}
运行这个的输出是:
SessionData:1> MyAggregate{trace1=[], trace2=[MyEvent{tracingId='trace2', count=1, eventTime=1551034666081}, MyEvent{tracingId='trace2', count=1, eventTime=1551034665081}]}
SessionData:3> MyAggregate{trace1=[MyEvent{tracingId='trace1', count=1, eventTime=1551034166081}], trace2=[]}
SessionData:3> MyAggregate{trace1=[MyEvent{tracingId='trace1', count=1, eventTime=1551034666081}, MyEvent{tracingId='trace1', count=1, eventTime=1551034665881}], trace2=[]}
但是我希望看到 e5 事件的 lateStream 触发器应该在第一个事件触发前 50 秒。
【问题讨论】:
标签: java rx-java apache-flink flink-streaming project-reactor