
Research Article
Toward Sliding Time Window of Low Watermark to Detect Delayed Stream Arrival
@INPROCEEDINGS{10.1007/978-3-030-67540-0_28, author={Xiaoqian Zhang and Kun Ma}, title={Toward Sliding Time Window of Low Watermark to Detect Delayed Stream Arrival}, proceedings={Collaborative Computing: Networking, Applications and Worksharing. 16th EAI International Conference, CollaborateCom 2020, Shanghai, China, October 16--18, 2020, Proceedings, Part II}, proceedings_a={COLLABORATECOM PART 2}, year={2021}, month={1}, keywords={Stream processing Watermark Out-of-order data Stragglers Late data}, doi={10.1007/978-3-030-67540-0_28} }
- Xiaoqian Zhang
Kun Ma
Year: 2021
Toward Sliding Time Window of Low Watermark to Detect Delayed Stream Arrival
COLLABORATECOM PART 2
Springer
DOI: 10.1007/978-3-030-67540-0_28
Abstract
Some emergency events such as time interval between input streams, operator’s misoperation, and network delay might cause stream processing system produce unbounded out-of-order data streams. Recent work on this issue focuses on explicit punctuation or heartbeats to handle faults and stragglers (outlier data). Most parallel and distributed models on stream processing, such as Google MillWheel and Apache Flink, require hot replication, logging, and upstream backup in an expensive manner. But these frameworks ignore straggler processing. Some latest frameworks such as Google MillWheel and Apache Flink only process disorder on an operator level, but only point-in-time and fixed window of low watermarks are discussed. Therefore, we propose a new sliding time window of low watermarks to detect delayed stream arrival. Contributions of our methods conclude as adaptive low watermarks, distinguishing stragglers from late data, and dynamic rectification of low watermark. The experiments show that our method is better in tolerating more late data to detect stragglers accurately.