University of Toronto, 2013.
Modern data-intensive applications handling massive event streams such as real-time traffic monitoring require support for both rich data filtering and aggregation capabilities. While the pub/sub communication paradigm provides an effective solution for the sought semantic diversity of event filtering, the event processing capabilities of existing pub/sub systems are restricted to singular event matching without support for stream aggregation, which can be accommodated only via the end-to-end principle.
In this paper, we propose the first systematic solution for supporting a range of time-based aggregation semantics in a pub/sub system. In order to eschew the need to disseminate a large number of publications to the subscriber, we strive to distribute the aggregation computation within the pub/ sub overlay network. By enriching the pub/sub language with aggregation semantics, we allow pub/sub brokers to aggregate incoming publications without forwarding them to the next broker downstream. We show that our baseline solutions, one which aggregates early (at the publisher edge) and another which aggregates late (at the subscriber edge), are not optimal strategies for minimizing bandwidth consumption. We thus propose an adaptive rate-based heuristic solution which determines which brokers should aggregate publications. We show that this adaptive solution leads to improved performance compared to our baseline solutions using real datasets extracted from our traffic monitoring use case.