INfrastructure for disTributed stReam mAnagement usiNg Self organIzing neTwork-overlays
Project Overview
We consider distributed applications that continuously stream data across the network, where data needs to be aggregated and processed to produce a 'useful' stream of updates. Centralized approaches to performing data aggregation suffer from high communication overheads, lack of scalability, and unpredictably high processing workloads at central servers. inTransit provides a scalable and efficient solution to distributed stream management based on (1) resource-awareness, which is middleware-level knowledge of underlying network and processing resources, (2) overlay-based in-network data aggregation, and (3) high-level programming constructs to describe data-flow graphs for composing useful streams. inTransit makes use of a novel algorithm based on resource-aware network partitioning to support dynamic deployment of data-flow graph components across the network, where efficiency of the deployed overlay is maintained by making use of partition-level resource-awareness. Contributions also include efficient middleware-based support for component deployment, utilizing runtime code generation rather than interpretation techniques, thereby addressing both high performance and resource-constrained applications.
Download
inTransit Version-0.1 has been released, it includes basic functionalities like system boot-strapping, node-information exchange and discovery, support for SQL joins and selects, and an interesting dynamic query modification module that lets the user modify query parameters at runtime. The system has been implemented in C++ and provides a simple GUI for issuing queries and viewing the streaming results. More information can be found in the README accompanying the distribution.
People
Zhongtang Cai
Brian F. Cooper
Greg Eisenhauer
Vibhore Kumar
Karsten Schwan
Sangeetha Seshadri
Balasubramanian Seshasayee
Patrick Widener
Publications
[1] Zhongtang Cai, Vibhore Kumar, Brian F. Cooper, Greg Eisenhauer, Karsten Schwan, Rob Strom. Utility-Driven Management of Availability in Enterprise-Scale Information Flows. ACM/IFIP/USENIX 7th International Middleware Conference, Melbourne, Australia, 2006. pdf
[2] Vibhore Kumar, Zhongtang Cai, Brian F. Cooper, Greg Eisenhauer, Karsten Schwan, Mohamed Mansour, Balasubramanian Seshasayee, Patrick Widener. Implementing Diverse Messaging Models with Self-Managing Properties using IFLOW. 3rd IEEE International Conference on Autonomic Computing (ICAC 2006), Dublin, Ireland. pdf
[3] Sangeetha Seshadri, Vibhore Kumar, Brian F. Cooper. Optimizing Multiple Queries in Distributed Data Stream Systems. 2nd IEEE International Workshop on Networking Meets Database (NetDB), in conjunction with ICDE 2006.
[4] Karsten Schwan, Brian F. Cooper, Greg Eisenhauer, Ada Gavrilovska, Matt Wolf, Hasan Abbasi, Sandip Agarwala, Zhongtang Cai, Vibhore Kumar, Jay Lofstead, Mohamed Mansour, Balasubramanian Seshasayee, and Patrick Widener. AutoFlow: Autonomic Information Flows for Critical Information Systems. Autonomic Computing: Concepts, Infrastructure, and Applications, ed. Manish Parashar and Salim Hariri, CRC Press, 2006.
[5] Vibhore Kumar, Brian F. Cooper, Zhongtang Cai, Greg Eisenhauer, Karsten Schwan. Middleware for Enterprise Scale Data Stream Management using Utility-Driven Self-Adaptive Information Flows. In Cluster Computing Journal, Springer Publishing, invited for publication, 2006.
[6] Vibhore Kumar, Brian F. Cooper, Zhongtang Cai, Greg Eisenhauer, Karsten Schwan. Resource-Aware Distributed Stream Management using Dynamic Overlays. 25th IEEE International Conference on Distributed Computing Systems (ICDCS), 2005. pdf
[7] Vibhore Kumar, Brian F. Cooper, Karsten Schwan. Distributed Stream Management using Utility-Driven Self-Adaptive Middleware. 2nd IEEE International Conference on Autonomic Computing (ICAC), 2005. Best student paper award. pdf