SCSQ - SuperComputer Stream Query processor
This work was funded by VINNOVA, SSF, and ASTRON.Instruments, such as radio telescopes, colliders, sensor networks, loggers, and simulators generate very high volumes of data streams that scientists, engineers, and monitoring systems analyze to detect and understand physical phenomena. The data volume in these data streams is often very high and there is need for advanced computations on the streams. This requires substantial hardware resources and scalable stream processing.
We address these challenges by developing a data stream management system SCSQ (pronounced 'sisque', Supercomputer Stream Query processor) which is a Data Stream Management System (DSMS) that enables queries over high-volume distributed streams. We have developed a SCSQ prototype that runs on a variety of hardware platforms, from Windows to IBM BlueGene massively parallel computers. SCSQ enables high level specification of distributed stream queries involving advanced computations in such heterogeneous communication and computation environments. SCSQ queries filter, transform, and join data from different kinds of distributed streaming data sources.
An important target application for the SCSQ
technology was LOFAR, a large
digital
radio telescope being developed in the Netherlands. An antenna array
distributed over the Netherlands and Germany produces massive amounts
of data which is streamed through heterogeneous cluster computers that
include Linux clusters and a 12000 nodes IBM Bluegene. SCSQ runs in
this massively parallel and heterogeneous computing
environment. SCSQ optimizes and executes data stream queries from
digital receivers of the LOFAR space radio signals.
The
SCSQ prototype has been
evaluated using
the Linear Road
Benchmark, which is a simulation of a toll expressway system
producing data streams to be processed by a data stream management
system. The implementation is called SCSQ-LR.
The SCSQ prototype is being
further developed in the iStreams project
where data stream management techniques are applied on searching and
analyzing industrial streams.
New: The massively scalable parallel
implementation of Linear Road SCSQ-PLR now is network bound and
achieves orders of magnitude improved scalability (L>512) over any
previously published results for the Linear Road Benchmark:
E.Zeitler and T.Risch: Massive scale-out of expensive continuous queries, presented at 37th International Conference on Very Large Databases, VLDB 2011, in Proceedings of the VLDB Endowment, Vol. 4, No. 11, 2011.
E. Zeitler and T.Risch: Scalable Splitting of Massive Data Streams, in Proc. 15th Conf. on Database Systems for Advanced Application, DASFAA 2010., Tokyo, Japan, 1-4 April, 2010 (abstract).
Publications
There first overview of the SCSQ project was made in:
- E.Zeitler and T.Risch: Processing high-volume stream queries on a supercomputer, ICDE Ph.D. Workshop 2006, Atlanta, GA, April 2006
The following paper shows how the flexibility of the SCSQ query language (SCSQL, pronounced 'Siskel') can be used for investigating the performance of a heterogeneous and massively parallel computer environment:
- E.Zeitler and T.Risch: Using stream queries to measure communication performance of a parallel computing environment. Presented at First International Workshop on Distributed Event Processing, Systems and Applications (DEPSA), Toronto, Canada, June 29, 2007.
SCSQ was applied on large scale collective traffic systems in:
- G.Gidofalvi, T.B. Pedersen, T.Risch, and E.Zeitler: Highly Scalable Trip Grouping for Large Scale Collective Transportation Systems, Proc. 11th International Conference on Extending Database Technology, EDBT 2008 , Nantes, France, March 2008.
There are popular presentations in ASTRON
News and European
Space Agency in Sweden.
People
Responsible for this project is Tore Risch. It is the basis
for the PhD work of Erik
Zeitler.