In recent years, our ability to produce information has been growing steadily, driven by an ever increasing computing power, communication rates, hardware and software sensors diffusion. The topic of Data Stream Processing (DaSP or simply DSP) is a recent and highly active research area dealing with the processing of streaming data with high-throughput and low-latency requirements. Several important on-line and real-time applications can be modeled with DaSP paradigms, including network traffic analysis, financial trading, data mining, and many others. Strong performance requirements are typical in DaSP scenarios: high-throughput and low-latency are unavoidable constraints that imply a careful design and the definition of new tools and libraries leveraging parallel hardware resources. Most of the existing solutions for DaSP (e.g., Apache Storm, Apache Flink) target clusters of commodity machines (scale-out scenarios). However, they are not optimized to exploit at best the potential of a single scale-up server equipped with several multi-core CPUs and co-processors like GPUs. For this reason, we are currently involved in the design and development of WindFlow, a parallel library for DaSP written on top of the Building Block layer provided by our FastFlow programming environment.