APACHE NIFI AS A TOOL FOR STREAM PROCESSING OF MEASUREMENT DATA
Słowa kluczowe:Apache NiFi, Streaming data, Apache Kafka, Apache ZooKeeper, Apache Spark Structured Streaming
In order to analyze data in real time, without wasting time on preliminary aggregation of this data, organizations are increasingly redesigning the way they make decisions by implementing streaming analysis. Such analysis ensures ongoing data monitoring in order to evaluate them and detect possible irregularities. Thanks to this approach, we are able to increase the speed and accuracy of decision-making, which is important wherever we want to quickly respond to data anomalies, e.g. by indicating failures or threats.
The aim of the article is to present an alternative to traditional batch data processing by presenting a solution such as data streaming. The article presents Apache NiFi as a data stream processing tool.
In order to present the solutions in a better way, the article describes a project whose task was to design and implement a system using the Apache NiFi program to stream process air quality measurement data from the API of the Chief Inspectorate for Environmental Protection.
/The project allowed to automate the flow of measurement data between the systems. Data flow presentation was presented using a visual interface. The NiFi program enabled early filtering of data by dividing them according to the scale of measurement thresholds, which enabled their monitoring and evaluation, taking into account possible data gaps. As the measurement data was processed in a streaming manner, the project was able to register the missing measurements, which would be impossible in another solution using batch processing, in which measurements are collected in databases and then analyzed on static data, because such information would be supplemented later time which would lead to overwriting of this data.
Chief Inspectorate of Environmental Protection, https://powietrze.gios.gov.pl
API portal "Air Quality", https://powietrze.gios.gov.pl/pjp/content/api
Apache Kafka documentation, https://kafka.apache.org/documentation
Apache NiFi Documentation, https://nifi.apache.org/docs.html
Apache ZooKeeper, https://zookeeper.apache.org/doc/r3.7.0/index.html
Docker Documentation, https://docs.docker.com/get-docker/
Structured Streaming Programming Guide, https://spark.apache.org/docs/2.2.2/structured-streaming-programming-guide.html