Skip to content

Commit 51c781e

Browse files
author
gvdongen
authored
Merge pull request #3 from Klarrio/v3.0.0
release v3.0.0
2 parents 685877b + c1255c1 commit 51c781e

File tree

223 files changed

+47262
-70
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

223 files changed

+47262
-70
lines changed

README.md

Lines changed: 23 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4,24 +4,36 @@ This repository contains the code of the open stream processing benchmark.
44

55
All documentation can be found in our [wiki](https://github.com/Klarrio/open-stream-processing-benchmark/wiki).
66

7-
This code base includes:
8-
- benchmark pipeline implementations:
9-
- data stream generator to generate input streams locally or on a DC/OS cluster
10-
- Kafka scripts to start a cluster and read from a topic for local development
7+
It includes:
8+
- [benchmark](./benchmark): benchmark pipeline implementations
9+
- [data-stream-generator](./data-stream-generator): data stream generator to generate input streams locally or on a DC/OS cluster
10+
- [output-consumer](./output-consumer): consumes the output of the processing job and metrics-exporter from Kafka and stores it on S3.
11+
- [evaluator](./evaluator): computes performance metrics on the output of the output consumer.
12+
- [result analysis](./result-analysis): Jupyter notebooks to visualize the results.
13+
- [deployment](./deployment): deployment scripts to run the benchmark on an DC/OS setup on AWS.
14+
- [kafka-cluster-tools](./kafka-cluster-tools): Kafka scripts to start a cluster and read from a topic for local development
15+
- [metrics-exporter](./metrics-exporter): exports metrics of JMX and cAdvisor and writes them to Kafka.
1116

1217
Currently the benchmark includes Apache Spark (Spark Streaming and Structured Streaming), Apache Flink and Kafka Streams.
13-
14-
## Running the benchmark locally
15-
16-
Documentation on running the benchmark locally can be found in our [wiki](https://github.com/Klarrio/open-stream-processing-benchmark/wiki/Architecture-and-deployment).
17-
1818
## References, Publications and Talks
1919
- [van Dongen, G., & Van den Poel, D. (2020). Evaluation of Stream Processing Frameworks. IEEE Transactions on Parallel and Distributed Systems, 31(8), 1845-1858.](https://ieeexplore.ieee.org/abstract/document/9025240)
2020
The Supplemental Material of this paper can be found [here](https://s3.amazonaws.com/ieeecs.cdn.csdl.public/trans/td/2020/08/extras/ttd202008-09025240s1-supp1-2978480.pdf).
2121

22+
- [van Dongen, G., & Van den Poel, D. (2021). A Performance Analysis Fault Recovery in Stream Processing Frameworks. IEEE Access.](https://ieeexplore.ieee.org/document/9466838)
23+
24+
- [van Dongen, G., & Van den Poel, D. (2021). Influencing Factors in the Scalability of Distributed Stream Processing Jobs. IEEE Access.](https://ieeexplore.ieee.org/document/9507502)
25+
2226
- Earlier work-in-progress publication:
2327
[van Dongen, G., Steurtewagen, B., & Van den Poel, D. (2018, July). Latency measurement of fine-grained operations in benchmarking distributed stream processing frameworks. In 2018 IEEE International Congress on Big Data (BigData Congress) (pp. 247-250). IEEE.](https://ieeexplore.ieee.org/document/8457759)
24-
25-
Talks related to this publication:
28+
Talks related to this publication:
2629

2730
- Spark Summit Europe 2019: [Stream Processing: Choosing the Right Tool for the Job - Giselle van Dongen](https://www.youtube.com/watch?v=PiEQR9AXgl4&t=2s)
31+
32+
## Issues, questions or need help?
33+
34+
Are you having issues with anything related to the project? Do you wish to use this project or extend it? The fastest way to contact me is through:
35+
36+
LinkedIn: [giselle-van-dongen](https://www.linkedin.com/in/giselle-van-dongen/)
37+
38+
39+

benchmark/README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# OSPBench: Open Stream Processing Benchmark
2+
3+
Currently the benchmark includes Apache Spark (Spark Streaming and Structured Streaming), Apache Flink and Kafka Streams.
4+
5+
Please consult the wiki of the repository to see details on deployment and running locally.

benchmark/build.sbt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,4 +76,4 @@ def frameworkSettings(framework: String, versionDocker: String) = Seq(
7676
version := versionDocker,
7777
fork in Test := true,
7878
envVars in Test := Map("DEPLOYMENT_TYPE" -> "local", "MODE" -> "constant-rate", "KAFKA_BOOTSTRAP_SERVERS" -> "localhost:9092")
79-
)
79+
)

benchmark/common-benchmark/src/main/resources/local.conf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ environment {
66
}
77

88
general {
9-
last.stage = "101"
9+
last.stage = "3"
1010
partitions = 2
1111
stream.source {
1212
volume = "1"

benchmark/flink-benchmark/src/main/scala/flink/benchmark/BenchmarkSettingsForFlink.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ class BenchmarkSettingsForFlink(overrides: Map[String, Any] = Map()) extends Ser
3131
val checkpointDir = new File(general.configProperties.getString("flink.checkpoint.dir"))
3232
Try(FileUtils.cleanDirectory(checkpointDir))
3333
"file://" + checkpointDir.getCanonicalPath
34-
} else general.configProperties.getString("flink.checkpoint.dir") + System.currentTimeMillis() + "/"
34+
} else general.configProperties.getString("flink.checkpoint.dir") + general.outputTopic + "/"
3535

3636

3737
val jobProfileKey: String = general.mkJobProfileKey("flink", bufferTimeout)

0 commit comments

Comments
 (0)