Public Repository

Last pushed: 2 months ago
Short Description
Container with preinstalled Kafka, Spark streaming (PySpark), and Cassandra.
Full Description

This Dockerfile sets up a complete streaming environment for experimenting with Kafka, Spark streaming (PySpark), and Cassandra. It installs

  • Kafka 0.10.2.1
  • Spark 2.1.1 for Scala 2.11
  • Cassandra 3.7

It additionnally installs

  • Anaconda 2.4.4 Python distribution
  • Jupyter notebook for Python

See https://github.com/Yannael/kafka-sparkstreaming-cassandra for details.

Docker Pull Command
Owner
yannael

Comments (2)
eman23
3 months ago

Hello Yannael

I want to know how to make docker volume for this container for backup and recovery purposes.

Thank in advance

onlyricks
6 months ago

Hey Thanks for the image but i am getting below error, Need your help to get it fixed.


Py4JJavaError Traceback (most recent call last)

<ipython-input-4-cc5a6ea484a2> in <module>()
1 ssc = StreamingContext(sc, 5)
----> 2 kvs = KafkaUtils.createStream(ssc, "127.0.0.1:2181", "spark-streaming-consumer", {'test': 1})
3 data = kvs.map(lambda x: x[1])
4 rows= data.map(lambda x:Row(time_sent=x,time_received=time.strftime("%Y-%m-%d %H:%M:%S")))
5 rows.foreachRDD(saveToCassandra)

/home/guest/spark/python/pyspark/streaming/kafka.py in createStream(ssc, zkQuorum, groupId, topics, kafkaParams, storageLevel, keyDecoder, valueDecoder)
78 if 'ClassNotFoundException' in str(e.java_exception):
79 KafkaUtils._printErrorMsg(ssc.sparkContext)
---> 80 raise e
81 ser = PairDeserializer(NoOpSerializer(), NoOpSerializer())
82 stream = DStream(jstream, ssc, ser)

Py4JJavaError: An error occurred while calling o30.createStream.
: java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:91)
at org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:168)
at org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper.createStream(KafkaUtils.scala:632)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 25 more