segence/spark
Docker images of Apache Spark.
Image | Description | Dockerfile |
---|---|---|
Base | Spark base image with default installation. Only Avro library is added on top of official installation. | Dockerfile |
Cloud | Contains the Hadoop AWS library to access S3. | Dockerfile.cloud |
Base image containing default installation.
Connecting to AWS:
val df = spark.read.json("s3a://mybucket/sth.json")
val df = spark.read.format("avro").load("s3a://mybucket/sth.avro")
docker pull segence/spark