pyspark-job
Base image for PySpark jobs with Python 3, Spark 3.5.3, OpenJDK 11, JDBC driver, and dependencies.
65
This Docker image serves as a base environment for running PySpark jobs, pre-configured with Python 3, Apache Spark 3.5.3, and OpenJDK 11. It includes essential dependencies such as JDBC drivers for SQL Server, utilities like curl, wget, and bash, and Spark environment settings for seamless data processing. With PySpark installed, it supports running Spark applications out of the box. The image is optimized for ETL workflows, supporting connectivity to various data sources. It also includes configurations for SQL Server tools and ensures compatibility with Spark jobs requiring JDBC integration, making it ideal for scalable data processing and analytics tasks.
To use this image, here is the below example
FROM yajith2001/pyspark-job:latest
WORKDIR /app
COPY main.py requirements.txt /app/
RUN pip install -r requirements.txt
CMD ["python", "/app/main.py"]
Content type
Image
Digest
sha256:b56527002…
Size
1.6 GB
Last updated
about 1 year ago
docker pull yajith2001/pyspark-job