yajith2001/pyspark-job

By yajith2001

Updated about 1 year ago

Base image for PySpark jobs with Python 3, Spark 3.5.3, OpenJDK 11, JDBC driver, and dependencies.

Image
Languages & frameworks
Databases & storage
0

65

yajith2001/pyspark-job repository overview

This Docker image serves as a base environment for running PySpark jobs, pre-configured with Python 3, Apache Spark 3.5.3, and OpenJDK 11. It includes essential dependencies such as JDBC drivers for SQL Server, utilities like curl, wget, and bash, and Spark environment settings for seamless data processing. With PySpark installed, it supports running Spark applications out of the box. The image is optimized for ETL workflows, supporting connectivity to various data sources. It also includes configurations for SQL Server tools and ensures compatibility with Spark jobs requiring JDBC integration, making it ideal for scalable data processing and analytics tasks.

To use this image, here is the below example

FROM yajith2001/pyspark-job:latest

WORKDIR /app

COPY main.py requirements.txt /app/

RUN pip install -r requirements.txt

CMD ["python", "/app/main.py"]

Tag summary

Content type

Image

Digest

sha256:b56527002

Size

1.6 GB

Last updated

about 1 year ago

docker pull yajith2001/pyspark-job