tomerlevi/jupyter-spark-snowflake

By tomerlevi

Updated about 6 years ago

Jupyter notebook server including Spark and Snowflake spark connector

Image

120

This image includes Jupyter notebook server including Spark 2.2, Snowflake spark connector and GraphFrames lib. The image is based on the official jupyter/all-spark-notebook

Running:

docker run -p 8888:8888 -p 4040:4040 -v ~:/home/jovyan/workspace --name jupyter tomerlevi/jupyter-spark-snowflake

Providing Spark options:

docker run -p 8888:8888 -p 4040:4040 -v ~:/home/jovyan/workspace -e SPARK_OPTS='--spark.executor.memory=1.5g --spark.driver.memory=0.5g' --name jupyter tomerlevi/jupyter-spark-snowflake

Follow my blog post for more details: https://medium.com/fundbox-engineering/overview-d3759e83969c

Sample spark code:

from pyspark import SparkConf, SparkContext
from pyspark import SparkContext
from pyspark.sql import SparkSession

spark = SparkSession.builder.master("local").appName("Spark-Snowflake").getOrCreate()

# You might need to set these
#sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", "<YOUR_AWS_KEY>")
#sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", "<YOUR_AWS_SECRET>")

# Set options below
sfOptions = {
  "sfURL" : "<account_name>.<region>.snowflakecomputing.com",
  "sfAccount" : "<account_name>",
  "sfUser" : "<user>",
  "sfPassword" : "<password>",
  "sfDatabase" : "<db_name>",
  "sfSchema" : "<schema_name>",
  "sfWarehouse" : "<warehouse_name>",
}

SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"

df = spark.read.format(SNOWFLAKE_SOURCE_NAME) \
  .options(**sfOptions) \
  .option("query",  "select a,b from MYTABLE") \
  .load()

df.show()

Docker Pull Command

docker pull tomerlevi/jupyter-spark-snowflake