tomerlevi/jupyter-spark-snowflake
Jupyter notebook server including Spark and Snowflake spark connector
120
This image includes Jupyter notebook server including Spark 2.2, Snowflake spark connector and GraphFrames lib. The image is based on the official jupyter/all-spark-notebook
docker run -p 8888:8888 -p 4040:4040 -v ~:/home/jovyan/workspace --name jupyter tomerlevi/jupyter-spark-snowflake
docker run -p 8888:8888 -p 4040:4040 -v ~:/home/jovyan/workspace -e SPARK_OPTS='--spark.executor.memory=1.5g --spark.driver.memory=0.5g' --name jupyter tomerlevi/jupyter-spark-snowflake
Follow my blog post for more details: https://medium.com/fundbox-engineering/overview-d3759e83969c
from pyspark import SparkConf, SparkContext
from pyspark import SparkContext
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local").appName("Spark-Snowflake").getOrCreate()
# You might need to set these
#sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", "<YOUR_AWS_KEY>")
#sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", "<YOUR_AWS_SECRET>")
# Set options below
sfOptions = {
"sfURL" : "<account_name>.<region>.snowflakecomputing.com",
"sfAccount" : "<account_name>",
"sfUser" : "<user>",
"sfPassword" : "<password>",
"sfDatabase" : "<db_name>",
"sfSchema" : "<schema_name>",
"sfWarehouse" : "<warehouse_name>",
}
SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"
df = spark.read.format(SNOWFLAKE_SOURCE_NAME) \
.options(**sfOptions) \
.option("query", "select a,b from MYTABLE") \
.load()
df.show()
docker pull tomerlevi/jupyter-spark-snowflake