Getting Started
Java
Dependency Management
Provide the Spark SQL and Redis Connector for Spark dependencies to your dependency management tool.
dependencies {
implementation 'com.redis:redis-spark-connector:0.9.1'
implementation 'org.apache.spark:spark-sql_2.12:3.5.4'
}
<dependencies>
<dependency>
<groupId>com.redis</groupId>
<artifactId>redis-spark-connector</artifactId>
<version>0.9.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.5.4</version>
</dependency>
</dependencies>
Spark Session Configuration
package com.redis.examples;
import org.apache.spark.sql.SparkSession;
public class RedisSparkExample {
public static void main(String[] args) throws Exception {
SparkSession spark = SparkSession.builder()
.master("local")
.appName("RedisSparkExample")
.config("spark.redis.read.connection.uri", "redis://localhost:6379")
.config("spark.redis.write.connection.uri", "redis://localhost:6379")
.getOrCreate();
}
}
For Redis Connector for Spark configuration details, see the Configuration section.
Python
PySpark
This guide describes how to use PySpark with the Redis Connector for Spark but this works with self-contained Python applications as well.
When starting pyspark
you must use one of the following options to add the package to the classpath:
--packages com.redis:redis-spark-connector:0.9.1
-
downloads the Redis Connector for Spark package using the given Maven coordinates or
--jars path/to/redis-spark-connector-0.9.1.jar
-
adds the downloaded Redis Connector for Spark jar to the classpath
You can specify --conf
option(s) to configure the connector.
pyspark --conf "spark.redis.read.connection.uri=redis://localhost:6379" \
--conf "spark.redis.write.connection.uri=redis://localhost:6379" \
--packages com.redis:redis-spark-connector:0.9.1
Python Application
Create a SparkSession
object using the same configuration options as before:
from pyspark.sql import SparkSession
spark_session = SparkSession
.builder
.appName("myApp")
.config("spark.redis.read.connection.uri", "redis://localhost:6379")
.config("spark.redis.write.connection.uri", "redis://localhost:6379")
.getOrCreate()
Scala
Spark Shell
When starting the Spark shell you must use one of the following options to add the package to the classpath:
--packages com.redis:redis-spark-connector:0.9.1
-
downloads the Redis Connector for Spark package using the given Maven coordinates or
--jars path/to/redis-spark-connector-0.9.1.jar
-
adds the downloaded Redis Connector for Spark jar to the classpath
You can specify --conf
option(s) to configure the connector.
spark-shell --conf "spark.redis.read.connection.uri=redis://localhost:6379" \
--conf "spark.redis.write.connection.uri=redis://localhost:6379" \
--packages com.redis:redis-spark-connector:0.9.1
Scala Application
Dependency Management
Provide the Spark SQL and Redis Connector for Spark dependencies to your dependency management tool.
scalaVersion := "2.12",
libraryDependencies ++= Seq(
"com.redis" %% "redis-spark-connector" % "0.9.1",
"org.apache.spark" %% "spark-sql" % "3.5.4"
)
Spark Session Configuration
package com.redis
object RedisSparkExample {
def main(args: Array[String]): Unit = {
import org.apache.spark.sql.SparkSession
val sparkSession = SparkSession.builder()
.master("local")
.appName("RedisSparkExample")
.config("spark.redis.read.connection.uri", "redis://localhost:6379")
.config("spark.redis.write.connection.uri", "redis://localhost:6379")
.getOrCreate()
}
}
Databricks
For Databricks-specific setup instructions, see the Databricks Integration guide.