Getting Started

Java

Dependency Management

Provide the Spark SQL and Redis Connector for Spark dependencies to your dependency management tool.

Gradle
dependencies {
    implementation 'com.redis:redis-spark-connector:0.9.1'
    implementation 'org.apache.spark:spark-sql_2.12:3.5.4'
}
Maven
<dependencies>
  <dependency>
    <groupId>com.redis</groupId>
    <artifactId>redis-spark-connector</artifactId>
    <version>0.9.1</version>
  </dependency>
  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.12</artifactId>
    <version>3.5.4</version>
  </dependency>
</dependencies>

Spark Session Configuration

package com.redis.examples;

import org.apache.spark.sql.SparkSession;

public class RedisSparkExample {

  public static void main(String[] args) throws Exception {
    SparkSession spark = SparkSession.builder()
      .master("local")
      .appName("RedisSparkExample")
      .config("spark.redis.read.connection.uri", "redis://localhost:6379")
      .config("spark.redis.write.connection.uri", "redis://localhost:6379")
      .getOrCreate();
  }
}

For Redis Connector for Spark configuration details, see the Configuration section.

Python

PySpark

This guide describes how to use PySpark with the Redis Connector for Spark but this works with self-contained Python applications as well.

When starting pyspark you must use one of the following options to add the package to the classpath:

--packages com.redis:redis-spark-connector:0.9.1

downloads the Redis Connector for Spark package using the given Maven coordinates or

--jars path/to/redis-spark-connector-0.9.1.jar

adds the downloaded Redis Connector for Spark jar to the classpath

You can specify --conf option(s) to configure the connector.

PySpark example
pyspark --conf "spark.redis.read.connection.uri=redis://localhost:6379" \
        --conf "spark.redis.write.connection.uri=redis://localhost:6379" \
        --packages com.redis:redis-spark-connector:0.9.1

Python Application

Create a SparkSession object using the same configuration options as before:

from pyspark.sql import SparkSession

spark_session = SparkSession
       .builder
       .appName("myApp")
       .config("spark.redis.read.connection.uri", "redis://localhost:6379")
       .config("spark.redis.write.connection.uri", "redis://localhost:6379")
       .getOrCreate()

Scala

Spark Shell

When starting the Spark shell you must use one of the following options to add the package to the classpath:

--packages com.redis:redis-spark-connector:0.9.1

downloads the Redis Connector for Spark package using the given Maven coordinates or

--jars path/to/redis-spark-connector-0.9.1.jar

adds the downloaded Redis Connector for Spark jar to the classpath

You can specify --conf option(s) to configure the connector.

Spark shell example
spark-shell --conf "spark.redis.read.connection.uri=redis://localhost:6379" \
            --conf "spark.redis.write.connection.uri=redis://localhost:6379" \
            --packages com.redis:redis-spark-connector:0.9.1

Scala Application

Dependency Management

Provide the Spark SQL and Redis Connector for Spark dependencies to your dependency management tool.

SBT example
scalaVersion := "2.12",
libraryDependencies ++= Seq(
  "com.redis" %% "redis-spark-connector" % "0.9.1",
  "org.apache.spark" %% "spark-sql" % "3.5.4"
)

Spark Session Configuration

package com.redis
object RedisSparkExample {
  def main(args: Array[String]): Unit = {
    import org.apache.spark.sql.SparkSession
    val sparkSession = SparkSession.builder()
      .master("local")
      .appName("RedisSparkExample")
      .config("spark.redis.read.connection.uri", "redis://localhost:6379")
      .config("spark.redis.write.connection.uri", "redis://localhost:6379")
      .getOrCreate()
  }
}

Databricks

For Databricks-specific setup instructions, see the Databricks Integration guide.