学习 Apache Cassandra(四):Spark SQL 读写 Cassandra

Spark 版本:2.2

Cassandra 版本:3.0.15

依赖

添加 Datastax 公司提供的 spark-cassandra-connector 依赖

编辑 pom.xml 文件:

<dependency>  
    <groupId>com.datastax.spark</groupId>
    <artifactId>spark-cassandra-connector_2.11</artifactId>
    <version>2.0.6</version>
</dependency>

配置

spark.setCassandraConf("Cassandra Cluster", CassandraConnectorConf.ConnectionHostParam.option("cassandra-1") ++ CassandraConnectorConf.ConnectionPortParam.option(9042))  

配置 Cassandra 集群,集群名与 conf/cassandra.yaml 中配置的 CLUSTER_NAME 参数保持一致,端口默认为 9042

读写

读数据:

spark  
  .read
  .format("org.apache.spark.sql.cassandra")
  .options(Map(
    "table" -> "trans",
    "keyspace" -> "dyingbleed"
  )).load()

写数据:

df.write  
  .format("org.apache.spark.sql.cassandra")
  .options(Map(
    "table" -> "trans",
    "keyspace" -> "dyingbleed"
  )).save()

参考:https://github.com/datastax/spark-cassandra-connector/blob/master/doc/14_data_frames.md