环境要求:
- 虚拟机上hadoop集群hdfs开启
- 虚拟机配置hive,且hive配置metastore到mysql
- windows中配置hadoop环境,且IDEA中sparksql内部可运行
- 关闭虚拟机防火墙
许多在线帖子的操作步骤过于繁琐,现总结如下:
1.向pom.xml中导入依赖(mysql驱动、hive依赖,spark-on-hive依赖)
<dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>5.1.27</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_2.12</artifactId> <version>3.0.0</version> </dependency> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>1.2.1</version> </dependency>
2.将虚拟机hive/conf目录下hive-site.xml 将文件复制到项目中 resources (根据自己的目录)mysql情况调整url、用户名及密码)
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hive.metastore.schema.verification</name> <value>false</value> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> <description>password to use against metastore database</description> </property> </configuration>
3.idea项目target/classes目录中hive-site.xml是否自动复制,如果没有,需要放置,否则spark只能本地运行
4.开启Hive支持,正在创建SparkSession时,添加enableHiveSupport()即可
//创建 SparkSession val spark: SparkSession = SparkSession .builder() .enableHiveSupport() .master("local[*]") .appName("sql") .getOrCreate()