spark rdd转dataframe并显示
作者:高景洋 日期:2020-11-13 15:14:40 浏览次数:2027
spark如何将rdd转换成dataframe?
***注意点***:RDD中的每条数据,一定要结构统一。不然会报以下错误:
ValueError: Length of object (3) does not match with length of fields (4)
注意场景:我们在hbase里的数据结构不统一,如有些数据有 JobHistory 列,但是有的没有。所以,当我们把数据从Hbase读出来后,进行 toDF 操作,报错。
下边为正常的Rdd转dataframe 示例:
from pyspark import SparkContext,SparkConf
from pyspark.sql.session import SparkSession
from pyspark.sql.types import StructField, StructType, StringType
if __name__ == '__main__': conf = SparkConf() sc = SparkContext(conf=conf) data = [('Alex','male',3,10),('Nancy','female',6,10),('Jack','male',9,None)] rdd = sc.parallelize(data) schema = StructType([
# true代表不为空
StructField("name", StringType(), True),
StructField("gender", StringType(), True),
StructField("num", StringType(), True),
StructField("price", StringType(), True) ]) spark = SparkSession.builder.master("local").appName("SparkOnHive").getOrCreate()#.enableHiveSupport()
df = spark.createDataFrame(rdd,schema=schema) df.show() spark.stop() sc.stop()
执行结果如下图:
本文永久性链接:
<a href="http://r4.com.cn/art161.aspx">spark rdd转dataframe并显示</a>
<a href="http://r4.com.cn/art161.aspx">spark rdd转dataframe并显示</a>
当前header:Host: r4.com.cn
X-Host1: r4.com.cn
X-Host2: r4.com.cn
X-Host3: 127.0.0.1:8080
X-Forwarded-For: 3.137.188.141
X-Real-Ip: 3.137.188.141
X-Domain: r4.com.cn
X-Request: GET /art161.aspx HTTP/1.1
X-Request-Uri: /art161.aspx
Connection: close
Accept: */*
User-Agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
Accept-Encoding: gzip, br, zstd, deflate