从 spark 获取 表格内容的方法 🦄

1
2
3
4
5
6
7
8
9
10
11
12
13
1 from pyspark.sql import *
2 import pyspark
3 if __name__ == '__main__':
4 spark = SparkSession.builder.master("yarn").appName("pyspark location process").enableHiveSupport().getOrCreate()
5 sc = spark.sparkContext
6 # spark.sql('show databases').show()
7 spark.sql('use annals').show()
8 # spark.sql('describe gps2').show()
9 spark.sql('select * from gps2 limit 1').show()
10 sql_df = spark.sql('select uid, lat, lgt, app_adjust_time from gps2 limit 5')
11 #sql_df.show()
12 print(type(sql_df))
13 sql_df.write.save("data/GrMWKfDj9eIjsRuh.parquet")

这样我们就在hdfs上得到了一份parquet格式的文件。

从hdfs 复制到跳板机
hadoop fs -get GrMWKfDj9eIjsRuh.parquet .
再scp 之类的 复制到本地即可。🐶🐒