- 최신
- 최다 투표
- 가장 많은 댓글
Hi Kevin,
If you are using the visual editor, there's no out of the box transformation to order your data, so you must create yours. The simplest way is by creating a custom SQL Query transform.
Click on (+) to add a node and select "SQL Query", than just write the really simple query "SELECT * FROM .... ORDER BY <column>"
This generates the following script, which runs a Spark SQL query:
def sparkSqlQuery(glueContext, query, mapping, transformation_ctx) -> DynamicFrame: for alias, frame in mapping.items(): frame.toDF().createOrReplaceTempView(alias) result = spark.sql(query) return DynamicFrame.fromDF(result, glueContext, transformation_ctx) # Script generated for node SQL Query SqlQuery0 = """ select * from myDataSource order by <myDataSourceColumn> """ SQLQuery_node1692843137953 = sparkSqlQuery( glueContext, query=SqlQuery0, mapping={"myDataSource": ChangeSchema_node2}, transformation_ctx="SQLQuery_node1692843137953", )
Additionally if you are writing your own script you could convert your Dynamic Frame to a Spark Dataframe and then sort data using the spark api [1]:
sorted_df = myframe.toDF().orderBy(["mycolumn"])
Hope this helps you, if you have further questions please let me know.
Reference: [1] https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrame.orderBy.html
i = glueContext.create_dynamic_frame_from_options( connection_type="s3", connection_options={"paths":[input_loc], "recurse": True, "compressionType": "gzip", "groupFiles": "inPartition", "groupSize": "104857600"}, format="json", )
I plan on loading in the json via the following group settings ^
If I sort by a column on the dynamic data frame:
sorted_df = i.toDF().orderBy(["col"])
Then output it into parquet, will each parquet file be sorted by the column within the file? I would instead like the column to be sorted "across" the parquet files, if that makes sense.
Something like "z-ordering" ?
관련 콘텐츠
- AWS 공식업데이트됨 2년 전
- AWS 공식업데이트됨 2년 전
- AWS 공식업데이트됨 3년 전
Are you using the visual editor ? Or you have a script ?