Generating Parquet files from Glue Data Catalog

0

I have a glue job that write to a Data Catalog. In the Data Catalog I originally set it up as CSV, and all works fine. Now I would like to try to use Parquet for the Data Catalog. I thought I would just have to re-create the table and select Parquet instead of CSV, so I did so like so:

CREATE EXTERNAL TABLE `gp550_load_database_beta.gp550_load_table_beta`(
  `vid` string,
  `altid` string,
  `vtype` string,
  `time` timestamp,
  `timegmt` timestamp,
  `value` float,
  `filename` string)
PARTITIONED BY (
  `year` int,
  `month` int,
  `day` int)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  's3://ds905-load-forecast/data_store_beta/'
TBLPROPERTIES (
  'classification'='parquet')

I left my glue job unchanged. I have it sending its output to the Data Catalog Table like so:

    additionalOptions = {"enableUpdateCatalog": True, "updateBehavior": "LOG"}
    additionalOptions["partitionKeys"] = ["year", "month", "day"]
    
    # Data Catalog WRITE
    DataCatalogtable_node2 = glueContext.write_dynamic_frame.from_catalog(
        frame = dynamicDF,
        database = db_name,
        table_name = tbl_name,
        additional_options=additionalOptions,
        transformation_ctx = "DataCatalogtable_node2",
    )

When I checked the files being created by the Data Catalog in s3://ds905-load-forecast/data_store_beta/, they look to just be CSV. What do I need to do to use Parquet? Can I just change the sink routine to use glueContext_write_dynamic_frame.from_options()?

bfeeny
gefragt vor 2 Jahren128 Aufrufe
Keine Antworten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen