My job succeeds with no error but when I query the table, its empty
The CSV is a single column with no header
The table has a single column, record_id
SELECT TOP 1000 [record_id]
FROM
[h3-glue-db].[dbo].[h3_load]
Here's the job script
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
args = getResolvedOptions(sys.argv, ["JOB_NAME"])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)
# Script generated for node S3 bucket
S3bucket_node1 = glueContext.create_dynamic_frame.from_options(
format_options={
"quoteChar": '"',
"withHeader": False,
"separator": ",",
"optimizePerformance": False,
},
connection_type="s3",
format="csv",
connection_options={"paths": ["s3://h3-public/testfile2.csv"]},
transformation_ctx="S3bucket_node1",
)
# Script generated for node ApplyMapping
ApplyMapping_node2 = ApplyMapping.apply(
frame=S3bucket_node1, mappings=[("col0", "string", "record_id", "string")], transformation_ctx="ApplyMapping_node2"
)
# Script generated for node SQL Server table
SQLServertable_node3 = glueContext.write_dynamic_frame.from_catalog(
frame=ApplyMapping_node2,
database="h3-sql-database",
table_name="h3_glue_db_dbo_h3_load",
transformation_ctx="SQLServertable_node3",
)
job.commit()
Have you verified if the S3 endpoint to the file is correct?
Have you done a ApplyMapping_node2.show() to make sure it has data loaded and the mapping was correctly applied?
vtjean Yes the endpoint is correct, I verified
Gonzalo Herreros ApplyMapping_node2.show() does not display anything, does that mean it is not mapping? I put the statement after the 23/05/22 16:01:18 WARN main: End - .show() below
23/05/22 16:01:23 INFO DAGScheduler: Job 0 finished: save at JDBCUtils.scala:897, took 0.020324 s 23/05/22 16:01:19 INFO GlueContext: The DataSink in action for the given format/connectionType (sqlserver) is com.amazonaws.services.glue.sinks.SQLServerDataSink 23/05/22 16:01:19 INFO GlueContext: Glue secret manager integration: secretId is not provided. 23/05/22 16:01:19 INFO GlueContext: Using location: h3-glue-db.dbo.h3_load 23/05/22 16:01:19 INFO GlueContext: getCatalogSink: catalogId: null, nameSpace: h3-sql-database, tableName: h3_glue_db_dbo_h3_load, isRegisteredWithLF: false 23/05/22 16:01:18 WARN main: End - .show() 23/05/22 16:01:17 WARN main: Start - .show() 23/05/22 16:01:16 INFO GlueContext: The DataSource in action : com.amazonaws.services.glue.HadoopDataSource 23/05/22 16:01:15 INFO GlueContext: Glue secret manager integration: secretId is not provided. 23/05/22 16:01:14 INFO GlueContext: GlueMetrics configured and enabled 23/05/22 16:01:12 INFO Utils: Successfully started service 'sparkDriver' on port 44775.