How do you combine a DynamicFrame collection after relationalise and then select fields for a new dynamic frame?


It is not at present possible to modify placeholder at all, including deletion. Deleting them is at least in principle possible as it doesn’t actually modify the object, but since the placeholders cannot currently be selected, they also cannot be deleted.

asked 2 months ago150 views
1 Answer

Using relationationalize to flatten and then join back sounds like a big overkill (also you are assuming the the joins know which columns to use)
I would just flatten using the DataFrame schema, which is that the Flatten visual transform does. e.g.:

def flatten(df, maxLevels=0, separator="."):
    cols_path = []
    # Receives a list with the nested names in order and returns a Spark escaped column selector
    col_path_select = lambda paths: '.'.join([escape_name(path) for path in paths])

    def add_fields(fields, level=1, prefix=()):
        for field in fields:
            # If it's a Struct, and we are supposed to flatten it based on maxLevel config
            if (type(field.dataType) == StructType) and ((level <= maxLevels) or (maxLevels == 0)):
                add_fields(field.dataType.fields, level + 1, prefix + (,))
                # Add the column path to the list, each element of the path list represents on leve
                cols_path.append(list(prefix) + [])
    # Enforce nested fields to use the full name with the alias, otherwise it will just use the last name
    col_list = [self[col_path_select(cols_path)].alias(separator.join(cols_path)) for cols_path in cols_path]
profile pictureAWS
answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions