How do you combine a DynamicFrame collection after relationalise and then select fields for a new dynamic frame?

0

It is not at present possible to modify placeholder at all, including deletion. Deleting them is at least in principle possible as it doesn’t actually modify the object, but since the placeholders cannot currently be selected, they also cannot be deleted.

1 réponse
0

Using relationationalize to flatten and then join back sounds like a big overkill (also you are assuming the the joins know which columns to use)
I would just flatten using the DataFrame schema, which is that the Flatten visual transform does. e.g.:

def flatten(df, maxLevels=0, separator="."):
    cols_path = []
    # Receives a list with the nested names in order and returns a Spark escaped column selector
    col_path_select = lambda paths: '.'.join([escape_name(path) for path in paths])

    def add_fields(fields, level=1, prefix=()):
        for field in fields:
            # If it's a Struct, and we are supposed to flatten it based on maxLevel config
            if (type(field.dataType) == StructType) and ((level <= maxLevels) or (maxLevels == 0)):
                add_fields(field.dataType.fields, level + 1, prefix + (field.name,))
            else:
                # Add the column path to the list, each element of the path list represents on leve
                cols_path.append(list(prefix) + [field.name])
    add_fields(self.schema)
    # Enforce nested fields to use the full name with the alias, otherwise it will just use the last name
    col_list = [self[col_path_select(cols_path)].alias(separator.join(cols_path)) for cols_path in cols_path]
    return self.select(col_list)
profile pictureAWS
EXPERT
répondu il y a 3 mois

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions