How do you combine a DynamicFrame collection after relationalise and then select fields for a new dynamic frame?

0

It is not at present possible to modify placeholder at all, including deletion. Deleting them is at least in principle possible as it doesn’t actually modify the object, but since the placeholders cannot currently be selected, they also cannot be deleted.

已提问 3 个月前191 查看次数
1 回答
0

Using relationationalize to flatten and then join back sounds like a big overkill (also you are assuming the the joins know which columns to use)
I would just flatten using the DataFrame schema, which is that the Flatten visual transform does. e.g.:

def flatten(df, maxLevels=0, separator="."):
    cols_path = []
    # Receives a list with the nested names in order and returns a Spark escaped column selector
    col_path_select = lambda paths: '.'.join([escape_name(path) for path in paths])

    def add_fields(fields, level=1, prefix=()):
        for field in fields:
            # If it's a Struct, and we are supposed to flatten it based on maxLevel config
            if (type(field.dataType) == StructType) and ((level <= maxLevels) or (maxLevels == 0)):
                add_fields(field.dataType.fields, level + 1, prefix + (field.name,))
            else:
                # Add the column path to the list, each element of the path list represents on leve
                cols_path.append(list(prefix) + [field.name])
    add_fields(self.schema)
    # Enforce nested fields to use the full name with the alias, otherwise it will just use the last name
    col_list = [self[col_path_select(cols_path)].alias(separator.join(cols_path)) for cols_path in cols_path]
    return self.select(col_list)
profile pictureAWS
专家
已回答 3 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则