Redshift Spectrum: no way to glance at tables with nested data

0

Hi,
I'm enjoying working with Redshift Spectrum, but I find it difficult to quickly scan a table with nested fields. By that I mean something along the lines of "select * from table limit 100"

This is impossible if the table has nested fields, because Spectrum doesn't allow "*" queries on these.

"Query 1 ERROR: ERROR: Nested tables do not support '*' in the SELECT clause."

But there are few alternatives for me - say I have dozens of columns in my table, do I need to enumerate them while somehow deducing which ones are nested and thus to be avoided?

There is a related issue that I can't do "select nested_field from table" either, presumably because Spectrum can't serialize the field in any way.

These two issues require me to know the schema of all my tables before writing my queries, which is quite user unfriendly. It would be really nice if Spectrum supported displaying string representations of nested fields, if only for these simple selects, they are crucial for exploratory work. (I also use Presto, Hive, and Spark, neither of those has a problem with this.)

Thanks!

ondrejj
asked 5 years ago2426 views
3 Answers
0

Hi ondrejj,

If you're sure Presto readily allows you to query this external data in an exploratory way, then I respectfully suggest that you use Athena, that uses Presto as it's query engine, to do your exploratory SQL and Redshift Spectrum for actual analysis.

Assuming you are using the Glue data catalog for your external table catalog then Athena and Redshift can directly share the same table definitions. If you've already done the DDL in Redshift Spectrum then should be able to use it in Athena as is with little to no setup, nothing to provision or spin up, and the only cost would be the Athena scan costs.

I hope this helps,
-Kurt

klarson
answered 5 years ago
0

We'd love to use Athena, but we're not using Glue and we're not currently planning on doing so.

We also like to query Redshift at the same time as S3 (which both Spectrum and Presto allow), so that's another deal breaker.

Last but not least, we don't want to use multiple tools (like Athena + Spectrum), rather one that supports all our workloads - currently Presto leads the way, but we like Spectrum's simplicity in terms of ops.

ondrejj
answered 5 years ago
0

Thank you for for this feature suggestion. We have definitely heard requests for this feature and it will be considered for our roadmap.

We do not comment on the timing of new features until they are announced but new feature releases are noted in our regular maintenance announcements at the top of this forum and on our What's New page. https://aws.amazon.com/redshift/whats-new/

answered 5 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions