- Più recenti
- Maggior numero di voti
- Maggior numero di commenti
Hello, thanks for bringing this to us. The data has been updated and your requested changes should all be reflected. Please let us know in case you find anything else or see things not fully addressed in your opinion.
Regarding the 3rd question: What kind of data update rate would be most helpful for you? Than this can be taken into consideration for potential future updates.
Hi,
To address your query, I suggest reaching out to the team responsible for managing the dataset. You can log a ticket on this GitHub page below https://github.com/aws-solutions-library-samples/guidance-for-digital-assets-on-aws/tree/main/analytics
When it comes to handling "junk" data, excluding patterns in Athena can be a tedious task, particularly if your dataset is frequently updated. In such cases, using Redshift [1] may provide better results. Additionally, making changes to the dataset requires control over it. You can copy the data from the AWS blockchain data and apply transformation functions using tools like Spark, Hive, or other big data ETL tools. For automating the ingestion of new data, you can utilize S3 replication rules [2] to copy the data automatically to your bucket. Moreover, leveraging Intelligent Tiering can help reduce costs by removing old data that is no longer needed for querying.
I hope this information proves helpful. Should you require further assistance, please let me know.
[1] Redshift Spectrum Data Files: Amazon Redshift Documentation [2] Configuring Replication: Amazon S3 User Guide
Contenuto pertinente
- AWS UFFICIALEAggiornata un anno fa
- AWS UFFICIALEAggiornata 3 anni fa
- AWS UFFICIALEAggiornata 2 anni fa
Hello and thanks for commenting. But for logs table there are still these 2 days' folder with incorrect files in them: 2022-11-05 and 2022-11-01. Also, data for 2023-06-12 and 2023-06-11 is missing.
I wasn't too clear about second point. Here I wanted to mention that currently "token_transfers" table's "value" column has 'double' type. You can check that same table at BigQuery has 'string' type. And if you check blockchain explorer (for example) at this page you can see that transferred token values can be bigger than double type can handle without losing precision. Value can be up to 115792089237316195423570985008687907853269984665640564039457584007913129639935 that is max of uint256. If you store that value as string it will satisfy everyone using it. Those who want double, can safely cast to double in their queries. So, could you please reupload just "token_transfers" table's data with "value" as string?
Coming to upload rate it would be best to make data streamed from blockchain. BigQuery supports that. If not possible than it will also be good to have data refreshed every hour. Thank you in advance!!