1 réponse
- Le plus récent
- Le plus de votes
- La plupart des commentaires
0
Using .read() would load the full dataset into the memory at once. Try avoiding that and using another method that reads line by line. Or even better, use the latest pandas to read directly from S3 location - http://pandas.pydata.org/pandas-docs/stable/user_guide/io.html - I can see V1.5 and V1.4 support S3 locations.
For example,
import pandas as pd
import boto3
# AWS credentials
import boto3
aws_id = 'xxxx'
aws_secret = 'xxxx'
Client = boto3.client(
's3',
aws_access_key_id=aws_id,
aws_secret_access_key=aws_secret
)
# To Pandas DataFrame
df_loan = pd.read_csv('s3://bucket-1/file1.txt')
If you are not using the latest versions of pandas, try using an alternative of read()
example:
# Read data from S3
result = Client.get_object(Bucket="bucket-1", Key = "file1.txt")
print(result)
for i,line in enumerate(result['Body'].iter_lines()):
line_decoded = line.decode('utf-8')
#Do your processing part here
répondu il y a 2 ans
Contenus pertinents
- demandé il y a 5 mois
- demandé il y a un an
- demandé il y a 7 mois
- demandé il y a 3 mois
- AWS OFFICIELA mis à jour il y a un an
- AWS OFFICIELA mis à jour il y a un an
- AWS OFFICIELA mis à jour il y a 3 ans
- AWS OFFICIELA mis à jour il y a 2 ans