By using AWS re:Post, you agree to the Terms of Use

Questions tagged with Amazon Simple Storage Service

Sort by most recent
  • 1
  • 12 / page

Browse through the questions and answers listed below or filter and sort to narrow down your results.

How to save a .html file to S3 that is created in a Sagemaker processing container

**Error message:** "FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/processing/output/profile_case.html'" **Background:** I am working in Sagemaker using python trying to profile a dataframe that is saved in a S3 bucket with pandas profiling. The data is very large so instead of spinning up a large EC2 instance, I am using a SKLearn processor. Everything runs fine but when the job finishes it does not save the pandas profile (a .html file) in a S3 bucket or back in the instance Sagemaker is running in. When I try to export the .html file that is created from the pandas profile, I keep getting errors saying that the file cannot be found. Does anyone know of a way to export the .html file out of the temporary 24xl instance that the SKLearn processor is running in to S3? Below is the exact code I am using: ``` import os import sys import subprocess def install(package): subprocess.check_call([sys.executable, "-q", "-m", "pip", "install", package]) install('awswrangler') install('tqdm') install('pandas') install('botocore==1.19.4') install('ruamel.yaml') install('pandas-profiling==2.13.0') import awswrangler as wr import pandas as pd import numpy as np import datetime as dt from dateutil.relativedelta import relativedelta from string import Template import gc import boto3 from pandas_profiling import ProfileReport client = boto3.client('s3') session = boto3.Session(region_name="eu-west-2") ``` ``` %%writefile casetableprofile.py import os import sys import subprocess def install(package): subprocess.check_call([sys.executable, "-q", "-m", "pip", "install", package]) install('awswrangler') install('tqdm') install('pandas') install('botocore') install('ruamel.yaml') install('pandas-profiling') import awswrangler as wr import pandas as pd import numpy as np import datetime as dt from dateutil.relativedelta import relativedelta from string import Template import gc import boto3 from pandas_profiling import ProfileReport client = boto3.client('s3') session = boto3.Session(region_name="eu-west-2") def run_profile(): query = """ SELECT * FROM "healthcloud-refined"."case" ; """ tableforprofile = wr.athena.read_sql_query(query, database="healthcloud-refined", boto3_session=session, ctas_approach=False, workgroup='DataScientists') print("read in the table queried above") print("got rid of missing and added a new index") profile_tblforprofile = ProfileReport(tableforprofile, title="Pandas Profiling Report", minimal=True) print("Generated carerequest profile") return profile_tblforprofile if __name__ == '__main__': profile_tblforprofile = run_profile() print("Generated outputs") output_path_tblforprofile = ('profile_case.html') print(output_path_tblforprofile) profile_tblforprofile.to_file(output_path_tblforprofile) #Below is the only part where I am getting errors import boto3 import os s3 = boto3.resource('s3') s3.meta.client.upload_file('/opt/ml/processing/output/profile_case.html', 'intl-euro-uk-datascientist-prod','Mark/healthclouddataprofiles/{}'.format(output_path_tblforprofile)) ``` ``` import sagemaker from sagemaker.processing import ProcessingInput, ProcessingOutput session = boto3.Session(region_name="eu-west-2") bucket = 'intl-euro-uk-datascientist-prod' prefix = 'Mark' sm_session = sagemaker.Session(boto_session=session, default_bucket=bucket) sm_session.upload_data(path='./casetableprofile.py', bucket=bucket, key_prefix=f'{prefix}/source') ``` ``` import boto3 #import sagemaker from sagemaker import get_execution_role from sagemaker.sklearn.processing import SKLearnProcessor region = boto3.session.Session().region_name S3_ROOT_PATH = "s3://{}/{}".format(bucket, prefix) role = get_execution_role() sklearn_processor = SKLearnProcessor(framework_version='0.20.0', role=role, sagemaker_session=sm_session, instance_type='ml.m5.24xlarge', instance_count=1) ``` ``` sklearn_processor.run(code='s3://{}/{}/source/casetableprofile.py'.format(bucket, prefix), inputs=[], outputs=[ProcessingOutput(output_name='output', source='/opt/ml/processing/output', destination='s3://intl-euro-uk-datascientist-prod/Mark/')]) ``` Thank you in advance!!!
1
answers
0
votes
23
views
asked a day ago

Allowing permission to Generate a policy based on CloudTrail events where the selected Trail logs events in an S3 bucket in another account

I have an AWS account (Account A) with CloudTrail enabled and logging management events to an S3 'logs' bucket in another, dedicated logs account (Account B, which I also own). The logging part works fine, but I'm now trying (and failing) to use the 'Generate policy based on CloudTrail events' tool in the IAM console (under the Users > Permissions tab) in Account A. This is supposed to read the CloudTrail logs for a given user/region/no. of days, identify all of the actions the user performed, then generate a sample IAM security policy to allow only those actions, which is great for setting up least privilege policies etc. When I first ran the generator, it created a new service role to assume in the same account (Account A): AccessAnalyzerMonitorServiceRole_ABCDEFGHI When I selected the CloudTrail trail to analyse, it (correctly) identified that the trail logs are stored in an S3 bucket in another account, and displayed this warning messsage: > Important: Verify cross-account access is configured for the selected trail The selected trail logs events in an S3 bucket in another account. The role you choose or create must have read access to the bucket in that account to generate a policy. Learn more. Attempting to run the generator at this stage fails after a short amount of time, and if you hover over the 'Failed' status in the console you see the message: > Incorrect permissions assigned to access CloudTrail S3 bucket. Please fix before trying again. Makes sense, but actually giving read access to the S3 bucket to the automatically generated AccessAnalyzerMonitorServiceRole_ABCDEFGHI is where I'm now stuck! I'm relatively new to AWS so I might have done something dumb or be missing something obvious, but I'm trying to give the automatically generated role in Account A permission to the S3 bucket by adding to the 'Bucket Policy' attached to the S3 logs bucket in our Account B. I've added the below extract to the existing bucket policy (which is just the standard policy for a CloudTrail logs bucket, extended to allow CloudTrail in Account A to write logs to it as well), but my attempts to run the policy generator still fail with the same error message. ``` { "Sid": "IAMPolicyGeneratorRead", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::1234567890:role/service-role/AccessAnalyzerMonitorServiceRole_ABCDEFGHI" }, "Action": [ "s3:GetObject", "s3:GetObjectVersion", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::aws-cloudtrail-logs-ABCDEFGHI", "arn:aws:s3:::aws-cloudtrail-logs-ABCDEFGHI/*" ] } ``` Any suggestions how I can get this working?
1
answers
0
votes
29
views
asked 2 days ago
0
answers
0
votes
19
views
asked 5 days ago
  • 1
  • 12 / page