Amazon Redshift Serverless and AWS Data Exchange Healthcare Demo
Faster is better!
Intro
AWS makes it possible to increase the velocity of gaining insights from data with many serverless technologies including Amazon Redshift Serverless and AWS Data Exchange. Below is a step-by-step guide on how to use these technologies to gain insights from third party data in the AWS Data Exchange.
Setup
AWS Data Exchange
The AWS Data Exchange bridges the gap between providers and subscribers exchanging data, helping customers lower costs, become more agile, and innovate faster.
Steps
- Navigate to the AWS Data Exchange
- Click on "Browse 3,500+ third party data sets"
- Search for "Ovation IBD"
- You will see two different results from Ovation.
- Click on the second result, "Ovation IBD Whole Genome (WGS+RNAseq) with Linked Clinical Data [Sample]". Note: The provided SQL works with the second listing which is a sample of 10 patients of data.
- Click on the "Continue to subscribe" link
- After a period of time, Ovation will complete the process and you will be subscribed to this product listing.
Amazon Redshift Serverless
Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. It is available both as a provisioned cluster or serverless which will be covered here.
Steps
- Log into the AWS console
- Navigate to Redshift
- Deploy Redshift Serverless by using the "Try Redshift Serverless free trial"
- It is recommended to use the default settings. Simply scroll to the bottom and click "Save configuration".
Once the Ovation product listing has been successfully subscribed to and the configuration of Redshift Serverless completes, you can proceed to the next step.
- Go to the Redshift Serverless dashboard and click the three lines in the top left.
- Click on datashares
- Click on the Subscriptions link
- Click on the "ovation_ibd" datashare
- Click on the "Create database from datashare" link
- Use the "default" namespace, database name of ibd_db_sample, and then click "Create". Note: the database name can be virtually anything you want but the SQL scripts provided are written for the ibd_db_sample name.
- Click on the "Query editor v2" link
- Paste the following queries and then click the "Run" button.
--PatientData
select patient_id, specimen_pathology_diagnosis_summary, disease_severity,
"% Aligned (STARUniquelyMappedPercent) - % Uniquely mapped reads"/100 as STAR_Uniquely_Mapped_Percent,
"% Dups (PicardPercentDuplication) - Mark Duplicates - Percent Duplication"::numeric(38,6) as Picard_Percent_Duplication ,
"% GC (FastQCTrimmedR1PercentGc) - Average % GC Content"/100::numeric as Fast_QC_Trimmed_R1_PercentGc,
"% GC (FastQCTrimmedR2PercentGc) - Average % GC Content"/100::numeric as Fast_QC_Trimmed_R2_PercentGc,
"M Aligned (QualiMapReadsAligned) - Reads Aligned (millions)"::int as Quali_Map_Reads_Aligned,
"M Aligned (STARUniquelyMapped) - Uniquely mapped reads (millions)"::int as STAR_Uniquely_Mapped,
"Total Reads (STARTotalReads) - Total Reads"::int as STAR_Total_Reads
from (
select patient_id, specimen_pathology_diagnosis_summary, disease_severity, metric, value
from (
select s.patient_id, s.specimen_pathology_diagnosis_summary, s.disease_severity,
q.metric, q.value
from ibd_db_sample.public.ovation_ibd_sample_specimen s
join ibd_db_sample.public.ovation_ibd_sample_sequencing_run r on s.specimen_id = r.specimen_id
join ibd_db_sample.public.ovation_ibd_sample_sequencing_qc q on r.sequencing_run_id = q.sequencing_run_id
where s.specimen_pathology_diagnosis_summary <> 'n/a'
) as sub
) pivot (
min(value) for metric in ('% Aligned (STARUniquelyMappedPercent) - % Uniquely mapped reads',
'% Dups (PicardPercentDuplication) - Mark Duplicates - Percent Duplication',
'% GC (FastQCTrimmedR1PercentGc) - Average % GC Content',
'% GC (FastQCTrimmedR2PercentGc) - Average % GC Content',
'M Aligned (QualiMapReadsAligned) - Reads Aligned (millions)',
'M Aligned (STARUniquelyMapped) - Uniquely mapped reads (millions)',
'Total Reads (STARTotalReads) - Total Reads')
);
--PatientDiagnosisCrosstab
select specimen_pathology_diagnosis_summary, coalesce(unspecified, 0) as unspecified,
coalesce(minimal, 0) as minimal, coalesce(mild, 0) as mild,
coalesce(moderate, 0) as moderate, coalesce("moderate-to-severe", 0) as moderate_to_severe, coalesce(severe, 0) as severe
from (
select s.specimen_pathology_diagnosis_summary, s.disease_severity, count(distinct s.patient_id) as total
from ibd_db_sample.public.ovation_ibd_sample_specimen s
join ibd_db_sample.public.ovation_ibd_sample_sequencing_run r on s.specimen_id = r.specimen_id
join ibd_db_sample.public.ovation_ibd_sample_sequencing_output o on r.sequencing_run_id = o.sequencing_run_id
group by s.specimen_pathology_diagnosis_summary, s.disease_severity
) as sub pivot (sum(sub.total) for disease_severity in
('Unspecified', 'Mild', 'Moderate', 'Severe', 'Moderate-to-Severe', 'Minimal'))
order by specimen_pathology_diagnosis_summary;
--Colitis Patients Drugs
with patients as
(select s.patient_id, s.specimen_pathology_diagnosis_summary, s.disease_severity
from ibd_db_sample.public.ovation_ibd_sample_specimen s
join ibd_db_sample.public.ovation_ibd_sample_sequencing_run r on s.specimen_id = r.specimen_id
join ibd_db_sample.public.ovation_ibd_sample_sequencing_output o on r.sequencing_run_id = o.sequencing_run_id
where s.specimen_pathology_diagnosis_summary <> 'n/a'
)
select p.patient_id, r.generic_name, p.specimen_pathology_diagnosis_summary, p.disease_severity, count(*) as total_claims
from ibd_db_sample.public.ovation_ibd_sample_rx_products r
join ibd_db_sample.public.ovation_ibd_sample_rx_claims c on r.ndc_product = c.ndc_product
join patients p on c.patient_id = p.patient_id
where generic_name <> 'Not Specified'
group by p.patient_id, r.generic_name, p.specimen_pathology_diagnosis_summary, p.disease_severity
order by 5 desc;
Summary
Amazon Redshift Serverless with AWS Data Exchange makes it very quick and easy to increase the velocity of gaining insights. Third party data like the above example from Ovation, is quick and easy to access with Redshift Serverless. You can combine third party data with your own to enhance it. You can also monetize your data by selling it on the AWS Data Exchange.
Be sure to watch this YouTube video that demonstrates these steps and also an Amazon QuickSight dashboard. QuickSight enables BI for everyone in the organization and is also built on a serverless architecture that enables consumption based pricing and auto-scale.
Relevant content
- Accepted Answerasked 2 years agolg...
- asked 2 years agolg...
- asked 2 years agolg...
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago