Can I use Amazon S3 for Hadoop storage instead of HDFS?

1 minute read

I want to configure Amazon EMR to use Amazon Simple Storage Service (Amazon S3) as the Apache Hadoop storage system instead of the Hadoop Distributed File System (HDFS).


You can't configure Amazon EMR to use Amazon S3 instead of HDFS for the Hadoop storage layer. HDFS and the EMR File System (EMRFS), which uses Amazon S3, are both compatible with Amazon EMR, but they're not interchangeable. HDFS is an implementation of the Hadoop FileSystem API, which models POSIX file system behavior. EMRFS is an object store, not a file system. For more information, see Hadoop documentation for Object Stores vs. Filesystems.

For an overview of the storage layers in Amazon EMR, see Overview of Amazon EMR architecture.

For recommendations about when to use each file system, see Work with storage and file systems.

Related information

EMR File System (EMRFS)

HDFS configuration

AWS OFFICIALUpdated 2 years ago