HDFS application to S3 using S3a connector
NickName:Subash Kunjupillai Ask DateTime:2021-10-07T00:48:30

I'm trying to understand the capability of S3a connector here for the use case where I have to run my current HDFS based application over a S3 storage without much change to the application.

On quick glance over S3a document (https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html), I understand we can modify the filesystem connection URI to point to s3a rather than hdfs and configuring the s3 endpoint and credentials should be sufficient enough to make my current application to work on top of S3, is my understanding right?


I'm facing the below error while running my application, not sure where I'm going wrong with the configuration

Connection Handler:

public static FileSystem getConnection() throws IOException {
        if (hdfsConnInstance == null) {
            config = new Configuration();
//          String Uri = "hdfs://" + cluster + "/";
            String Uri = "s3a://tmpBkt/";
            config.set("fs.defaultFS", Uri);
            config.set("fs.s3a.access.key", "EXAMPLE");
            config.set("fs.s3a.secret.key", "EXAMPLEKEY");
            config.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem");
            config.set("fs.s3a.endpoint", "");
            fs = FileSystem.get(URI.create(Uri), config);
        return fs;

Error :

java.lang.UnsupportedOperationException: Not implemented by the DistributedFileSystem FileSystem implementation
        at org.apache.hadoop.fs.FileSystem.getScheme(FileSystem.java:217)
        at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2624)
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2634)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
        at com.subash.hdfshandler.HdfsConnectionHandler.getConnection(HdfsConnectionHandler.java:29)
        at com.subash.datagenerator.DataHandler.run(DataHandler.java:27)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

