How to move files from one S3 bucket directory to another directory in same bucket? Scala/Java
NickName:Dylan Sanderson Ask DateTime:2022-06-07T03:59:47

How to move files from one S3 bucket directory to another directory in same bucket? Scala/Java

I want to move all files under a directory in my s3 bucket to another directory within the same bucket, using scala.

Here is what I have:

def copyFromInputFilesToArchive(spark: SparkSession) : Unit = {
    val sourcePath = new Path("s3a://path-to-source-directory/")
    val destPath = new Path("s3a:/path-to-destination-directory/")
    val fs = sourcePath.getFileSystem(spark.sparkContext.hadoopConfiguration)
    fs.moveFromLocalFile(sourcePath,destPath)
  }

I get this error:

fs.copyFromLocalFile returns Wrong FS: s3a:// expected file:///

Copyright Notice:Content Author:「Dylan Sanderson」,Reproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/72522888/how-to-move-files-from-one-s3-bucket-directory-to-another-directory-in-same-buck

Answers
Salvatore 2022-06-22T21:48:51

Error explained\nThe error you are seeing is because the copyFromLocalFile method is really for moving files from a local filesystem to S3. You are trying to "move" files that are already both in S3.\nIt is important to note that directories don't really exist in Amazon S3 buckets - The folder/file hierarchy you see is really just key-value metadata attached to the file. All file objects are really sitting in the same big, single level container and that filename key is there to give the illusion of files/folders.\nTo "move" files in a bucket, what you really need to do is update the filename key with the new path which is really just editing object metadata.\nHow to do a "move" within a bucket with Scala\nTo accomplish this, you'd need to copy the original object, assign the new metadata to the copy, and then write it back to S3. In practice, you can copy it and save it to the same object which will overwrite the old version, which acts a lot like an update.\nTry something like this (from datahackr):\n/**\n * Copy object from a key to another in multiparts\n *\n * @param sourceS3Path S3 object key\n * @param targetS3Path S3 object key\n * @param fromBucketName bucket name\n * @param toBucketName bucket name\n */\n @throws(classOf[Exception])\n @throws(classOf[AmazonServiceException])\n def copyMultipart(sourceS3Path: String, targetS3Path: String, fromS3BucketName: String, toS3BucketName: String) {\n\n // Create a list of ETag objects. You retrieve ETags for each object part uploaded,\n // then, after each individual part has been uploaded, pass the list of ETags to\n // the request to complete the upload.\n var partETags = new ArrayList[PartETag]();\n\n // Initiate the multipart upload.\n val initRequest = new InitiateMultipartUploadRequest(toS3BucketName, targetS3Path);\n val initResponse = s3client.initiateMultipartUpload(initRequest);\n\n // Get the object size to track the end of the copy operation.\n var metadataResult = getS3ObjectMetadata(sourceS3Path, fromS3BucketName);\n var objectSize = metadataResult.getContentLength();\n\n // Copy the object using 50 MB parts.\n val partSize = (50 * 1024 * 1024) * 1L;\n var bytePosition = 0L;\n var partNum = 1;\n var copyResponses = new ArrayList[CopyPartResult]();\n while (bytePosition < objectSize) {\n // The last part might be smaller than partSize, so check to make sure\n // that lastByte isn't beyond the end of the object.\n val lastByte = Math.min(bytePosition + partSize - 1, objectSize - 1);\n\n // Copy this part.\n val copyRequest = new CopyPartRequest()\n .withSourceBucketName(fromS3BucketName)\n .withSourceKey(sourceS3Path)\n .withDestinationBucketName(toS3BucketName)\n .withDestinationKey(targetS3Path)\n .withUploadId(initResponse.getUploadId())\n .withFirstByte(bytePosition)\n .withLastByte(lastByte)\n .withPartNumber(partNum + 1);\n partNum += 1;\n copyResponses.add(s3client.copyPart(copyRequest));\n bytePosition += partSize;\n }\n\n // Complete the upload request to concatenate all uploaded parts and make the copied object available.\n val completeRequest = new CompleteMultipartUploadRequest(\n toS3BucketName,\n targetS3Path,\n initResponse.getUploadId(),\n getETags(copyResponses));\n s3client.completeMultipartUpload(completeRequest);\n logger.info("Multipart upload complete.");\n\n }\n\n // This is a helper function to construct a list of ETags.\n def getETags(responses: java.util.List[CopyPartResult]): ArrayList[PartETag] = {\n var etags = new ArrayList[PartETag]();\n val it = responses.iterator();\n while (it.hasNext()) {\n val response = it.next();\n etags.add(new PartETag(response.getPartNumber(), response.getETag()));\n }\n return etags;\n }\n\n def moveObject(sourceS3Path: String, targetS3Path: String, fromBucketName: String, toBucketName: String) {\n\n logger.info(s"Moving S3 frile from $sourceS3Path ==> $targetS3Path")\n // Get the object size to track the end of the copy operation.\n var metadataResult = getS3ObjectMetadata(sourceS3Path, fromBucketName);\n var objectSize = metadataResult.getContentLength();\n\n if (objectSize > ALLOWED_OBJECT_SIZE) {\n logger.info("Object size is greater than 1GB. Initiating multipart upload.");\n copyMultipart(sourceS3Path, targetS3Path, fromBucketName, toBucketName);\n } else {\n s3client.copyObject(fromBucketName, sourceS3Path, toBucketName, targetS3Path);\n }\n // Delete source object after successful copy\n s3client.deleteObject(fromS3BucketName, sourceS3Path);\n }\n",


More about “How to move files from one S3 bucket directory to another directory in same bucket? Scala/Java” related questions

How to move files from one S3 bucket directory to another directory in same bucket? Scala/Java

I want to move all files under a directory in my s3 bucket to another directory within the same bucket, using scala. Here is what I have: def copyFromInputFilesToArchive(spark: SparkSession) : Unit...

Show Detail

How to transfer a file/files from one S3 bucket/directory to another using AWS Data Pipeline

I would like to transfer a file (i.e copy it to a target directory and delete it from the source directory) from one S3 bucket directory to another using AWS data pipeline. I tried using the

Show Detail

How to isolate files in S3 bucket by directory?

I have an s3 bucket mybucket containing three files in the following directory structure a/b/c/d/some_file.txt a/b/d/d/some_file2.txt x/y/z/yet_another_file.txt I can list all of the files using ...

Show Detail

Copy one directory to another in same amazon s3 bucket

I am working with codeigniter. For Amazon I have used its library. I need to rename a directory in an Amazon S3 bucket. For that I used to copy one directory to another directory and the delete old

Show Detail

How to copy/move the downloaded files from S3 bucket to a different folder under the same bucket and not download load the latest file

I am using python 2.7.x, and Boto API 2.X to connect to AWS S3 bucket. I have a unique situation where I want to download files from S3 bucket that to from a specific directory/folder say myBucket/...

Show Detail

How to copy all files from one folder to another folder in the same S3 bucket

I am trying to copy all files from one folder to another folder in the s3 bucket. I see a lot of examples for moving from one bucket to another but not from one folder to another. from the examples I

Show Detail

Is there a way to clone a directory in an S3 bucket to another path within the same bucket?

I've searched and read quite a bit but can't find an answer to this question so I assume it's not possible. Can I clone (or copy) a path within an S3 bucket to another path and have it copy all sub-

Show Detail

Move files from one s3 bucket to another in AWS using AWS lambda

I am trying to move files older than a hour from one s3 bucket to another s3 bucket using python boto3 AWS lambda function with following cases: Both buckets can be in same account and different r...

Show Detail

AWS-s3 : How to copy files from s3 bucket to another s3 bucket based on filename

using the below code , Im able to list the files in s3 bucket. I would like to know how to copy/move the files from s3 one bucket (s3-dev) to another s3 bucket(s3-prod) based on file names. eg if a...

Show Detail

aws-sdk-go s3 move directory between bucket

I would like to know how can I move a directory inside the same bucket. I didn't had any problem to move one specific file inside the same bucket with CopyObject but not a directory This is the e...

Show Detail