How to upload data that does not fit in memoy to AWS S3 using HTTP request

gilneaas · January 14, 2022, 9:02am

I need to upload large data files (around 200MB) from my STM32L485 board to AWS S3 storage. I was planning to use direct S3 upload via an HTTP request as there is an example available . However, my data cannot be loaded in RAM (only 128KB) so I was thinking of sending it in chunks. But, I have seen that the HTTP library used in the examples, coreHTTP from aws c sdk, does not support streaming. Because of this, I’m not sure that uploading my data via HTTP is the best way to go. Is there a way to this using HTTP direct upload ?

I am new to these kinds of challenges so any advice on how to solve this problem will be much appreciated !

Thank in advance !

archigup · January 14, 2022, 7:23pm

Hi! You can send data in chunks to S3 using multipart uploads. Documentation for multipart uploads is available at: Uploading and copying objects using multipart upload - Amazon Simple Storage Service. You are able to use S3 APIs over either HTTP or MQTT.

gilneaas · January 17, 2022, 10:00am

Thanks for you answer @archigup the problem is that the minimum chunk size is 5MB. However, I have come accross this post Building microcontroller-based IoT applications using HTTPS client in Amazon FreeRTOS | The Internet of Things on AWS – Official Blog. It this mention “You can transfer files in pieces and specify the size of each payload piece”. However the examples that I could find that uses this mechanism are always for download and never upload. I’m not sure to construct my PUT requests to be accepted by aws and be seen as one single file.

archigup · January 18, 2022, 8:02pm

Hmm, you might be able to use HTTP chunked requests. There is documentation here for using HTTP chunking with SigV4 authentication: Signature Calculations for the Authorization Header: Transferring Payload in Multiple Chunks (Chunked Upload) (AWS Signature Version 4) - Amazon Simple Storage Service.

We have examples for coreHTTP here, but they unfortunately only have upload in one piece.

gilneaas · January 19, 2022, 8:37am

Thanks @archigup. Would that process mean that for each chunk of data I need to do one request?
For example, if I have 1000 chunks to send will I need to do 1000 HTTP requests ?

archigup · January 19, 2022, 4:03pm

That is correct.
If the overhead of HTTP requests is a concern, you could also consider MQTT. If the constraints of the existing solutions don’t work for you, you could upload chunks to separate files and have a Lambda combine them.

Mhroczny · September 29, 2022, 11:50am

Hello @archigup can you correct me if im wrong? As far as i understand you said that when we want send 1000 chunks we need only 1000 request. But what about creating multipe upoad? Is sending data in chunks using AWS Signature Version 4 is somthing diffrent then that? Why in example of sending data in chunks using AWS SigV4 we dont have ?uploads after key value? What value should have x-amz-content-sha256 when we creating first chunk for getting UploadID?

kstribrn · September 30, 2022, 5:27pm

As far as i understand you said that when we want send 1000 chunks we need only 1000 request.

For full accuracy - it’ll be nearly 1k requests. There are other requests which are needed. The request flow is CreateMultipartUpload (1 request), UploadPart (1 request * 1000 chunks), and a CompleteMultipartUpload (1 request).

Is sending data in chunks using AWS Signature Version 4 is somthing diffrent then that?

I’m not sure I understand this question. I suppose you could send data in chunks to individual files but that doesn’t make as much sense as using a multipart upload. You’d end up implementing a lot of the functionality of the multipart upload on your end which isn’t worth it - so just use the multipart upload API.

Why in example of sending data in chunks using AWS SigV4 we dont have ?uploads after key value?

Which example are you referencing?

What value should have x-amz-content-sha256 when we creating first chunk for getting UploadID ?

The x-amzn-content-sha256 value can be set according to this documentation

Mhroczny · October 7, 2022, 7:42am

Hello @kstribrn. I spend some time with understanding this. I tried to reproduce example from there in this way: sending two request with 1024 bytes in one data chunk (+88 bytes of meta data) and one request with 0 bytes of payload - only 86 bytes of metadata. As I discovered SigV4 library doesen’t include chunk support. I changed SigV4_GenerateHTTPAuthorization function for optional calculating chunked string to sign. I also use in some way coreHTTP download example for getting temporary credentials. My request are like:

PUT /objecttoput.txt HTTP/1.1
User-Agent: [some agent]
Host: "myS3".s3.us-east-1.amazonaws.com
Connection: keep-alive
x-amz-date: 20221007T072721Z
x-amz-security-token: [from getting temporary credential]
x-amz-content-sha256: STREAMING-AWS4-HMAC-SHA256-PAYLOAD
x-amz-storage-class: REDUCED_REDUNDANCY
x-amz-decoded-content-length: 2048 (two chunks of 1024 data)
Content-Length: 2310 (2048 of data +  2 x 88 bytes metadata chunks + 86 last metadata chunk)
Authorization: AWS4-HMAC-SHA256 Credential=...

for chunk 1 and 2 got payload like this:

400;chunk-signature="chunk signature"\r\n + <1024 bytes> \r\n

last chunk only metadata

0;chunk-signature="chunk signature"\r\n\ + \r\n

Is everything fine? Im getting temporary credentials (HTTP/1.1 200 OK) but when im sending chunks reqest im getting only HTTP/1.1 400 Bad Request without any response body. Thank you for your help.

aggarg · October 10, 2022, 5:01am

Server side logging may provide more information that may help in finding the cause of 400 error - Logging options for Amazon S3 - Amazon Simple Storage Service

Mhroczny · October 11, 2022, 12:03pm

Hello @aggarg i logged bucket using server access logging in its prosperties. Then tested uploading 1024 bytes in one upload and got : REST.PUT.OBJECT test.txt "PUT /test.txt HTTP/1.1" 200 - - 1024 43 15 "-" "my-platform-name" but when i send in chunks and got HTTP/1.1 400 Bad Request i wont get any logs from that. I waited almost 4h after request and no log shows after that successful request.
The same result is when im using CloudTrail as logger. Only positive requests are logged on the bucket.

aggarg · October 11, 2022, 3:15pm

Thank you for doing that. So, there is probably an issue with the generated HTTP request, which is possible as you are trying multi-part upload for the first time and also changing/updating the signature calculation code. We can generate the same HTTP request using python boto3 client and compare the generated HTTP request with the one generated with your code. Would you please do that and share your findings.

Mhroczny · October 12, 2022, 9:42am

Hello @aggarg , i tried upload_file with boto3 and configured multipart_threshold to send data in chunks like in here. Then I saw in logs that this generate 3 requests: CreateMultipartUpload, UploadPart and CompleteMultipartUpload. Im little confused after that because those requests are different then described in this Transferring Payload in Multiple Chunks. In example from Transferring Payload in Multiple Chunks it uses PutObject as API action, it doesn’t have any partNumber or uploadID parameters like UploadPart API action.

aggarg · October 12, 2022, 10:09am

I see that multipart upload is different than what you are trying to do. The page that you linked provides an example calculation. Can you try this example and verify that your code calculates the same?

Mhroczny · October 12, 2022, 12:38pm

Hello again @aggarg. Thank you for suggestion. I tried to hardcode example parameters and i found that i have small mistake - forgot to add chunk support for assignDefaultArguments function because it is passing default AWS4-HMAC-SHA256 as algorithm. I changed this that only for seed signature is using that but for rest string to sign is using AWS4-HMAC-SHA256-PAYLOAD after that change im getting the same chunk signatures as in example code. With this changes i upgrade my uploading in chunks and it is still getting HTTP/1.1 400 Bad Request. I dont think that it could have in common with those signatures because then i should get error like forbidden access because Authorization header payload is bad but maybe im wrong.

aggarg · October 12, 2022, 2:55pm

Can you paste the complete generated HTTP request so that I can ask the people more familiar with S3?

Mhroczny · October 13, 2022, 5:59am

Hello @aggarg, I used same request headers like there or you want all raw data?

aggarg · October 13, 2022, 5:11pm

Once you have serialized your HTTP request in the buffer, can you print the content of that buffer may be in the hex format?

sarfaraz_k · December 21, 2023, 6:38am

Given the limitations of your STM32L485 board with limited RAM (128KB) and the fact that the coreHTTP library from the AWS C SDK does not support streaming, you can still achieve the upload of large data files to AWS S3 via HTTP in chunks. One recommended approach is to use the AWS S3 multipart upload feature. This feature allows you to split your large data file into smaller parts and upload them in parallel, which can help overcome the RAM limitation.

Here’s a high-level overview of the steps:

Divide your large data file into smaller chunks, ideally in the range of 5MB to 100MB per part.
Use the AWS SDK for your STM32L485 board (if available) or implement AWS Signature Version 4 signing manually to create valid S3 requests.
Initiate a multipart upload request to AWS S3, which will provide you with a unique upload ID.
Upload each chunk of your data using separate HTTP requests, specifying the part number and upload ID. You can do this in a loop.
After uploading all parts, complete the multipart upload, which will assemble them into the final object in your S3 bucket.

This approach allows you to work within the constraints of your board’s memory while still achieving the desired upload. Be sure to refer to the AWS S3 documentation for specific API details and examples related to multipart uploads for your preferred programming language and SDK.