Amazon S3 Integration
Description of Amazon S3 Integration Function
Through the Amazon S3 integration function, you can automatically upload the results of Web Scraper crawling tasks to a specified S3 bucket, facilitating data backup, sharing, or subsequent processing and analysis.
Integration Configuration:
Integration Function Name Customize a name for this integration task to facilitate subsequent management and identification. It is recommended to name it based on the purpose or crawling object, such as "Upload Product Review Results to S3".
Event Type Setting You can choose one of the following two methods to trigger data sending according to your needs:
Specify Task ID Suitable for sending results of known specific scraping tasks to S3. Ideal for handling results from multiple task IDs within one scraper. Separate multiple task IDs with commas. Up to 10 task IDs are supported.
Follow Task Automatically uploads all subsequent results from the scraper to S3. Configured once, it takes effect continuously unless manually disabled or deleted. Better suited for continuous scraping or periodic tasks requiring automated data archiving.
Amazon S3 Parameter Configuration Configure the following information to complete the data upload setup:
awsAccessKey
, AWS Access Key (Required)
The AWS access key ID used to authorize uploads. You can obtain it from the AWS Console -> IAM -> Users -> Create User/Select Existing User -> Security Credentials -> Access Keys. It functions like a username.
awsSecretKey
, AWS Secret Key (Required)
Your AWS secret access key used to authorize uploads. You can obtain this key from the AWS Console -> IAM -> Users -> Create User/Select Existing User -> Security Credentials -> Access Keys -> Create Access Key. After creating the access key, the secret access key is displayed only once. It functions like a password.
Viewing Transferred Files: If your integration task shows “Success” status, you can view the results in your Amazon S3 account. Or you can directly access them via the link: https://s3.us-east-2.amazonaws.com/downloaddirectory/your-target-path/filename
For example: if your target path is path/to
, and the file name is 123
, and the file format is json
,
then the access link will be:
https://s3.us-east-2.amazonaws.com/downloaddirectory/path/to/123.json
If you need further assistance, please contact us via email at [email protected].
Last updated
Was this helpful?