Skip to content

AWS

RDS

re:Invent

2024 - 【re:Invent 2024現場直擊】S3雲端儲存兩大新功能瞄準AI需求,Iceberg超大型資料表查詢能快3倍,還能自動產生Metadata | iThome - 【re:Invent 2024現場直擊】AWS執行長揭2款資料庫服務新進展,兼顧高可用、低延遲、無需管理基礎設施等優勢 | iThome

S3

Server-side encryption

Metadata

CLI

download object by prefix

```shell= aws s3api list-objects-v2 --bucket {bucket name} --prefix {prefix} > download.json jq '.Contents[].Key' download.json | awk -F '"' '{print $2}' > s3_object_keys

```bash=
#!/bin/bash

FILENAME="s3_object_keys"
BUCKET_NAME="bucket name"
PREFIX="prefix"

aws s3api list-objects-v2 --bucket ${BUCKET_NAME} --prefix ${PREFIX} > download.json
jq '.Contents[].Key' download.json | awk -F '"' '{print $2}' > ${FILENAME}

LINES=$(cat $FILENAME)

for s3_object_key in $LINES
do
    echo $s3_object_key
    local_file_name=$(echo $s3_object_key | awk -F '/' '{print $2}')  
    echo $local_file_name
    aws s3api get-object --bucket {bucket name} --key $s3_object_key $local_file_name
done

IAM

aws:MultiFactorAuthPresent

aws:MultiFactorAuthPresent is present principal uses temporary credentials to make the request Temporary credentials are used to authenticate - IAM roles, - federated users, - IAM users with temporary tokens from sts:GetSessionToken - users of the AWS Management Console

The aws:MultiFactorAuthPresent key is NOT present when an API or CLI command is called with long-term credentials - user access key pairs

...IfExists Condition Operators You do this to say "If the policy key is present in the context of the request, process the key as specified in the policy. If the key is not present, evaluate the condition element as true."

IAM:PassRole

EC2

Spot instance

You set the maximum price you are willing to pay as part of the launch configuration or launch template. If the Spot price is within your maximum price, whether your request is fulfilled depends on Spot Instance capacity. You pay only the Spot price for the Spot Instances that you launch.

vCPU

EBS

SSH

DNS

AMI

ELB

Q. 請問 Application Load Balancer (ALB) DNS name 的 IP 是否會改變? 是的,ELB 會更新 Load balancer 的 DNS 紀錄,所以當 ELB 新增資源時,每個新增的資源都有相對註冊在 DNS 裡的 IP

ELB 後面有一群 EC2 的集合,其中一個 EC2 如果 malfunction/unhealthy 的話 就會替換另一個正常運作的 EC2 上去,此時 IP 就會改變

Application Load Balancer 也有 Auto Scaling 的機制 假設: 一台 ALB 後面接 2 個 EC2,因此有2個 IP A, B流量增加時,這些新增長出的 EC2 也都會有相對應的IP 如 C, D但當流量變少 Scale in 時,不會有固定的模式先去刪除哪些 EC2有可能Scale in 時,先縮減 A, B 的 EC2,此時剩 C, D 的 EC2此時IP就會變更為 C, D 的 IP

Q: wildcard 是否包含 slash? 以文件中的例子為例,我在ALB 設定兩個 rule /img/ => forward to Target Group A /img//pics => forward to Target Group B 請問 rule 1 是否包含 rule 2,永遠不會有 /img 開頭的 request 到 Target Group B 呢?

是的, rule 2 的確有被包含在 rule 1 中。 至於是否會有 request 送至 Target Group B 內,需要取決於您 rule 的優先順序。 load balancer 會依據優先順序最低值的 rule 往高值 rule 執行,預設的 rule 則會最後執行。 若您的 /img//pics 優先順序較 /img/ 前面,您的 request 還是可以送到 Target Group B。

Q: 變更 ALB rule 時,是否會立即套用至後續的request 呢? 是否需等待一段時間後,才會套用至 ALB?

在變更 rule 時,並不會立即套用生效。 正如您所說,新的 rule 會需等待一段時間後,才會套用至 ALB。

Integration options - Using AWS Lambda with an Application Load Balancer - AWS APPLICATION LOAD BALANCER (ALB) AND ECS WITH FLASK APP

Troubleshoot - The load balancer generates an HTTP error - Access logs for your Application Load Balancer

ECS

Lambda

Overview - What is AWS Lambda - Lambda concepts - Managing Lambda reserved concurrency - :star: Understanding AWS Lambda scaling and throughput | AWS Compute Blog - Lambda function scaling - Working with Lambda function metrics - Runtime deprecation policy - Security - Lambda operator guide - Understanding the Lambda execution environment - Encrypting data in Lambda-based applications - Security in AWS Lambda - Using AWS Lambda with Amazon API Gateway - Handling errors with an API Gateway API - If the Lambda API rejects the invocation request, API Gateway returns a 500 error code. - If the function runs but returns an error, or returns a response in the wrong format, API Gateway returns a 502. - In both cases, the body of the response from API Gateway is {"message": "Internal server error"}. - Handle Lambda errors in API Gateway - Amazon API Gateway - Resolve HTTP 502 errors from API Gateway REST APIs with Lambda functions - SNS to Lambda or SNS to SQS to Lambda, what are the trade-offs? | theburningmonk.com

https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html https://aws.amazon.com/amazon-linux-ami/2018-03-packages/ https://www.openssl.org/news/openssl-1.0.2-notes.html Exploring the AWS Lambda Execution Environment

import json
import subprocess

def lambda_handler(event, context):
    # TODO implement
    openssl_cmd_subprocess = subprocess.Popen('openssl version', shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    output = openssl_cmd_subprocess.communicate()[0]
    print(output)
    openssl_cmd_subprocess = subprocess.Popen('touch /tmp/key.txt && echo 456 >> /tmp/key.txt', shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    openssl_cmd_subprocess = subprocess.Popen('touch /tmp/test.txt && echo 123 >> /tmp/test.txt', shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    openssl_cmd_subprocess = subprocess.Popen('openssl aes-256-cbc -k 456 -salt -in /tmp/test.txt -out /tmp/test.enc', shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    openssl_cmd_subprocess = subprocess.Popen('file /tmp/test.enc', shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    output = openssl_cmd_subprocess.communicate()[0]
    for file in output.splitlines():
        print(file)
    openssl_cmd_subprocess = subprocess.Popen('openssl aes-256-cbc -d -k 456 -in /tmp/test.enc -out /tmp/test.dec', shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    openssl_cmd_subprocess = subprocess.Popen('ls -la /tmp && md5sum /tmp/*', shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    output = openssl_cmd_subprocess.communicate()[0]
    for file in output.splitlines():
        print(file)
    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }
Function logs:
Function logs:
START RequestId: cd75f4ff-b209-44ac-bfc5-fcf3deee287e Version: $LATEST
b'OpenSSL 1.0.2k-fips  26 Jan 2017\n'
b"/tmp/test.enc: openssl enc'd data with salted password"
b'total 24'
b'drwx------  2 sbx_user1051  991 4096 Sep 21 10:00 .'
b'dr-xr-xr-x 21 root         root 4096 Aug 31 10:28 ..'
b'-rw-rw-r--  1 sbx_user1051  991    4 Sep 21 10:00 key.txt'
b'-rw-rw-r--  1 sbx_user1051  991    4 Sep 21 10:00 test.dec'
b'-rw-rw-r--  1 sbx_user1051  991   32 Sep 21 10:00 test.enc'
b'-rw-rw-r--  1 sbx_user1051  991    4 Sep 21 10:00 test.txt'
b'd2d362cdc6579390f1c0617d74a7913d  /tmp/key.txt'
b'ba1f2511fc30423bdbb183fe33f3dd0f  /tmp/test.dec'
b'189671449c7e95e7bf09942b654df82f  /tmp/test.enc'
b'ba1f2511fc30423bdbb183fe33f3dd0f  /tmp/test.txt'
END RequestId: cd75f4ff-b209-44ac-bfc5-fcf3deee287eREPORT RequestId: cd75f4ff-b209-44ac-bfc5-fcf3deee287e Duration: 588.25 ms Billed Duration: 600 ms Memory Size: 128 MB Max Memory Used: 48 MB Init Duration: 1.40 ms

Hit the 6MB Lambda payload limit? Here’s what you can do

AWS KMS, Boto3 and Python: Complete Guide with examples

Layer

Parameters

Extensions

Local test

Support

:::info I understand that you would like to know why is CloudWatch "ConcurrentExecutions" metric only 826 for all Lambda functions in us-east-1 region and you are still facing Throttle error?

To investigate this issue further, I discuss with internal Lambda expert, kindly refer to the following explanation:

Since Lambda service uses a counter-like mechanism to count the number of current execution environments. In addition, the CloudWatch ConcurrentExecutions metric is recorded by sampling and may cause some gaps due to time intervals. For example, even though the current sample value of ConcurrentExecutions is 826, due to too many invocations, the next moment ConcurrentExecution is likely to suddenly exceed the upper limit of 1000.

Later, when some Lambdas function execution is completed, the available execution environment is released. Therefore, before the time of the next sampling, ConcurrentExecution returns to normal again. This is why, we would advise our customer to observe that when ConcurrentExecution metrics and the upper limit value is extremely close, customer can consider raising the limit of "ConcurrentExecutions", which should help reduce Throttle errors. :::

VPC

API Gateway

Lambda integration

Custom HTTP Status Code

The routing of Lambda function errors to HTTP responses in API Gateway is achieved by pattern matching against this “errorMessage” field in the Lambda response. The Lambda function must exit with an error in order for the response pattern to be evaluated – it is not possible to “fake” an error response by simply returning an “errorMessage” field in a successful Lambda response.

Metrics

Usage plan

ACM

Supported Regions

  • Supported Regions

    Certificates in ACM are regional resources. To use a certificate with Elastic Load Balancing for the same fully qualified domain name (FQDN) or set of FQDNs in more than one AWS region, you must request or import a certificate for each region. For certificates provided by ACM, this means you must revalidate each domain name in the certificate for each region. You cannot copy a certificate between regions.

To use an ACM certificate with Amazon CloudFront, you must request or import the certificate in the US East (N. Virginia) region. ACM certificates in this region that are associated with a CloudFront distribution are distributed to all the geographic locations configured for that distribution.

System Manager

SSM也可以管理機房的機器和VM 必須在需要被SSM管理的機器或VM內安裝SSM agent Session Manager: 好處, 可以將inbound port關掉; Windows RDP也可以 Distributor: 裝軟體套件

以下是AWS Senior SA的補充資訊:

  1. Automation with rollback – 可以參考下面這個document,較複雜的流程還是需要搭配Lambda來完成:

AWS-PatchInstanceWithRollback: https://docs.aws.amazon.com/systems-manager-automation-runbooks/latest/userguide/automation-aws-patchinstancewithrollback.html

  1. AWS Health Events automation (EC2 Retired)

先設定AWS Health Event做為CloudWatch Events (EventBridge)來源: https://docs.aws.amazon.com/health/latest/ug/cloudwatch-events-health.html How can I receive notifications for scheduled events for my EC2 instance using CloudWatch Events? https://aws.amazon.com/tw/premiumsupport/knowledge-center/cloudwatch-notification-scheduled-events/

CloudWatch Events可以直接觸發SSM Automation: https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/SSM_Automation_as_Target.html

如果要透過SSM OpsCenter集中管理ops事件的話: https://docs.aws.amazon.com/systems-manager/latest/userguide/OpsCenter-automatically-create-OpsItems-2.html

Route 53

Unlike a CNAME record, you can create an alias record at the top node of a DNS namespace, also known as the zone apex. For example, if you register the DNS name example.com, the zone apex is example.com. You can't create a CNAME record for example.com, but you can create an alias record for example.com that routes traffic to www.example.com.

tag

Overview

  • Focus on Required and Conditionally Required Tags
  • Consider naming your tags using all lowercase, with hyphens separating words, and a prefix identifying the organization name or abbreviated name
  • In 2016, the number of tags per resource was increased to 50 (with a few exceptions, such as S3 objects)
  • it’s generally recommended to follow good data management practice by including only one data attribute per tag
  • Remediate Untagged Resources
  • Tag Editor is a feature of the AWS Management Console that allows you to search for resources using a variety of search criteria and add, modify, or delete tags in bulk.
  • The AWS Resource Tagging API allows you to perform these same functions programmatically.

Practice

  • purpose
  • key and value
  • product
  • component
  • application
  • owner
  • department
  • environment
  • version
  • required, conditionally required

examples: anycompany:cost-center anycompany:environment-type anycompany:application-id

SNS

SQS

deplicated messages

SQS message may be duplicated on some situlations - Resolve Duplicate Messages in Amazon SQS for the Same Amazon S3 Event - At-least-once delivery

Warm Greetings from AWS Premium Support. Thank you for contacting AWS Premium Support. 
This is Jennifer and I will be assisting you with your case today.

From the case note, I understand that you would like to confirm that the message ID of standard SQS Queue will be the same for the following two scenarios. Kindly refer to the following information:

### case1 ### Producer application sends a message, but the consumer application receive two duplication messages

As you may alredy know, there are some inherent characteristics of a Standard SQS Queue that allows for duplicative messaging. As per the note on this document [1]:

For Standard SQS Queues, the `Visibility Timeout` is not a guarantee against receiving a message more than once.

➜ As per this document [2], a Standard SQS Queue ensures \"at-least-once delivery\" which implies that it is possible for the same message to be delivered more than once.
➜ When messages are added to a Standard SQS Queue, a unique Message ID is allocated to each message. Amazon SQS returns the Message ID in the response of the \"SendMessage\" [3] API call.
➜ Amazon SQS stores copies of the messages on multiple servers for redundancy and high availability.
➜ On rare occasions, one of the servers that stores a copy of a message might be unavailable when you receive or delete a message.
➜ This can result in a duplicate message being received when the server becomes available again.

I would like to highlight that, duplicate messages (introduced by Amazon SQS as a result of the above mentioned point) will contain the SAME Message ID.


### case2 ### Put a single file to S3 bucket and then trigger duplicated SQS messages for the PutObject action

To investigate this issue, I have setup the configuration and performed the testing in my environment. Based on my test, when I upload the same object 3 times in a row, normally, the ` sequencer key` and `message ID` of the 3 responses are totally different. However, it is difficult to reproduce the phenomenon of sequencer key duplication, as it occurs in rare cases [4]. After delving into this issue, I am able to confirm from internal sources: \"For S3 event notifications, it is expected to see duplicates. However, they'd show up as DIFFERENT SQS messages id if they were generated by the events system\".

Furthermore, for case1, here is an example of the application logic that would need to be implemented in order to facilitate idempotency:

1. Extract the value of a unique attribute of the input event (such as, the Message ID).
2. Check if the attribute value already exists in a control database. Depending on the outcome, do the following:
➜ If a unique value exists, end the action without producing an error.
➜ If a unique value does not exist, proceed with the actions that you designed.
3. Thereafter, include a record of the attribute value in the control database.

I hope the above information helps.
Have a nice day :)

■ References:
============
[1] Amazon SQS Visibility Timeout: https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-visibility-timeout.html 
[2] At-least-once Delivery: https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/standard-queues.html#standard-queues-at-least-once-delivery 
[3] SendMessage: https://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_SendMessage.html 
[4] https://aws.amazon.com/tw/premiumsupport/knowledge-center/s3-duplicate-sqs-messages/ 

CloudWatch

Metrics

Metric data is kept for 15 months

Publishing Single Data Points Each metric is one of the following: - Standard resolution, with data having a one-minute granularity - Metrics produced by AWS services are standard resolution by default - High resolution, with data at a granularity of one second - :money_with_wings: Keep in mind that every PutMetricData call for a custom metric is charged, so calling PutMetricData more often on a high-resolution metric can lead to higher charges.

Namespace: 類似分類 Dimension: 類似上標籤做grouping SampleCount: record數量

Others

Alarms

Using Amazon CloudWatch Alarms Why did my CloudWatch alarm trigger when its metric doesn't have any breaching data points

  • Type
    • metric alarm
    • composite alarm
  • Alarm States
    • OK
    • ALARM
    • INSUFFICIENT_DATA

An alarm invokes actions only when the alarm changes state

The exception is for alarms with Auto Scaling actions. For Auto Scaling actions, the alarm continues to invoke the action once per minute that the alarm remains in the new state.

==Three settings== to enable CloudWatch to evaluate when to change the alarm state - Period: 檢查的間隔, 輸出 data point - the length of time to evaluate the metric or expression to create each individual data point for an alarm - If you choose one minute as the period, the alarm evaluates the metric once per minute. - each specific data point reported to CloudWatch falls under one of three categories - Not breaching (within the threshold) - Breaching (violating the threshold) - Missing - missing data points - Evaluation Periods: 最近的候選檢查點個數, 有點類似check window - the number of the most recent periods, or data points, to evaluate when determining alarm state - Datapoints to Alarm: 滿足的檢查點個數 - the number of data points within the Evaluation Periods that must be breaching to cause the alarm to go to the ALARM state

When you configure Evaluation Periods and Datapoints to Alarm as different values, you're setting an "M out of N" alarm. Datapoints to Alarm is ("M") and Evaluation Periods is ("N"). The evaluation interval is the number of data points multiplied by the period. For example, if you configure 4 out of 5 data points with a period of 1 minute, the evaluation interval is 5 minutes. If you configure 3 out of 3 data points with a period of 10 minutes, the evaluation interval is 30 minutes.

Logs

MetricFilter - Filter and pattern syntax

Insight

file_search_api log

file info done

fields @timestamp, @message
| filter @message like /File Info Status: 5/

fields @timestamp, @message
| filter @message like /File Info Status: 5/ or @message like /File Info Status: 6/ or @message like /File Info Status: 7/ or @message like /File Info Status: 8/ or @message like /File Info Status: 9/

Hit prefilter cache

fields @timestamp, @message
| filter @message like 'File with valid score'
| filter @message like '-1'

get uplod url

fields @timestamp, @message
| filter @message like /Response json string of new file case/

Undefined or unsupported file type

sum

fields @timestamp, @message, @logStream
| filter @message like 'Undefined or unsupported file type'
| filter @message like 'file_type'
| parse @message '"hash": "*"' as hash
| parse @message '"file_type": *,' as file_type
| parse @logStream '*/*/*/[$LATEST]' as year, month, day
| stats count_distinct(hash) as sum by day

group results

fields @timestamp, @message
| filter @message like 'Undefined or unsupported file type'
| filter @message like 'file_type'
| parse @message '"hash": "*"' as hash
| parse @message '"file_type": *,' as file_type
| stats count_distinct(hash) as sum by hash, file_type

scan task forwarder log

send vendor

fields @timestamp, @message, @logStream
| filter @message like 'handle sandbox reply task id' or @message like 'handle sandbox reply report'
| parse @logStream '*/*/*/[$LATEST]' as year, month, day
| stats count(*) as sum by day

report forwarder log

score distribution

fields @timestamp, @message
| filter @message like 'virus_score'
| parse @message 'virus_score * is' as score
| stats count(score) as sum by bin(1d), score

quarantine

file type distribution

fields @timestamp, @message
| filter @message like 'filter queue msg'
| parse message '"file_type": *}' as file_type
| stats count(file_type) as sum by bin(1d), file_type

cloud query

hit cache and score

fields @timestamp, @message
| filter @message like 'has been cached'
| parse @message /(?<md5>[0-9a-z]{32}) has been cached and score is (?<score>-?[0-9]+)/
| filter score > 0
| stats count(md5) as sum by md5, score

EventBriedge

CloudFront

CloudTrail

Elasticsearch

DynamoDB

Primary key

The primary key uniquely identifies each item in the table, so that no two items can have the same key.

Each primary key attribute must be a scalar (meaning that it can hold only a single value). The only data types allowed for primary key attributes are string, number, or binary

Local secondary indexes (max is 5)

At table creation

docker for local dev

https://github.com/instructure/dynamo-local-admin-docker

UpdateExpression

RCU & WCU

EFS

I understand that you want to find out the 12 additional connections that shows in Cloudwatch. I would like to share some information with you after speaking with EFS service team: 1. The connections in Cloudwatch can get overcounted when connections get closed and re-established within the same period. 2. The issue might be caused by a specific behavior of the Linux NFS client with regards to TCP reconnection events. When a reconnection event occurs, the Linux NFS client reuses the TCP source port. This behavior is not conformant with the TCP RFC, and can cause a network issue where NFS responses from EFS to an EC2 instance are blocked for multiple minutes.

To resolve this issue, we recommend to add the "noresvport" mount option when mounting an EFS file system. This option has the effect that a new port is allocated when a reconnection event occurs.

  • noresvport – Tells the NFS client to use a new Transmission Control Protocol (TCP) source port when a network connection is reestablished. Doing this helps make sure that the EFS file system has uninterrupted availability after a network recovery event.

DMS

Athena

:::success S3 建議使用 YYYY/MM/dd/1.log,原因是Glue做parse時可以直接建立partition :::

Basic

``` sql= SHOW PARTITIONS dc_log

``` sql=
ALTER TABLE dc_log DROP PARTITION (year='2020', month='04');

Get DUTs to DC Total Request Counts Per Day/Month

Per day per DUT

``` sql= SELECT date_format(CAST(dc.time_stamp as timestamp), '%Y%m%d') as day, dc.device_info.sn, count(*) as requests FROM dc_log as dc WHERE dc.category = 'file-search-api' AND dc.extra_info.system_tag['device-request-counter'] is not null GROUP BY date_format(CAST(dc.time_stamp as timestamp), '%Y%m%d'), dc.device_info.sn order by day asc

``` sql=
SELECT DAY(CAST(dc.time_stamp as timestamp)) as day, dc.device_info.sn, count(*) as requests 
FROM dc_log as dc 
WHERE dc.category = 'cloud-anti-malware-query' AND
dc.extra_info.system_tag['device-request-counter'] is not null AND
YEAR(CAST(dc.time_stamp as timestamp)) = 2019 AND
MONTH(CAST(dc.time_stamp as timestamp)) = 6
GROUP BY DAY(CAST(dc.time_stamp as timestamp)), dc.device_info.sn
order by day asc

Glue

Polly

整合情境