aws 认证

the highly important and carefully crafted piece, * this will only be useful after completing the entire course on Udemy

精心制作的作品非常重要，*只完成相关工作Udemy整门课后才有用

适用于AWS ML专业的Udemy课程 (Udemy Course for AWS ML Specialty)

备忘单 (Cheat Sheet)

降低SageMaker上自动超参数调整的成本 (Reduce the cost of Automatic Hyperparameter tuning on SageMaker)

use log scales on parameter ranges
在参数范围内使用对数刻度
less concurrent while tuning, cause it learns in different runs
调整时并发性较小，导致在不同的操作中学习
have the smallest range of hyperparameters
具有最小范围的超参数

Recall is an important metric in situations where classifications are highly imbalanced, and the positive case is rare. Accuracy tends to be misleading in these cases.

在分类高度不平衡的情况下，召回这是一个重要的指标，而积极的案例很少见。在这些情况下，准确性往往被误导。

Ex: Fraud Detection
例如：欺诈检测

混淆矩阵备忘单— (Cheat Sheet for Confusion Matrix —)

更多的时代和过度拟合？ (More epochs and overfitted?)

use drop out regularization
使用辍学正则化
early stopping of epochs is good advice
早停是个好建议

SageMaker支持笔记本实例Internet，在VPC潜在的安全漏洞。 (SageMaker notebook instances are Internet-enabled, creating a potential security hole in your VPC.)

VPC Interface Endpoint(PrivateLink)
VPC接口端点(PrivateLink)
Modify instance’s security group to allow outbound connections for training and hosting.
修改实例安全组，允许出站连接进行培训和托管。

边缘 (Edge)

SageMaker Neo IoT GreenGrass
SageMaker Neo 物联网GreenGrass
sample edge device — Nvidia Jetson
样品边缘设备— Nvidia Jetson

设计并推向边缘 (To design and push something to edge)

design something to do the job, say TF model
设计能胜任的工作，比如TF模型
compile it for the edge device using SageMaker Neo, say Nvidia Jetson
Nvidia Jetson说，使用SageMaker Neo将其编译成边缘设备
run it on the edge using IoT GreenGrass
使用IoT GreenGrass在边缘运行

亚马逊上的NLP —理解 (NLP on Amazon — Comprehend)

Another solution would be to use natural language processing through a service such as Amazon Comprehend.
另一个解决方案是通过，例如Amazon Comprehend自然语言处理等服务。

您正在SageMaker训练有数百万行训练数据XGBoost并希望使用模型Apache Spark这些数据大规模预处理。实现这一目标最简单的架构是什么？ (You are training an XGBoost model on SageMaker with millions of rows of training data, and you wish to use Apache Spark to pre-process this data at scale. What is the simplest architecture that achieves this?)

The SageMakerEstimator classes allow tight integration between Spark and SageMaker for several models including XGBoost, and offers the simplest solution

SageMakerEstimator类允许Spark和SageMaker在包括XGBoost在内的多种模型之间进行紧密集成，并提供最简单的解决方案

您无法将SageMaker部署到EMR集群 (You can’t deploy SageMaker to an EMR cluster)

XGBoost实际上需要LibSVM或CSV输入 (XGBoost actually requires LibSVM or CSV input)

归纳最佳ML填充选择？ (Imputation best ML filling choices?)

Categorical — Deep Learning
分类-深度学习
Numerical — kNN
数值— kNN

ML和流量峰值是否偶尔出现？ (if any, ML and spike Of traffic sporadically?)

Use Spot Instances — The use of spot instances in response to anticipated surges in usage is the most cost-effective approach for scaling up an EMR cluster.
使用竞价型实例—使用竞价型实例来响应预期的使用激增是扩展EMR集群的最具成本效益的方法。

像素级分类称为“语义分割” (Pixel level classification is called — Semantic Segmentation)

什么是损失函数？ (What is Loss Function?)

What is that you don’t want to lose will be your loss function while building your model
您不想丢失的是构建模型时的损失函数
Example: for fraud detection, you don’t want false negatives, so FN / FN + TP is the loss function
示例：对于欺诈检测，您不需要假阴性，因此FN / FN + TP是损失函数

降低尺寸 (Reduce the Dimensionality)

PCA
PCA
K-Means Clustering
K均值聚类

KNN-受监督； K均值—无监督 (KNN — Supervised; K-Means — Unsupervised)

/opt/ml/code/train.py (/opt/ml/code/train.py)

this should have an env. variable SAGEMAKER_PROGRAM with value train.py in the Dockerfile
这应该有一个环境。 SAGEMAKER_PROGRAM具有值train.py的变量SAGEMAKER_PROGRAM

使用S3前缀按日期组织数据可以使Glue按日期对数据进行分区，从而可以更快地查询日期范围。 (Organizing data by date using S3 prefixes allows Glue to partition the data by date, which leads to faster queries done on date ranges.)

S3生命周期策略可以自动化将旧数据归档到Glacier的过程。 (S3 lifecycle policies can automate the process of archiving old data to Glacier.)

制作自己的Alexa (Make your own Alexa)

Transcribe(speech to text) → Lex(chatbot engine that works on intent) → Polly(that reads the given text (text to speech))
转录(语音到文本)→Lex(可在意图上工作的聊天机器人引擎)→Polly(读取给定的文本(文本到语音))
in real implementation we also use — DynamoDB and Lambdas too
在实际实现中，我们还使用了DynamoDB和Lambdas

在您首先进行培训之前， AWS Rekognition不会知道您的公司徽标，也不会知道对象检测。 (AWS Rekognition won’t know about your company logo, nor will Object Detection until you have trained it first.)

虽然Ground Truth可以选择使用Mechanical Turk的劳动力，但它是专门为此类任务而设计的，可以很快设置 (While Ground Truth can use the Mechanical Turk workforce as an option, it is purpose-built for this sort of task and can be set up very quickly)

分解机器与处理稀疏数据有关，但是它们本身并不执行降维。 (Factorization machines are relevant to handling sparse data, but they don’t perform dimensionality reduction per se.)

Factorization Machines → Sparse Data
分解机→稀疏数据
Sparse Data → Factorization Machines
稀疏数据→因式分解机

PCA是一种强大的降维技术，可以找到最佳尺寸。 (PCA is a powerful dimensionality reduction technique that will find the best dimensions.)

给定多轴混淆矩阵作为具有对角轴的热图 (Given a multi-axis confusion matrix as a heat map with a diagonal axis)

The choice with the lightest color along the diagonal axis is the correct one, as it represents the lowest number of correct predictions.
沿对角线轴颜色最浅的选择是正确的选择，因为它代表正确预测的最少数量。

我们永远不能说图表捕捉的趋势不错，但季节性不好。 (We can never say a graph is capturing trend good but seasonality bad.)

either both good or both bad
要么好要么坏

季节性是指周期性的变化，而趋势是随时间推移的长期变化。 (Seasonality refers to periodic changes, while trends are longer-term changes over time.)

Kinesis Analytics可以使用SQL本机进行最少的转换。 (Kinesis Analytics can do minimum transformation natively using SQL.)

Amazon Forecast-AWS上的RTF服务以进行预测。 (Amazon Forecast — RTF Service on AWS for forecasting.)

您正在使用EMR，请使用S3→始终使用EMRFS (You are on EMR, to use S3 → always EMRFS)

SMOTE-巧妙的过采样技术 (SMOTE — an ingenious oversampling technique)

Synthetic Minority Oversampling Technique
综合少数民族过采样技术

大批量处理→卡在局部最小值中→您将错过真正的最小值 (Large Batch Size → stuck in local minima → you will miss true minima)

L1正则化技术→减少功能(对修复过度拟合非常有用)→如果执行得太过激，也可能会过早拟合。 (L1 Regularization Technique → reduces features (very useful to fix overfitting) → if done too aggressive might also under fit too soon.)

L2正则化技术→权衡每个特征而不是将其全部删除，这可以提高准确性 (L2 Regularization Technique → it weights each feature instead of removing them entirely, which can lead to better accuracy)

解决不合身？ (Tackle underfitting?)

use L2 instead of L1
使用L2代替L1
or we can also just reduce the L1 regression term (this term means, how intense L1 was applied)
或者我们也可以只减少L1回归项(此项意味着应用L1的强度)

解决过度拟合？ (Tackle Overfitting?)

Dropout regularization Technique
辍学正则化技术
early stops of epochs
时代的早期停止
use a few layers may help
使用几层可能会有所帮助

分位数分档 (Quantile Binning)

splits data into a fixed number of buckets, with the same number of observations in each bin.
将数据分割成固定数量的存储桶，每个仓中的观察值数量相同。

分布不均的数据并保持分布 (unevenly distributed data and preserve the distribution)

Quantile binning
分位数分档

如果使用间隔合并怎么办？ (What if used Interval binning?)

some intervals could have fewer items and some could have way more → this behavior loses the distribution visibility
一些间隔可能会减少项目的数量，而某些间隔可能会有更多的方法→此行为会失去分布可见性

SageMaker分布式培训 (SageMaker Distributed Training)

can’t be done out of the box

开箱即用

Horovod
霍罗沃德
Parameter Servers
参数服务器

训练失败了吗？ (Did training fail?)

Training with unshuffled data may cause training to fail.
使用未经改组的数据进行训练可能会导致训练失败。

培训数据应始终规范化和改组。 (Training data should be normalized and shuffled, all the time.)

Sage Maker Linear Learner支持分类和回归任务。 (Sage Maker Linear Learner supports both classification and regression tasks.)

F1得分→2.PR/(P + R) (F1 Score → 2.P.R/(P + R))

P — Precision
P —精度
R — Recall
R —召回

Glue和Glue ETL可以为非结构化数据赋予结构，并在接收到该数据时对其进行转换。 (Glue and Glue ETL can impart structure to unstructured data, and perform transformations on that data as it is received.)

Athena是一种无服务器解决方案，与Glue配对后可以直接查询S3数据湖 (Athena is a serverless solution that can query S3 data lakes directly when paired with Glue)

S3中的数据，是否需要可视化？ (data in S3 and need visualizations?)

S3 → GlueCrawlers → Glue Data Catalog → Athena → QuickSight
S3→粘合履带→粘合数据目录→雅典娜→QuickSight

当您要准备大量数据时→您总是希望并行完成数据，而Apache Spark是唯一擅长的数据。 (when you want to prepare so much data → you always want it to be done in parallel and Apache Spark is the only one good at it.)

S3上有这么多数据并将其用于ML？ (so much data on S3 and use it for ML?)

approach 1 :
方法1：
- use PySpark + XGBoostSageMakerEstimator to prepare data using Spark
-使用PySpark + XGBoostSageMakerEstimator使用Spark准备数据
- then pass the data to SageMaker
-然后将数据传递给SageMaker
approach 2 : without using XGBoostSageMakerEstimator
方法2：不使用XGBoostSageMakerEstimator
- use Spark on EMR to pre-process the data and store it back in same/another S3
-在EMR上使用Spark预处理数据并将其存储回相同/另一个S3中
- keep S3 bucket accessible to SageMaker to train on
-让SageMaker可以访问S3存储桶以进行培训

Glue ETL和Kinesis Analytics都不能转换为LibSVM格式 (Neither Glue ETL nor Kinesis Analytics can convert to LibSVM format)

`scikit-learn`不适用于分布式解决方案。 (`scikit-learn` is not for a distributed solution.)

LibSVM —支持向量机的库 (LibSVM — A Library for Support Vector Machines)

最好的插补技术是什么？ (What is the best imputation technique?)

always supervised for → discrete data
始终受监督→离散数据
Deep Learning for → classification data
深度学习→分类数据
mean or median next
下一个均值或中位数
drop off next
接下来下车

培训涉及多个长期运行的ETL作业，这些作业需要按顺序执行 (training involves multiple long-running ETL jobs which need to execute in order)

order → StepFunctions
订购→StepFunctions

QuickSight的ML Insights功能允许使用QuickSight本身进行预测。这是一种包含最少数量组件的无服务器解决方案。 (QuickSight’s ML Insights feature allows forecasting using QuickSight itself. This is a serverless solution that contains the least number of components.)

完全没有开销的预测？ (Forecasting without overhead at all?)

put data in S3
将数据放入S3
use QuickSight’s native ML Insights feature
使用QuickSight的本机ML Insights功能
also use QuickSight dashboard for visualization
还使用QuickSight仪表板进行可视化

XGBoost超参数 (XGBoost hyperparameters)

subsample
子样本
alpha
α
eta
eta
gamma
伽玛
lambda
拉姆达

当假阴性的成本高于假阳性的成本时，召回(TP /(TP + FN))很重要。 (Recall (TP / (TP+FN)) is important when the cost of a false negative is higher than that of a false positive.)

在装有 相机的地方检测到自定义徽标或T恤？ (detect a custom logo or t-shirt from a place with cameras?)

custom CNN for achieving computer vision or image detection
定制的CNN以实现计算机视觉或图像检测
camera at location
相机在位置
DeepLens
深镜头
DeepLens_kinesis_Video Module
DeepLens_kinesis_Video模块
SageMaker
贤者

快速在当前分类器旁边建立另一个分类器？ (quickly build another classifier beside the current one?)

use transfer learning, clone this besides one and start building on top of it
使用迁移学习，将其克隆并在其上开始构建

转移学习 (transfer learning)

can be below or above
可以低于或高于
use transfer learning, clone this besides one and start building on top of it
使用迁移学习，将其克隆并在其上开始构建
Transfer learning generally involves using an existing model or adding additional layers on top of one.
转移学习通常涉及使用现有模型或在模型之上添加其他层。

分解机→float32 (Factorization Machines → float32)

分解机 (Factorization Machines)

handle sparse data
处理稀疏数据
RecordIO/protobuf in float32 format (highly unusual)
float32格式的RecordIO / protobuf( 非常不寻常 )

对于SageMaker管道模式 (For SageMaker Pipe Mode)

RecordIO is efficient
RecordIO是高效的

SageMaker Notebook(如果使用默认IAM创建) (SageMaker Notebook if created with default IAM)

it can access S3 buckets with ‘sagemaker’ in name
它可以访问名称为“ sagemaker ”的S3存储桶

除非您将具有S3FullAccess权限的策略添加到角色，否则策略将仅限于存储桶名称中带有“ sagemaker”的存储桶。奇怪但真实。 (Unless you add policy with S3FullAccess permission to the role, it is restricted to buckets with “sagemaker” in the bucket name. Strange but true.)

炽烈的文字格式 (Blazing Text format)

Each line of the input file contains a training sentence per line, along with their labels. Labels must be prefixed with the label, and the tokens within the sentence — including punctuation — should be space-separated.
输入文件的每一行每行包含一个训练语句及其标签。标签必须与标签为前缀，这句话中的表征-包括标点符号-应该用空格分开。

为什么是管道模式？ (why Pipe mode?)

if using pipe mode, we don’t copy the data to the training machine
如果使用管道模式，我们不会将数据复制到训练机上
we stream the data
我们流数据
it makes a big diff. for big datasets
这带来了很大的不同。适用于大型数据集
requirements of pipe mode? → RecordIO Format
管道模式的要求？ →RecordIO格式

SageMaker LDA→仅管道模式→因此RecordIO (SageMaker LDA → only Pipe mode → so RecordIO)

SageMaker LDA→仅在单个实例上进行培训 (SageMaker LDA → training on an only single instance)

SageMaker分解机→RecordIO && float32 (SageMaker Factorization Machines → RecordIO && float32)

AWS批处理 (AWS Batch)

plans, schedules, and executes your batch computing workloads across the full range of AWS compute services and features, such as Amazon EC2 and Spot Instances.
在整个AWS计算服务和功能(例如Amazon EC2和竞价型实例)中计划，计划和执行批处理计算工作负载。

复杂的工作流程？ (complex workflow ?)

orderly executed → Step Functions
有序执行→步骤功能
just scheduling ability, but no order required → AWS Batch
仅具有计划功能，但无需订购→AWS Batch

学习率 (Learning Rate)

Too Large → overshoots true minima
太大→超出实际最小值
Too Small → Slows down convergence, takes more time
太小→降低收敛速度，需要更多时间

批量大小 (Batch Size)

Too Large → stuck at local minima
太大→停留在局部最小值
Less Size → true minima
较小的尺寸→真正的最小值

真正的最低要求是什么？ (What is this true minima?)

when training usually we want it to perform less bad of one quality
通常，当我们训练时，我们希望它表现出一种劣质的表现
that one quality → Loss Function
那个质量→损失函数
that actually less bad → actual minimal bad. → actual minima → true minima
那实际上更少的坏→实际上最小的坏。 →实际最小值→真实最小值

SageMaker Seq2Seq (SageMaker Seq2Seq)

machine translation
机器翻译
we need to provide vocabulary files
我们需要提供词汇文件
tokenize our words into integers
将我们的单词标记为整数
RecordIO-protobuf format with integer tokens
具有整数标记的RecordIO-protobuf格式

您自己的通用语言翻译器？ (Your own Universal Language Translator?)

You Speak in language 1 → AWS Transcribe → AWS Translate → AWS Polly speaks in language 2
您以语言1说→AWS Transcribe→AWS Translate→AWS Polly以语言2说

炽热的文字 (BlazingText)

this is for sentiment analysis
这是用于情绪分析
because only the sentiment analysis → order of words doesn’t matter
因为只有情感分析→单词顺序无关紧要
Uses Skip-gram and CBOW-Continuous Bag Of Words
使用Skip-gram和CBOW连续词袋
BlazingText doesn’t use LSTM or CNN
BlazingText不使用LSTM或CNN

在用于神经网络之前，必须将分类特征转换为一元热的二进制表示形式。 (Categorical features need to be converted into one-hot, binary representations prior to use in a neural network.)

RDS，Elasticsearch和EMR都需要配置服务器。 (RDS, Elasticsearch, and EMR all require the provisioning of servers.)

S3，Glue，Athena和Quicksight都是无服务器解决方案。 (S3, Glue, Athena, and Quicksight are all serverless solutions.)

名人检测 (Celebrity Detection)

already trained model under the hood of AWS Rekognition
已在AWS Rekognition的框架下训练过的模型

检测流中的某些异常？ (to detect some anomaly on a stream?)

Kinesis Data Analytics has ❤️ a native Random Cut Forest algorithm, use that.
Kinesis Data Analytics具有❤️本机的Random Cut Forest算法，请使用该算法。
Random Cut Forest is Amazon’s own algorithm for anomaly detection and is usually the right choice when anomaly detection is asked for on the exam. It is implemented within both Kinesis Data Analytics and SageMaker, but only Kinesis works in the way described.
Random Cut Forest是Amazon自己的异常检测算法，通常是在考试中要求进行异常检测时的正确选择。它在Kinesis Data Analytics和SageMaker中均已实现，但只有Kinesis可以按所述方式工作。

LSTM — RNN的特定种类，长期短期记忆 (LSTM — specific kind of RNN, Long Short Term Memory)

RNN (RNN)

feeds the same neuron(so named recurrent — reoccurring)
喂养相同的神经元(所以称为复发性-重复发生)
if the depth of persistence of this feed that is fed → LSTM — long or short
如果所喂入的这种喂食的持续深度→LSTM —长还是短

产生音乐。？ (Generate Music. ?)

it is a time-series problem
这是一个时序问题
use RNN
使用RNN

Kinesis Firehose能够即时将JSON数据转换为Parquet或ORC格式。 (Kinesis Firehose has the ability to convert JSON data to Parquet or ORC format on the fly.)

当使用Parquet或ORC等列格式时，Athena的执行效率更高，成本更低， (Athena performs much more efficiently and at lower cost when using columnar formats such as Parquet or ORC,)

无服务器分析。？ (Serverless Analytics. ?)

JSON Data input as Kinesis Streams
JSON数据输入为Kinesis Streams
- send to Firehose
-发送给Firehose
Supply to Kinesis Firehose
供应给Kinesis Firehose
- convert to Parquet or ORC and load to S3
-转换为Parquet或ORC并加载到S3
Athena queries from S3 using Glue Crawler and Glue Data Catalog and provides Analytics
雅典娜使用Glue Crawler和Glue Data Catalog从S3查询并提供分析

AWS Rekognition可以立即识别图像中的常见对象。 (AWS Rekognition can identify common objects in images right out of the box.)

Comprehend可用于为帖子中的文本生成主题。 (Comprehend could be used to produce topics for the text in the posts.)

理解— RTF AWS NLP (Comprehend — RTF AWS NLP)

BlazingText —只是SageMaker上NLP的一种算法 (BlazingText — Just an Algorithm for NLP on SageMaker)

消失的梯度？ (Vanishing Gradient?)

use ReLU
使用ReLU

梯度消失的原因？ (reasons for vanishing gradient?)

from multiplying together many small derivates of the sigmoid activation function in multiple layers
将多层S型激活函数的许多小导数相乘

SageMaker Object2Vec与SageMaker BlazingText (SageMaker Object2Vec vs. SageMaker BlazingText)

both are algorithms
两者都是算法
Object2Vec creates embeddings for arbitrary objects, like Tweets
Object2Vec为任意对象(如推文)创建嵌入
BlazingText can only find relationships between words but not entire tweets
BlazingText只能找到单词之间的关系，而不能找到整个推文

XGBoost实例类型？ (XGBoost instance type?)

M4
M4
XGBoost is a CPU-only algorithm
XGBoost是仅CPU的算法
no benefit from GPUs
无法从GPU中受益
GPU Type → P3 or P2
GPU类型→P3或P2

实例类型-https: //aws.amazon.com/sagemaker/pricing/instance-types/ (Instance Types — https://aws.amazon.com/sagemaker/pricing/instance-types/)

GPU — Accelerated Computing
GPU —加速计算
P, G
P，G
CPU — Standard
CPU —标准
M, T
M，T
Memory-Optimized — Current generation
内存优化-当前一代
R
[R
Compute Optimized — Current generation
优化计算-当前的一代
C
C
Inference Accelerator
推理加速器
another level
另一个层面

非线性聚类解决方案 (Non — linear clustering solutions)

kNN
神经网络
SVM + RBF
支持向量机+ RBF
SVM — Simple Vector Machine
SVM —简单的矢量机
RBF — Radial Basis Function
RBF —径向基函数

离群值会使线性模型倾斜。 (Outliers can skew linear models.)

discard them by identifying as being outside some multiple of a standard deviation from the mean
通过将其标识为与平均值相差某个标准偏差的倍数，将其丢弃

竞价型实例→EMR上的任务节点 (Spot Instances → task nodes on EMR)

重复数据删除？ (Deduplication?)

Glue ETL — FindMatchesML ❤️ feature
胶水ETL — FindMatchesML❤️功能

为一堆文本分配主题 (Assign topics for a bunch of texts)

LDA — Latent Dirichlet Allocation, Unsupervised Topic Modeling
LDA —潜在Dirichlet分配，无监督主题建模
NTM — Neural Topic Model — SageMaker Algorithm
NTM —神经主题模型— SageMaker算法

寻找话题 (find topics)

SageMaker LDA Algorithm
SageMaker LDA算法
SageMaker NTM Algorithm
SageMaker NTM算法
Amazon Comprehend also (this does sentiment and full)
亚马逊还理解(这确实感悟和充分)

归咎于？ (Imputation?)

If no outliers? → Mean
如果没有异常值？ →均值
If yes outliers? → Median
如果是，则有异常值吗？ →中位数

SageMaker的新模型可以在不影响客户的情况下进行测试吗？ (SageMaker's new model can be tested without impact to customers?)

Yes
是
Production Variants — are made for this
生产变型 —为此而制造
purpose like Tesla Shadow Mode
特斯拉阴影模式

曲线 (Curves)

AUC — Area Under Curve
AUC —曲线下面积
ROC — Receiver Operating Characteristic
ROC —接收器工作特性
Good ROC will be curved up toward (0,1)
好的ROC会向上弯曲(0,1)
Perfect AUC is 1.0
完美的AUC为1.0

建议使用SageMaker Linear Learner改组 (Shuffling is recommended with SageMaker Linear Learner)

如何控制特定IAM组对SageMaker笔记本的访问？ (how to control access to SageMaker notebooks to specific IAM Groups?)

put tags on SageMaker resources
将标签放在SageMaker资源上
use ResourceTag conditions in IAM Policies to choose these tags of SageMaker instances
使用IAM策略中的ResourceTag条件选择SageMaker实例的这些标签

由于数据集中的PII数据而在进行训练时进行完全加密？ (Full Encryption while training due to PII data in the dataset?)

Inter-container encryption is just a checkbox away when creating a training job via the SageMaker console.
通过SageMaker控制台创建培训作业时，容器间加密只是一个复选框。
It can also be specified using the SageMaker API with a little extra work
也可以使用SageMaker API进行一些额外的工作来指定它

自定义推理容器要求？ (Custom Inference Container requirements?)

Your inference container responds to port 8080, and
您的推理容器响应port 8080 ，并且
must respond to ping requests in under 2 seconds.
必须在2 seconds.响应ping请求2 seconds.
Model artifacts need to be compressed in tar format, not zip.
模型工件需要以tar格式而不是zip压缩。

K-Means是不受监督的。 (摘自备忘录-KUM) (K-Means is unsupervised. (from memo — KUM))

to optimize?
优化？
WSS is one way, also called an elbow method
WSS是一种方法，也称为弯头方法

End.

结束。

翻译自: https://medium.com/swlh/cheat-sheet-for-aws-ml-specialty-certification-e8f9c88566ba