A manufacturing company has a production line with sensors that collect hundreds of quality metrics. The company has stored sensor data and manual inspection results in a data lake for several months. To automate quality control, the machine learning team must build an automated mechanism that determines whether the produced goods are good quality, replacement market quality, or scrap quality based on the manual inspection results. Which modeling approach will deliver the MOST accurate prediction of product quality?
A. Amazon SageMaker DeepAR forecasting algorithm
B. Amazon SageMaker XGBoost algorithm
C. Amazon SageMaker Latent Dirichlet Allocation (LDA) algorithm
D. A convolutional neural network (CNN) and ResNet
Explanation: A convolutional neural network (CNN) is a type of deep learning model that can learn to extract features from images and perform tasks such as classification, segmentation, and detection1. ResNet is a popular CNN architecture that uses residual connections to overcome the problem of vanishing gradients and enable very deep networks2. For the task of predicting product quality based on sensor data, a CNN and ResNet approach can leverage the spatial structure of the data and learn complex patterns that distinguish different quality levels.
An Amazon SageMaker notebook instance is launched into Amazon VPC The SageMaker notebook references data contained in an Amazon S3 bucket in another account The bucket is encrypted using SSE-KMS The instance returns an access denied error when trying to access data in Amazon S3. Which of the following are required to access the bucket and avoid the access denied error? (Select THREE)
A. An AWS KMS key policy that allows access to the customer master key (CMK)
B. A SageMaker notebook security group that allows access to Amazon S3
C. An 1AM role that allows access to the specific S3 bucket
D. A permissive S3 bucket policy
E. An S3 bucket owner that matches the notebook owner
F. A SegaMaker notebook subnet ACL that allow traffic to Amazon S3.
Explanation: To access an Amazon S3 bucket in another account that is encrypted using
SSE-KMS, the following are required:
A Machine Learning Specialist is attempting to build a linear regression model. Given the displayed residual plot only, what is the MOST likely problem with the model?
A. Linear regression is inappropriate. The residuals do not have constant variance.
B. Linear regression is inappropriate. The underlying data has outliers.
C. Linear regression is appropriate. The residuals have a zero mean.
D. Linear regression is appropriate. The residuals have constant variance.
Explanation: A residual plot is a type of plot that displays the values of a predictor variable in a regression model along the x-axis and the values of the residuals along the y-axis. This plot is used to assess whether or not the residuals in a regression model are normally distributed and whether or not they exhibit heteroscedasticity. Heteroscedasticity means that the variance of the residuals is not constant across different values of the predictor variable. This violates one of the assumptions of linear regression and can lead to biased estimates and unreliable predictions. The displayed residual plot shows a clear pattern of heteroscedasticity, as the residuals spread out as the fitted values increase. This indicates that linear regression is inappropriate for this data and a different model should be used.
A company is using Amazon Polly to translate plaintext documents to speech for automated company announcements However company acronyms are being mispronounced in the current documents How should a Machine Learning Specialist address this issue for future documents?
A. Convert current documents to SSML with pronunciation tags
B. Create an appropriate pronunciation lexicon.
C. Output speech marks to guide in pronunciation
D. Use Amazon Lex to preprocess the text files for pronunciation
Explanation: A pronunciation lexicon is a file that defines how words or phrases should be
pronounced by Amazon Polly. A lexicon can help customize the speech output for words
that are uncommon, foreign, or have multiple pronunciations. A lexicon must conform to the
Pronunciation Lexicon Specification (PLS) standard and can be stored in an AWS region
using the Amazon Polly API. To use a lexicon for synthesizing speech, the lexicon name
must be specified in the
To use this lexicon, the text input must include the following SSML tag:
A company's machine learning (ML) specialist is building a computer vision model to classify 10 different traffic signs. The company has stored 100 images of each class in Amazon S3, and the company has another 10.000 unlabeled images. All the images come from dash cameras and are a size of 224 pixels * 224 pixels. After several training runs, the model is overfitting on the training data. Which actions should the ML specialist take to address this problem? (Select TWO.)
A. Use Amazon SageMaker Ground Truth to label the unlabeled images
B. Use image preprocessing to transform the images into grayscale images.
C. Use data augmentation to rotate and translate the labeled images.
D. Replace the activation of the last layer with a sigmoid.
E. Use the Amazon SageMaker k-nearest neighbors (k-NN) algorithm to label the unlabeled images.
Explanation:
Data augmentation is a technique to increase the size and diversity of the training
data by applying random transformations such as rotation, translation, scaling,
flipping, etc. This can help reduce overfitting and improve the generalization of the
model. Data augmentation can be done using the Amazon SageMaker image
classification algorithm, which supports various augmentation options such as
horizontal_flip, vertical_flip, rotate, brightness, contrast, etc1.
The Amazon SageMaker k-nearest neighbors (k-NN) algorithm is a supervised
learning algorithm that can be used to label unlabeled data based on the similarity
to the labeled data. The k-NN algorithm assigns a label to an unlabeled instance
by finding the k closest labeled instances in the feature space and taking a
majority vote among their labels. This can help increase the size and diversity of
the training data and reduce overfitting. The k-NN algorithm can be used with the
Amazon SageMaker image classification algorithm by extracting features from the
images using a pre-trained model and then applying the k-NN algorithm on the
feature vectors2.
Using Amazon SageMaker Ground Truth to label the unlabeled images is not a
good option because it is a manual and costly process that requires human
annotators. Moreover, it does not address the issue of overfitting on the existing
labeled data.
Using image preprocessing to transform the images into grayscale images is not a
good option because it reduces the amount of information and variation in the
images, which can degrade the performance of the model. Moreover, it does not
address the issue of overfitting on the existing labeled data.
Replacing the activation of the last layer with a sigmoid is not a good option
because it is not suitable for a multi-class classification problem. A sigmoid
activation function outputs a value between 0 and 1, which can be interpreted as a
probability of belonging to a single class. However, for a multi-class classification
problem, the output should be a vector of probabilities that sum up to 1, which can
be achieved by using a softmax activation function.
A company is building a demand forecasting model based on machine learning (ML). In the
development stage, an ML specialist uses an Amazon SageMaker notebook to perform
feature engineering during work hours that consumes low amounts of CPU and memory
resources. A data engineer uses the same notebook to perform data preprocessing once a
day on average that requires very high memory and completes in only 2 hours. The data
preprocessing is not configured to use GPU. All the processes are running well on an
ml.m5.4xlarge notebook instance.
The company receives an AWS Budgets alert that the billing for this month exceeds the
allocated budget.
Which solution will result in the MOST cost savings?
A. Change the notebook instance type to a memory optimized instance with the same vCPU number as the ml.m5.4xlarge instance has. Stop the notebook when it is not in use. Run both data preprocessing and feature engineering development on that instance.
B. Keep the notebook instance type and size the same. Stop the notebook when it is not in use. Run data preprocessing on a P3 instance type with the same memory as the ml.m5.4xlarge instance by using Amazon SageMaker Processing.
C. Change the notebook instance type to a smaller general-purpose instance. Stop the notebook when it is not in use. Run data preprocessing on an ml. r5 instance with the same memory size as the ml.m5.4xlarge instance by using Amazon SageMaker Processing.
D. Change the notebook instance type to a smaller general-purpose instance. Stop the notebook when it is not in use. Run data preprocessing on an R5 instance with the same memory size as the ml.m5.4xlarge instance by using the Reserved Instance option.
Explanation: The best solution to reduce the cost of the notebook instance and the data
preprocessing job is to change the notebook instance type to a smaller general-purpose
instance, stop the notebook when it is not in use, and run data preprocessing on an ml.r5
instance with the same memory size as the ml.m5.4xlarge instance by using Amazon
SageMaker Processing. This solution will result in the most cost savings because:
An aircraft engine manufacturing company is measuring 200 performance metrics in a time-series. Engineers want to detect critical manufacturing defects in near-real time during testing. All of the data needs to be stored for offline analysis. What approach would be the MOST effective to perform near-real time defect detection?
A. Use AWS IoT Analytics for ingestion, storage, and further analysis. Use Jupyter notebooks from within AWS IoT Analytics to carry out analysis for anomalies.
B. Use Amazon S3 for ingestion, storage, and further analysis. Use an Amazon EMR cluster to carry out Apache Spark ML k-means clustering to determine anomalies.
C. Use Amazon S3 for ingestion, storage, and further analysis. Use the Amazon SageMaker Random Cut Forest (RCF) algorithm to determine anomalies.
D. Use Amazon Kinesis Data Firehose for ingestion and Amazon Kinesis Data Analytics Random Cut Forest (RCF) to perform anomaly detection. Use Kinesis Data Firehose to store data in Amazon S3 for further analysis.
Explanation:
Therefore, the company can use the following architecture to build the near-real
time defect detection solution:
A company is setting up a mechanism for data scientists and engineers from different
departments to access an Amazon SageMaker Studio domain. Each department has a
unique SageMaker Studio domain.
The company wants to build a central proxy application that data scientists and engineers
can log in to by using their corporate credentials. The proxy application will authenticate
users by using the company's existing Identity provider (IdP). The application will then
route users to the appropriate SageMaker Studio domain.
The company plans to maintain a table in Amazon DynamoDB that contains SageMaker
domains for each department.
How should the company meet these requirements?
A. Use the SageMaker CreatePresignedDomainUrl API to generate a presigned URL for each domain according to the DynamoDB table. Pass the presigned URL to the proxy application.
B. Use the SageMaker CreateHuman TaskUi API to generate a UI URL. Pass the URL to the proxy application.
C. Use the Amazon SageMaker ListHumanTaskUis API to list all UI URLs. Pass the appropriate URL to the DynamoDB table so that the proxy application can use the URL.
D. Use the SageMaker CreatePresignedNotebookInstanceUrl API to generate a presigned URL. Pass the presigned URL to the proxy application.
Explanation: The SageMaker CreatePresignedDomainUrl API is the best option to meet
the requirements of the company. This API creates a URL for a specified UserProfile in a
Domain. When accessed in a web browser, the user will be automatically signed in to the
domain, and granted access to all of the Apps and files associated with the Domain’s
Amazon Elastic File System (EFS) volume. This API can only be called when the
authentication mode equals IAM, which means the company can use its existing IdP to
authenticate users.
The company can use the DynamoDB table to store the domain IDs
and user profile names for each department, and use the proxy application to query the
table and generate the presigned URL for the appropriate domain according to the user’s
credentials. The presigned URL is valid only for a specified duration, which can be set by
the SessionExpirationDurationInSeconds parameter. This can help enhance the security
and prevent unauthorized access to the domains.
The other options are not suitable for the company’s requirements. The SageMaker
CreateHumanTaskUi API is used to define the settings for the human review workflow user
interface, which is not related to accessing the SageMaker Studio domains. The
SageMaker ListHumanTaskUis API is used to return information about the human task
user interfaces in the account, which is also not relevant to the company’s use case. The
SageMaker CreatePresignedNotebookInstanceUrl API is used to create a URL to connect
to the Jupyter server from a notebook instance, which is different from accessing the
SageMaker Studio domain.
A data science team is planning to build a natural language processing (NLP) application. The application’s text preprocessing stage will include part-of-speech tagging and key phase extraction. The preprocessed text will be input to a custom classification algorithm that the data science team has already written and trained using Apache MXNet. Which solution can the team build MOST quickly to meet these requirements?
A. Use Amazon Comprehend for the part-of-speech tagging, key phase extraction, and classification tasks.
B. Use an NLP library in Amazon SageMaker for the part-of-speech tagging. Use Amazon Comprehend for the key phase extraction. Use AWS Deep Learning Containers with Amazon SageMaker to build the custom classifier.
C. Use Amazon Comprehend for the part-of-speech tagging and key phase extraction tasks. Use Amazon SageMaker built-in Latent Dirichlet Allocation (LDA) algorithm to build the custom classifier.
D. Use Amazon Comprehend for the part-of-speech tagging and key phase extraction tasks. Use AWS Deep Learning Containers with Amazon SageMaker to build the custom classifier.
Explanation: Amazon Comprehend is a natural language processing (NLP) service that can perform part-of-speech tagging and key phrase extraction tasks. AWS Deep Learning Containers are Docker images that are pre-installed with popular deep learning frameworks such as Apache MXNet. Amazon SageMaker is a fully managed service that can help build, train, and deploy machine learning models. Using Amazon Comprehend for the text preprocessing tasks and AWS Deep Learning Containers with Amazon SageMaker to build the custom classifier is the solution that can be built most quickly to meet the requirements.
An agricultural company is interested in using machine learning to detect specific types of
weeds in a 100-acre grassland field. Currently, the company uses tractor-mounted
cameras to capture multiple images of the field as 10 × 10 grids. The company also has a
large training dataset that consists of annotated images of popular weed classes like
broadleaf and non-broadleaf docks.
The company wants to build a weed detection model that will detect specific types of
weeds and the location of each type within the field. Once the model is ready, it will be
hosted on Amazon SageMaker endpoints. The model will perform real-time inferencing
using the images captured by the cameras.
Which approach should a Machine Learning Specialist take to obtain accurate predictions?
A. Prepare the images in RecordIO format and upload them to Amazon S3. Use Amazon SageMaker to train, test, and validate the model using an image classification algorithm to categorize images into various weed classes.
B. Prepare the images in Apache Parquet format and upload them to Amazon S3. Use Amazon SageMaker to train, test, and validate the model using an object-detection singleshot multibox detector (SSD) algorithm.
C. Prepare the images in RecordIO format and upload them to Amazon S3. Use Amazon SageMaker to train, test, and validate the model using an object-detection single-shot multibox detector (SSD) algorithm.
D. Prepare the images in Apache Parquet format and upload them to Amazon S3. Use Amazon SageMaker to train, test, and validate the model using an image classification algorithm to categorize images into various weed classes.
Explanation: The problem of detecting specific types of weeds and their location within
the field is an example of object detection, which is a type of machine learning model that
identifies and localizes objects in an image. Amazon SageMaker provides a built-in object
detection algorithm that uses a single-shot multibox detector (SSD) to perform real-time
inference on streaming images. The SSD algorithm can handle multiple objects of varying
sizes and scales in an image, and generate bounding boxes and scores for each object
category. Therefore, option C is the best approach to obtain accurate predictions.
A Machine Learning Specialist must build out a process to query a dataset on Amazon S3 using Amazon Athena The dataset contains more than 800.000 records stored as plaintext CSV files Each record contains 200 columns and is approximately 1 5 MB in size Most queries will span 5 to 10 columns only How should the Machine Learning Specialist transform the dataset to minimize query runtime?
A. Convert the records to Apache Parquet format
B. Convert the records to JSON format
C. Convert the records to GZIP CSV format
D. Convert the records to XML format
Explanation:
Amazon Athena is an interactive query service that allows you to
analyze data stored in Amazon S3 using standard SQL. Athena is serverless, so
you only pay for the queries that you run and there is no infrastructure to manage.
To optimize the query performance of Athena, one of the best practices is to
convert the data into a columnar format, such as Apache Parquet or Apache ORC.
Columnar formats store data by columns rather than by rows, which allows Athena
to scan only the columns that are relevant to the query, reducing the amount of
data read and improving the query speed. Columnar formats also support
compression and encoding schemes that can reduce the storage space and the
data scanned per query, further enhancing the performance and reducing the cost.
In contrast, plaintext CSV files store data by rows, which means that Athena has to
scan the entire row even if only a few columns are needed for the query. This
increases the amount of data read and the query latency. Moreover, plaintext CSV
files do not support compression or encoding, which means that they take up more
storage space and incur higher query costs.
Therefore, the Machine Learning Specialist should transform the dataset to
Apache Parquet format to minimize query runtime.
A manufacturer is operating a large number of factories with a complex supply chain
relationship where unexpected downtime of a machine can cause production to stop at
several factories. A data scientist wants to analyze sensor data from the factories to identify
equipment in need of preemptive maintenance and then dispatch a service team to prevent
unplanned downtime. The sensor readings from a single machine can include up to 200
data points including temperatures, voltages, vibrations, RPMs, and pressure readings.
To collect this sensor data, the manufacturer deployed Wi-Fi and LANs across the
factories. Even though many factory locations do not have reliable or high-speed internet
connectivity, the manufacturer would like to maintain near-real-time inference capabilities.
Which deployment architecture for the model will address these business requirements?
A. Deploy the model in Amazon SageMaker. Run sensor data through this model to predict which machines need maintenance.
B. Deploy the model on AWS IoT Greengrass in each factory. Run sensor data through this model to infer which machines need maintenance.
C. Deploy the model to an Amazon SageMaker batch transformation job. Generate inferences in a daily batch report to identify machines that need maintenance.
D. Deploy the model in Amazon SageMaker and use an IoT rule to write data to an Amazon DynamoDB table. Consume a DynamoDB stream from the table with an AWS Lambda function to invoke the endpoint.
Explanation: AWS IoT Greengrass is a service that extends AWS to edge devices, such
as sensors and machines, so they can act locally on the data they generate, while still
using the cloud for management, analytics, and durable storage. AWS IoT Greengrass
enables local device messaging, secure data transfer, and local computing using AWS
Lambda functions and machine learning models. AWS IoT Greengrass can run machine learning inference locally on devices using models that are created and trained in the
cloud. This allows devices to respond quickly to local events, even when they are offline or
have intermittent connectivity. Therefore, option B is the best deployment architecture for
the model to address the business requirements of the manufacturer.
Page 3 out of 26 Pages |
Previous |