Professional-Data-Engineer Practice Test Questions

Which is the preferred method to use to avoid hotspotting in time series data in Bigtable?

A.

Field promotion

B.

Randomization

C.

Salting

D.

Hashing

What are all of the BigQuery operations that Google charges for?

A.

Storage, queries, and streaming inserts

B.

Storage, queries, and loading data from a file

C.

Storage, queries, and exporting data

D.

Queries and streaming inserts

Which of the following job types are supported by Cloud Dataproc (select 3 answers)?

A.

Hive

B.

Pig

C.

YARN

D.

Spark

The CUSTOM tier for Cloud Machine Learning Engine allows you to specify the number of which types of
cluster nodes?

A.

Workers

B.

Masters, workers, and parameter servers

C.

Workers and parameter servers

D.

Parameter servers

Why do you need to split a machine learning dataset into training data and test data?

A.

So you can try two different sets of features

B.

To make sure your model is generalized for more than just the training data

C.

To allow you to create unit tests in your code

D.

So you can use one dataset for a wide model and one for a deep model

Which of the following is NOT a valid use case to select HDD (hard disk drives) as the storage for Google
Cloud Bigtable?

A.

You expect to store at least 10 TB of data.

B.

You will mostly run batch workloads with scans and writes, rather than frequently executing random
reads of a small number of rows.

C.

You need to integrate with Google BigQuery.

D.

You will not use the data to back a user-facing or latency-sensitive application

Which of the following statements about the Wide & Deep Learning model are true? (Select 2 answers.)

A.

The wide model is used for memorization, while the deep model is used for generalization.

B.

A good use for the wide and deep model is a recommender system.

C.

The wide model is used for generalization, while the deep model is used for memorization.

D.

A good use for the wide and deep model is a small-scale linear regression problem.

Which of the following statements is NOT true regarding Bigtable access roles?

A.

Using IAM roles, you cannot give a user access to only one table in a project, rather than all tables in a
project.

B.

To give a user access to only one table in a project, grant the user the Bigtable Editor role for that table.

C.

You can configure access control only at the project level.

D.

To give a user access to only one table in a project, you must configure access through your application

Which software libraries are supported by Cloud Machine Learning Engine?

A.

Theano and TensorFlow

B.

Theano and Torch

C.

TensorFlow

D.

TensorFlow and Torch

You want to use a BigQuery table as a data sink. In which writing mode(s) can you use BigQuery as a sink?

A.

Both batch and streaming

B.

BigQuery cannot be used as a sink

C.

Only batch

D.

Only streaming

Which of the following is not possible using primitive roles?

A.

Give a user viewer access to BigQuery and owner access to Google Compute Engine instances.

B.

Give UserA owner access and UserB editor access for all datasets in a project.

C.

Give a user access to view all datasets in a project, but not run queries on them.

D.

Give GroupA owner access and GroupB editor access for all datasets in a project.

Which methods can be used to reduce the number of rows processed by BigQuery?

A.

Splitting tables into multiple tables; putting data in partitions

B.

Splitting tables into multiple tables; putting data in partitions; using the LIMIT clause

C.

Putting data in partitions; using the LIMIT clause

D.

Splitting tables into multiple tables; using the LIMIT clause