ACA BigData Consolidated Dump

ACA Sample exam Questions
Single selection
1. Scenario: Jack is the administrator of project prj1. The project involves a large volume of
sensitive data such as bank account, medical record, etc. Jack wants to properly protect
the data. Which of the follow statements is necessary?
a) set ProjectACL=true;
b) add accountprovider ram;
c) set ProjectProtection=true;
d) use prj1;
2. Where is the meta data (e.g.,table schemas) in Hive?

a) Stored as metadata on the NameNode
b) Stored along with the data in HDFS
c) Stored in the RDBMS like MySQL
d) Stored in ZooKeeper
3. MaxCompute tasks contain computational tasks and non-computational tasks. The

computational tasks require actual operations on data stored in the table. MaxCompute
parses the task to obtain its execution plan, and submits the task for execution. The noncomputational
tasks require substantial reading of and modification to metadata
information. Therefore, the task is not parsed, and no execution plan is provided. The
task is directly submitted for execution. The latter one has a faster response speed than
the former one. Which of the following operations on the table t_test is a computational
task?
a) desc t_test
b) alter table t_test add columns (comments string);
c) select count(*) from t_test;
d) truncate table t_test;
4. When we use the MaxCompute tunnel command to upload the log.txt file to the t_log
table, the t_log is a partition table and the partitioning column is (p1 string, p2 string).
Which of the following commands is correct?
a) tunnel upload log.txt t_log/p1="b1”, p2="b2"
b) tunnel upload log.txt t_log/(p1="b1”, p2="b2")
c) tunnel upload log.txt t_log/p1="b1"/p2="b2"
5. A Log table named log in MaxCompute is a partition table, and the partition key is dt. A
new partition is created daily to store the new data of that day. Now we have one
month's data, starting from dt='20180101' to dt='20180131', and we may use ________
to delete the data on 20180101.
a) delete from log where dt='20180101'
b) truncate table where dt='20180101'
c) drop partition log (dt='20180101')
d) alter table log drop partition(dt='20180101')
6. DataV is a powerful yet accessible data visualization tool, which features geographic
information systems allowing for rapid interpretation of data to understand
relationships, patterns, and trends. When a DataV screen is ready, it can embed works to
the existing portal of the enterprise through ______.
a) URL after the release
b) URL in the preview
c) MD5 code obtained after the release
d) Jar package imported after the release
7. By integrating live dashboards, DataV can present and monitor business data
simultaneously. This data-driven approach enables for well-organized data mining and
analysis allowing the user to seize new opportunities that otherwise might remain
hidden. It can support wide range of databases and data formats. Which of the following
options DataV does not support?
a) Alibaba Cloud' s AnalyticDB, ApsaraDB
b) Static data in CSV and JSON formats
c) Oracle Database
d) MaxCompute Project
8. You want to understand more about how users browse your public website. For example,
you want to know which pages they visit prior to placing an order. You have a server farm
of 100 web servers hosting your website. Which is the most efficient process to gather
these web servers across logs into traditional Hadoop ecosystem.
a) Just copy them into HDFS using curl
b) Ingest the server web logs into HDFS using Apache Flume
c) Channel these clickstreams into Hadoop using Hadoop Streaming
d) Import all user clicks from your OLTP databases into Hadoop using Sqoop
9. Your company stores user profile records in an OLTP databases. You want to join these
records with web server logs you have already ingested into the Hadoop file system.
What is the best way to obtain and ingest these user records?
a) Ingest with Hadoop streaming
b) Ingest using Hive
c) Ingest with sqoop import
d) Ingest with Pig's LOAD command
10. You are working on a project where you need to chain together MapReduce, Hive jobs.
You also need the ability to use forks, decision points, and path joins. Which ecosystem
project should you use to perform these actions?
a) Apache HUE
b) Apache Zookeeper
c) Apache Oozie
d) Apache Spark
Multiple selections
1. In DataWorks, we can configure alert policies to monitor periodically scheduled tasks, so that an alert will
be issued timely. Currently DataWorks supports ________ alerts.
(Number of correct answers: 2)
a) Email
b) Text message
c) Telephone
c) Aliwangwang
2. Which of the following task types does DataWorks support?
a) Data Synchronization
b) SHELL
c) MaxCompute SQL
d) MaxCompute MR
e) Scala
3. In order to improve the processing efficiency when using

MaxCompute, you can specify the partition when creating a
table. That is, several fields in the table are specified as
partition columns. Which of the following descriptions about
MaxCompute partition table are correct? (Number of correct
answers: 4)
a) In most cases, user can consider the partition to be the
directory under the file system
b) User can specify multiple partitions, that is, multiple
fields of the table are considered as the partitions of the
table, and the relationship among partitions is similar to
that of multiple directories
c) If the partition columns to be accessed are specified
when using data, then only corresponding partitions are
read and full table scan is avoided, which can improve the
processing efficiency and save costs
d) MaxCompute partition only supports string type and the
conversion of any other types is not allowed
e) The partition value cannot have a double byte characters
(such as Chinese)
4. In DataWorks, a task should be instantiated first before a
scheduled task is running every time, that is, generating a
corresponding instance which is executed for running the
scheduled task. The status is different in each phase of the
scheduling process, including ________. (Number of correct
answers: 3)
a) Not running
b) Running
c) Running Successfully
5. Alibaba Cloud E-MapReduce can be easily plugged with other
Alibaba Cloud services such as Log Service, ONS, MNS that act
as data ingestion channels from real-time data streams. Which
of the following descriptions about real-time processing are
correct? (Number of correct answers: 3)
a) This data is streamed and processed using Apache

Flume or Kafka in integration with Apache Storm using
complex algorithms
b) Kafka is usually preferred with Apache Storm to
provide data pipeline
c) The final processed data can be stored in HDFS, HBase
or any other big data store service in real time.
d) Apache Sqoop is used to do the real-time data
transmission of structured data
True-or-false questions
1. One Alibaba Cloud account is entitled to join only one organization that uses DataWorks.
True
False
2. DataWorks can be used to create all types of tasks and configure scheduling cycles as
needed. The supported granularity levels of scheduling cycles include days, weeks,
months, hours, minutes and seconds.
True
False
3. MaxCompute SQL is suitable for processing less real-time massive data, and employs a
syntax similar to that of SQL. The efficiency of data query can be improved through
creating proper indexes in the table.
True
False
Corrected QA’s
1 .Function Studio is a web project coding and development tool independently developed by the
Alibaba Group for function development scenarios. It is an important component of DataWorks.
Function Studio supports several programming languages and platform-based function development
scenarios except for ______ .
A. Real-time computing
B. Python
C. Java
D. Scala
My Answer: D
2 .A business flow in DataWorks integrates different node task types by business type, such a structure
improves business code development facilitation. Which of the following descriptions about the node
type is INCORRECT?
A. A zero-load node is a control node that does not generate any data. The virtual node is generally used
as the root node for planning the overall node workflow.
B. An ODPS SQL task allows you to edit and maintain the SQL code on the Web, and easily implement
code runs, debug, and collaboration.
C. The PyODPS node in DataWorks can be integrated with MaxCompute Python SDK. You can edit the
Python code to operate MaxCompute on a PyODPS node in DataWorks.
D. The SHELL node supports standard SHELL syntax and the interactive syntax. The SHELL task can run on
the default resource group.
3 .Apache Spark included in Alibaba E-MapReduce(EMR) is a fast and general-purpose cluster computing
system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports
general execution graphs. It also supports a rich set of higher-level tools. Which of the following tools
does not be included in Spark?
A. Spark SQL for SQL and structured data processing
B. MLlib for machine learning
C. GraphX for graph processing - i think its c in rest you can use spark
D. TensorFlow for AI
My Answer: D. Another file says A. Siddesh corrected file A.
Correct Answer: A or D?
4 .DataWorks provides two billing methods: Pay-As-You-Go (post-payment) and subscription (pre-
payment). When DataWorks is activated in pay-as-you-go mode, Which of the following billing items will
not apply?
A. Shared resource groups for scheduling and Data Integration instances
B. Baseline instances generated by Intelligent Monitor and Data Quality checks
C. Task nodes created by developer
D. Calls and execution time of APIs compiled in DataService Studio
My Answer: C
5 .Users can use major BI tools, such as Tablueu and FineReport, to easily connect to MaxCompute
projects, and perform BI analysis or ad hoc queries. The quick query feature in MaxCompute is called
_________ allows you to provide services by encapsulating project table data in APIs, supporting diverse
application scenarios without data migration.
A. Lightning
B. MaxCompute Manager
C. Tunnel
D. Labelsecurity
6 .If a MySQL database contains 100 tables, and jack wants to migrate all those tables to MaxCompute
using DataWorks Data Integration, the conventional method would require him to configure 100 data
synchronization tasks. With _______ feature in DataWorks, he can upload all tables at the same time.
A. Full-Database Migration feature
B. Configure a MySQL Reader plug-in
C. Configure a MySQL Writer plug-in
D. Add data sources in Bulk Mode

My Answer: D. Another file says B. Siddesh corrected file A.
Correct Answer: A or B or D?
7 .Machine Learning Platform for Artificial Intelligence (PAI) node is one of the node types in DataWorks
business flow. It is used to call tasks created on PAI and schedule production activities based on the
node configuration. PAI nodes can be added to DataWorks only _________ .
A. after PAI experiments are created on PAI
B. after PAI service is activated
C. after MaxCompute service is activated
D. Spark on MaxCompute Machine Learning project is created
8 .In a scenario where a large enterprise plans to use MaxCompute to process and analyze its data, tens
of thousands of tables and thousands of tasks are expected for this project, and a project team of 40
members is responsible for the project construction and O&M. From the perspective of engineering,
which of the following can considerably reduce the cost of project construction and management?
A. Develop directly on MaxCompute and use script-timed scheduling tasks
B. Use DataWorks
C. Use Eclipse
D. Use a private platform specially developed for this project
My Answer: B
9 .AliOrg Company plans to migrate their data with virtually no downtime. They want all the data
changes to the source database that occur during the migration are continuously replicated to the
target, allowing the source database to be fully operational during the migration process. After the
database migration is completed, the target database will remain synchronized with the source for as
long as you choose, allowing you to switch over the database at a convenient time. Which of the
following Alibaba products is the right choice for you to do it:
A. Log Service
B. DTS(Data Transmission Service)
C. Message Service
D. CloudMonitor
My Answer: B
10 .There are three types of node instances in an E-MapReducecluster: master, core, and _____ .
A. task
B. zero-load
C. gateway
D. agent
11 .A dataset includes the following items (time, region, sales amount). If you want to present the
information above in a chart, ______ is applicable.
A. Bubble Chart
B. Tree Chart
C. Pie Chart
D. Radar Chart
12 .Alibaba Cloud Quick BI reporting tools support a variety of data sources, facilitating users to analyze
and present their data from different data sources. ______ is not supported as a data source yet.
A. Results returned from the API
B. MaxCompute
C. Local Excel files
D. MySQL RDS
My Answer: A. Another file says C. Siddesh corrected file C.
Correct Answer: A or C?
13 .DataV is a powerful yet accessible data visualization tool, which features geographic information
systems allowing for rapid interpretation of data to understand relationships, patterns, and trends.
When a DataV screen is ready, it can embed works to the existing portal of the enterprise through
______.
A. URL after the release
B. URL in the preview
C. MD5 code obtained after the release
D. Jar package imported after the release
My Answer: A. Another file says C.
14 .Where is the meta data(e.g.,table schemas) in Hive?
A. Stored as metadata on the NameNode
B. Stored along with the data in HDFS
C. Stored in the RDBMS like MySQL
D. Stored in ZooKeeper
My Answer: C. Siddesh corrected file A.
15 ._______ instances in E-MapReduce are responsible for computing and can quickly add computing
power to a cluster. They can also scale up and down at any time without impacting the operations of the
cluster.
A. Task
B. Gateway
C. Master
D. Core
My Answer: A
16 .Your company stores user profile records in an OLTP databases. You want to join these records with
web server logs you have already ingested into the Hadoop file system. What is the best way to obtain
and ingest these user records?
A. Ingest with Hadoop streaming
B. Ingest using Hive
C. Ingest with sqoop import
D. Ingest with Pig's LOAD command
My Answer: C.
My Answer: Other file says B. Siddesh corrected file B.
Correct Answer: B or C?
17 .You are working on a project where you need to chain together MapReduce, Hive jobs. You also
need the ability to use forks, decision points, and path joins. Which ecosystem project should you use to
perform these actions?
A. Spark
B. HUE
C. Zookeeper
D. Oozie
My Answer: Other file says C. Another file says C.
Ans: D
18 .Which node type in DataWorks can edit the Python code to operate data in MaxCompute?
A. PyODPS
B. ODPS MR Node
C. ODPS Script Node
D. SHELL node
My Answer: A
19 .DataService Studio in DataWorks aims to build a data service bus to help enterprises centrally
manage private and public APIs. DataService Studio allows you to quickly create APIs based on data
tables and register existing APIs with the DataService Studio platform for centralized management and
release. Which of the following descriptions about DataService Studio in DataWorks is INCORRECT?
A. DataService Studio is connected to API Gateway. Users can deploy APIs to API Gateway with one-
click.
B. DataService Studio adopts the serverless architecture. All you need to care is the query logic of APIs,
instead of the infrastructure such as the running environment.
C. To meet the personalized query requirements of advanced users, DataService Studio provides the
custom Python script mode to allow you compile the API query by yourself. It also supports multi-table
association, complex query conditions, and aggregate functions.
D. Users can deploy any APIs created and registered in DataService Studio to API Gateway for
management, such as API authorization and authentication, traffic control, and metering.
My Answer: C
20 .MaxCompute Tunnel provides high concurrency data upload and download services. User can use
the Tunnel service to upload or download the data to MaxCompute. Which of the following descriptions
about Tunnel is NOT correct:
A. MaxCompute Tunnel provides the Java programming interface for users
B. MaxCompute provides two data import and export methods: using Tunnel Operation on the console
directly or using TUNNEL written with java
C. If data fails to be uploaded, use the restore command to restore the upload from where it was
interrupted
D. Tunnel commands are mainly used to upload or download data.They provide the following
functions:upload, download, resume, show, purge etc.
My Answer: B
21 .Which of the following is not proper for granting the permission on a L4 MaxCompute table to a
user. (L4 is a level in MaxCompute Label-based security (LabelSecurity), it is a required MaxCompute
Access Control (MAC) policy at the project space level. It allows project administrators to control the
user access to column-level sensitive data with improved flexibility.)
A. If no permissions have been granted to the user and the user does not belong to the project, add the
user to the project. The user does not have any permissions before they are added to the project.
B. Grant a specific operation permission to the user.

C. If the user manages resources that have labels, such as datasheets and packages with datasheets,
grant label permissions to the user.
D. The user need to create a project in simple mode
My Answer: D. Other file says A. Another file says A. Siddesh corrected file A.
Correct Answer: A or D?
22 .MaxCompute supports two kinds of charging methods: Pay-As-You-Go and Subscription (CU cost).
Pay-As-You-Go means each task is measured according to the input size by job cost. In this charging
method the billing items do not include charges due to ______.
A. Data upload
B. Data download
C. Computing
D. Storage
My Answer: B
23 .MaxCompute is a general purpose, fully managed, multi-tenancy data processing platform for large-
scale data warehousing, and it is mainly used for storage and computing of batch structured data. Which
of the following is not a use case for MaxCompute?
A. Order management
B. Date Warehouse
C. Social networking analysis
D. User profile
24 .Tom is the administrator of a project prj1 in MaxCompute. The project involves a large volume of
sensitive data such as user IDs and shopping records, and many data mining algorithms with proprietary
intellectual property rights. Tom wants to properly protect these sensitive data and algorithms. To be
specific, project users can only access the data within the project, all data flows only within the project.
What operation should he perform?
A. Use ACL authorization to set the status to read-only for all users
B. Use Policy authorization to set the status to read-only for all users
C. Allow the object creator to access the object
D. Enable the data protection mechanism in the project, using set ProjectProtection=true;
25 .There are multiple connection clients for MaxCompute, which of the following is the easiest way to
configure workflow and scheduling for MaxCompute tasks?
A. Use DataWorks
B. Use Intelij IDEA
C. Use MaxCompute Console
D. No supported tool yet
26 . In MaxCompute, you can use Tunnel command line for data upload and download. Which of the
following description of Tunnel command is NOT correct:
A. Upload: Supports file or directory (level-one) uploading. Data can only be uploaded to a single table or
table partition each time.
B. Download: You can only download data to a single file. Only data in one table or partition can be
downloaded to one file each time. For partitioned tables, the source partition must be specified.
C. Resume: If an error occurs due to the network or the Tunnel service, you can resume transmission of
the file or directory after interruption.
D. Purge: Clears the table directory. By default, use this command to clear information of the last three
days.
My Answer: B. Siddesh corrected file D.
Correct Answer: B or D?
27 .Scenario: Jack is the administrator of project prj1. A new team member, Alice (already has an Alibaba
Cloud account alice@aliyun.com), applies for joining this project with the following permissions: view
table lists, submit jobs, and create tables. Which of the following SQL statements is useless:
A. use prj1;
B. add user aliyun$alice@aliyun.com;
C. grant List, CreateTable, CreateInstance on project prj1 to user aliyun$alice@aliyun;
D. flush privileges;
28 .commark is an open-source framework that functions on the service level to support data processing
and analysis operations. Equipped with unified computing resources and data set permissions, Spark on
MaxCompute allows you to submit and run jobs while using your preferred development methods.
Which of the following descriptions about Spark on MaxCompute is NOT correct:
A. Spark on MaxCompute provides you with native Spark Web UIs.
B. Different versions of Spark can run in MaxCompute at the same time.
C. Similar to MaxCompute SQL and MaxCompute MapReduce, Spark on MaxCompute runs in the unified
computing resources activated for MaxCompute projects.
D. Spark on MaxCompute has a separate permission system which will not allow users to query data
without any additional permission modifications required.
My Answer: B
29 .In MaxCompute command line, if you want to view all tables in a project, you can execute command:
______.
A. show tables;
B. use tables;
C. desc tables;
D. select tables;
30 .When odpscmd is used to connect to a project in MaxCompute, the command ______ can be
executed to view the size of the space occupied by table table_a.
A. select size from table_a;
B. size table_a;
C. desc table_a;
D. show table table_a;

True/False
31 .Data Migration Unit (DMU) is used to measure the amount of resources consumed by data
integration, including CPU, memory, and network. One DMU represents the minimum amount of
resources used for a data synchronization task.
True
False
My Answer: True
32 .DataWorks can be used to create all types of tasks and configure scheduling cycles as needed. The
supported granularity levels of scheduling cycles include days, weeks, months, hours, minutes and
seconds.
True
False
My Answer: False
33 .If a task node of DataWorks is deleted from the recycle bin, it can still be restored.
True
False
34 .If the DataWorks(MaxCompute) tables in your request belong to two owners. In this case, Data
Guard(DataWorks component) automatically splits your request into two by table owner.
True
False
35 .The FTP data source in DataWorks allows you to read/write data to FTP, and supports configuring
synchronization tasks in wizard and script mode.
True
False
My Answer: True
36 .In each release of E-MapReduce, the software and software version are flexible. You can select
multiple software versions.
True
False
37 .Alibaba Cloud Elastic MapReduce (E-MapReduce) is a big data processing solution to quickly process
huge amounts of data. Based on open source Apache Hadoop and Apache Spark, E-MapReduce flexibly
manages your big data use cases such as trend analysis, data warehousing, and analysis of continuously
streaming data.
True
False
My Answer: True
38 .An enterprise uses Alibaba Cloud MaxCompute for storage of service orders, system logs and
management data. Because the security levels for the data are different, it is needed to register multiple
Alibaba Cloud accounts for data management.
True
False
My Answer: False
39 .JindoFS in E-MapReduce provided by SmartData uses OSS as the storage back end.
True
False
My Answer: True
40 .In DataWorks table permission system, you can revoke permissions only on the fields whose security
level is higher than the security level of your account.
True
False
My Answer: True
41 .Project is an important concept in MaxCompute. A user can create multiple projects, and each object
belongs to a certain project.
True
False
My Answer: True
42 .Assume that Task 1 is configured to run at 02:00 each day. In this case, the scheduling system
automatically generates a snapshot at the time predefined by the periodic node task at 23:30 each day.
That is, the instance of Task 1 will run at 02:00 the next day. If the system detects the upstream task is
complete, the system automatically runs the Task 1 instance at 02:00 the next day.
True
False
My Answer: True
43 .In MaxCompute, if error occurs in Tunnel transmission due to network or Tunnel service, the user
can resume the last update operation through the command tunnel resume;.
True
False
My Answer: True
44 .A company originally handled the local data services through the Java programs. The local data have
been migrated to MaxCompute on the cloud, now the data can be accessed through modifying the Java
code and using the Java APIs provided by MaxCompute.
True
False
My Answer: True
45 .MaxCompute takes Project as a charged unit. The bill is charged according to three aspects: the
usage of storage, computing resource, and data download respectively. You pay for compute and
storage resources by the day with no long-term commitments.
True
False
My Answer: True
46 .There are various methods for accessing to MaxCompute, for example, through management
console, client command line, and Java API. Command line tool odpscmd can be used to create, operate,
or delete a table in a project.
True
False
My Answer: True
47 .A start-up company wants to use Alibaba Cloud MaxCompute to provide product recommendation
services for its users. However, the company does not have much users at the initial stage, while the
charge for MaxCompute is higher than that of ApsaraDB RDS, so the company should be recommended
to use MaxCompute service until the number of its users increases to a certain size.
True
False
48 .Synchronous development in DataWorks provides both wizard and script modes.
True
False
My Answer: True
49 .MaxCompute SQL is suitable for processing less real-time massive data, and employs a syntax similar
to that of SQL. The efficiency of data query can be improved through creating proper indexes in the
table.
True
False
My Answer: B. Siddesh corrected file says True.
Correct Answer: True or False?

50 .Table is a data storage unit in MaxCompute. It is a two-dimensional logical structure composed of
rows and columns. All data is stored in the tables. Operating objects of computing tasks are all tables. A
user can perform create table, drop table, and tunnel upload as well as update the qualified data in the
table.
True
False
My Answer: True
51 .Which of the following Hadoop ecosystem componets can you choose to setup a streaming log
analysis system?(Number of correct answers: 3)
A. Apache Flume
B. Apache Kafka
C. Apache Spark
D. Apache Lucene
52 .A distributed file system like GFS and Hadoop are design to have much larger block(or chunk) size
like 64MB or 128MB, which of the following descriptions are correct? (Number of correct answers: 4)
A. It reduces clients' need to interact with the master because reads and writes on the same block( or
chunck) require only one initial request to the master for block location information
B. Since on a large block(or chunk), a client is more likely to perform many operations on a given block, it
can reduce network overhead by keeping a persistent TCP connection to the metadata server over an
extended period of time
C. It reduces the size of the metadata stored on the master
D. The servers storing those blocks may become hot spots if many clients are accessing the same small
files
E. If necessary to support even larger file systems, the cost of adding extra memory to the meta data
server is a big price
My Answer: Other file says ABCDE.
Correct Answer: ABCD or ABCDE?
53 .MaxCompute can coordinate multiple users to operate one project through ACL authorization. The
objects that can be authorized by ACL include ______. (Number of correct answers: 3)
A. Project
B. Table
C. Resource
D. Procedure
E. Job
My Answer: ACD. Another file says ABC.
Correct Answer: ABC or ACD?
54 .DataWorks can be used to develop and configure data sync tasks. Which of the following statements
are correct? (Number of correct answers: 3)
A. The data source configuration in the project management is required to add data source
B. Some of the columns in source tables can be extracted to create a mapping relationship between
fields, and constants or variables can't be added
C. For the extraction of source data, "where" filtering clause can be referenced as the criteria of
incremental synchronization
D. Clean-up rules can be set to clear or preserve existing data before data write
My Answer: A,B,D. Another file say ABCD.
Correct Answer: ABD or ABCD?
55 .The data development mode in DataWorks has been upgraded to the three-level structure
comprising of _____, _____, and ______. (Number of correct answers: 3)
A. Project
B. Solution
C. Business flow
D. Directory
My Answer: A,B,C
56 .In DataWorks, we can configure alert policies to monitor periodically scheduled tasks, so that an
alert will be issued timely. Currently DataWorks supports ________ alerts. (Number of correct answers:
2)
A. Email
B. Text message
C. Telephone
D. Aliwangwang
My Answer: A,B
57 .DataWorks provides powerful scheduling capabilities including time-based or dependency-based

task trigger mechanisms to perform tens of millions of tasks accurately and punctually each day based
on DAG relationships. It supports multiple scheduling frequency configurations like: (Number of correct
answers: 4)
A. By Minute
B. By Hour
C. By Day
D. By Week
E. By Second
My Answer: A,B,C,D
58 .MaxCompute is a fast and fully-managed TB/PB-level data warehousing solution provided by Alibaba
Cloud. Which of the following product features are correct? ______ (Number of correct answers: 3)
A. Distributed architecture
B. High security and reliability
C. Multi-level management and authorization
D. Efficient transaction processing
E. Fast real-time response
My Answer: A,B,E
59 .Resource is a particular concept of MaxCompute. If you want to use user-defined

function UDF or MapReduce, resource is needed. For example: After you have prepared UDF, you must
upload the compiled jar package to MaxCompute as resource. Which of the following objects are
MaxCompute resources? (Number of correct answers: 4)
A. Files
B. Tables: Tables in MaxCompute

C. Jar: Compiled Java jar package
D. Archive: Recognize the compression type according to the postfix in the resource name
E. ACL Policy
My Answer: Other file says ABCD. Other file says ABCDE.
Correct Answer: ABCD or ABCDE?
60 .In order to ensure smooth processing of tasks in the Dataworks data development kit, you must
create an AccessKey. An AccessKey is primarily used for access permission verification between various
Alibaba Cloud products. The AccessKey has two parts, they are ____. (Number of correct answers: 2)
A. Access Username
B. Access Key ID
C. Access Key Secret
D. Access Password
My Answer: B,C
61. DataWorks uses MaxCompute as the core computing and storage engine to provide massive offline
data processing, analysis, and mining capabilities. It introduces both simple and standard modes
workspace. Which of the following descriptions about DataWorks Workspace and MaxCompute Project
is INCORRECT?
A. A simple mode refers to a DataWorks Workspace that corresponds to a MaxCompute Project and
cannot set up a development and Production Environment
B. The advantage of the simple mode is that the iteration is fast, and the code is submitted without
publishing, it will take effect. The risk of a simple mode is that the development role is too privileged to
delete the tables under this project, there is a risk of table permissions.
C. Standard mode refers to a DataWorks project corresponding to two MaxCompute projects, which can
be set up to develop and produce dual environments, improve code development specifications and be
able to strictly control table permissions, the operation of tables in Production Environments is
prohibited, and the data security of production tables is guaranteed.
D. All Task edits can be performed in the Development Environment, and the Production Environment
Code can also be directly modified
My Answer: B
Correct Answer: B?
62. MaxCompute provides SQL and MapReduce for calculation and analysis service. Which of the
following descriptions about MaxCompute and SQL is NOT correct:
A. In MaxCompute, data is stored in forms of tables, MaxCompute provides a SQL query function for the
external interface
B. You can operate MaxCompute just like traditional database software, but It is worth to mention that
MaxCompute SQL does not support transactions, index and Update/Delete operations
C. MaxCompute SQL syntax differs from Oracle and MySQL, so the user cannot migrate SQL statements
of other databases into MaxCompute seamlessly
D. MaxCompute SQL can complete the query in minutes even seconds, and it can be able to return the
result in millisecond without using other process engine.
My Answer: D
Correct Answer: D?
63. DataWorks provides scheduling capabilities including time-based or dependency-based task trigger
functions to perform tens of millions of tasks accurately and timely each day, based on DAG
relationships. Which of the following descriptions about scheduling and dependency in DataWorks is
INCORRECT?
A. Users can configure an upstream dependency for a task. In this way, even if the current task instance
reaches the scheduled time, the task only run after the instance upstream task is completed.
B. If no upstream tasks is configured then, by default the current task is triggered by the project. As a
result, the default upstream task of the current task is project_start in the scheduling system. By default,
a project_start task is created as a root task for each project.
C. If the task is submitted after 23: 30, the scheduling system automatically cycle-generate instances
from the second day and run on time.
D. The system automatically generates an instance for the task at each time point according to the
scheduling attribute configuration and periodically runs the task from the second day only after a task is
submitted to the scheduling system.
My Answer: D
Correct Answer: D?
64. E-MapReduce simplifies big data processing, making it easy, fast, scalable and cost-effective for you
to provision distributed Hadoop clusters and process your data. This helps you to streamline your
business through better decisions based on massive data analysis completed in real time. Which of the
following descriptions about E-MR is NOT true?
A. E-MapReduce allows you simply select the required ECS model (CPU or memory) and disks, and the
required software for automatic deployment
B. Saves extra overheads involved in managing the underlying instances
C. Seamless integration with other Alibaba Cloud products to be used as the input source or output
destination of Hadoop/Spark calculation engine
D. It supports the Pay-As-You-Go payment method, which means that the cost of each task is measured
according to the input size
My Answer: B
Correct Answer: B?
65. When a local file is updated to Quick BI for presentation, the data is stored in ______.
A. Exploration space built in Quick BI
B. MaxCompute built in Quick BI
C. AnalyticDB
D. Client local cache
66. Which HDFS daemon or service manage all the meta data stored in HDFS?
A. secondary namenode
B. namenode
C. datanode
D. node manager
67. Which of the following descriptions about MaxCompute security is NOT correct:
A. MaxCompute supports two account systems: the Alibaba Cloud account system and RAM user system
B. MaxCompute recognizes RAM users but cannot recognize RAM permissions. That is, you can add RAM
users under your Alibaba Cloud account to a MaxCompute project. However, MaxCompute does not
consider the RAM permission definitions when it verifies the permissions of RAM users.
C. LabelSecurity is a workspace-level mandatory access control (MAC) policy that enables workspace
administrators to control user access to row-level sensitive data more flexibly.
D. MaxCompute users can share data and resources, such as tables and functions, among workspaces by
using packages.
My Answer: B
Correct Answer: B?
68. MaxCompute SQL is suitable for the scenarios: there is massive data (TB level) to be processed and
real-time requirement is not high. It takes seconds or even minutes to prepare each job and submit each
job, so MaxCompute SQL is not acceptable for the services which need to process thousands to tens of
thousands of transactions per second. Which of the following descriptions about MaxCompute SQL is
NOT correct:
A. The synax of ODPS SQL is similar to SQL. It can be considered as a subset of standard SQL
B. MaxCompute SQL is not equivalent to a database, which has no database characteristics in many
aspects, such as transaction, primary key constraints, index
C. At present, the maximum length of SQL in MaxCompute is 2MB
D. MaxCompute SQL is 100% equivalent to Hive SQL
69. By default, the resource group in DataWorks provides you 50 slots and each DMU occupies 2 slots.
This means the default resource group supports 25 DMUs at the same time.
True
False
70. JindoFS is a cloud-native file system that combines the advantages of OSS and local storage. JindoFS
is also the next-generation storage system that provides efficient and reliable storage services for cloud
computing. To use JindoFS, select the related services when creating an E-MapReduce cluster.
True
False
My Answer: A
71. A partition table can be created through the following statement in MaxCompute SQL:
create table if not exists t_student(

name string,
number string)
partitioned by ( department string);
True
False
72. E-MapReduce(EMR) Auto Scaling feature is designed to reduce costs and improve execution
efficiency. which of the following descriptions about EMR Auto Scaling are correct? (Number of correct
answers: 3)
A. Auto Scaling only supports scaling in and scaling out a cluster by adding or removing task nodes.
B. Scale by Time is recommended as the rule type if you can specify the time to scale a cluster.
C. Scale by Rule is recommended as the rule type if you cannot specify the time to scale a cluster and
need to add and remove computing resources based on the specified YARN metrics.
D. Auto Scaling only supports Pay-As-You-Go Hadoop clusters.
My Answer: A,B,D
Correct Answer: ABD?
73. DataWorks App Studio is a tool designed to facilitate your data product development. It comes with
a rich set of frontend components that you can drag and drop to easily and quickly build frontend apps.
With App Studio, you do not need to download and install a local integrated development environment
(IDE) or configure and maintain environment variables. Instead, you can use a browser to write, run, and
debug apps and enjoy the same coding experience as that in a local IDE. App Studio also allows you to
publish apps online. Which of the following descriptions about APP Studio in DataWorks is CORRECT?
A. App Studio comes with all breakpoint types and operations of a local IDE. It supports thread switching
and filtering, variable viewing and watching, remote debugging, and hot code replacement.
B. You can directly access the runtime environment, which is currently built based on MacOS as the base
image.
C. You and your team members can use App Studio to share the development environment for
collaborative coding.
D. App Studio supports real-time collaborative coding. Multiple collaborators of a team can develop and
write code at the same time in the same project, and view changes made by other collaborators in real
time. This feature helps avoid the hassle of synchronizing code and merging branches and significantly
improve the development efficiency.
E. APP Studio is included in Basic edition of DataWorks
My Answer: A,B,C
Correct Answer: ABC?
74. MaxCompute Graph is a processing framework designed for iterative graph computing.
MaxCompute Graph jobs use graphs to build models. Graphs are composed of vertices and edges.
Which of the following operations can MaxCompute support? (Number of correct answers: 3)
A. Modify the value of a vertex or edge.
B. Add/delete a vertex.
C. Add/delete an edge.
D. When editing a vertex and an edge, you don't have to maintain their relationship.
75. There are various methods for connecting and using MaxCompute, which of the following options
have lower thresholds for the size of uploading file? ______. (Number of correct answers: 2)
A. DataWorks
B. IntelliJ IDEA
C. MaxCompute Tunnel
D. Alibaba DTS
My Answer: A,D

ACA BigData Consolidated Dump

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ACA BigData Consolidated Dump

Uploaded by

Copyright:

Available Formats

ACA Sample exam Questions

2. Where is the meta data (e.g.,table schemas) in Hive?

3. MaxCompute tasks contain computational tasks and non-computational tasks. The

3. In order to improve the processing efficiency when using

a) This data is streamed and processed using Apache

A. Spark SQL for SQL and structured data processing

B. MLlib for machine learning

My Answer: D. Another file says A. Siddesh corrected file A.

A. Shared resource groups for scheduling and Data Integration instances

B. Baseline instances generated by Intelligent Monitor and Data Quality checks

C. Task nodes created by developer

D. Calls and execution time of APIs compiled in DataService Studio

A. Full-Database Migration feature

B. Configure a MySQL Reader plug-in

C. Configure a MySQL Writer plug-in

D. Add data sources in Bulk Mode

A. after PAI experiments are created on PAI

B. after PAI service is activated

C. after MaxCompute service is activated

D. Spark on MaxCompute Machine Learning project is created

A. Develop directly on MaxCompute and use script-timed scheduling tasks

D. Use a private platform specially developed for this project

B. DTS(Data Transmission Service)

A. Results returned from the API

C. Local Excel files

My Answer: A. Another file says C. Siddesh corrected file C.

A. URL after the release

B. URL in the preview

C. MD5 code obtained after the release

D. Jar package imported after the release

My Answer: A. Another file says C.

14 .Where is the meta data(e.g.,table schemas) in Hive?

A. Stored as metadata on the NameNode

B. Stored along with the data in HDFS

C. Stored in the RDBMS like MySQL

My Answer: C. Siddesh corrected file A.

A. Ingest with Hadoop streaming

B. Ingest using Hive

C. Ingest with sqoop import

D. Ingest with Pig's LOAD command

My Answer: Other file says C. Another file says C.

C. ODPS Script Node

A. MaxCompute Tunnel provides the Java programming interface for users

B. Grant a specific operation permission to the user.

D. The user need to create a project in simple mode

C. Social networking analysis

B. Use Intelij IDEA

C. Use MaxCompute Console

D. No supported tool yet

My Answer: B. Siddesh corrected file D.

B. add user aliyun$alice@aliyun.com;

C. grant List, CreateTable, CreateInstance on project prj1 to user aliyun$alice@aliyun;

A. Spark on MaxCompute provides you with native Spark Web UIs.

B. Different versions of Spark can run in MaxCompute at the same time.

A. select size from table_a;

D. show table table_a;

48 .Synchronous development in DataWorks provides both wizard and script modes.

My Answer: B. Siddesh corrected file says True.

Correct Answer: True or False?

C. It reduces the size of the metadata stored on the master

My Answer: Other file says ABCDE.

Correct Answer: ABCD or ABCDE?

My Answer: ACD. Another file says ABC.

Correct Answer: ABC or ACD?