You are on page 1of 4

Assigned: 2021/12/28

大數據分析 (Big Data Analysis) SOL05 Due: -

第一部份:名詞解釋 (20 pts., each 4 pts.)


Part I: Glossary

1. Data Architect
 The Data Architect is responsible for establishing and maintaining the company's data storage
related equipment and technical benchmarks, planning the hardware and software operational
architecture, and ensuring that the overall data storage system can support future data volume and
analysis requirements.
 Construct the overall system infrastructure and architecture

2. Data Scientist
 Statisticians (statisticians) who can extract information from large data sets through big data
analysis and perform statistics and inferences
 Data analysts who are able to perform analysis and draw conclusions or evaluate
recommendations with skill
 General term for "senior data analysts"

3. Domain Expert
 Have sufficient knowledge and experience in their respective areas of expertise and have a certain
level of problem solving ability in that area
 Usually act as a support or advisor to the data analysts in their analysis, either as a solution
provider or as a proxy for evaluating the thinking of potential users (proxy opinion)
 Need to coordinate with data analysts or through project managers to translate their expertise into
knowledge and rules in the data system

4. Data Analyst
 Data analysts mainly analyze and interpret the organized data and try to get information from it to
draw conclusions for judgment or to identify trends for prediction.
 Analyze and apply the data to the stage of processing
 The nature of the work is more exploratory, that is, trying to solve unknown problems, not
necessarily find the right solution, and there is no guarantee that there will be results (even
difficult to predict the results) and it is difficult to guarantee business results, but the results can
usually significantly affect the operation of the company

5. Data Engineer
 Data engineers are mainly responsible for ensuring the source, collection, import and pre-
processing of data, as well as confirming and integrating the establishment, structure and setting of
data systems and frameworks in the enterprise.
 Cleaning, conversion, and integration for data collection and processing stages
 Focus on the development and maintenance of server-side and database related functions, the
nature of work is basically similar to that of a software engineer.

第二部份:簡答題 (80 pts., each 8 pts.)


Part II: Short-Answer Type

1. What are the talent demand directions of big data analysis? (three)
 Big data architecture: the level/direction of infrastructure and data architecture
 Big data analysis: system modeling and data analysis
 Big data development: focus on data application and construction implementation level/direction

2. What are the emerging jobs of big data analysis? (four)


 Data Architect
 Data Scientist
 Data Analyst
 Data Engineer

3. The categories that data analysts? (two)


 Technical orientation
 Business orientation

4. Please list the five job titles and the requirement abilities of big data analysis talents.
 Chief Information Officer/Project Manager: Strategic analysis, team management, corporate
communication, direction planning, data science foundation management and application
knowledge, etc...
 Data scientists/field experts: problem definition/clarification, logical thinking, cross-disciplinary
integration/collaboration, programming and system development, machine learning and artificial
intelligence, etc...
 Data Analyst: Statistical analysis, database application, programming and development, data
exploration, big data processing, etc...
 Data Architect: Data storage management, relational database system, decentralized data storage
system, system architecture planning and integration, etc...
 Data Engineer: Open source software frameworks/system applications (Hadoop, Spark, etc.),
programming and system development, development of ETL (extraction/transformation/loading)
processes, establishment of data pipelines, system integration, etc....

5. What are the main considerations when optimizing the big data analysis system architecture.
 Trade-off between speed and resources
 Trade-off between stability and resources
 Tradeoff between expansion and resources
6. What are the trade-offs between speed of system and resources.
 Whether the processing structure is unreasonable or not
 Whether the logic is unreasonable or not
 Whether there is unreasonable allocation of resources or insufficient resources
 Whether the data storage and access structure is unreasonable

7. What are the trade-offs between stability of system and resources.


 A fast server is better than a slow server
 A slow server is better than a crashed server
 Better to have a server on standby than a broken server
 Evaluation of Local Serving and Cloud Service Mixing
 Heat dissipation and stability of server hardware

8. What are the trade-offs between expansibility of system and resources.


 Choice of system architecture
 Associative database creation and file storage methods
 Creation and selection of data/server nodes

9. Please list the contents of system architecture optimization (software and hardware system).
 Data collection: [Soft] Network data collection (crawlers), operational data collection (logs)
[Hard] Enterprise network (intranet/extranet), data storage server
 Data storage: [soft] data warehousing, relational database (structured / unstructured)
[Hard] Distributed data system, data storage server, memory storage
 Data processing: [Soft] Batch processing, message queues, real-time processing
[Hard] Memory Storage, Memory Computing, Computing Server
 Data Retrieval: [Soft] Query Matching, Data Correlation, Distributed Search
[Hard] Distributed Data Systems, Data Storage Computing Servers, Memory
Storage and Computing
 Data Mining: [Soft] Data Mining, Machine Learning
[Hard] Data Storage Computing Server, Memory Storage Computing
10. Please list the data analysis and processing stages with the corresponding tools.

Offline interface/report: Excel, PowerPoint, Tableau...


Visualization Offline programming: R, SAS, Rython, Processing, …
Online interface: Echarts, Tagxedo, Cloud service, ...

Data Interface operation: Excel, SPSS, …


Analysis Programming: VBA, Python, R, SAS, ...

Interface operation: SPSS, ...


Data
Programming (Special languages):R, SAS, ...
Modeling Programming (General propose languages): Python, ...

Database: SQL, Hadoop, Hive, …


Data
Interface operation: Excel, SPSS, …
Storage
Programming: VBA, Python, R, SAS, ...

Data Database: SQL, Hadoop, Hive, …


Collecting Crawler: Python, Java, php, C/C++, ...

11. What's the environment and content of the data visualization tools.
 Currently, most of them are built based on web environment.
 Mainly using HTML5 (tagging syntax), CSS3 (hierarchical style sheet), with various
JavaScript Library, and HTML DOM (Document Object Model) containers or APIs to
implement the support module to build a visual interface
 The application is usually combined with Apache HTTP server (web server) and SQL server
(database server), and JSON (JavaScript Object Notation) is usually used to convert
structured data into JavaScript objects for use.
 Open source tools are mainly implemented in the form of JavaScript Library, which gives
users the ability to quickly create corresponding charts, automatic updates, and animated
updates through APIs.
 It also provides access to control elements written in other programming languages (e.g.,
diagram elements written in C/C++ in Visual Studio) through JavaScript.

https://docs.google.com/document/d/1efn7KleBV5Gc5Aa_oxa4LaICoZ-
ea5wH/edit?usp=sharing&ouid=100235357095740738175&rtpof=true&sd=true

You might also like