Professional Documents
Culture Documents
1. Data Architect
The Data Architect is responsible for establishing and maintaining the company's data storage
related equipment and technical benchmarks, planning the hardware and software operational
architecture, and ensuring that the overall data storage system can support future data volume and
analysis requirements.
Construct the overall system infrastructure and architecture
2. Data Scientist
Statisticians (statisticians) who can extract information from large data sets through big data
analysis and perform statistics and inferences
Data analysts who are able to perform analysis and draw conclusions or evaluate
recommendations with skill
General term for "senior data analysts"
3. Domain Expert
Have sufficient knowledge and experience in their respective areas of expertise and have a certain
level of problem solving ability in that area
Usually act as a support or advisor to the data analysts in their analysis, either as a solution
provider or as a proxy for evaluating the thinking of potential users (proxy opinion)
Need to coordinate with data analysts or through project managers to translate their expertise into
knowledge and rules in the data system
4. Data Analyst
Data analysts mainly analyze and interpret the organized data and try to get information from it to
draw conclusions for judgment or to identify trends for prediction.
Analyze and apply the data to the stage of processing
The nature of the work is more exploratory, that is, trying to solve unknown problems, not
necessarily find the right solution, and there is no guarantee that there will be results (even
difficult to predict the results) and it is difficult to guarantee business results, but the results can
usually significantly affect the operation of the company
5. Data Engineer
Data engineers are mainly responsible for ensuring the source, collection, import and pre-
processing of data, as well as confirming and integrating the establishment, structure and setting of
data systems and frameworks in the enterprise.
Cleaning, conversion, and integration for data collection and processing stages
Focus on the development and maintenance of server-side and database related functions, the
nature of work is basically similar to that of a software engineer.
1. What are the talent demand directions of big data analysis? (three)
Big data architecture: the level/direction of infrastructure and data architecture
Big data analysis: system modeling and data analysis
Big data development: focus on data application and construction implementation level/direction
4. Please list the five job titles and the requirement abilities of big data analysis talents.
Chief Information Officer/Project Manager: Strategic analysis, team management, corporate
communication, direction planning, data science foundation management and application
knowledge, etc...
Data scientists/field experts: problem definition/clarification, logical thinking, cross-disciplinary
integration/collaboration, programming and system development, machine learning and artificial
intelligence, etc...
Data Analyst: Statistical analysis, database application, programming and development, data
exploration, big data processing, etc...
Data Architect: Data storage management, relational database system, decentralized data storage
system, system architecture planning and integration, etc...
Data Engineer: Open source software frameworks/system applications (Hadoop, Spark, etc.),
programming and system development, development of ETL (extraction/transformation/loading)
processes, establishment of data pipelines, system integration, etc....
5. What are the main considerations when optimizing the big data analysis system architecture.
Trade-off between speed and resources
Trade-off between stability and resources
Tradeoff between expansion and resources
6. What are the trade-offs between speed of system and resources.
Whether the processing structure is unreasonable or not
Whether the logic is unreasonable or not
Whether there is unreasonable allocation of resources or insufficient resources
Whether the data storage and access structure is unreasonable
9. Please list the contents of system architecture optimization (software and hardware system).
Data collection: [Soft] Network data collection (crawlers), operational data collection (logs)
[Hard] Enterprise network (intranet/extranet), data storage server
Data storage: [soft] data warehousing, relational database (structured / unstructured)
[Hard] Distributed data system, data storage server, memory storage
Data processing: [Soft] Batch processing, message queues, real-time processing
[Hard] Memory Storage, Memory Computing, Computing Server
Data Retrieval: [Soft] Query Matching, Data Correlation, Distributed Search
[Hard] Distributed Data Systems, Data Storage Computing Servers, Memory
Storage and Computing
Data Mining: [Soft] Data Mining, Machine Learning
[Hard] Data Storage Computing Server, Memory Storage Computing
10. Please list the data analysis and processing stages with the corresponding tools.
11. What's the environment and content of the data visualization tools.
Currently, most of them are built based on web environment.
Mainly using HTML5 (tagging syntax), CSS3 (hierarchical style sheet), with various
JavaScript Library, and HTML DOM (Document Object Model) containers or APIs to
implement the support module to build a visual interface
The application is usually combined with Apache HTTP server (web server) and SQL server
(database server), and JSON (JavaScript Object Notation) is usually used to convert
structured data into JavaScript objects for use.
Open source tools are mainly implemented in the form of JavaScript Library, which gives
users the ability to quickly create corresponding charts, automatic updates, and animated
updates through APIs.
It also provides access to control elements written in other programming languages (e.g.,
diagram elements written in C/C++ in Visual Studio) through JavaScript.
https://docs.google.com/document/d/1efn7KleBV5Gc5Aa_oxa4LaICoZ-
ea5wH/edit?usp=sharing&ouid=100235357095740738175&rtpof=true&sd=true