Professional Documents
Culture Documents
com
DAeardaptmvn
ceetnf l eecftrirger
doER ic aalti
on Coanntdrol Ai
Lr aCbo.nDiti ia l eEin
ycEtrnigcne
dpenoairntgmaenndtEonfeErgle rng,g
inNeaetrioinnaglNChaitnioYniaUl nCivheursnitgy-oHfsTinech
gnUonloivgeyrsity
Course outline
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 2/35
Big Data Analysis and Big Data Talents
- Data analysis has become a development trend in the information age, which has led to an increase in the vacancies of data talents
and improved treatment. Data talents are even more sought-after in European and American countries.
-In response to the data collection, processing, analysis and application of big data, the professionals required for many important steps
can gradually become independent or transform into important occupations in response to the importance of the existing work and
- However, because enterprises are still groping for the positioning and definition of data talents, there are often cases where the
content and nature of work are confused or the division of labor is unclear, and they are directly referred to as data analysts
or engineers, but the business content and nature are very different.
- Such as data/data engineer (Data Engineer)Most of them are from software engineers, but they focus on the construction
and application of data analysis systems and the automation of processing processes such as data collection and
cleaning, conversion and integration, and sometimes they may have to assist in partial analysis.
- Other common ones are data/data scientists (Data Scientist), Data/Data Analyst (
Data Analyst), Data/Data Architect (Data Architect)other occupational types
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 3/35
Big Data Analysis and Big Data Talents
- The demand for data talents in the industry can be classified according to different demand levels/directions:
- Big Data Architecture: Emphasis on the level/direction of infrastructure and data architecture
- Focus on the implementation principles, deployment, optimization and stability issues of various open source framework software,
and build a basic environment for big data applications with data flow tools and visualization tools
-Big data analysis: Emphasis on the level/direction of system modeling and data analysis
- Focus on statistical analysis, index establishment, data correlation, deep mining and investment in machine learning based
on the collected data and business content, so as to obtain information to analyze and reason, draw conclusions or
predict possibilities and make suggestions, or cooperate with the field Special suggestions and proposals
- Big Data Development: Emphasis on the level/direction of data application and construction implementation
- Focus on data application system, server side, database development, and related application software development,
data operation interface, data carrier connection, data processing and client application development, etc., based
on the proficiency of data application content to quickly meet the needs perform construction
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 4/35
Big Data Analysis and Big Data Talents
- data architect Mainly responsible for establishing and maintaining relevant equipment and technical benchmarks for
company data storage, planning the operation structure of hardware and software, and ensuring that the overall
data storage system can support future data volume and analysis needs
-The main development directions and common technical tools of big data architecture include:
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 5/35
Big Data Analysis and Big Data Talents
- Data Analyst Mainly through the analysis and interpretation of the sorted data, try to obtain
information from it to draw conclusions for judgment or find out trends for prediction
-The nature of the work is more inclined to the exploration type, that is, trying to solve unknown problems, not necessarily
finding a correct solution, nor guaranteeing results (even difficult to predict results) and difficult to guarantee the
output of business results, but the results usually can be greatly affected Operation of the company
-Belongs to the core business of big data processing, so if the team is not enough, it is usually realized by
taking data analysts as the core and taking on other aspects of the business.
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 6/35
Big Data Analysis and Big Data Talents
-The main development directions and common technical tools of big data analysis include:
- Data analysis: analysis through data collection (mining), clustering, classification, regression analysis, data
modeling, machine learning, business knowledge, domain knowledge, etc.
- From processing (pre-processing), statistics to analysis, commonly used general tools includeR, SAP,
SPSS, SSAS, SSRS, ExcelWait…
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 7/35
Big Data Analysis and Big Data Talents
- Data analysts can usually be roughly divided into two categories: technical orientation and business orientation. The
capabilities and work content of the two are quite different, and the requirements for tools are also different.
-Technical orientation is mainly biased towards the application of big data in all aspects, including data application in
-Business orientation mainly focuses on data application in combination with work business content/
professional fields, including project analysis, cost calculation, business planning, etc.
-The data application methods and processing stages of the two are often different, and they usually have the ability to
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 8/35
Big Data Analysis and Big Data Talents
-It is closer to the feeling of taking the role of an analyst with a statistician with a software engineer background, and it is
-According to different work links, it is usually divided into database engineers, database engineers,ETL
-Usually subdivided into different groups such as data warehousing, thematic analysis, modeling analysis, data governance, etc.
-The work content is combined with data acquisition, data sorting, database management, data algorithm development, and
report design. In this way, the data scattered in various places can be collected and calculated into commonly used indicators,
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 9/35
Big Data Analysis and Big Data Talents
- Business data analysts, often serving in the operations department, marketing department, sales department,
etc.
- Closer to a traditional analytical researcher, but requires a deep understanding of the business domain
-It can be roughly divided into data operation, business analysis, member analysis, business analyst and other roles
-The specific problems, analysis ideas and systems of different business contents are different, but generally the processing
logic projected to the data analysis is analogized through the analysis logic of the work business
-The work content is mainly to organize business reports, do special analysis for specific businesses,
and measure, plan, and plan application data for business growth.
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 10/35
Big Data Analysis and Big Data Talents
- data scientist
- Statistician (statistician) who can extract information from large data sets and make statistics and
- Data analysts who are skilled in analyzing and deriving conclusions or evaluating recommendations
- domain expert
-Have sufficient knowledge and experience in their respective professional fields, and have a certain degree of
- Requires data analysts to communicate and coordinate with project managers to translate their expertise into
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 11/35
Big Data Analysis and Big Data Talents
- data engineer Mainly responsible for ensuring the source of data, collecting and importing and pre-
processing, as well as confirming and integrating the establishment, structure and setting of data systems
-Cleaning, converting, integrating and other processing for data collection and processing
- Focus more on the development and maintenance of server-side and database-related functions, and the nature of work is basically
-The main development direction and common technical tools of big data development include:
- Data acquisition and development:Python, Embedded Controller Development Language (C/C++, Wiring)Wait… Applied to
web crawler, word segmentation, semantic analysis, natural language learning and other applications
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 12/35
Big Data Analysis and Big Data Talents
- The demand for data talents in the information age is quite diverse. To face larger, more diverse, and
more complex information, in addition to the ability to quickly process huge amounts of data, it is also
necessary to master data structures, programming languages, applied statistics, data mining,
-However, there is too much professional knowledge in various fields. Even if individual members are recruited according to the
classification of talent needs, it is easy to fail to integrate smoothly due to the communication gap in expressing their
-A project manager with professional management knowledge and basic data science management and application knowledge can be used as the
team leader to establish a professional data team, division of labor and cooperation, each performing its own duties
- Just like the common software technical team today, from a small number of members to deal with the vast and complex business in the past, it
gradually formed a complete team through the process of business run-in, work refinement, and organizational reorganization.
- Most companies expect to build a new or separate data technical team from the original software technical team quickly based on the experience
established by the software technical team, and establish an independent operating data technical team to start operation. However, it still
takes time for continuous operation and running-in to optimize the team structure.
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 13/35
Big Data Analysis and Big Data Talents
Data Analyst Statistical analysis, database application, programming and development, data mining, big
data architect Data warehousing management, relational database system, distributed data storage system system ,
(Data Architect) architecture planning and integration, etc...
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 14/35
Big Data Architecture and Architecture Optimization
- For the construction of commercial big data systems, from the processing flow/operations performed to
the software and hardware systems that need to be constructed, mainly include:
-Data collection: [soft] network data collection (crawlers), job data collection (logs)
[Hard] Enterprise network (intranet/extranet), data storage server
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 15/35
Big Data Architecture and Architecture Optimization
- The optimization of the overall system, in addition to adjusting the cluster structure of the storage and
computing servers according to the needs, the memory storage and computing servers (In-memory
Server) Whether it is independent or not, adjusting network bandwidth and distinguishing between
internal and external networks for data security, analyzing and confirming software, resources and links
- The main aspects to be considered can be divided into the following aspects:
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 16/35
Big Data Architecture and Architecture Optimization
- Large tasks take too long to execute - try splitting tasks and processing them in parallel
- Small task response time is too long - try to improve program structure or data storage speed
- For example, when the distributed parallel computing function is introduced for batch processing content, whether there is a causal relationship
between the data or the sequence of calculation, which leads to the need to wait for the previous-level calculation results to be idle and waste
calculation resources
- For example, the virtual machine (KVM),container(Docker), allocate server resources to too many
virtual servers or duplicate configurations
- Confirm whether it is necessary to import according to the frequency of data use or demandIn-memoryTechnology
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 17/35
Big Data Architecture and Architecture Optimization
- best experience
- Through decentralized queuing, resource locking and other mechanisms to avoid system errors
- Evaluate and disperse system stress to avoid system damage and rebuilding costing a lot of manpower, time and money
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 18/35
Big Data Architecture and Architecture Optimization
- Extended Features of Data Nodes in Distributed Data Systems Quickly Configure Server Clusters
- The way the database is created will affect the way the index table is built (total table or data chain), thus affecting the query speed
- When unstructured data is stored in a decentralized manner, whether the fragmented file block storage is allocated considering
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 19/35
Big data tools and their applications
- For the application of data and the transmission and evolution of its life cycle, it can be roughly divided
- data collection: Through manual import, the program automatically captures and records the information, analyzes the stream
or network connection, parses or builds a table, connects with other databases, etc. to collect, and then performs pre-
processing including cleaning, conversion and integration (sorting and sorting) ) to form the original data
- data storage: After sorting and sorting out the collected data, the structured or unstructured data is stored
separately (data table or data directory). Currently, the more common practice is to establish a data warehouse
- data modeling: A model is formed by sorting out the mathematical relationship between the data and
establishing a certain data calculation method or data index. Sometimes additional processing is required to
- ANALYSE information: Attempts to seek logic of causality or influence between data, or to make appropriate
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 20/35
Data analysis and processing stages and corresponding tools
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 21/35
Ha
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 22/35
S
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 23/35
Had
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 24/35
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 25/35
Organizing common tools
- Log collection:Flume,Scribe,Logstash,Kibana
- Message queue:Kafka,StormMQ,ZeroMQ,RabbitMQ
- Query analysis:Hive,Impala,Pig,Presto,Phoenix,
Spark SQL,Drill,Flink,Kylin,Druid
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 26/35
Organizing common tools
- Data synchronization:Sqoop
- Task scheduling:Oozie
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 27/35
ApacheOpen source project/framework/tool
- Hadoop: Open source cluster computing framework, using a decentralized file systemHDFS, with a distributed
computing system (parallel computing framework) that integrates the concepts of mapping and induction
MapReduceto apply
- Spark: An open-source cluster computing framework that references the Elastic Distributed Dataset (
- Nutch: Open source extensible web crawler (Crawler)with the query (Searcher)engine
- Flume: An open source decentralized log collection system that can process data aggregation from a variety of
sources, including network communication data, social media data, email data, and event data, etc.
- Kafka: An open source streaming platform that can be viewed as a large-scale publish/subscribe message queue according to a
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 28/35
ApacheOpen source project/framework/tool
- Hive: An open source data warehousing and analysis suite that maps structured data files to database tables, which can be
- Cassandra: open source decentralizedNoSQLDatabase system, including data model and fully
decentralized architecture
- PrestoDB: High performance decentralizedSQLA query engine that can target databases of
different source types (such asMySQL, PostgreSQL, AWS Redshift, MS SQL Server,
TeradataEqual relational database; orHDFS, AWS S3, Cassandra, Kafka andMongoDBand
other non-relational databases) to query at the same time
compatible with most browsers for the visualization of charts and geographic maps
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 29/35
Visual application tool
- also throughJavaScriptAccess control elements written in some other programming language (such
as Visual StudioChina and IsraelC/C++authored chart element)
- There is also a lot of business wisdom (BI)Using visual analysis software, throughGUICreate a visual
analysis report/panel in an interactive way, and most of them also support publishing as a web
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 30/35
Glossary
- data architect
Data Architect
- data scientist
Data Scientist
- domain expert
Domain Expert
- Data Analyst
Data Analyst
- data engineer
Data Engineer
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 31/35
Glossary
- Hadoop
- Spark
- HDFS
- Nutch
- Flume
- Hive
- HBase
- EChart
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 32/35
problem discussion
- What are the three directions/levels of data talent needs in the industry?
What are the talent demand directions of big data analysis? (three)
- From the three directions/levels of data talent needs in the industry, which four emerging
- Please list the job titles of big data analysis talents and their required basic abilities (five types)
Please list the five job titles and the requirement abilities of big data
analysis talents.
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 33/35
problem discussion
- What are the main aspects to be considered when optimizing a big data architecture?
What are the main considerations when optimizing the big data
analysis system architecture.
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 34/35
problem discussion
- Please list the optimization contents of the software and hardware system architecture corresponding to the processing flow/
work performed
- Please list the data analysis and processing stages and their corresponding tools
Please list the data analysis and processing stages with the
corresponding tools.
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology 35/35
Q&A
Ddepartment of Refrigeration and Air Conditioning and Energy Eengineering, National Chin Yi University of Ttechnology