You are on page 1of 4

Nama: Yasa Hapipudin

Kelas: 122 SA
NPM: 22262011235

Tugas Analitika Data


Halaman 23
1. What are the three characteristics of Big Data, and what are the main considerations
in processing Big Data?
2. What is an analytic sandbox, and why is it important?
3. Explain the differences between BI and Data Science.
4. Describe the challenges of the current analytical architecture for data scientists.
5. What are the key skill sets and behavioral characteristics of a data scientist?

Answer
1. Three characteristics define Big Data: volume, variety, and velocity. Together, these
characteristics define “Big Data”.
Big data processing involves several considerations such as storage, processing power,
and data management. Storage: Big data requires large amounts of storage space, so
it's important to have a system that can accommodate the volume of data collected.
Processing power: Big data requires significant processing power to analyze and
understand the data. This can be achieved through distributed computing, parallel
processing, or other advanced computing techniques. Data management: Big data can
come from various sources and in different formats, so it is important to have a
system to manage and organize the data. This includes data cleansing, data integration,
and data governance. Additionally, considerations such as data privacy, security, and
ethical use of data are also important when processing big data.
2. This is part of your data warehouse area, where you can perform experimental and
development work on your analytical systems.
This is important because it lets you know that what you plan to implement is actually
working (called “staging”), before you deploy it to a live system that could impact
customers.
This prevents you from looking like you're running an “Amateur Hour” operation, so
your customers don't take their business elsewhere.
When you deploy (called “rollout”), everything actually works.
Note that you need to make sure to only add new analytics, and not remove old ones
(called “comparable value”), or your customers won't be able to compare the data you
provide before and after launch.
You should keep this at least until the end of the customer's fiscal year, so that you
don't overshadow even the financial statements, but you can warn the customer that
the old values you have to maintain in this process will be lost in the near future.
future release (this is called “humiliation”).
You may get some larger customers who really demand that you retain some of the
old value for them (this is called “customer resistance”). This usually indicates that
you are making a bad business decision.
3. Like Business Intelligence, Data Science enables the analysis of past data.
However, whereas BI enables descriptive analysis, Data Science enables predictive or
prescriptive analysis, looking to the future. In the past, only teams of IT experts could
exploit Business Intelligence tools and techniques.
4. It is clear that architects must pay attention to data and how we handle it. the hardest
part for a data scientist is carefully analyzing the data to find out the gaps and identify
the problems. they must render the data in a format that is easier for users to read. but
now, to avoid manual interference, we have got new machine learning and deep
learning. Big data will help scientists reach more data. they also use data warehousing
where data challenges across applications are met. Virtual data handling can be
considered as one of the best options for data scientists. Future data exploration and
selection of appropriate models helps in data analysis for scientists. explaining data
science into business language is important.
5. 1.) Think critically.
2.) Effective communication.
3.) Proactive problem solving.
4.) Intellectual curiosity.
5.) Business sense.
6.) Ability to prepare data for effective analysis.
7.) Ability to leverage self-service analytics platforms.
8. )Ability to write efficient and maintainable code.
9.) Ability to apply mathematics and statistics correctly.
10.) Ability to leverage machine learning and artificial intelligence (AI).

Halaman 61
1. In which phase would the team expect to invest most of the project time? Why?
Where would the team expect to spend the least time?
2. What are the benefits of doing a pilot program before a full-scale rollout of a new
analytical method- ology? Discuss this in the context of the mini case study.
3. What kinds of tools would be used in the following phases, and for which kinds of use
scenarios?
a. Phase 2: Data preparation
b. Phase 4: Model building

Answer
1. The team would be expected to spend most of the project time in the second phase
that is the data preparation phase. Data preparation requires the team to have an
analytic sandbox from where they work with the data and perform analytics. In the
second phase, the team also performs Big ELT and Big ETL, familiarize themselves
with the data, decide on what to keep and discard and finally survey and visualize the
data. The team needs to invest most of the project time in the second phase to ensure
they have enough quality data for the project. The team can spend the least time in the
first phase where they formulate hypothesis to test and analytic challenges to explore
in the following phases.
2. With the launch of a pilot program, companies can test the efficacy and accuracy of
the new methodology and assess the potential risks associated with the new
methodology. This allows companies to identify potential issues and make necessary
adjustments before a full-scale rollout.
Additionally, by conducting a pilot program, companies can measure the level of
acceptance of the new methodology from stakeholders. This will allow the company
to determine how well the new methodology is received and how employees and
customers will respond to it.
Finally, conducting a pilot program before a full-scale rollout can also help companies
develop a better understanding of the costs associated with this new methodology.
This will allow companies to better plan the financial implementation of this new
methodology, giving them the opportunity to identify potential cost savings and create
more accurate budgets.
3. a.) Phase 2: Data preparation
several tools are commonly used for this phase:
1. hadoop
2. alpine miner
3. open refine
4. data wrangler

 IT department to set up a new analytics sandbox to store and experiment on


the data.
 The data scientists and data engineers bega to notice that certain data needed
conditioning and normalization.
 As the team explored the data, it quickly realized that if it did not have data of
sufficient quality or could not get good quality data, it would not be able to
perform the subsequent steps in the lifecycle process.
 Important to determine what level of data quality and cleanliness was
sufficient for the project being undertaken.
b.) Phase 4: Model building
Commercial tools:
1. SAS Enterprise miner
2. SPSS Modeler (provided by IBM and noe called IBM SPSS Modeler)
3. Matlab
4. Alpine miner
5. STATISTICA
6. Mathematica
 this included work by the data scientist using natural languange processing
(NLP) techniques on the textual descriptions of the innovation roadmap ideas.
 social network analysis using R and studio.

You might also like