Professional Documents
Culture Documents
LEARNING OUTCOMES
DECLARATION –
I DECLARE THAT THIS ASSIGNMENT IS MY INDIVIDUAL WORK. I HAVE NOT COPIED IT FROM ANY
OTHER STUDENT’S WORK OR FROM ANY OTHER SOURCE EXCEPT WHERE DUE
ACKNOWLEDGEMENT IS MADE EXPLICITLY IN THE NEXT, NOR HAS ANY PART BEEN WRITTEN FOR ME
BY ANY OTHER PERSON.
Q.1.2. Describe the primary data structures in Pandas, namely series and
Data frame. Explain the difference and use cases for each.
Ans: Panda Series:
Use cases:
Part 2: NumPy
Q.1. Write a brief description of what NumPy is and why it is important for
scientific computing and data analysis in Python.
Ans: NumPy is a code name for "Numerical Python" and is a well-known Python
package. Large, multi-dimensional arrays and matrices are supported, and a few
mathematical operations are available for effective manipulation of these arrays.
Scientific computing, data analysis, and machine learning tasks all make
extensive use of NumPy. It enables effective manipulation and calculation of
numerical data and provides high-performance numerical operations. NumPy is
frequently used to carry out intricate data analysis and visualization activities in
conjunction with other libraries such as Pandas and Matplotlib.
The foundation of Python's scientific computing and data analysis is NumPy. For
anyone working with numerical data, its robust array structures, well-optimized
functions, and broad integration make it a vital tool.
Q.2. Explain the significance of NumPy in terms of performance and efficiency
when working with large datasets and numerical computations.
Ans: Vectorization and memory optimization are two important aspects of
NumPy that contribute to its efficiency and performance for huge datasets and
numerical computations.
Vectorization:
a. Using NumPy, you can work with complete data arrays (vectors) as
opposed to single elements. This lets you take full advantage of the
Single Instruction, Multiple Data (SIMD) features of your CPU. When
compared to iterating through each data element separately, SIMD
significantly speeds up the process by executing the same instruction on
several components at once.
b. Consider the task of adding up a million numbers. Each element would
be inserted one after the other using a for loop using Python lists.
Nevertheless, NumPy adds each element in parallel, greatly cutting
down on execution time.
Memory optimization:
Unlike Python lists, which can be dispersed throughout memory, NumPy saves
data in blocks of contiguous memory. There are two main benefits to this
continuous storage:
UNIT-5
Data Visualization:
1. Create a matplotlib bar plot showing the sales of products in a store for
a given month. Label the axes, add a title, and customize the
appearance (e.g., colour, width).
Ans: A robust Python package called Matplotlib may be used to create a
wide variety of visualizations, from straightforward line plots to complex
three-dimensional models. It is a fundamental component of Python data
visualization, providing adaptability, personalization, and connection with
additional scientific instruments.
a. Types of plots: Scatter plots, joint plots, pair plots, heatmaps Why
seaborn excels:
a. Type of plots: Bar plots, count plots, box plots, violin plots, swarm plots
Why seaborn excels:
a. Purpose: The total holding area for the data and graphic components. It
determines the visualization's dimensions and bounds.
b. Components:
1. Canvas: The region where images are placed.
2. Axes: Establish the coordinate system and scales before
beginning a data plot.
3. Gridlines: Optional lines that facilitate reading values and serve
as reference points.
4. Titles and labels: Give the visualization's material some context
and clarification.
5. Legends (for multi-series charts): Describe the meanings of the
various colours, forms, and symbols.
2. Data:
a. Purpose: The core information being visualized, presented in a visual
form.
b. Representation:
1. Numerical values: Shown as regions, bars, points, lines, or
other shapes.
2. Categorial data: Depicted by the use of text labels, shapes, or
colours.
3. Spatial data: Mapped to a visual space's coordinates.
4. Textual data: Shown as headings, comments, or as part of an
image.
3. Layout:
a. Purpose: The facts and visual elements are arranged in the figure to
improve comprehension and communication.
b. Elements:
1. Positioning: Deciding on the placement of the pieces on the
canvas.
2. Spacing: Modifying the spacing between components to
improve clarity of vision.
3. Hierarchy: Highlighting specific components to direct
attention.
4. Alignment: Establishing visual coherence and organization.
5. Grouping: Putting similar components in order.
Q.2. Load a sales dataset with columns ‘sales’, create a plotly line chart to
visualize the total sales trend. Include axis labels, a title, and customize the
appearance.
Ans: