You are on page 1of 41

Project Report on

“Data Analysis and Visualization”


At
National Centre for Medium Range Weather Forecasting
Submitted in Fulfilment of the Degree of
Masters of Computer Applications
(2023-24)

M.K.M College of Management and Information Technology,


(Hodal)
Affiliated to
MAHARISHI DAYANAND UNIVERSITY(Rohtak)

Under the Supervision of Submitted By:


Mrs. Bhanu Tomar Komal
Registration No. 1911420768
Declaration:-
I, Komal solemnly declare that this report is submitted in fulfilment of the
Training Programme at the National Centre for Medium Range Weather Forecasting, Noida. The contents
of this report represent the outcome of work undertaken by me under the diligent guidance of Dr. Indira Rani
S.

I further affirm, to the best of my knowledge, that the structure and content of this report are entirely original
and have not been previously submitted for any purpose whatsoever.

Sincerely

Komal
MCA 4th Semester
M.K.M college of Management and Information Technology for girls, hodal
Approval Letter
Acknowledgement

I would like to this take this opportunity to express my heartfelt gratitude to NCMRWF (Noida) for providing
me with the invaluable opportunity to complete my internship with your esteemed organization.

My time at NCMRWF has been a remarkable and enriching experience. I am deeply thankful to the entire team
for their warm welcome, guidance, and support throughout my internship journey. Your professionalism,
dedication, and commitment to excellence have been truly inspiring.

I would like to extend my special thanks to Dr. Indira rani S. for their mentorship, valuable insights, and
continuous encouragement. Their guidance played a pivotal role in my personal and professional development
during this internship.

The knowledge and skills I gained from working with each of you have been invaluable.

I would like to express my appreciation towards Anadi J. (Dean) for providing me with exposure to diverse
projects and tasks, which enhanced my learning experience and allowed me to contribute meaningfully to the
organizat

Once again, thank you, ncmrwf for this incredible opportunity, and for being an integral part of my academic
and career development.

Sincerely,
Ms.komal

Index

S.no. Contents 1.
Organizational
Profile.................................................................
1.1 Name
1.2 Location
1.3 Background
1.4 Functions and Objectives
1.5 Research Activity
1.6 Technological Advancements
1.7 Collaboration
1.8 Achievement
1.9 Challenges
1.10 Future Directions
2. Internship Details........................................................................
2.1 Internship Domain
2.2 Internship Goal
2.3 Internship Outcome
2.4 Internship Duration

3. Technology................................................................................... .
3.1 Python
3.2 Shell Scripting
3.3 Excel

4. Report : Data analysis and visualisation.........................


4.1 Introduction
4.2 Script 1 4.3 Script 2 4.4 Script 3
4.5 Script 4
5. Code
Overview.............................................................................

6. Project Detail......

Organizational Profile

Name: National Centre for Medium Range Weather Forecasting


(NCMRWF)
Location: Noida, Uttar Pradesh, India Background:

• NCMRWF is one of India's premier meteorological research institutions.


• It was established in 1988 with the goal of improving medium-range weather forecasting capabilities
in India.

• The center operates under the Ministry of Earth Sciences, Government of India.
Functions and Objectives:

• NCMRWF's primary function is to provide medium-range weather forecasts for India.

• It plays a crucial role in issuing forecasts for up to 10 days in advance, helping various sectors,
including agriculture, disaster management, and transportation.

• The center conducts research and development in atmospheric and oceanic sciences.

• It focuses on improving the accuracy of weather forecasts, climate modeling, and disaster
management.
Research Activities:

• NCMRWF is involved in a wide range of research activities related to atmospheric and oceanic
sciences.

• It conducts research on numerical weather prediction (NWP) models, data assimilation techniques, and
climate modeling.

• The center also conducts studies on monsoons, cyclones, and extreme weather events.
Technological Advancements:

• NCMRWF uses advanced technologies and high-performance computing systems for weather
modeling and forecasting.

• The center has developed its own NWP model called the "Unified Model for Medium-Range Weather
Prediction" (UMM).
Collaborations:

• NCMRWF collaborates with various national and international meteorological organizations and
research institutions.

• It shares data and expertise with organizations such as the India


Meteorological Department (IMD), the World Meteorological Organization
(WMO), and the European Centre for Medium-Range Weather Forecasts (ECMWF).
Achievements:

• NCMRWF has made significant contributions to improving the accuracy of medium-range weather
forecasts in India.

• It has successfully predicted weather events, including monsoons, cyclones, and extreme weather
conditions.

• The center's research has led to advancements in weather modeling and data assimilation techniques.
Challenges:

• NCMRWF faces challenges related to data assimilation, model improvement, and the need for constant
technological upgrades.

• Predicting extreme weather events, such as cyclones and heavy rainfall, remains a complex task.
Future Directions:
• NCMRWF continues to work on improving its NWP models and expanding its forecasting
capabilities.

• It aims to enhance its climate modeling efforts and contribute to climate change research.

• Collaboration with other organizations and the development of stateof-the-art technologies are part of
its future plans.
Conclusion :
NCMRWF plays a crucial role in enhancing India's weather forecasting capabilities and contributes
significantly to various sectors, including agriculture, disaster management, and climate research.

Internship Details :

Internship Domain :Data Analytics techniques with various Methods.

Internship Goal :

• Analysis and get internal insights of data


• Solve real world problem using data analysis

Internship Outcomes :

• Gain valuable work experience


• Develop and refine communication skills and Gain confidence

• Industrial Training is beneficial for students, it impproves ‘personal attitude’, and ‘work attiude’.

Duration of internship :july/2023 - september/2023

PYTHON

Python is a popular programming language that is easy to learn and use. It has a rich ecosystem of open-source
libraries and tools that allow scientists to build sophisticated applications. Python is particularly popular in data
science because it can perform complex analyses on various data sets.

Python Used For :

The uses of Python are varied and quite impactful. Here is a list of fields where Python is commonly used:
Web Development

As a web developer, you have the option to choose from a wide range of web frameworks while using Python
as a server-side programming language. Both Django and Flask are popular among Python programmers.
Django is a full-stack web framework for Python to develop complex large web applications, whereas Flask is
a lightweight and extensible Python web framework to build simple web applications as it is easy to learn and
is more Python-based. It is a good start for beginners.

Application giants like Youtube, Spotify, Mozilla, Dropbox, and

Instagram use the Django framework, whereas Airbnb, Netflix, Uber, and Samsung use the Flask framework.

Machine Learning:-

As Python is a very accessible language, you have a lot of great libraries on top of it that make your work
easier. A large number of Python libraries that exist help you to focus on more exciting things than reinventing
the wheel. Python is also an excellent wrapper language for working with more efficient C/ C++
implementations of algorithms and CUDA/cuDNN, which is why existing machine learning and deep learning
libraries run efficiently in Python. This is also super important for working in the fields of machine learning
and AI.

Data Analysis :

“Data is Everywhere”, in sheets, in social media platforms, in product reviews and feedback, everywhere. In this
latest information age it’s created at blinding speeds and, when data is analyzed correctly, can be a company’s
most valuable asset. “To grow your business even to grow in your life, sometimes all you need to do is Analysis!”
If your business is not growing, then you have to look back recognize your mistakes, and make a plan again
without repeating those mistakes. And even if your business is growing, then you have to look forward to making
the business grow more.

All you need to do is analyze your business data and business processes. The process of studying the data to find
out the answers to how and why things happened in the past. Usually, the result of data analysis is the final
dataset, i.e. a pattern, or a detailed report that you can further use forDataAnalytics.

Data Analysis tools :

While there are many libraries available to perform data analysis in Python, here are a few to get you started:
• NumPy: For scientific computing with Python, NumPy is essential. It supports large, multi-
dimensional arrays and matrices and includes an assortment of high-level mathematical functions to
operate on these arrays.

• SciPy: This works with NumPy arrays and provides efficient routines for numerical integration and
optimization.

• Pandas: This is also built on top of NumPy, and offers data structures and operations for manipulating
numerical tables and time series.

• Matplotlib: A 2D plotting library that can generate data visualizations as histograms, power spectra,
bar charts, and scatterplots with just a few lines of code.

• DataFrame: In Pandas, a DataFrame is a two-dimensional, sizemutable, and potentially heterogeneous


tabular data structure. It is a key component for storing and manipulating the combined data from the
text files. You use DataFrames to organize and analyze the data.

• Subplots: Matplotlib'splt.subplots function is used to create multiple subplots (in this case, two
vertically stacked subplots) within a single figure to display 'Reception' and 'Used' data side by side.

• Legends: Legends are added to the subplots to label the lines in the graph. They are created using
Matplotlib's legend function.

• Custom Labels: Custom labels for the x-axis and y-axes are set using functions like set_xlabel and
set_ylabel.

• Data Calculation: Basic calculations are performed to calculate the percentage of 'Reception' data that
is 'Used'. This is done using arithmetic operations on DataFrame columns.

Desktop Applications:-

Desktop applications are software programs designed to run on personal computers and laptops, typically within
an operating system environment like Windows, macOS, or Linux.

Platform-Specific: These applications are generally developed for specific operating systems and take
advantage of platform-specific features, such as Windows APIs, macOS frameworks, or Linux libraries.

Installation : Desktop applications usually require installation on a device,

which
systeminvolves copying files, creating shortcuts, and potentially updating
configurations.
Performance: Desktop applications often have direct access to system resources, which can result in
faster performance compared to web-based

applications.

Offline Capabilities: Since they're installed on a device, desktop applications can often function
without an internet connection, providing a seamless user experience.

Shell Scripting

A shell script is a text filethat contains a sequence of commands for a UNIX-based operating system. It is
called a shell script because it combines a sequence of commands, that would otherwise have to be typed into
the keyboard one at a time, into a single script. Theshell is the operating system's command-line interface
(CLI) and interpreter for the set of commands that are used to communicate with the system.

A shell script is usually created for command sequences in which a user has a need to use repeatedly in order to
save time. Like other programs, the shell script can contain parameters, comments and subcommands that the
shell must follow. Users initiate the sequence of commands in the shell script by simply entering the file name
on a command line.

The basic steps involved with shell scripting are writing the script, making the script accessible to the shell and
giving the shell execute permission.
Shell scripts containASCII text and are written using atext editor,word processor or graphical user interface
(GUI). The content of the script is a series of commands in a language that can be interpreted by the shell.
Functions that shell scripts support includeloops, variables, if/then/else statements, arrays and shortcuts. Once
complete, the file is saved typically with a .txt or .sh extension and in a location that the shell can access.

Bash (Bourne Again SHell) : The most widely used shell in Linux and

macOS environments. It is an extension of the original Bourne shell (sh),

offering additional features and flexibility.

Sh (Bourne Shell): A simpler shell, often used as a default in many UNIX systems.

Zsh (Z Shell): An advanced shell with additional features, like better scripting capabilities, command-
line completion, and customization.

Csh/Tcsh (C Shell/TENEX C Shell): Shells with a syntax similar to the C programming language,
popular in some UNIX environments.

Shell scripts typically have the following structure:

Shebang : The first line of the script indicating which shell should interpret

the script. For Bash, it's usually #!/bin/bash . are comments and are ignored by the
shell.
Comments : Lines starting with #

Commands : The actual instructions to be executed. Commands can include

built-in shell functions, system utilities, and custom logic.

Variables: Shell scripts use variables to store values for later use. Variables

can be assigned and referenced using $ (e.g., $VAR).

Control Structures : Shell scripts support loops (for, while), conditionals (if,

case ), and functions for complex logic and flow control.


• Automation of System Tasks: Automating repetitive tasks like backups, system monitoring, and
software installations.

• File Manipulation: Creating, modifying, moving, and deleting files and directories.

• Text Processing: Processing and transforming text data using tools like awk, sed, grep, and cut.

• System Administration: Managing system services, user accounts, and system configurations.

• Batch Processing: Running multiple commands or scripts in sequence to process large amounts of data.

• Integration: Integrating different programs and utilities to create complex workflows.

• Use Comments: Add comments to explain complex code sections.

• Follow Consistent Naming Conventions: Use clear and descriptive variable and function names.

• Error Handling: Include checks for errors and handle them appropriately.

• Permissions: Ensure scripts have the correct permissions for execution (chmod +x script.sh).

• Security: Be cautious with user input and avoid common security vulnerabilities, such as command
injection.

EXCEL

Excel is frequently used for data analysis because of its superb data visualization features, which enable the
creation of illuminating graphics. Each Excel chart has a specific meaning, and excel comes with a significant
selection of built-in charts, which may be elegantly used to make the best use of data. Data visualization is the
graphic depiction of data, simplifying the data understanding. Utilizing Data visualization tools like Data
Wrapper, Google Charts, and others, data visualization can be done. A spreadsheet called Excel is also used to
visualize data and organize it.

Data visualization can be done in various data visualization excel charts & graphs.

Data visualization can also be done by data visualization using excel templates. Excel has charts of many kinds,
including column charts, bar charts, pie charts, line charts, area charts, scatter charts, surface charts, and many
more.

Microsoft Excel is a versatile and widely used spreadsheet software that serves a

variety of purposes, from simple data entry to complex data analysis. Here's a theoretical overview of Excel
and its key components:

1. Spreadsheet Structure

Worksheets: Excel is composed of multiple worksheets within a workbook.

A worksheet is a grid of cells arranged in rows and columns.

Cells: Each cell is an intersection of a row and a column, identified by a combination of letters (for
columns) and numbers (for rows). For example, cell A1 is in the first column and the first row.

Workbooks: A workbook is a complete Excel file that can contain one or more worksheets.

2. Basic Operations

Data Entry: You can enter text, numbers, dates, and formulas into cells.

Formulas: Formulas are mathematical expressions used for calculations within Excel. They typically
begin with an equals sign (=) and can involve basic arithmetic operations, functions, and references to
other cells.

Functions: Excel offers a wide range of functions for various tasks, such as mathematical, statistical,
financial, and logical operations. Examples include

SUM(), AVERAGE(), IF(), VLOOKUP(), and many more.

3. Data Manipulation and Analysis

Sorting and Filtering: Excel allows you to sort data based on specific
criteria and filter rows that meet certain conditions.

PivotTables: PivotTables are powerful tools for summarizing and analyzing large data sets. They
enable you to create customizable reports, group data,

and perform calculations.

Data Validation: This feature allows you to restrict data entry in cells to specific types, ranges, or
lists.

4. Data Visualization

Charts: Excel provides a variety of chart types (e.g., bar, line, pie, scatter) to

visually represent data. Charts can be customized with labels, titles, and other elements.

Conditional Formatting: This feature allows you to apply formatting (such

as colors or icons) to cells based on specific criteria, which is useful for

highlighting trends or anomalies.

5. Advanced Features

Macros and VBA: Excel supports automation through macros, which are

sequences of instructions that can be recorded and replayed. VBA (Visual Basic for Applications) is a
programming language that allows for more complex automation and customizations.

Data Connections: Excel can connect to external data sources, such as databases, CSV files, and online
services, allowing for real-time data analysis and integration.

Collaboration: With Excel Online and cloud-based solutions like Microsoft 365, multiple users can
collaborate on the same workbook in real time, making it easier to share and work on data.
Report: Data Analysis and Visualization

Introduction:-

This report shows the analysis and visualization of data obtained from 'Reception' and 'Used' sources within
two years (2021-22). The data was processed using Python and Matplotlib to create informative graphs. This
document outlines the purpose, methodology, and results of the analysis.

1. Purpose

The report aims to present a comprehensive analysis of data derived from

'Reception' and 'Used' sources over the course of two years, from 2021 to 2022. It seeks to provide insights
into trends, patterns, or noteworthy findings from these

datasets.

2. Methodology

The analysis involved the use of Python, a powerful programming language for

data science, and Matplotlib, a popular library for creating visualizations. Key stages in the methodology
might include:

Data Collection : Gathering data from the specified sources (e.g., 'Reception' and 'Used') to
ensure that it is accurate, complete, and relevant.

Data Cleaning: Removing inconsistencies, missing values, and outliers to ensure data quality.

Data Analysis: Applying statistical techniques and descriptive analytics to


understand the underlying trends and patterns.

Data Visualization: Using Matplotlib to create graphs and charts that

visually represent the data, making it easier to comprehend and extract

insights.

3. Results

The report would likely contain various visualizations to showcase the results of

the analysis. Some possible types of visualizations include:

Line Graphs : To illustrate trends over time, such as the volume of

'Reception' and 'Used' sources.

Bar Charts : To compare different categories within the data.

Scatter Plots : To examine relationships or correlations between different

variables.

Pie Charts : To show the proportional distribution of a specific category.

4. Discussion

In this section, the report may discuss the key findings and their implications. This

might include identifying any significant trends, correlations, or unexpected

outcomes. It could also include potential recommendations based on the analysis.

5. Conclusion

The report would summarize the main takeaways from the data analysis and

visualization. It might also suggest next steps for further research or action based on the findings.

6. Appendices
Any additional information, such as the Python scripts used for analysis, data tables, or raw data, might be
included in the appendices for reference or further

study.

7.Raw data

Sources: The origin of the data. In your case, it's mentioned that the data comes from 'Reception' and
'Used' sources.

Reception Sources: This could refer to incoming data, like goods received,

customer orders, or any other initial input in a process.

Used Sources: This could indicate data representing how resources or products are utilized, distributed,
or consumed.

Format: The raw data could be in various formats, like CSV, Excel, SQL

databases, or text files. Understanding the structure of these files is essential for processing.

Variables/Attributes: These are the different fields or columns in the data,

representing specific information such as dates, categories, quantities, or other measurements.

Timeframe: The data spans a period from 2021 to 2022. You'd need to know if it's continuous (daily,
weekly) or collected at specific intervals.

Continuous Data: Collected at regular intervals, like daily or weekly, providing consistent snapshots
over time.

Specific Intervals: Collected at irregular intervals or specific events, requiring a different approach to
identify trends.
Chapter 1
import pandas as pd import matplotlib.pyplot as plt import
matplotlib.dates as mdates

#Define the folder path and file list folder_path =

'/home/payal_komal/mihir_to_linux/mihir_new' file_list = ['date.dat',

'reception_00.txt', 'used_00.txt'] main_dataframe = pd.DataFrame(pd.read_table(file_list[0]))

for i in range(1, len(file_list)): data =

pd.read_table(file_list[i]) df = pd.DataFrame(data) main_dataframe =


pd.concat([main_dataframe, df],

maximain_dataframe.to_csv('AMSR_00.txt', header=None, index=None, sep=' ')

df = pd.read_csv('ahiclr11_00.txt', header=None, sep=' ')

# Add headers to the DataFramedf.columns =

['date', 'reception', 'used']

# Convert the 'date' column to datetimedf['date'] = pd.to_datetime(df['date'])

# Create three separate subplots fig, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(10, 8), sharex=True)

# Plot 'reception' data on the first subplot ax1.plot(df['date'], df['reception'], label='Reception',


color='blue') ax1.set_ylabel('Reception Data', color='blue') ax1.tick_params(axis='y',
labelcolor='blue')

# Plot 'used' data on the second subplot ax2.plot(df['date'], df['used'], label='Used',


color='green') ax2.set_ylabel('Used Data', color='green') ax2.tick_params(axis='y',
labelcolor='green')

# Set the x-axis interval to 6 months ax3.xaxis.set_major_locator(mdates.MonthLocator(interval=6))


ax3.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m')) ax3.set_xlabel('year', color='black')
ytick_positions = [0, 100000, 200000, 300000, 400000, 500000, 600000]

ytick_labels = [str(position) for position in ytick_positions] ax1.set_yticks(ytick_positions)


ax1.set_yticklabels(ytick_labels, rotation=45)

# Add legends for both subplots ax1.legend(loc='upper left', bbox_to_anchor=(0.0, 1.05))


ax2.legend(loc='upper left', bbox_to_anchor=(0.0, 0.95))

# Calculate the percentage of reception data that is used df['percentage_used'] = (df['used'] / df['reception']) *
100

# Print the DataFrame with the new 'percentage_used' column print(df)

# Plot the percentage data ax3.plot(df['date'], df['percentage_used'], label='Percentage Used', color='red')


ax3.set_ylabel('Percentage Used') ax3.legend() ax3.grid(True)

#define y_ticks and set ytick label for ax3 y_ticks = [0,

1, 2, 3, 4, 5, 6, 7] ax3.set_yticks(y_ticks) ax3.set_yticklabels([f'{y}%' for y in y_ticks])

#save the plot as an image plt.savefig('Percentage_Used_00.png') # Display the plot


plt.show() output:-
Code detail

This Python code appears to be aimed at processing and visualizing data from files

in a folder. Here's a breakdown of what each part of the code does:

1. Setting Labels : It sets the xlabel for the third subplot and defines the ytick positions for better
readability.

2. Importing Libraries: The code imports necessary libraries such as pandas for data

manipulation and matplotlib.pyplot for plotting graphs.

3. Defining Folder Path and File List: It defines the folder path where the files are located and lists the
names of the files to be processed.

4. Reading and Concatenating Data: It reads each file in the file list, converts it into a DataFrame, and
concatenates them into a single DataFrame called

main_dataframe .

5. Exporting Concatenated Data: It exports the concatenated DataFrame to a text

file named 'AMSR_00.txt' with no header and space-separated values.

6. Reading Data from a Text File: It reads data from another text file named 'ahiclr11_00.txt' into a
DataFrame.

7. Adding Headers to the DataFrame: It assigns column names ('date', 'reception', 'used') to the DataFrame.

8. Converting Date Column to Datetime : It converts the 'date' column to


datetime

format using pandas' to_datetime function.

9. figsize=(10,
Creating Subplots: 8), It creates a figure with three vertically stacked subplots using

plt.subplots(3, 1, Subplots: sharex=True).

Axis: It sets the x


10. Plotting Data on It plots 'reception' data on the first subplot,
'used'

11. Formatting Date


data on the second subplot, and sets labels and colors for each subplot accordingly.

-axis interval to 6 months and formats


the

date display to show only the year and month.


Chapter 2
import pandas as pd import matplotlib.pyplot as plt import
matplotlib.dates as mdates

# Define the folder path and file list folder_path =


'/home/payal_komal/mihir_to_linux/mihir_new' file_list = ['date.dat', 'reception_06.txt',
'used_06.txt']

# Initialize an empty DataFrame to store the combined data main_dataframe =


pd.DataFrame(pd.read_table(file_list[0]))

# Loop through the remaining files and concatenate data for i in range(1, len(file_list)): data =
pd.read_table(file_list[i]) df =

pd.DataFrame(data) main_dataframe = pd.concat([main_dataframe, df], axis=1)

# Export the combined data to a text file without headers

main_dataframe.to_csv('ahiclr_06.txt', header=None, index=None, sep='

')

# Read the combined data from the text file without headers df = pd.read_csv('ahiclr_06.txt', header=None,
sep=' ')

# Add headers to the DataFramedf.columns =

['date', 'reception', 'used']

# Convert the 'date' column to datetimedf['date'] = pd.to_datetime(df['date'])

# Create three separate subplots fig, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(10, 8), sharex=True)

# Plot 'reception' data on the first subplot

ax1.plot(df['date'], df['reception'], label='Reception', color='blue')

ax1.set_ylabel('Reception Data', color='blue') ax1.tick_params(axis='y', labelcolor='blue')

# Plot 'used' data on the second subplot

ax2.plot(df['date'], df['used'], label='Used', color='green')


ax2.set_ylabel('Used Data', color='green') ax2.tick_params(axis='y', labelcolor='green')

# Set the x-axis interval to 6 months ax3.xaxis.set_major_locator(mdates.MonthLocator(interval=6))


ax3.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m')) ax3.set_xlabel('year', color='black')

ytick_positions = [0, 100000, 200000, 300000, 400000, 500000, 600000,

700000, 800000]

ytick_labels = [str(position) for position in ytick_positions]

ax1.set_yticks(ytick_positions) ax1.set_yticklabels(ytick_labels, rotation=45)

# Add legends for both subplots ax1.legend(loc='upper left', bbox_to_anchor=(0.0, 1.05))


ax2.legend(loc='upper left', bbox_to_anchor=(0.0, 0.95))

# Calculate the percentage of reception data that is used df['percentage_used'] = (df['used'] / df['reception']) *
100

# Print the DataFrame with the new 'percentage_used' column print(df)

# Plot the percentage data

ax3.plot(df['date'], df['percentage_used'], label='Percentage Used', color='red')


ax3.set_ylabel('Percentage Used') ax3.legend() ax3.grid(True)

y_ticks = [0,0.2,0.4,0.6,0.8,1.0] ax3.set_yticks(y_ticks)

ax3.set_yticklabels([f'{y}%' for y in y_ticks])

#save the plot as an image plt.savefig('Percentage_Used_06.png')

# Display the plot plt.show()


Output:-
Chapter 3
import pandas as pd import matplotlib.pyplot as plt import
matplotlib.dates as mdates

# Define the folder path and file list folder_path =


'/home/payal_komal/mihir_to_linux/mihir_new' file_list = ['date.dat', 'reception_12.txt',
'used_12.txt']

# Initialize an empty DataFrame to store the combined data main_dataframe =


pd.DataFrame(pd.read_table(file_list[0]))

# Loop through the remaining files and concatenate data for i in range(1, len(file_list)): data =
pd.read_table(file_list[i]) df =

pd.DataFrame(data) main_dataframe = pd.concat([main_dataframe, df], axis=1)

# Export the combined data to a text file without headers

main_dataframe.to_csv('ahiclr_12.txt', header=None, index=None, sep='

')

# Read the combined data from the text file without headers df = pd.read_csv('ahiclr_12.txt', header=None,
sep=' ')

# Add headers to the DataFramedf.columns =

['date', 'reception', 'used']

# Convert the 'date' column to datetimedf['date'] = pd.to_datetime(df['date'])

# Create three separate subplots fig, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(10, 8), sharex=True)

# Plot 'reception' data on the first subplot

ax1.plot(df['date'], df['reception'], label='Reception', color='blue')

ax1.set_ylabel('Reception Data', color='blue') ax1.tick_params(axis='y', labelcolor='blue') # Plot 'used' data on


the second subplot

ax2.plot(df['date'], df['used'], label='Used', color='green')


ax2.set_ylabel('Used Data', color='green') ax2.tick_params(axis='y', labelcolor='green')

# Set the x-axis interval to 6 months ax3.xaxis.set_major_locator(mdates.MonthLocator(interval=6))


ax3.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m')) ax3.set_xlabel('year', color='black')

ytick_positions = [0,100000, 200000, 300000, 400000, 500000, 600000,]

ytick_labels = [str(position) for position in ytick_positions]

ax1.set_yticks(ytick_positions) ax1.set_yticklabels(ytick_labels, rotation=45)

# Add legends for both subplots ax1.legend(loc='upper left', bbox_to_anchor=(0.0, 1.05))


ax2.legend(loc='upper left', bbox_to_anchor=(0.0, 0.95))

# Calculate the percentage of reception data that is used df['percentage_used'] = (df['used'] / df['reception']) *
100

# Print the DataFrame with the new 'percentage_used' column print(df)

# Plot the percentage data

ax3.plot(df['date'], df['percentage_used'], label='Percentage Used', color='red') ax3.set_ylabel('Percentage


Used')

ax3.legend() ax3.grid(True)

y_ticks = [0,0.2,0.4,0.6,0.8,1.0] ax3.set_yticks(y_ticks)

ax3.set_yticklabels([f'{y}%' for y in y_ticks])

#save the plot as an image plt.savefig('Percentage_Used_12.png')

# Display the plot plt.show()


Output :
Chapter 4
import pandas as pd import matplotlib.pyplot as plt import
matplotlib.dates as mdates

# Define the folder path and file list folder_path =


'/home/payal_komal/mihir_to_linux/mihir_new' file_list = ['date.dat', 'reception_18.txt',
'used_18.txt']

# Initialize an empty DataFrame to store the combined data main_dataframe =


pd.DataFrame(pd.read_table(file_list[0]))

# Loop through the remaining files and concatenate data for i in range(1, len(file_list)): data =
pd.read_table(file_list[i]) df =

pd.DataFrame(data) main_dataframe = pd.concat([main_dataframe, df], axis=1)

# Export the combined data to a text file without headers

main_dataframe.to_csv('ahiclr_18.txt', header=None, index=None, sep='

')

# Read the combined data from the text file without headers df = pd.read_csv('ahiclr_18.txt', header=None,
sep=' ')

# Add headers to the DataFramedf.columns =

['date', 'reception', 'used']

# Convert the 'date' column to datetimedf['date'] = pd.to_datetime(df['date'])

# Create three separate subplots fig, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(10, 8), sharex=True)

# Plot 'reception' data on the first subplot

ax1.plot(df['date'], df['reception'], label='Reception', color='blue')

ax1.set_ylabel('Reception Data', color='blue') ax1.tick_params(axis='y', labelcolor='blue')

# Plot 'used' data on the second subplot

ax2.plot(df['date'], df['used'], label='Used', color='green')


ax2.set_ylabel('Used Data', color='green') ax2.tick_params(axis='y', labelcolor='green')

# Set the x-axis interval to 6 months ax3.xaxis.set_major_locator(mdates.MonthLocator(interval=6))


ax3.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m')) ax3.set_xlabel('year', color='black')

ytick_positions = [0, 100000, 200000, 300000, 400000, 500000, 600000]

ytick_labels = [str(position) for position in ytick_positions]

ax1.set_yticks(ytick_positions) ax1.set_yticklabels(ytick_labels, rotation=45)

# Add legends for both subplots ax1.legend(loc='upper left', bbox_to_anchor=(0.0, 1.05))


ax2.legend(loc='upper left', bbox_to_anchor=(0.0, 0.95))

# Calculate the percentage of reception data that is used df['percentage_used'] =

(df['used'] / df['reception']) * 100 for percentage in df['percentage_used']:


print(percentage)

# Print the DataFrame with the new 'percentage_used' column print(df)

# Plot the percentage data

ax3.plot(df['date'], df['percentage_used'], label='Percentage Used', color='red')


ax3.set_ylabel('Percentage Used') ax3.legend() ax3.grid(True)

y_ticks = [0,0.2,0.4,0.6,0.8,1.0] ax3.set_yticks(y_ticks)

ax3.set_yticklabels([f'{y}%' for y in y_ticks])

#save the plot as an image plt.savefig('Percentage_Used_18.png')

# Display the plot plt.show()


output:-
Code Overview
This code is a Python script that performs several tasks related to data manipulation, visualization, and export
using the pandas library for data handling and matplotlib for plotting. Let's break down the code step by step:

This report documents the analysis and visualization of data obtained from

'Reception' and 'Used' sources. The data was processed using Python and Matplotlib to create informative
graphs. This document outlines the purpose, methodology, and results of the analysis.

1. Importing Libraries:

- `pandas` is imported as `pd` for data manipulation.

- `matplotlib.pyplot` is imported as `plt` for creating plots.

- `matplotlib.dates` is imported as `mdates` for date-related formatting in plots.

2. Defining Folder Path and File List:

- `folder_path` specifies the directory where the data files are located.

- `file_list` is a list of three file names that you want to read and combine.

3. Initializing an Empty DataFrame:

- `main_dataframe` is created as an empty DataFrame. It will be used to store the combined data from
the files.

4. Reading and Combining Data:


- The code then enters a loop to read and concatenate data from the filesspecified in `file_list`. It
iterates through the files and reads each one using `pd.read_table`.

- Each file's data is stored in a temporary DataFrame called `df`, and then `main_dataframe` is updated
by concatenating these `df` objects horizontally using `pd.concat`.

5. Exporting Combined Data:

- After combining all the data, `main_dataframe` is saved to a text file named "ahiclr_xx.txt" without
headers and using space (' ') as a separator.

6. Reading Combined Data:

- The code reads the "ahiclr_XY.txt" file into a new DataFrame named `df`, specifying that there are
no headers and that the separator used is a space ('

').

7. Adding Headers:

- The script adds headers to the DataFrame `df` for clarity. The columns are named 'date,' 'reception,'
and 'used.'

8. Converting 'date' Column to DateTime:

- The 'date' column is converted to a datetime format using `pd.to_datetime`.

This allows for proper date formatting in the plots.

9. Creating Subplots:

- Three separate subplots (`ax1`, `ax2`, and `ax3`) are created using

`plt.subplots`. These subplots will be used to visualize different aspects of the data.
10. Plotting 'reception' and 'used' Data:

- Data from the 'reception' and 'used' columns of the DataFrame are plotted on the first and second
subplots, respectively.

- Axis labels and tick parameters are set for each subplot.

11. Formatting X-Axis:

- The x-axis of the third subplot (`ax3`) is formatted to display the date at 6month intervals.

12. Adding Legends:

- Legends are added to the first and second subplots to label the data series.

13. Calculating and Printing Percentage Used:

- The code calculates the percentage of 'used' data relative to 'reception'data and stores it in a new
column 'percentage_used' in the DataFrame.

- It then iterates through the 'percentage_used' values and prints them.

14. Plotting Percentage Data:

- The 'percentage_used' data is plotted on the third subplot (`ax3`), labeled as 'Percentage Used,' and
the y-axis is formatted as percentages.

15. Saving the Plot:

- The script saves the entire figure as an image named

"Percentage_Used_XY.png." 16.
Displaying the Plot:
- Finally, the script displays the plot on the screen.

This code essentially reads, combines, and visualizes data from multiple text files and then saves the resulting
plot as an image file.

Tasks and Activities : Detail the tasks carried out during the project,

specifying how they were completed and who was responsible.

Challenges and Risks: Discuss any challenges or risks encountered during

the project and how they were mitigated or resolved.

Quality Assurance: Explain how quality was ensured throughout the project, including testing, reviews,
and audits.

Project Outcomes : Highlight the final results of the project, emphasizing

whether the initial objectives were met.

Data Analysis: Present relevant data, statistics, and analysis that support the project's outcomes.

Lessons Learned: Summarize key lessons learned from the project,

including what worked well and what could be improved in future project

Clarity and Conciseness: Ensure the report is clearly written and concise.

Use plain language where possible, and avoid jargon unless necessary.

Visual Elements: Incorporate visual aids such as charts, graphs, and tables to help convey information
more effectively.

Consistency: Maintain consistent formatting and structure throughout the report, including headings,
font styles, and numbering.

Proofreading and Review: Carefully proofread the report and have it

reviewed by others to ensure accuracy and completeness.


Lessons Learned: Summarize key lessons learned from the project,

including what worked well and what could be improved in future projects.

Work Breakdown Structure (WBS): Present a hierarchical breakdown of project tasks.

Risk Management Plan: Provide a detailed list of potential risks, their likelihood, impact, and
mitigation strategies. Include a risk register if applicable.

Gantt Chart or Timeline: Visualize the project timeline, showing key

milestones, task dependencies, and deadlines.

Problem Statement: Describe the problem or opportunity that prompted the project. Explain why the
project was initiated and the context surrounding it.

Project Justification: Outline the reasons for undertaking the project, including expected benefits,
return on investment, or strategic alignment with

organization

You might also like