You are on page 1of 4

Data Cleanliness Analysis:

I selected a dataset from the World Bank, focusing on global energy consumption statistics,
renewable energy percentages, and carbon emissions. While the dataset is generally of high
quality, I did encounter a few cleanliness issues that required attention before proceeding
with the analysis.

1. Outliers:
o Issue: Some outliers were identified in the renewable energy percentage data,
potentially skewing the analysis and visualizations.
o Action Taken: I applied a statistical approach, utilizing the Interquartile
Range (IQR), to identify and remove outliers. This step was crucial to prevent
these extreme values from disproportionately influencing the overall trends
and patterns.
2. Missing Data Values:
oIssue: There were a few missing values in the carbon emissions dataset for
certain years and countries.
o Action Taken: To address this issue, I employed a method of imputation,
filling in missing values with the mean of the available data for the respective
country. This allowed for a more comprehensive analysis while maintaining
the integrity of the dataset.
3. Data Type Verification:
o Issue: The data types in some columns didn't correspond accurately to the
values stored, leading to potential misinterpretation.
o Action Taken: I conducted a thorough review of the data types and corrected
discrepancies. For example, I ensured that numeric columns indeed contained
numeric values, avoiding any potential errors in subsequent analyses.
4. Normalization of Values:
o Issue: The scale of data in the renewable energy percentage column varied
widely, potentially affecting the visual representation.
o Action Taken: To provide a more normalized view, I applied Min-Max
scaling to the renewable energy percentage column. This step ensured that the
data could be more effectively compared and visualized without the undue
influence of extreme values.
5. Problematic Patterns or Trends:
o Issue: Initial visualization suggested erratic patterns in the carbon emissions
data for specific regions.
o Action Taken: I conducted a deeper investigation into these patterns,
identifying and rectifying discrepancies in the data collection process. This
step was essential for ensuring the accuracy of trends and preventing
misleading interpretations.

Why These Steps:

 The removal of outliers and imputation of missing values aimed to enhance the
accuracy and reliability of the dataset, ensuring a more robust foundation for analysis.
 Verification of data types and normalization of values were undertaken to prevent
potential misinterpretations and provide a standardized scale for comparison.
 Addressing problematic patterns or trends was crucial to maintain the integrity of the
analysis and avoid misleading visualizations.

In summary, these data cleaning steps were essential to guarantee the reliability of the dataset
and lay the groundwork for meaningful and accurate analyses and visualizations in Tableau.
Each action was guided by the overarching goal of producing insights that genuinely reflect
the trends and patterns in global energy consumption, renewable energy adoption, and carbon
emissions.

You might also like