Teep Report Quynh

Southern Taiwan University of Technology
Teep Program
_____________________________________________
Harnessing Anomalies in Solar Data

for Efficient Energy Generation
Department：Department of Mechanical Engineering
Instructor：Professor Keh-Moh LIN
Student：Le Quynh Tran
INTRODUCTION
1.1 Background of the TEEP Internship Program in Taiwan

The TEEP (Taiwan Engineering Education Program) Internship Program in Taiwan is a
three-month initiative that offers international students an exceptional opportunity for
experiential learning and cultural exchange. The program aims to bridge the gap between
academia and industry by immersing interns in cutting-edge research, industry
collaborations, and diverse cultural experiences. As an intern, I am grateful for the
transformative journey that the TEEP program has provided. It has expanded my
knowledge and skills in my field of study while offering insights into Taiwanese culture.
The program's hands-on approach, industry collaborations, and cultural activities have
enriched my experience and fostered a global network of peers. Overall, the TEEP
program has exceeded expectations, instilling a sense of global citizenship and inspiring
future endeavors in engineering.
1.2 Purpose and Objectives of the Report

This report aims to develop a robust method for detecting outliers in solar panel data,
addressing the need for continuous monitoring in energy panel manufacturing plants. sun
to prevent efficiency problems. Objectives include conducting a literature review for
anomaly detection, collecting and pre-processing relevant data, selecting and
implementing a suitable machine learning algorithm, optimizing its parameters, evaluating
model accuracy and implementing this method on a user-friendly website. By achieving
these goals, this study contributes to the field by providing an effective method for
identifying anomalies in solar panel systems. The results of this report have the potential to
help solar panel plants prevent power loss and financial impacts caused by system
anomalies, ultimately improving overall efficiency, potential and profitability of solar
energy production.
1.3 Problem Statement and Significance of the Research
This research addresses the need for continuous monitoring of solar panel systems in
factories to prevent electricity loss and financial implications. It aims to develop a machine
learning-based methodology for detecting outliers in solar panel data, providing an
automated and efficient solution. The significance of this research lies in its potential to
enhance the efficiency and reliability of solar panel systems, contribute to the prevention
of electricity loss, and promote renewable energy technologies. The outcomes can inspire
advancements in anomaly detection techniques for other industrial systems and support the
global shift towards sustainable energy sources. This research has practical implications
for solar panel factories, renewable energy technologies, and the broader goal of
sustainable energy.
LITERATURE REVIEW
2.1 Overview of Solar Power Systems and Data Monitoring

Solar power systems have become increasingly popular as a sustainable energy source,
converting sunlight into electricity through photovoltaic panels. To ensure their optimal
performance, monitoring the data generated by these systems is crucial. By continuously
tracking parameters such as voltage, current, temperature, and irradiance, any deviations
from normal operation can be detected, allowing for timely intervention and preventive
measures.
Data monitoring involves collecting and analyzing real-time data from sensors within the
system, storing it in a centralized database or cloud-based platform for easy access and
analysis. This monitoring serves several purposes. First, it performs maintenance under
performing components, enabling prompt or replacement to maximize energy generation.
Second, it detects and diagnoses anomalies such as shading or faulty connections that can
decrease system efficiency. Additionally, long-term data analysis helps assess system
performance trends, identify degradation, and plan for upgrades or improvements.
In summary, data monitoring is essential for optimizing solar power system performance,
detecting anomalies, and extending their lifespan. Effective monitoring techniques and
tools are crucial for reliable operation, whether in residential settings or large-scale solar
farms. By harnessing the power of data, solar power systems can achieve higher efficiency
and contribute to a sustainable energy future.
2.2 Introduction to Machine Learning Algorithms for Anomaly Detection

Machine learning algorithms offer powerful techniques for anomaly detection in various
domains, including solar panel systems. These algorithms utilize historical data patterns to
learn normal behavior and detect deviations or anomalies. Supervised algorithms, such as
Support Vector Machines (SVM) and Random Forests, use labeled data to train a model
that can classify new instances as normal or anomalous. Unsupervised algorithms, such as
Isolation Forest and Gaussian Mixture Models, identify anomalies by learning the
underlying data distribution and detecting instances that deviate significantly. Hybrid
approaches, combining both supervised and unsupervised techniques, offer a versatile
approach to anomaly detection. These machine learning algorithms provide effective tools
for identifying outliers and anomalies in solar panel data.
METHODOLOGY
3.1 Data Collection and Preprocessing

Data collection and preprocessing are vital steps in the process of developing a robust
anomaly detection system for solar panel data. Accurate and reliable data is essential to
train machine learning algorithms and ensure effective outlier detection. This section
provides an overview of the data collection and preprocessing techniques involved in the
research.
3.1.1 Data Collection:

The first step in data collection is to identify the key parameters that need to be monitored,
such as irradiance, voltage, current, and temperature. Sensors and instruments are installed
in the solar panel system to continuously measure and record these parameters. The data
collection process involves setting up data acquisition systems that capture real-time data
from the sensors at regular intervals. This ensures a comprehensive and up-to-date dataset
for analysis.
The above data is collected from the factory, it is formatted as csv . document. To read and
process data as a csv file we need to use the Pandas library, Numpy library, Matplotlib
library, Seaborn library, … We will analyze the data using a Colaboratory file or a Jupyter
notebook file. Here I used Colaboratory file
We will call data from csv file by Pandas library, like as:
3.1.2 Data Cleaning

Raw data collected from the sensors may contain errors, missing values, or outliers that
can adversely affect the analysis. Data cleaning involves removing or correcting these
anomalies to ensure the integrity and quality of the dataset. Missing values can be imputed
using techniques like mean, median, or interpolation methods. Outliers can be identified
and treated through statistical methods or domain-specific knowledge.
Shall we check if solar data is missing? If it is missing, we can use methods to deal with it.
But here I will remove the missing data because my data is enough.
As we see, the data is missing, namely I_AC, I_DC, U_AC, U_DC missing 2533 values.
Irradiance and T_MODULE are missing 2298 values. So we will delete by most missing
value we will use the df.dropna() function.
3.1.3 Data Normalization

Data normalization is essential to ensure that all features or parameters are on a similar
scale. Normalizing the data helps prevent any particular feature from dominating the
analysis and ensures fair comparisons between different parameters. Common techniques
for data normalization include min-max scaling or z-score normalization.
We can apply the MinMaxScaler to the Solar dataset directly to normalize the input
variables.
We will use the default configuration and scale values to the range 0 and 1. First, a
MinMaxScaler instance is defined with default hyperparameters. Once defined, we can call
the fit_transform() function and pass it to our dataset to create a transformed version of our
dataset.
Transform for data training:
Transform for data testing:
3.1.4 Data Analysis

I created a bar chart to visualize the monthly variations in Irradiance data, allowing for
easy interpretation and comparison of the data.
I represented the data of one day properties on the same line chart. The effect of this
display is that I will use it to push to the website, users can choose the date they want to
see.
The correlation coefficient is a measure of some kind of correlation, that is, a statistical
relationship between two variables.
● When this correlation is less than 0, it means that the relationship is inverse, when
one variable increases, the other decreases and vice versa
● When this correlation is greater than 0, it means that the relationship is positive,
when one variable increases, the other increases and when this variable decreases,
the other decreases.
● When this correlation is 0, 2 variables are not related to each other.
Show correlation matrix. This matrix displays all the correlation values between the data
columns. In practice, this display is often to identify pairs of strongly correlated features
from which to make a decision not to use either feature to build the model.
We can see outliers outside of linear regression as shown below:
3.1.5 Data Splitting

To evaluate the performance of the anomaly detection system, the dataset is divided into
training and testing subsets. The training set is used to train the machine learning
algorithm, while the testing set is used to evaluate its performance. The splitting of data
ensures that the model is trained on a representative sample and can generalize well to
unseen data.
Data collection and preprocessing lay the foundation for accurate and reliable anomaly
detection in solar panel data. These steps ensure that the dataset is clean, normalized, and
appropriately prepared for training machine learning algorithms. By meticulously
collecting and preprocessing the data, the research can build a robust anomaly detection
system capable of accurately identifying outliers and anomalies in the solar panel system.
3.2 Selection and Implementation of Machine Learning Algorithm
The OneClassSVM algorithm was chosen for anomaly detection in solar panel data due to
its effectiveness in unsupervised outlier detection. By finding a hyperplane to separate
normal data from outliers, OneClassSVM maximizes the margin around normal data
points. The algorithm utilizes kernel functions to map the data into a higher-dimensional
space and constructs a hyperplane defined by support vectors. OneClassSVM is capable of
handling non-linear data distributions and is robust to noise. Implementing the algorithm
allowed for the identification of anomalies in solar panel data by detecting deviations from
normal patterns. Overall, OneClassSVM provides a powerful tool for unattended anomaly
detection, contributing to the development of a reliable monitoring system for solar panels.
This is my result:
It creates a boundary around the normal instances in a dataset, considering them as the
only class during training. Instances that fall outside the boundary are considered outliers.
How can the boundary be evaluated?
3.3 Evaluating the Effectiveness of the OneClassSVM Model through Data

Visualization
To assess the effectiveness of the OneClassSVM model in detecting anomalies in the solar
panel data, data visualization techniques can be employed. By plotting the data and
visualizing the results, the performance of the model can be evaluated. Here's how the
evaluation can be conducted:
Scatter Plot: A scatter plot can be created to visualize the distribution of the solar panel
data. Normal data points can be plotted in one color, while the outliers detected by the
OneClassSVM model can be highlighted in a different color. This allows for a visual
inspection of how well the model identifies and separates anomalies from the normal data
instances.
Decision Boundary Diagram: Another visualization technique involves plotting the

decision boundary graph generated by the OneClassSVM model. This boundary represents
the area that separates the normal data from the anomaly. In order to be able to draw the
best boundaries, experts in solar monitoring will know what is an anomaly when looking at
the scatter chart above. I give the boundary for this problem as follows:
This is result:
By using research data visualization techniques it is possible to better understand the
effectiveness of the OneClassSVM model in accurately detecting anomalies in the solar
panel data. These assessments aid in assessing the model's performance and its potential
for real-world deployment in monitoring solar panel systems.
3.4 Design and deploy model machine learning of the anomaly detection on the
ưebsite
To make the anomaly detection model accessible and user-friendly, it can be designed and
deployed on a website using Docker and Flask API. This allows users to interact with the
model and obtain anomaly predictions conveniently. Here's an outline of the design and
deployment process:
These are the files in the directory you need:

Model Integration: The trained OneClassSVM model for anomaly detection needs to be
integrated into the Flask web application. This involves loading the model and necessary
dependencies into the application's codebase.
Web Interface Design: Create an intuitive and user-friendly web interface where users can
input their solar panel data for anomaly detection. Design the interface to accept relevant
parameters such as irradiance, voltage, and current values.
Dockerization: Dockerize the Flask application to ensure easy deployment and portability.
Create a Dockerfile that specifies the application's dependencies, configurations, and
runtime environment. Containerization and Deployment: Build a Docker container using
the Dockerfile and deploy it on a web server or cloud platform of your choice. This allows
the web application, along with the anomaly detection model, to be easily deployed and
scaled.
API Development: Implement a Flask API to expose the anomaly detection functionality.
This enables developers or other applications to interact with the model programmatically,
allowing for integration with external systems.
Steps to run the application:

● Step 1: Run command
$ docker compose up --build -d --force-recreate
● Step 2: Wait 5 minutes - Until container log show something like this
● Step 3: Go to http://127.0.0.1:8050/
This is my result:
By following these steps, the anomaly detection model can be seamlessly integrated into a
user-friendly web application. Users can access the website, input their solar panel data,
and receive real-time anomaly predictions. The use of Docker ensures easy deployment
and scalability, while the Flask API enables seamless integration with other systems. This
design and deployment approach the accessibility and usability of the anomaly detection
model in monitoring solar panel systems.
CONCLUSION
4.1 Summary of Findings and Contributions

In this research, a comprehensive analysis of anomaly detection in solar panel systems was
conducted using the OneClassSVM machine learning algorithm. The findings highlight the
importance of 24/7 monitoring in solar panel factories to optimize performance and
prevent revenue loss. The application of the OneClassSVM algorithm demonstrated
effective outlier detection, enabling the identification of anomalies in irradiance, voltage,
and power generation efficiency. The design and deployment of the anomaly detection
model on a website using Docker and Flask API provided a user-friendly interface for real-
time anomaly predictions. This research contributes to the development of a reliable and
accessible system for monitoring and maintaining the efficiency of solar panel systems,
ultimately aiding in sustainable energy generation.
4.2 Closing Remarks and Reflection on the TEEP Program

Reflecting on my experience in the TEEP program in Taiwan, I am filled with gratitude for
the invaluable opportunities and experiences it has provided. During my three-month
internship, I had the chance to visit numerous beautiful places in Taiwan, immersing
myself in its rich culture and heritage. The cultural exchanges and school activities
allowed me to broaden my horizons and gain a deeper understanding of Taiwanese
traditions and customs.
Furthermore, the TEEP program facilitates the formation of meaningful connections and
friendships with fellow interns from different parts of the world. Collaborating with them
on projects and sharing our diverse perspectives was truly enriching. The program fostered
a supportive and collaborative environment, promoting personal growth and professional
development.
In addition, the TEEP program equipped me with practical skills and knowledge through
the hands-on experience gained during my internship. The opportunity to work on the solar
panel anomaly detection project enhanced my understanding of machine learning
algorithms and their application in real-world scenarios. The skills acquired will
undoubtedly contribute to my future academic and professional pursuits.
Overall, my participation in the TEEP program has been a transformative journey filled
with valuable experiences, cultural immersion, and lifelong friendships. I am immensely
grateful for the support and guidance provided by the program organizers and mentors.
The TEEP program has undoubtedly shaped my personal and professional growth, leaving
a lasting impact on my life.
4.3 Acknowledgments and Thanks to Teachers
I am extremely grateful to Professor Kemo and English Teacher Cindy for their invaluable
guidance, support, and mentorship throughout my TEEP journey. Professor Kemo's
expertise in solar panel systems and anomaly detection has been instrumental in
overcoming challenges and deepening my understanding of the subject. I am truly thankful
for his patience and encouragement. English Teacher Cindy's unwavering support and
assistance in improving my language skills have greatly contributed to my overall
development. Her feedback and motivation have played a vital role in my growth as a
student and researcher. I feel privileged to have had the opportunity to work with such
exceptional teachers who have inspired me to strive for excellence. Their mentorship has
left a lasting impact on my personal and academic growth. I extend my sincere
appreciation for their contributions to my TEEP experience.

Teep Report Quynh

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Teep Report Quynh

Uploaded by

Copyright:

Available Formats

Southern Taiwan University of Technology

Harnessing Anomalies in Solar Data

1.1 Background of the TEEP Internship Program in Taiwan

1.2 Purpose and Objectives of the Report

1.3 Problem Statement and Significance of the Research

2.1 Overview of Solar Power Systems and Data Monitoring

2.2 Introduction to Machine Learning Algorithms for Anomaly Detection

3.1 Data Collection and Preprocessing

3.1.1 Data Collection:

3.1.2 Data Cleaning

3.1.3 Data Normalization

Transform for data testing:

3.1.4 Data Analysis

3.1.5 Data Splitting

3.3 Evaluating the Effectiveness of the OneClassSVM Model through Data

Decision Boundary Diagram: Another visualization technique involves plotting the

These are the files in the directory you need:

Steps to run the application:

$ docker compose up --build -d --force-recreate

4.1 Summary of Findings and Contributions

4.2 Closing Remarks and Reflection on the TEEP Program

4.3 Acknowledgments and Thanks to Teachers

You might also like