You are on page 1of 5

AMITY UNIVERSITY

Uttar Pradesh Lucknow Campus


Amity Institute of Information Technology
WEEKLY PROGRESS REPORT
For the week commencing from: 5 June 2023 – 9 June 2023

WPR No. 1

Name of the student: Shaurya Upadhyay

Enrollment Number: A7304822105

Program: Bachelor of Computer Applications (2019-2022) Semester: II

Name of the Non-Teaching Credit Course: Summer Internship – 1 [ETTP100]

Organization Name: AIIT, Amity University Uttar Pradesh, Lucknow

Faculty Guide’s Name: Dr. Ajay Pratap

Project Title: Detecting Fraud Apps Using Sentiment Analysis Using Machine Learning

Targets set for the week Achievements for the week Future work plans
Here is a breakdown of Here are suggested Here are suggested future
suggested targets for each day achievements for each day of work plans for each day of the
of the week in the project the week in the project week in the project "Detecting
"Detecting Fraud Apps Using "Detecting Fraud Apps Using Fraud Apps Using Sentiment
Sentiment Analysis Using Sentiment Analysis Using Analysis Using Machine
Machine Learning": Monday: Machine Learning": Monday: Learning": Monday: 1.
1. Gather a comprehensive 1. Conducted a comprehensive Explore advanced sentiment
dataset of app reviews, literature review on fraud analysis techniques, such as
including both legitimate and detection in mobile apps, deep learning models or
fraudulent samples, from sentiment analysis techniques, transformer-based models, to
various app marketplaces or and machine learning improve the accuracy and
sources. 2. Preprocess the algorithms. 2. Defined the granularity of sentiment
dataset by cleaning and project's objectives, scope, analysis. 2. Investigate the use
normalizing the text data, and success criteria. 3. Set up of domain-specific sentiment
removing irrelevant the development environment, lexicons or pre-trained
information, and handling any including necessary libraries language models tailored for
missing or noisy data. 3. and frameworks. 4. Gathered a app reviews to capture
Explore and implement diverse dataset of app reviews, nuances and domain-specific
different sentiment analysis both legitimate and fraudulent, language. 3. Experiment with
techniques, such as lexicon- from various sources or app different text preprocessing
based approaches, machine marketplaces. Tuesday: 1. techniques, such as stemming,
learning models, or pre- Preprocessed the dataset by lemmatization, or part-of-
trained language models, to cleaning and normalizing the speech tagging, to further
extract sentiment information text data, removing stop enhance the quality of the text
from the app reviews. 4. words, and handling missing data. 4. Consider
Evaluate the performance of or noisy data. 2. Performed incorporating semantic
the sentiment analysis models exploratory data analysis on analysis or entity recognition
using appropriate metrics and the dataset, gaining insights to extract additional
select the most effective into the distribution of ratings, information from the app
approach for further review lengths, and sentiment reviews. Tuesday: 1.
development. Tuesday: 1. distribution. 3. Implemented Research and implement
Design and implement a sentiment analysis techniques, advanced feature engineering
feature extraction pipeline that such as lexicon-based techniques, such as word
transforms the preprocessed approaches or machine embeddings (e.g., Word2Vec,
app reviews into numerical learning models, and GloVe) or contextualized
features, considering aspects evaluated their performance. embeddings (e.g., BERT,
such as sentiment scores, 4. Selected the most effective ELMo), to capture richer
textual representations, or sentiment analysis approach semantic information in the
other relevant information. 2. for further development. app reviews. 2. Explore the
Explore additional features or Wednesday: 1. Designed and use of topic modeling
metadata that can be implemented a feature algorithms, such as Latent
incorporated into the feature extraction pipeline that Dirichlet Allocation (LDA) or
extraction process to enhance transformed the preprocessed Non-Negative Matrix
fraud detection, such as user app reviews into numerical Factorization (NMF), to
demographics, app ratings, or features, incorporating identify key topics or themes
temporal information. 3. sentiment scores and other in the app reviews related to
Analyze the feature relevant information. 2. fraud. 3. Investigate the
importance and relevance to Explored additional features potential impact of temporal
fraud detection using or metadata to enhance fraud features, such as review
techniques like correlation detection, such as user timestamps or release dates,
analysis or feature selection demographics, app ratings, or on fraud detection and
algorithms. temporal information. 3. incorporate them into the
4. Prepare the feature dataset, Analyzed feature importance feature extraction pipeline. 4.
ensuring it is properly and relevance using Consider feature engineering
formatted and compatible with techniques like correlation techniques specific to fraud
the selected machine learning analysis or feature selection detection, such as extracting
algorithms. Wednesday: algorithms. 4. Prepared the linguistic patterns or syntactic
1. Research and select suitable feature dataset, ensuring it structures indicative of
machine learning algorithms was properly formatted and fraudulent reviews.
for fraud detection, such as compatible with the chosen Wednesday: 1. Experiment
logistic regression, support machine learning algorithms. with different machine
vector machines, random Thursday: 1. Researched and learning algorithms, such as
forests, or neural networks selected suitable machine gradient boosting machines,
. 2. Implement the chosen learning algorithms for fraud deep neural networks, or
machine learning algorithms detection, such as logistic hybrid models, to improve the
and train them using the regression, support vector fraud detection performance.
prepared feature dataset, machines, random forests, or 2. Explore techniques to
applying appropriate data neural networks. 2. address model interpretability,
splitting techniques (e.g., Implemented the chosen such as SHAP values, LIME,
cross-validation) to assess machine learning algorithms or rule-based approaches, to
their performance. and trained them using the provide explanations for the
3. Optimize the prepared feature dataset. 3. predictions made by the
hyperparameters of the Optimized the machine learning models. 3.
machine learning models hyperparameters of the Investigate the use of active
using techniques like grid machine learning models learning strategies to
search, random search, or using techniques like grid iteratively label and
Bayesian optimization to search, random search, or incorporate additional high-
improve their accuracy and Bayesian optimization. 4. impact data points into the
generalization capabilities. Evaluated the trained machine training process, improving
4. Evaluate the trained learning models using various the model's performance over
machine learning models performance metrics, such as time. 4. Consider
using various performance accuracy, precision, recall, F1- incorporating external data
metrics, including accuracy, score, or area under the ROC sources, such as social media
precision, recall, F1-score, or curve. Friday: 1. Explored feeds or user behavior data, to
area under the ROC curve. techniques to handle class enrich the fraud detection
imbalance in the dataset, such models and capture additional
Thursday: 1. Explore as oversampling, signals. Thursday: 1.
techniques to handle class undersampling, or class- Experiment with ensemble
imbalance in the dataset, as weighted approaches. 2. learning techniques, such as
fraud cases are typically rare Investigated ensemble stacking, blending, or model
compared to legitimate cases. learning methods, such as averaging, to combine the
Consider approaches like bagging or boosting, and predictions of multiple
oversampling, undersampling, implemented them to improve machine learning models and
or class-weighted techniques the fraud detection further boost fraud detection
to address this issue. 2. performance. 3. Conducted performance. 2. Investigate
Investigate ensemble learning further analysis and techniques for model
methods, such as bagging or interpretation of the model calibration to ensure well-
boosting, to combine the results, such as analyzing calibrated probability
predictions of multiple decision boundaries or estimates, allowing for more
machine learning models and generating explanations for reliable decision-making. 3.
improve the overall fraud fraud detection outcomes. 4. Explore transfer learning
detection performance. 3. Fine-tuned the fraud detection approaches, where knowledge
Implement and train the models based on insights and learned from related tasks or
ensemble models, evaluating analysis. Saturday: 1. domains can be leveraged to
their performance and Conducted extensive testing improve fraud detection in app
comparing them against and evaluation of the final reviews. 4. Consider the
individual models. 4. Conduct fraud detection models using potential impact of adversarial
further analysis and separate test datasets or real- attacks and explore techniques
interpretation of the model world app reviews. 2. to enhance the robustness of
results, such as analyzing the Validated the models against the fraud detection models
model's decision boundaries known fraud cases or ground against such attacks. Friday:
or generating explanations for truth labels, analyzing their 1. Evaluate the scalability and
fraud detection outcomes. performance on different efficiency of the fraud
Friday: 1. Fine-tune the fraud types of fraudulent activities. detection system, considering
detection models based on the 3. Documented the evaluation factors such as computational
insights and analysis from results, including performance resources, response time, and
previous days, making metrics, model comparisons, memory requirements. 2.
necessary adjustments to and any refinements made to Investigate the potential for
improve their performance. 2. the fraud detection models. 4. deploying the fraud detection
Conduct extensive testing and Prepared a comprehensive system in a real-time or
evaluation of the final fraud project report summarizing streaming environment,
detection models using the methodology, findings, enabling continuous
separate test datasets or real- and conclusions of the project. monitoring of app reviews for
world app reviews to assess Remember to adjust the fraud detection. 3. Conduct
their effectiveness and achievements based on the rigorous testing and validation
robustness. 3. Validate the specific progress made in your of the entire system,
models against known fraud project. Regularly considering various scenarios,
cases or ground truth labels, communicate with your team edge cases, and real-world
analyzing their performance members, seek guidance from data. 4. Perform a comparative
on different types of mentors or advisors, and track analysis of the developed
fraudulent activities. 4. progress to ensure successful system against existing fraud
Document the evaluation completion of the project. detection methods or
results, including performance commercial solutions,
metrics, model comparisons, highlighting the strengths and
and any refinements made to limitations of the proposed
the fraud detection models. approach. Saturday: 1.
Saturday: 1. Prepare a Prepare a detailed
comprehensive project report documentation of the project,
summarizing the including the implemented
methodology, findings, and methodologies, findings,
conclusions of the project. 2. challenges faced, and lessons
Include detailed descriptions learned. 2. Reflect on the
of the dataset, preprocessing limitations and potential
steps, sentiment analysis ethical considerations of the
techniques, feature extraction developed fraud detection
pipeline, machine learning system, addressing issues such
algorithms, and evaluation as bias, privacy, or fairness. 3.
metrics. 3. Present the Explore possibilities for
performance results, including collaboration or further
accuracy, precision, recall, F1- research in the field of fraud
score, or area under the ROC detection in mobile apps,
curve, highlighting the sentiment analysis, or related
effectiveness of the developed areas. 4. Present the project's
fraud detection system. 4. outcomes and insights to
Reflect on the overall project stakeholders, such as the
journey, lessons learned, and project team, advisors, or
potential future improvements potential users, and gather
or extensions to the fraud feedback for future
detection system. Remember improvements. Remember to
to adjust the targets based on adjust
the specific requirements and
timeline of your project.
Regularly communicate with
your team members, seek
guidance from mentors or
advisors, and track progress to
ensure successful completion
of the project.

Signature of the Student: ___Shaurya Upadhyay____________________


(Name of Student)

Signature of the Faculty Guide: __________________


(Name of Guide)

You might also like