You are on page 1of 4

Original Work Log

Anish Saripella

Day(s) Time Tasks Accomplished Reflection Next Goals

12/17/2019 10:30AM - I talked about what the Original Work entails and my I really liked the diabetes idea Come up with multiple
11:30 AM original work idea to my mentor Mr. Madan. This at the time, but I am glad Mr. Original Work Ideas
(1 hour) consisted of the idea​ to create a model that would Madan offered his advice that will incorporate
predict whether a person is at risk for developing because I am very excited to data science in the
diabetes. The variables part of this model would apply for data science in the sports industry.
have included the person’s BMI, family history of sports industry.
diabetes, and metabolism rate. We discussed my
true passion and identified it as the use of data
science in sports. Mr. Madan told the model
predicting diabetes is a great model, but I would be
more motivated and would learn more if I followed
an idea that I felt passionate about.

02/04/2020 10:30 AM- We discussed the ideas I came up with and discussed I learned a lot about the Work on the template
11:30 AM each idea with great detail. We talked about the process data scientists go provided by Mr. Madan
(1 hour) variables involved in each model and the limitations through to see what are the to see what steps I have
and benefits of each idea. We came to the conclusion resources they have and what already completed and
that predicting salary using stats from the previous resources they need to solve analyze the graph to
year. We chose this idea because​ predicting the the problem at hand. I also understand the process.
salary of an NBA player based on the previous year’s decided on the idea and
stats. We chose this idea because the salary was looking back at it, I believe I https://drive.google.co
more generic compared to the other ideas we made the right choice. m/file/d/1aGwYlVAK
discussed, and most of the variables were already NE61GcFwbCYW6tDv
identified. The dataset at hand was also already in DrxZF7m1/view?usp=s
CSV form and was already cleaned. We also haring
discussed some of the research and skills that would
go into completing the project.
02/08/2020 2:30 PM- Analyzed and thought about the processes mentioned This graphic helped me Meet with Mr. Madan
3:30 PM in the graphic. Built a rough timeline on visualize how the model was to create an elaborate
(1 hour) approximately how long each step will take and going to be built. It is a great plan and a timeline for
decided on what steps to skip. Worked on a few tool, especially for someone the Original Work.
steps to get a head start. who has not built a model
before.

02/12/2020 11:00 AM- We further discussed the idea itself and highlighted A detailed plan is the basis of a Start annotating the
12:15 PM some of my limitations and areas that need successful project. This plan various notebooks
(1 hour improvement. In the second part of the meeting, Mr. we laid out helps me complete belonging to the
15min) Madan and I highlighted a plan for me to follow to the project with time to spare. kernels present on the
achieve my goal of completing my Original Work. We discussed the specifics and Kaggle dataset.
The plan first highlighted the steps that I would what I will have to do to
need to take. These steps were analyzing the kernels accomplish this project.
on the Kaggle dataset. This would help me learn
about the variables and the impact they have on the
algorithm. Then I would have to customize the code
to my specifications and choose the correct models
and variables. After that, I have to revise the model
and fix the code based on the performance
indicators. We also identified a timeline that will
help me stay on track.

02/15/2020 09:30 PM- Annotated the Inferential Statistics Notebook. Learned about some statistical Continue annotating
10:30 PM Learned about some correlations between certain functions and graphing Inferential Statistics
(1 hour) variables and learned about some functions to build functions in Python. It also
such graphs in R creator. focused on the correlations
between variables. I also had
some questions regarding the
coding part of the Notebook.
02/16/2020 10:00 PM - Continued annotating Inferential Statistics. Analyzed Graph analysis plays a huge Complete the
10:45 PM the graphs provided in the notebook to gain a better role in understanding the annotation of
(45 min) understanding of the relationship between the salary relationship between the main Inferential Statistics
and variables. object and the surrounding Notebook and get
variables. started on the Multiple
Regression Notebook.

02/24/2020 7:30 PM- Completed the annotation of the Inferential Statistics By analyzing the creator’s Focus on the code and
8:30 PM Notebook. Began annotating the Multiple Regression comments, I was able to better continue annotating the
(1 hour) Notebook. Analyzed the creator’s comments on the understand his reasoning for Multiple Regression
code and graphs provided. using certain models over Notebook.
others.

02/29/2020 09:00 AM- Annotated the code for the Multiple Regression Understood the importance of Completer the
10:00AM Notebook. Analyzed why he used the parallel model using the right models and part annotation of Multiple
(1 hour) and the linear regression model. of the code behind this kernel. Regression in the next
sitting

03/01/2020 10:00 AM- Completed the annotations of the code. Researched Understood some of the stats The next step is to
12:00 PM the statistics of choosing the right variable. The behind choosing the right modify this code to my
(2 hours) P-Value calculated needs to be less than the alpha variables. I am also done with liking and involving the
variable for us to keep the variable. If it is over the annotating the code. variables I see fit. I also
value, we cannot reject the variable’s null need to decide on the
hypothesis, and therefore, the variable is no longer a regression I will be
part of the model. using.

03/03/2020 11:20 AM- Explored customizing code by deciding on a few I plan on implementing both Start building my own
12:00 AM variables and the regression models based on the linear regression and multiple model.
(40 min) annotations on the Notebook. Researched the parallel regression to see which one
slopes model to get a better understanding of it as it better fits the dataset.
was used in the Multiple Regression Notebook.
03/08/2020 11:00 AM Further customized code to my liking. Watched It was the first time I was able We need to explore
- many videos regarding how to accurately input to code in Python by myself. more statistical
12:15 PM datasets into the Python program. I also explored Learning Java definitely made functions offered by
(1 hour some of the functions offered by the Pandas software it somewhat easier to code in Pandas and also meet
15 min) library. this new language. with Mr. Madan to
discuss the next steps.

03/13/2020 2:00 PM- Watched videos on more Pandas functions to better I learned more about coding in Reschedule the meeting
3:00 PM find the key statistical components needed to analyze Python and truly understand with Mr. Madan to
(1 hour) the data. I also watched videos and learned how to the importance of code in data discuss the next steps
compose graphs and other statistical figures in R. science. and ask the questions
identified.

03/16/2020 5:00 PM - Looked at other kernels in the notebook to explore The jargon in Python is very Talk with Mr. Madan
6:30 PM other inferential stats and other models that people different from Java. I need to ASAP to discuss the
(1 hour used. Also looked at different ways people used the learn how to truly code in next steps. Falling
30min) dataset. Also looked at the types of functions people Python and ask for help behind schedule and
used in R and Python because I was confused on because I don't truly somewhat stuck where
how to use certain functions. understand how to write the I am.
code. However, I do
understand the use of the code
and how it affects the project.

You might also like