You are on page 1of 4

 My courses / ME204 / 10 July - 16 July / ✍️ Midterm Assignment

ME204: Data Engineering for the Social World


Data science offers exciting possibilities for social scientists, but it's often underestimated how much time is spent on data cleaning and pre-processing.
This course focuses on essential data handling skills, enabling you to extract insights from your data before applying complex algorithms.
By the end, you'll create visual dashboards that showcase your data-wrangling abilities and emphasize the importance of data cleaning in data science.

✍️ Midterm Assignment
⏲️ Due Date:
Tuesday, 18 July 2023 at 23:59:59, UK Time. (but hopefully, it won’t take you more than a few hours to complete it)
This is worth 25% of your final grade.
📝 Instructions:
1. Go to the #general channel of our Slack workspace to find a GitHub Classroom link. Do not share this link with anyone else, this is a private assignment for you.
2. Click on the link, sign in to GitHub if needed and then click on the green button Accept this assignment.
3. You will be redirected to a new repository created for you. The repository will be named me204-2023-midterm-assignment--yourusername, where yourusername is
your GitHub username. The repository will be private and will contain only a README.md file with the instructions for the assignment.
4. All your instructions will appear to you there. Don’t edit the README file, just follow the instructions and complete the assignment.
This assignment, a problem set, incorporates the key topics covered in the first week of the course, such as dplyr, xml2, rvest, writing custom functions, using
Quarto Markdown to generate HTML files, and working with GitHub.
“How do I submit?”
You don’t need to click t osubmit anything. Your assignment will be automatically submitted when you commit AND push your changes to GitHub. You can push your
changes as many times as you want before the deadline. We will only grade the last version of your assignment.
✔️ How you wil be assessed:
1. You will be granted a maximum of 100 points for the whole assignment. You will see how much each task is right next to the tasks’ names.
2. This assessment is worth 25% of your final grade.
3. You will be assessed on correctness, efficiency, and style. (see expectations below)
4. You will lose points if the filenames are not correct or if you do not follow the instructions precisely.
5. You can only expect to get full marks if you have done a pristine job. That is, we can’t find any mistakes in your code, your code is efficient and your markdown
file is so well formatted and well documented (with comments) that it is an absolute pleasure and joy to read.
Marking scheme expectations
Percentage Mark Letter Grade Equivalent
80+ A+
70-79 A
65-69 A-
60-64 B+
50-59 B
48-49 B-
42-47 C+
40-41 C
39 or less F
You should expect to earn around B+ or A- points (good and very good scores!) if you have followed all instructions correctly, although you might have made some
inefficient choices in your code or your markdown is not well formatted. For instance, if you did not create custom R functions when it could have made your code
more efficient, you didn’t use good data types or the layout and aesthetics of your markdown file were not particularly clear and easy to follow.
You should expect closer to an A if, on top of following all instructions to the letter, your code looks really neat and organised, to a point where we felt impressed.
The HTML produced by your code is well-formatted and easy to read.
You should expect more >70/100 (the upper band of A and beyond) only if, on top of being correct and well-formatted and efficient, your submission contained
some advanced tidyverse operations and functions that were really impressing, advanced, well documented and well reasoned!
You should expect less than 55/100 if you did not follow the instructions or if you did not produce the right output files or if you did not use any functions or any of
the dplr/tidyverse functions we have been exploring in class.
You will receive feedback by Friday, 21 July 2023.

🗨️ How to get help?


What we CANNOT accept:
sharing your entire script with others — but it is ok to share small pieces of code to ask for help like the type of code people share on Stackoverflow
asking others to do your work for you (LSE regulations on plagiarism)
If your code is incredibly similar to someone else’s, we will assume that you have copied it. We will not accept this.
If you copied code from a colleague, we will award both of you 0 points and we will notify LSE Student Services.
What is OK to do:
It is okay to ask instructors clarifying questions about the instructions either on Slack or in class.
Yes, you can take advantage of search engines, Stack Overflow and 🤖 AI assistants such as Github Copilot and ChatGPT (they can be handy for tackling
complex tasks if you know how to write a prompt).
However, I ask that you tell us (in your markdown) how you prompted the AI tools. You can either add this to the bottom of your markdown file or within
sections or comments of your code. You won’t be punished for using ChatGPT even if you used it intensely.
Even if the AI assistants give you a valid answer, ask yourself if you understand what the code does. If you feel the solution is obscure, try to look up the
documentation pages or ask others for help on Slack. This is as valuable as way of learning as searching for solutions on the Web.
I find that these tools can be of little help if you don’t understand the basics of a library (say, dplyr). Try to keep tabs open with the documentation pages to
cross-reference the code you are writing with the documentation.
It is okay to team up with your group/class colleagues to work on the problems (conceptually) together.
For example, you can discuss the logic of the code together (“I will first scrape the names of the recipes and then I will scrape the links to the recipes”)
But you should not share your entire code with others.
It is also ok to share ideas of prompt engineering with others. For example: ‘Guys, I find that if you ask ChatGPT “How do I retrieve the class attribute from
this HTML tag?” it gives you a good answer’
It is ok to use Slack to share links to useful content
Share things like “Tip: I found this alternative way to convert a CSV to an XML in R that is much faster than the one we saw in class” or “I found this really
useful tutorial on how to scrape data from a website using rvest”.
It is also ok to ask generic programming-related questions publicly on Slack. For example, you can ask questions like:
“How do I test if a function works?” or
“Does anyone know how to create a for loop inside a for loop well?” or
“How do I format … in Quarto?”
Last modified: Thursday, 13 July 2023, 12:33 PM

◄ ✍ (Optional) Formative Assignment Jump to... 🗓️ Week 02 – Day 01: Neat functions,
(W01 D04) Due 14 July testing and debugging ►

You might also like