Professional Documents
Culture Documents
chegg.com/homework-help/questions-and-answers/two-datasets-tripstxt-records-trip-information-taxistxt-taxi-
information-tripstxt-taxistxt-q118382967
Question
(0)
You have two datasets: Trips.txt which records trip information, and Taxis.txt which is about
taxi information.
Both Trips.txt and Taxis.txt are stored on HDFS. Complete the following MapReduce
programming tasks
with Python. Note that using any other language like Java will directly lead to a 0 mark on the
assignment.
Also, you are not allowed to use any Python MapReduce library such as mrjob.
A sample of Taxis.txt A sample of Trips.txt
Taxi#, company, model, year
470,0,80,2018
332,11,88,2013
254,10,62,2018
460,4,90,2022
113,6,23,2015
275,16,13,2015
318,14,46,2014
Trip#, Taxi#, fare, distance, pickup_x, pickup_y, dropoff_x, dropoff_y
0,354,232.64,127.23,46.069,85.566,10.355,4.83
1,173,283.7,150.74,5.02,31.765,88.386,27.265
2,8,83.84,43.17,63.269,33.156,92.953,60.647
3,340,259.2,136.3,14.729,13.356,14.304,90.273
4,32,270.07,152.65,27.965,13.37,77.925,62.82
5,64,378.31,202.95,1.145,94.519,98.296,35.469
6,480,235.98,121.23,66.982,66.912,5.02,31.765
7,410,293.16,162.29,2.841,95.636,91.029,16.232
For each taxi, count the number of trips and the average distance per trip by developing
MapReduce programs with Python. The program should implement in-mapper combining
with state preserved across lines. The code must work for 3 reducers. You need to submit a
shell script named task1-run.sh. Running the shell script, the task is performed where the
shell script and code files are in the same folder (no subfolders).
Expert Answer
This solution was written by a subject matter expert. It's designed to help students like you
learn core concepts.
Step-by-step
1st step
All steps
Answer only
Step 1/2
Certainly! To solve this problem using MapReduce in Python, you can follow the steps below.
I'll provide you with the Python code for the Mapper and Reducer, as well as the shell script
to run the task.
The mapper reads lines from Trips.txt and emits the Taxi# as the key and the distance as
the value. It uses in-mapper combining to aggregate the data.
#!/usr/bin/env python
import sys
# Initialize a dictionary to hold Taxi# and its corresponding distance and count
taxi_data = {}
taxi_id = fields[1]
distance = float(fields[3])
# In-mapper combining
if taxi_id in taxi_data:
taxi_data[taxi_id][0] += distance # Update distance
taxi_data[taxi_id][1] += 1 # Update count
else:
taxi_data[taxi_id] = [distance, 1] # Initialize distance and count
The reducer reads the output from the mapper, aggregates the data, and calculates the
average distance per trip for each taxi.
#!/usr/bin/env python
import sys
# Initialize variables
current_taxi = None
total_distance = 0.0
total_count = 0
if current_taxi == taxi_id:
total_distance += distance
total_count += count
else:
if current_taxi:
avg_distance = total_distance / total_count
print(f"{current_taxi}\t{total_count}\t{avg_distance}")
current_taxi = taxi_id
total_distance = distance
total_count = count
Step 2/2
This shell script assumes that Trips.txt and Taxis.txt are stored in HDFS and that the
mapper and reducer Python files are in the same directory.
#!/bin/bash
Make sure to give execute permissions to your Python and shell script files:
Explanation:
Now, you can run the shell script task1-run.sh to execute the MapReduce job. Make sure to
replace /path/to/Trips.txt with the actual HDFS path to your Trips.txt file.
🤎
Final answer
Dear Student√ We are expecting just one like for our effort nothing more than that it's save
our accounts from block.
Thank you 😊
Have an awesome day! 📚
Was this answer helpful?
Post a question
Q:
What are the stats of the penalties that have been incurred bygender? Display the
gender as “Gender”, count as “PenaltyCount”, sum as “Penalty Sum”, average as
“Penalty Average”, lowestamount as ‘Minimum Penalty’, and highest amount as
‘MaximumPenalty’ of all the penalties incurred by gender. Insert yourscreenshot
here.2. Which players have lost more matches than the average numberof losses? No
duplicates should be listed. Order by player nu...
A:
See answer
100% (1 rating)
Q:
A:
See answer