0% found this document useful (0 votes)

28 views11 pages

MongoDB Analysis of Kathmandu Post Data

Uploaded by

brishav59

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views11 pages

MongoDB Analysis of Kathmandu Post Data

Uploaded by

brishav59

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

6CS030 Big Data

2022/23
Internal Assignment 2
This worksheet is based on MongoDB.

1. There are three JSON exports from Twitter.

You need to analyze just one of the JSON datasets.
First take your student number and divide it by 3. Use the remainder value (modulus) to
pick one of the following worksheets:
Remainder Value JSON Dataset to use Dataset Generated From
0 kathmandu [Link] Kathmandu Post
1 Nepal_Cricket.json Nepal Cricket
2 [Link] Nepal Republic Media

For example, if your student number is 1712345, 1712345/3= 2 so you would use the
[Link] dataset. See the Remainder spreadsheet if you are not sure how to do
this.
As per my university id,I got remainder value 0 which means that I used the
[Link] dataset.

2. Examine your dataset and carry out the following tasks:

Task no Task

a Import the data into your own MongoDB database:

- Show the command to do

Command: use assignment_2

Command: mongoimport --db mydb --collection KathmanduPost --file
"//Users/zeus/Downloads/Datasets_89718/[Link] ";

- Write a command to show how many documents are in your

collection

There were total of 550 document in that collections which is shown below

Command: [Link]()

b Analyze the data

Write command to
- Show one document
Command: [Link]()
- Show the unique values in one field

In this each value was represented on the language code corresponding to the
language which was used in the documents. Here what's the value means:
- da - Danish
- en - English
- hi - Hindi
- in - Indonesian
- ne - Nepali
- nl - Dutch
- und - Undefined or unknown language

Command: [Link]("lang")

- Shows a set of documents based on some criteria. Output just

two fields from the document

On the basis of the specified criteria, I had looked at the documents and I
have outputted just two fields from the documents i.e., ‘text’ and ‘created_at’.

Command : [Link]({ favorite_count: { $gt: 0 } }, { text: 1, id: 1 })

- Use a regular expression to search for some criteria. The
search should be case insensitive

I have searched for the word called ‘news’ in the documents within the field
‘text’ and I have made sure that the search ignored whether the letters are
uppercase of lowercase

Command:[Link](%7B text: %7B $regex: /news/i %7D %7D)

c Reshape the collection
Write a command to:
- Update a field within the collection

Firstly, I have found a field called “time_zone” which has a value called null in
the document. After that I have set the value of “time_zone” having null to
unknown.

Command: [Link]({ time_zone: null })

Command: [Link]({ time_zone: null }, { $set: {
time_zone: "unknown" } })

- Create a new collection based on a subset of the dataset.

Include a query to show a document from the new collection

For this an aggregation operation is performed, it first matches the documents

where the “time_zone” equals “unknown” . Then it outputs the results of this
aggregation to a new collection called “newCollection”.

Commands: [Link]([{ $match: { time_zone:

"unknown" } }, {
$out: "newCollection" }])

Now viewing the details stored in the “newCollection” collection.

d Name one advantage to using this approach for handling Big Data
and include a brief explanation of why you think this is an
advantage.

This approach enables swift processing and manipulation of data, making it a

distinct advantage in data management. Even with vast amounts of data,
complex tasks can be executed efficiently. It excels in tasks such as sorting,
grouping, and altering data of any size, facilitating improved decision-making
and maintaining a competitive edge.

e Name one disadvantage to using this approach for handling Big Data
and include a brief explanation of why you think this is a
disadvantage.

Managing and enhancing the aggregation process for handling extensive data
volumes can prove challenging. As data scales up, ensuring smooth aggregation
becomes increasingly complex. This necessitates a deep understanding of data
organization and tailored queries to optimize efficiency. Failure to manage these
aspects effectively can lead to process slowdowns across the board.
For this exercise you can either use the Mongo Shell or Python Notebook to carry out the
commands.
Python

If using a Python Notebook you will not be able to use the command to import data within the
notebook, however, you can document what command you ran in your notebook.
You can use the Print Option to create a PDF version of the file and upload it. Do check that it has
printed all the pages and not just the first page (if so, submit the notebook).

Upload

Upload one of the following: A Word Document, ipynb file or PDF version of the Python Notebook
which shows evidence of the above tasks.

Lab Sheet 06 - Introduction To NoSQL Databases Using MongoDB
No ratings yet
Lab Sheet 06 - Introduction To NoSQL Databases Using MongoDB
5 pages
Mongodb - Question Bank For IA
No ratings yet
Mongodb - Question Bank For IA
6 pages
Programming Assignment 3 v03
No ratings yet
Programming Assignment 3 v03
7 pages
UjwalBhattarai InternalAssignment2
No ratings yet
UjwalBhattarai InternalAssignment2
13 pages
Chapter 6
100% (1)
Chapter 6
51 pages
Bda 2
No ratings yet
Bda 2
2 pages
MongoDB Aggregation Workshop Guide
No ratings yet
MongoDB Aggregation Workshop Guide
26 pages
MongoDb Lab Progam Syllabus
No ratings yet
MongoDb Lab Progam Syllabus
3 pages
MongoDB MANUAL
No ratings yet
MongoDB MANUAL
25 pages
Updated Mongodb Lab Manual IV Sem
No ratings yet
Updated Mongodb Lab Manual IV Sem
48 pages
MongoDB Basics for Big Data Management
No ratings yet
MongoDB Basics for Big Data Management
4 pages
Les12 Mongo Part 3 Aggregation
No ratings yet
Les12 Mongo Part 3 Aggregation
9 pages
Next Generation Technologies in MongoDB
No ratings yet
Next Generation Technologies in MongoDB
172 pages
Big Training Data Module 2 - Mongo DB 2
No ratings yet
Big Training Data Module 2 - Mongo DB 2
67 pages
MongoDB Query and Theory Overview
No ratings yet
MongoDB Query and Theory Overview
1 page
MongoDB Data Import and Analysis Guide
No ratings yet
MongoDB Data Import and Analysis Guide
13 pages
MongoDB Aggregation Framework Guide
No ratings yet
MongoDB Aggregation Framework Guide
21 pages
ISIDS Programming Exam 2023-2024
No ratings yet
ISIDS Programming Exam 2023-2024
3 pages
Big Data
No ratings yet
Big Data
11 pages
Data Science Practical Workbook Overview
No ratings yet
Data Science Practical Workbook Overview
141 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
60 pages
MongoDB RDBMS Terminology and CRUD Guide
No ratings yet
MongoDB RDBMS Terminology and CRUD Guide
47 pages
Big Data-Unit 4
No ratings yet
Big Data-Unit 4
41 pages
Introduction to NoSQL Databases
No ratings yet
Introduction to NoSQL Databases
14 pages
MongoDB Teacher Collection Guide
No ratings yet
MongoDB Teacher Collection Guide
28 pages
MongoDB Exercise
No ratings yet
MongoDB Exercise
3 pages
Next Generation Database Exam Guide
No ratings yet
Next Generation Database Exam Guide
4 pages
Mongodb Lab Viva Questions
No ratings yet
Mongodb Lab Viva Questions
8 pages
Wa0005.
No ratings yet
Wa0005.
145 pages
Next Generation Database Exam Guide
No ratings yet
Next Generation Database Exam Guide
3 pages
DBMS MongoDB CRUD
No ratings yet
DBMS MongoDB CRUD
13 pages
MongoDB Employee and Population Management Guide
No ratings yet
MongoDB Employee and Population Management Guide
6 pages
SQL and PySpark Interview Questions
No ratings yet
SQL and PySpark Interview Questions
15 pages
Manual Group B Assignment No 1
No ratings yet
Manual Group B Assignment No 1
7 pages
Assignment 16 Utkarsh
No ratings yet
Assignment 16 Utkarsh
8 pages
Module-III MongoDB
No ratings yet
Module-III MongoDB
89 pages
MongoDB Developer Exam Prep
No ratings yet
MongoDB Developer Exam Prep
4 pages
DS Retest
No ratings yet
DS Retest
18 pages
Basic Exercise MongoDB
No ratings yet
Basic Exercise MongoDB
8 pages
MongoDB DDL Commands and Functions Guide
No ratings yet
MongoDB DDL Commands and Functions Guide
7 pages
MongoDB Basics and CRUD Operations
No ratings yet
MongoDB Basics and CRUD Operations
54 pages
MongoDB Data Modeling Guide
No ratings yet
MongoDB Data Modeling Guide
18 pages
MongoDB Overview and Key Operations
No ratings yet
MongoDB Overview and Key Operations
30 pages
Introduction To Mongodb
No ratings yet
Introduction To Mongodb
50 pages
Unit2 MongoDB Practical
No ratings yet
Unit2 MongoDB Practical
148 pages
WK 2 4 MongoDB Aggregation
No ratings yet
WK 2 4 MongoDB Aggregation
4 pages
Science BSC Information Technology Semester 5 2019 November Next Generation Technologies Cbcs
No ratings yet
Science BSC Information Technology Semester 5 2019 November Next Generation Technologies Cbcs
21 pages
NoSQL Databases Course Overview
No ratings yet
NoSQL Databases Course Overview
4 pages
Module-III MangoDB
No ratings yet
Module-III MangoDB
50 pages
Big Data Analytics Lab Certificate
No ratings yet
Big Data Analytics Lab Certificate
22 pages
MongoDB Full Practice Assignment
No ratings yet
MongoDB Full Practice Assignment
9 pages
Big Data Analytics Lab Manual 2024-25
No ratings yet
Big Data Analytics Lab Manual 2024-25
43 pages
MongoDB Shard Balancing Issues
No ratings yet
MongoDB Shard Balancing Issues
16 pages
MongoDB Ex
No ratings yet
MongoDB Ex
3 pages
Big Data
No ratings yet
Big Data
26 pages
MongoDB NoSQL Assignment Guide
No ratings yet
MongoDB NoSQL Assignment Guide
2 pages
MongoDB Aggregation Commands Guide
No ratings yet
MongoDB Aggregation Commands Guide
11 pages
Research Ethics Approval for BioID Project
No ratings yet
Research Ethics Approval for BioID Project
5 pages
Fingerprint-Based Patient Record Access
No ratings yet
Fingerprint-Based Patient Record Access
1 page
Professionalism in Bio-Id Health Portal
No ratings yet
Professionalism in Bio-Id Health Portal
12 pages
Complex System Report Ujwal Bhattarai
No ratings yet
Complex System Report Ujwal Bhattarai
19 pages
OFI-GST Tax Deduction at Source Data Model Document 1
No ratings yet
OFI-GST Tax Deduction at Source Data Model Document 1
24 pages
State Machine Diagram
No ratings yet
State Machine Diagram
10 pages
ECES v2: Master Encryption & Cryptography
No ratings yet
ECES v2: Master Encryption & Cryptography
9 pages
Understanding Android Activity States and Framework
No ratings yet
Understanding Android Activity States and Framework
2 pages
Your Go-To Guide To Myki: Your Ticket For Trains, Trams and Buses in Melbourne
No ratings yet
Your Go-To Guide To Myki: Your Ticket For Trains, Trams and Buses in Melbourne
5 pages
Sysmex KX-21 Instrument Setup Guide
No ratings yet
Sysmex KX-21 Instrument Setup Guide
28 pages
Haji Afrid Resume
No ratings yet
Haji Afrid Resume
2 pages
Ch. 2 Internet of Things
No ratings yet
Ch. 2 Internet of Things
9 pages
Portnox™ Core POC Scope: For (Prospect)
No ratings yet
Portnox™ Core POC Scope: For (Prospect)
9 pages
Fortinet Product Discontinuation Notice
No ratings yet
Fortinet Product Discontinuation Notice
1 page
Soc Queries - Splunk
No ratings yet
Soc Queries - Splunk
2 pages
Oracle Database Performance Setting Parameters
No ratings yet
Oracle Database Performance Setting Parameters
2 pages
CS 101: Introduction to Computing Overview
No ratings yet
CS 101: Introduction to Computing Overview
60 pages
Importance of Human-Computer Interaction
No ratings yet
Importance of Human-Computer Interaction
24 pages
Pds Assess 2 Ans Key
No ratings yet
Pds Assess 2 Ans Key
22 pages
1D0-1003-25-D Exam Dumps 2026
No ratings yet
1D0-1003-25-D Exam Dumps 2026
7 pages
HCIA-WLAN V3.0 Training Material-3
No ratings yet
HCIA-WLAN V3.0 Training Material-3
100 pages
Data Structure Operations Explained
No ratings yet
Data Structure Operations Explained
6 pages
Smoke Detector Using MQ-2 Sensor Report
No ratings yet
Smoke Detector Using MQ-2 Sensor Report
22 pages
Password Protection Using PIC Microcontroller - The Engineering Projects
No ratings yet
Password Protection Using PIC Microcontroller - The Engineering Projects
10 pages
SQL Queries for Student Database Tasks
No ratings yet
SQL Queries for Student Database Tasks
4 pages
School Memorandum GPTA Election
No ratings yet
School Memorandum GPTA Election
32 pages
ETC Gadget - Programming - Guide
No ratings yet
ETC Gadget - Programming - Guide
24 pages
Heuristic Search with SimpleAI in Python
No ratings yet
Heuristic Search with SimpleAI in Python
8 pages
Detailed Lesson Plan - Computer System Servicing
100% (2)
Detailed Lesson Plan - Computer System Servicing
4 pages
Chinovo GMBH Introduction General English
No ratings yet
Chinovo GMBH Introduction General English
30 pages
Digital Marketing Guide for Photographers
No ratings yet
Digital Marketing Guide for Photographers
56 pages
Maintenance Order Guide
No ratings yet
Maintenance Order Guide
19 pages
SKF IMx1 - Manual
No ratings yet
SKF IMx1 - Manual
117 pages
Unit 3-MAD
No ratings yet
Unit 3-MAD
9 pages

MongoDB Analysis of Kathmandu Post Data

Uploaded by

MongoDB Analysis of Kathmandu Post Data

Uploaded by

6CS030 Big Data

1. There are three JSON exports from Twitter.

2. Examine your dataset and carry out the following tasks:

a Import the data into your own MongoDB database:

Command: use assignment_2

- Write a command to show how many documents are in your

b Analyze the data

- Shows a set of documents based on some criteria. Output just

Command : [Link]({ favorite_count: { $gt: 0 } }, { text: 1, id: 1 })

Command:[Link](%7B text: %7B $regex: /news/i %7D %7D)

Command: [Link]({ time_zone: null })

- Create a new collection based on a subset of the dataset.

For this an aggregation operation is performed, it first matches the documents

Commands: [Link]([{ $match: { time_zone:

Now viewing the details stored in the “newCollection” collection.

This approach enables swift processing and manipulation of data, making it a

You might also like