Welcome to Scribd!

Lecture 1:set Up Jupyter, Import Data From Web and Select Cases

Uploaded by

0% found this document useful (0 votes)

39 views16 pages

This document provides instructions for setting up a Jupyter notebook to analyze Twitter data. It discusses importing necessary packages, reading data from a CSV file and the web, exploring and selecting columns of a DataFrame, removing unneeded columns, counting unique Twitter accounts in the data, removing tweets from a specific account, and using basic functions like count, max, min to analyze the DataFrame. It also provides an assignment involving finding the max and average retweets, creating a figure from tweet dates, and reading and selecting financial data for AAPL stock.

Original Description:

big data

Original Title

Lecture BD 01

Copyright

Available Formats

PPT, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PPT, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

39 views16 pages

Lecture 1:set Up Jupyter, Import Data From Web and Select Cases

Uploaded by

Ahmed Elmi

Copyright:

Available Formats

Download as PPT, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 16

Search inside document

Lecture 1:Set up Jupyter, Import

Data from web and Select Cases

Installing packages

• Conda install pandas-datareader

Reading data from web
• import pandas as pd
• import pandas_datareader.data as web
• df = web.DataReader('AAPL',
data_source='yahoo',
• start='1/1/2010', end='3/21/2017')
• df.to_csv('AAPL.csv')
• df.tail()
Reading data
• import pandas as pd
• df =pd.read_csv(‘d:/CSR_user_timeline_2013.csv')
• print (len(df))
• df.head(2)
List all the columns in the
DataFrame
• df.columns
• len(df.columns)
• Data types
• df.dtypes
Remove Unneeded Columns
• df = df.drop('created_at_text',1)
• df = df.drop('tweet_id',1)
• df = df.drop('withheld_in_countries',1)
• df = df.drop('withheld_scope',1)
• df = df.drop('truncated',1)
• df = df.drop('possibly_sensitive',1)
• len(df.columns)
• df.head(2)
• If you have only a few columns to delete you can use
the drop command as shown above. On the other hand,
if you only want to keep a few columns, you can create
a new version of the dataframe with only those
columns you like. Note that the double square brackets
-- "[[...]]" -- in PANDAS forms a dataframe
representation. In the following example, I am creating
a new dataframe with only three variables. You can see
that this new dataframe has the same number of
tweets but fewer columns (variables).
Creating new data fram
• df2 = df[['created_at',
'from_user_screen_name', 'retweet_count']]
print len(df2)
• df2.head(2)
• len(df.columns)
View Twitter Accounts Represented in DF
• We can use the unique function to find
how many unique Twitter accounts are
represented in the dataset. First, I'll show
you what unique function does -- it creates
an array of all the screen_names of the
Twitter accounts.
• pd.unique(df.from_user_screen_name.ravel())
• len(pd.unique(df.from_user_screen_name.ravel()))
•
• Remove Tweets from One Specific Account
• We want to get rid of all tweets by TICalculators from the
dataframe. Unlike the other 41 Twitter accounts in the
dataset, this account is not a CSR-related account. First,
we can use the len function combined with a dataframe
query to count the number of tweets that are not sent by
TICalculators
• len(df[df['from_user_screen_name'] != 'TICalculators'])
• We should then also check how many tweets are sent by
TICalculators: 1,767
• len(df[df['from_user_screen_name'] == 'TICalculators'])
•
• We can use Python to do "math." Let's use this to show
whether the two numbers returned in the above steps
add up to the total number of tweets in our dataframe.
• 1767 + 32330
We can also do this another way
• (1767 + 32330) - len(df)
Or even
• df = df[df['from_user_screen_name'] != 'TICalculators']
• print len(df)
• df.head(2)
• Now let's check again for all the unique accounts in
the dataframe -- as you can see, TICalculators is
gone and there are now 41 accounts.
• pd.unique(df.from_user_screen_name.ravel())

• len(pd.unique(df.from_user_screen_name.ravel()))
• Transferring data into figure
• df['retweet_count'].plot(legend=True,
figsize=(12, 8), title='user_screen',
label='twitter')
• Count function
• df.count()

• Max function
• df.max()

• Min function
• df.min()
ASSIGNMENT 1

1.Prompt maximum number of

retweet_count.
2. Prompt average number of retweet_count
3. Create figure from column of created_at.
4. Display unique numbers in created_at.
5. Read data about AAPL finance history
from 2012 till 2018.
6. Create data frame for rowid, query,
tweet_id_str.

Lecture 2:analyze Twitter Data at The Account (Organization) Level
Document11 pages
Lecture 2:analyze Twitter Data at The Account (Organization) Level
Ahmed Elmi
No ratings yet
Lecture 3:analyze Twitter Data by Time Period
Document13 pages
Lecture 3:analyze Twitter Data by Time Period
Ahmed Elmi
No ratings yet
Data Exploration Preparation
Document12 pages
Data Exploration Preparation
hamidsithole65
No ratings yet
Data Wrangling With Python and Pandas
Document7 pages
Data Wrangling With Python and Pandas
Carlos Andrés Pérez
No ratings yet
Financial Analytics With Python
Document40 pages
Financial Analytics With Python
Harshit Singh
100% (1)
Pip3 Install Jupyter
Document16 pages
Pip3 Install Jupyter
blah bl
No ratings yet
Practical File Python
Document25 pages
Practical File Python
kaizenpro01
No ratings yet
Python (Pandas)
Document16 pages
Python (Pandas)
Josh Wang
No ratings yet
Stock Management by Parth Kataria
Document32 pages
Stock Management by Parth Kataria
parthkataria501
No ratings yet
Python For DS Cheat Sheet
Document6 pages
Python For DS Cheat Sheet
Sebastián Emdef
100% (2)
Pandas Tutorial
Document21 pages
Pandas Tutorial
KEVIN KUMAR
No ratings yet
Data Mining Assignment No. 1
Document22 pages
Data Mining Assignment No. 1
NIRAV SHAH
No ratings yet
Data Preprocessing Python Tome III
Document12 pages
Data Preprocessing Python Tome III
Elisée TEGUE
No ratings yet
Lecture Material 3
Document7 pages
Lecture Material 3
2021me372
No ratings yet
024 Price and Everything PDF
Document12 pages
024 Price and Everything PDF
Tman Letswalo
No ratings yet
Time Series Basics
Document21 pages
Time Series Basics
Chopra Shubham
100% (1)
CHP 8 Pandas
Document49 pages
CHP 8 Pandas
Heshalini Raja Gopal
No ratings yet
Pandas
Document16 pages
Pandas
lalkrishna123
No ratings yet
Intro To Pandas
Document7 pages
Intro To Pandas
The path to Allah
No ratings yet
DF PD - Read - Excel ('Sample - Superstore - XLS') : Anjaliassignmnet - Ipy NB
Document23 pages
DF PD - Read - Excel ('Sample - Superstore - XLS') : Anjaliassignmnet - Ipy NB
sumaira khan
No ratings yet
Ai Workflow Data Preparation With Numpy and Pandas: MR Hew Ka Kian Hew - Ka - Kian@Rp - Edu.Sg
Document26 pages
Ai Workflow Data Preparation With Numpy and Pandas: MR Hew Ka Kian Hew - Ka - Kian@Rp - Edu.Sg
Ng Kai Ting
No ratings yet
2 Python Data Processing
Document66 pages
2 Python Data Processing
Shaifali Jain
100% (2)
LSTM Stock Prediction
Document38 pages
LSTM Stock Prediction
Ketan Ingale
100% (1)
Pandas 1705297450
Document21 pages
Pandas 1705297450
kkkvj9d8zw
No ratings yet
Lab 3 - Working With Data Frames
Document10 pages
Lab 3 - Working With Data Frames
PATTABHI RAMANJANEYULU
No ratings yet
Exp2 - Data Visualization and Cleaning and Feature Selection
Document13 pages
Exp2 - Data Visualization and Cleaning and Feature Selection
mnbatrawi
No ratings yet
Panda Cheatsheet
Document17 pages
Panda Cheatsheet
Adevair Junior
No ratings yet
1 Pandas Basics
Document13 pages
1 Pandas Basics
Biku
No ratings yet
Python Data Science 101
Document41 pages
Python Data Science 101
consania
100% (1)
3 Steps To Time Series Forecasting LSTM With TensorFlow KerasA Practical Example in Python With Usefu
Document15 pages
3 Steps To Time Series Forecasting LSTM With TensorFlow KerasA Practical Example in Python With Usefu
Juanito Alimaña
No ratings yet
EmployeeMgmt XII IP ProjectReprot 2022 23
Document16 pages
EmployeeMgmt XII IP ProjectReprot 2022 23
ushavalsa
No ratings yet
ITETL0001 Toolkit - Part 1: ETL Software Development Process and ETL Toolkit
Document23 pages
ITETL0001 Toolkit - Part 1: ETL Software Development Process and ETL Toolkit
piaguha
No ratings yet
Time-series-Forecasting Time Series Forecasting Jupyter Code - Ipynb at Main Chetandudhane Time-series-Forecasting GitHub
Document162 pages
Time-series-Forecasting Time Series Forecasting Jupyter Code - Ipynb at Main Chetandudhane Time-series-Forecasting GitHub
Arun Kumar
100% (1)
ML Lab Manual Final
Document36 pages
ML Lab Manual Final
trinadhrao30112003
No ratings yet
Modifiedip
Document27 pages
Modifiedip
sayantuf17
No ratings yet
DA0101EN-Review-Introduction - Jupyter Notebook
Document8 pages
DA0101EN-Review-Introduction - Jupyter Notebook
Sohail Doulah
No ratings yet
Demo
Document25 pages
Demo
harbirsingh.astics
No ratings yet
11 Create Netcdf Python
Document22 pages
11 Create Netcdf Python
Ali Jalaly
No ratings yet
Python Data Wrangling Tutorial: Pandas Cheatsheet
Document1 page
Python Data Wrangling Tutorial: Pandas Cheatsheet
Rajas
No ratings yet
Ai - Phase 3
Document9 pages
Ai - Phase 3
Manikandan N
No ratings yet
Reference Guide - Pandas Tools For Structuring A Dataset
Document5 pages
Reference Guide - Pandas Tools For Structuring A Dataset
khederissa385
No ratings yet
06 Seaborn
Document13 pages
06 Seaborn
Anonymous 001
No ratings yet
SQL Server
Document130 pages
SQL Server
kumarmadhavan
No ratings yet
Lecture 4: Analyzing Hashtags
Document18 pages
Lecture 4: Analyzing Hashtags
Ahmed Elmi
No ratings yet
Lstm-Load-Forecasting:6 - All - Features - Ipynb at Master Dafrie:lstm-Load-Forecasting GitHub
Document5 pages
Lstm-Load-Forecasting:6 - All - Features - Ipynb at Master Dafrie:lstm-Load-Forecasting GitHub
Muhammad Hamdani Azmi
No ratings yet
Pandas
Document82 pages
Pandas
aacharyadhruv1310
No ratings yet
Twissandra
Document91 pages
Twissandra
PhoenixPyDev
No ratings yet
Python Lab
Document8 pages
Python Lab
duc anh
No ratings yet
Extract Transform Load
Document80 pages
Extract Transform Load
Ly Nguyễn Hương
No ratings yet
Intro To Py and ML - Part 2
Document10 pages
Intro To Py and ML - Part 2
KAORU Amane
No ratings yet
Python - Scientific Functions
Document24 pages
Python - Scientific Functions
anis hannani
No ratings yet
SBLC 1
Document23 pages
SBLC 1
Raj
No ratings yet
MOD-3 Dap
Document41 pages
MOD-3 Dap
Varshitha Kn
No ratings yet
DP-203T00 Microsoft Azure Data Engineering-03
Document21 pages
DP-203T00 Microsoft Azure Data Engineering-03
Javier Madrigal
No ratings yet
ML0101EN Clus K Means Customer Seg Py v1
Document8 pages
ML0101EN Clus K Means Customer Seg Py v1
Rajat Solanki
100% (1)
Unit6 - Working With Data
Document29 pages
Unit6 - Working With Data
vvloggingzone05
No ratings yet
StudentMgmStystme ProjectFinal
Document23 pages
StudentMgmStystme ProjectFinal
ushavalsa
100% (1)
Lab1 1eggeggse
Document4 pages
Lab1 1eggeggse
huntersike
No ratings yet
Data Science Programming In Python
From Everand
Data Science Programming In Python
Anita Raichand
No ratings yet
Some Tutorials in Computer Networking Hacking
From Everand
Some Tutorials in Computer Networking Hacking
Dr. Hidaia Mahmood Alassouli
No ratings yet
Computerized Hotel Reservation System Capstone Project Document
Document7 pages
Computerized Hotel Reservation System Capstone Project Document
Mang Kario Masangkay
No ratings yet
Pega719 Install Tomcat MSSQL 2
Document50 pages
Pega719 Install Tomcat MSSQL 2
Sturm Wolfe
No ratings yet
Do's and Don'ts: SAP TM Development
Document30 pages
Do's and Don'ts: SAP TM Development
harshal77
100% (2)
CICS Interview Questions: Customer Information Control System (CICS)
Document5 pages
CICS Interview Questions: Customer Information Control System (CICS)
Sivarajan G
No ratings yet
7 Oracle Trobulshoting Questions Answers 1
Document7 pages
7 Oracle Trobulshoting Questions Answers 1
ddd 49it1
No ratings yet
Database Management
Document14 pages
Database Management
Obito Uchicha
No ratings yet
Abend Aid User Ref
Document442 pages
Abend Aid User Ref
Yohana Acosta
No ratings yet
Modul Mobile Apps - Pdf10-Dikonversi
Document27 pages
Modul Mobile Apps - Pdf10-Dikonversi
Radiatul Aisy Fanny
No ratings yet
Design/Implement/Create SCD Type 2 Effective Date Mapping in Informatica
Document5 pages
Design/Implement/Create SCD Type 2 Effective Date Mapping in Informatica
rameshsamarla
No ratings yet
Migrating Applications From VSE To OS390
Document14 pages
Migrating Applications From VSE To OS390
gborja8881331
No ratings yet
Mining Association Rule With WEKA Explorer: Lab Exercise Two
Document4 pages
Mining Association Rule With WEKA Explorer: Lab Exercise Two
Mohamed Med
No ratings yet
Difference Between Degree and Cardinality
Document3 pages
Difference Between Degree and Cardinality
Water Flow
75% (4)
Oracle Database 19c - New Features For Administrators
Document8 pages
Oracle Database 19c - New Features For Administrators
vineet
No ratings yet
Finger Vein Recognition
Document20 pages
Finger Vein Recognition
karan takhtani
No ratings yet
Larry Fast Manuf Exc Maintenance PDF
Document15 pages
Larry Fast Manuf Exc Maintenance PDF
Erdin Ahaddin
No ratings yet
NICE ONE - SQL Optimization
Document60 pages
NICE ONE - SQL Optimization
ssray23
No ratings yet
Oscm 2014
Document13 pages
Oscm 2014
goutam1235
No ratings yet
SD-MCQ-TF All 12
Document12 pages
SD-MCQ-TF All 12
Mira Makram
No ratings yet
SQL Workbench Manual
Document158 pages
SQL Workbench Manual
Vladislav Zaimov
No ratings yet
IFN554 Week3 Tutorial With Solutions v2-1
Document30 pages
IFN554 Week3 Tutorial With Solutions v2-1
kitkataus0711
No ratings yet
R12.0.6+: Oracle Receivables (Ar) : Autoinvoice Setup Setup Diagnostic Test
Document4 pages
R12.0.6+: Oracle Receivables (Ar) : Autoinvoice Setup Setup Diagnostic Test
shankar p
No ratings yet
Subanesh P R: Mobius Knowledge Services: Executive Web Researcher From May 2011 - May 2012 at
Document3 pages
Subanesh P R: Mobius Knowledge Services: Executive Web Researcher From May 2011 - May 2012 at
mounica srinivasan
No ratings yet
Abap Sample - 02
Document54 pages
Abap Sample - 02
Gayathri S
No ratings yet
MCA-12 Web Designing
Document204 pages
MCA-12 Web Designing
Litta
No ratings yet
Adaptation Database
Document6 pages
Adaptation Database
Med Zakaria Hamid
No ratings yet
Week 1 Merged
Document18 pages
Week 1 Merged
SRI HARSHA SARAGADAM
No ratings yet
Detail View - Function Based Views Django - GeeksforGeeks
Document8 pages
Detail View - Function Based Views Django - GeeksforGeeks
Trung Luong
No ratings yet
FMD-3200/FMD-3200-BB/FMD-3300: Operator's Guide Model
Document12 pages
FMD-3200/FMD-3200-BB/FMD-3300: Operator's Guide Model
ahmedshirazar
No ratings yet
Database/Cd-Rom Search Services Unit: WWW - Teeal.abu - Edu.ng
Document3 pages
Database/Cd-Rom Search Services Unit: WWW - Teeal.abu - Edu.ng
Adah Ene
100% (3)
Attendance Management System
Document18 pages
Attendance Management System
Punit Jain
100% (1)