Professional Documents
Culture Documents
DataInformation Updated
DataInformation Updated
csv
3. Number of Variables: 61
4. Note: several of the variables may be correlated, thus you should consider multicollinearity.
5. Variable Information:
0. url: URL of the article
1. timedelta: Days between the article publication and
the dataset acquisition
2. n_tokens_title: Number of words in the title
3. n_tokens_content: Number of words in the content
4. n_unique_tokens: Rate of unique words in the content
5. n_non_stop_words: Rate of non-stop words in the content
6. n_non_stop_unique_tokens: Rate of unique non-stop words in the
content
7. num_hrefs: Number of links
8. num_self_hrefs: Number of links to other articles
published by Mashable
9. num_imgs: Number of images
10. num_videos: Number of videos
11. average_token_length: Average length of the words in the
content
12. num_keywords: Number of keywords in the metadata
13. data_channel_is_lifestyle: Is data channel 'Lifestyle'?
14. data_channel_is_entertainment: Is data channel 'Entertainment'?
15. data_channel_is_bus: Is data channel 'Business'?
16. data_channel_is_socmed: Is data channel 'Social Media'?
17. data_channel_is_tech: Is data channel 'Tech'?
18. data_channel_is_world: Is data channel 'World'?
19. kw_min_min: Worst keyword (min. shares)
20. kw_max_min: Worst keyword (max. shares)
21. kw_avg_min: Worst keyword (avg. shares)
22. kw_min_max: Best keyword (min. shares)
23. kw_max_max: Best keyword (max. shares)
24. kw_avg_max: Best keyword (avg. shares)
25. kw_min_avg: Avg. keyword (min. shares)
26. kw_max_avg: Avg. keyword (max. shares)
27. kw_avg_avg: Avg. keyword (avg. shares)
28. self_reference_min_shares: Min. shares of referenced articles in
Mashable
29. self_reference_max_shares: Max. shares of referenced articles in
Mashable
30. self_reference_avg_sharess: Avg. shares of referenced articles in
Mashable
31. published_day: Which day was the article published?
32. is_weekend: Was the article published on the weekend?
33. LDA_00: Closeness to LDA topic 0
34. LDA_01: Closeness to LDA topic 1
35. LDA_02: Closeness to LDA topic 2
36. LDA_03: Closeness to LDA topic 3
37. LDA_04: Closeness to LDA topic 4
38. global_subjectivity: Text subjectivity
39. global_sentiment_polarity: Text sentiment polarity
40. global_rate_positive_words: Rate of positive words in the content
41. global_rate_negative_words: Rate of negative words in the content
42. rate_positive_words: Rate of positive words among non-neutral
tokens
43. rate_negative_words: Rate of negative words among non-neutral
tokens
44. avg_positive_polarity: Avg. polarity of positive words
45. min_positive_polarity: Min. polarity of positive words
46. max_positive_polarity: Max. polarity of positive words
47 avg_negative_polarity: Avg. polarity of negative words
48. min_negative_polarity: Min. polarity of negative words
49. max_negative_polarity: Max. polarity of negative words
50. title_subjectivity: Title subjectivity
51. title_sentiment_polarity: Title polarity
52. abs_title_subjectivity: Absolute subjectivity level
53. abs_title_sentiment_polarity: Absolute polarity level
54. shares: Number of shares (target)
7. Citation Request: Please include this citation if you plan to use this database:
1. Descriptions: The data are related to red Vinho Verde wine samples, from the north of
Portugal. The goal is to predict wine quality.
4. Note: several of the variables may be correlated, thus you should consider multicollinearity.
5. Variable information:
1 - fixed acidity
2 - volatile acidity
3 - citric acid
4 - residual sugar
5 - chlorides
6 - free sulfur dioxide
7 - total sulfur dioxide
8 - density
9 - pH
10 - sulphates
11 - alcohol
12 - quality (score between 0 and 10)
winequalityWhite.csv
1. Descriptions: The data are related to white Vinho Verde wine samples, from the north of
Portugal. The goal is to predict wine quality.
4. Note: several of the variables may be correlated, thus you should consider multicollinearity.
5. Variable information:
1 - fixed acidity
2 - volatile acidity
3 - citric acid
4 - residual sugar
5 - chlorides
6 - free sulfur dioxide
7 - total sulfur dioxide
8 - density
9 - pH
10 - sulphates
11 - alcohol
12 - quality (score between 0 and 10)
Facebook.csv
1. Descriptions: This dataset is related to predicting the performance metrics of posts published
in brand’s Facebook pages. Multiple performance metrics are in the dataset.
2. Variable information:
BikeSharing_Day.csv
1. Descriptions: Bike sharing systems are new generation of traditional bike rentals where
whole process from membership, rental and return back has become automatic. Through these
systems, user is able to easily rent a bike from a particular position and return back at another
position. Currently, there are about over 500 bike-sharing programs around the world which is
composed of over 500 thousands bicycles. Today, there is great interest in understanding the
use of these systems due to their growth, as well as their role in traffic, environmental, and
health issues.
Due to the individualized and electronic nature of bike sharing systems, detailed information is
recorded, including the duration of travel, departure and arrival position is explicitly recorded in
these types of systems. The dataset is related to the two-year historical log corresponding to
years 2011 and 2012 from Capital Bikeshare system, Washington D.C.
2. Variable information:
1. Descriptions: Bike sharing systems are new generation of traditional bike rentals where
whole process from membership, rental and return back has become automatic. Through these
systems, user is able to easily rent a bike from a particular position and return back at another
position. Currently, there are about over 500 bike-sharing programs around the world which is
composed of over 500 thousands bicycles. Today, there is great interest in understanding the
use of these systems due to their growth, as well as their role in traffic, environmental, and
health issues.
Due to the individualized and electronic nature of bike sharing systems, detailed information is
recorded, including the duration of travel, departure and arrival position is explicitly recorded in
these types of systems. The dataset is related to the two-year historical log corresponding to
years 2011 and 2012 from Capital Bikeshare system, Washington D.C.
2. Variable information:
1. Descriptions: GoodBelly is trying to boost its sales at grocery stores like Whole Foods Market.
As a small start-up, GoodBelly must optimize the allocation of its limited marketing budget. It
currently promotes through in-person demonstrations in stores, but management is concerned
that these demonstrations are not effective enough to justify the cost. The main task is to
determine whether or not the company should continue its promotional programs.
2. Variable information:
1 – Date: Date.
2 – Region: Region.
3 - UnitsSold: The number of units sold per store per week.
4 - AverageRetailPrice: The average retail price for GoodBelly products per store per week
5 – SalesRep: 1 if the store had a regional sales rep (face-to-face contact) and 0 if the store
had only the national sales rep (no face-to-face contact).
6 – Endcap: 1 if a store participated in an endcap promotion.
7 – Demo: 1 if the store had a demo on the corresponding.
8 – Demo1_3: 1 if the store had a demo 1-3 weeks ago.
9 – Demo4_5: 1 if the store had a demo 4-5 weeks ago.
10 – Natural: The number of other natural retailers within 5 miles of each store.
11 – Fitness: The number of fitness centers within 5 miles of each store.
FoodTruck.csv
1. Descriptions: In 2014, the owner of a food truck based in Hamilton, Ontario, was looking over
the first year of her operations. In addition to working in Hamilton, she had tried to maximize
her revenues by driving to several other cities and charging various prices for each burger,
depending partly on the fresh ingredients available in each city. Besides location, the owner had
collected data on a few other factors-the weather, the day of the week, the city's population,
and whether a festival was going on-that had had an impact on the demand for her product.
She wondered whether analytics could help her decide where to sell and how much to charge
on a daily basis.
2. Variable information:
1 Date: Date
2 QuantitySold
3 City: Hamilton, Toronto, London, Waterloo
4 Precipitation: The precipitation probability
5 Temperature: in Celsius
6 Festival: 1 if there is a festival on that day and 0 otherwise.
7 Price: in dollars
8 Weekday: 1 if the day is a weekday and 0 otherwise.