Professional Documents
Culture Documents
Data Scraping From Various Data Resources
Data Scraping From Various Data Resources
INTRODUCTION 'Big Data' is also a data but with a huge size. 'Big Data'
is a term used to describe collection of data that is huge in
Data scraping is a process of extracting data from the size and yet growing exponentially with time. In short,
internet through various methods. The internet is a such a data is so large and complex that none of the
universally accessible resource for millions of people. As traditional data management tools are able to store it or
the usage of internet has commonly increased in process it efficiently.
everywhere there is highly growth in competition between
the organizations in their business. Data scraping helps in Examples of 'Big Data'
automation of Data through the use in various
techniques. The usage of Data scraping have been in Following are some the examples of 'Big Data'-
many fields and beneficial for weather data monitoring,
website change detection, research, web data integration,
The New York Stock Exchange
contact scraping and online price comparison.. In this
generates about one terabyte of new trade data
project we will go through the tools and techniques used
per day.
in scraping and its impact on the social networks.
Statistic shows that 500+terabytes of new data Characteristics Of 'Big Data':
gets ingested into the databases of social media
site Facebook, every day. This data is mainly (i) Volume – One of the characteristics to be considered
generated in terms of photo and video uploads, while dealing with Big Data is Volume. Big Data means a
message exchanges, putting comments etc. large volume of data. Size of data plays very crucial role
Single Jet engine can generate 10+terabytes of in determining value out of data.
data in 30 minutes of a flight time. With many
thousand flights per day, generation of data (ii) Variety – The second characteristic of Big data is Variety.
reaches up to many Petabytes. Variety refers to the type of data i.e. Structured, semi
structured and Unstructured. Structured data refers to
Categories Of 'Big Data': data in the form of text. Semi Structured data is the data
in the form of XML sheets etc. Unstructured data is the
Big data could be found in three forms: data in the form of videos, audio etc.
import
java.io.IOException; REFERENCES
[1] https://www.ibm.com/analytics/hadoop/big-
import org.jsoup.Jsoup;
data-analytics
import [2]
http://www.pearsonitcertification.com/articles/article.as
org.jsoup.nodes.Document;
px?p=2427073&seqNum=2
} [7] https://digitalcommons.georgiasouthern.e
du/c
}
gi/viewcontent.cgi?article=1227&context=hono
Hive engine r s-theses
VI. CONCLUSION
ACKNOWLEDGMENT
SQL- Standard Query Language.