How-45-Successful-Companies-Used-Big-Data-Analytics-To-Deliver-Extraordinary-Results-Wiley-2016 101-108

You might also like

You are on page 1of 8

14

LINKEDIN
How Big Data Is Used To Fuel Social
Media Success

Background
LinkedIn is the world’s largest online professional network, with more
than 410 million members in over 200 countries. LinkedIn connects
professionals by enabling them to build a network of their connec-
tions and the connections of their connections. The site was launched
by Reid Hoffman in 2003, making it one of the oldest social media
networks in the world.

What Problem Is Big Data Helping To Solve?


Competition among social networks is fiercer than ever and what’s
hot one year may not be the next. LinkedIn need to ensure their site
remains an essential tool for busy professionals, helping them become
more productive and successful, whether they’re using the premium
(paid-for) service or the free service. As such, Big Data is at the very
heart of LinkedIn’s operations and decision making, helping them
provide the best possible service for the site’s millions of members.

How Is Big Data Used In Practice?


LinkedIn track every move users make on the site: every click, every
page view, every interaction. With 410 million members, that’s an

87
BIG DATA IN PRACTICE

awful lot of events to process each day. Data scientists and researchers
at LinkedIn analyse this mountain of data in order to aid decision
making, and design data-powered products and features. I could fill a
whole book on the many ways LinkedIn use Big Data, but here I just
want to look at a few key examples.

Much like other social media networks, LinkedIn use data to make
suggestions for their users, such as “people you may know”. These
suggestions are based on a number of factors, for example if you
click on someone’s profile (in which case, it’s reasonable to assume
you may know them, or someone else by that name), if you worked
at the same company during the same period or if you share some
connections. Also, because users can upload their email contacts,
LinkedIn use this information to make suggestions – not only for the
people you may know on the site but also for people your contacts
may know when they join the site. LinkedIn can also pull data about
users from other sites, such as Twitter, to make suggestions about
people you may know.

LinkedIn use machine-learning techniques to refine their algorithms


and make better suggestions for users. Say, for example, LinkedIn reg-
ularly gave you suggestions for people you may know who work at
Company A (which you worked at eight years ago) and Company B
(which you worked at two years ago). If you almost never click on
the profiles of people from Company A but regularly check out the
suggestions from Company B, LinkedIn will prioritize Company B in
their suggestions going forward. This personalized approach enables
users to build the networks that work best for them.

One of the features that set LinkedIn apart from other social media
platforms like Facebook is the way it lets you see who has viewed your
profile. And this feature recently got a lot more detailed: while you
used to be able to see how many had viewed your profile and who
the most recent viewers were, now you can also see what regions and

88
LINKEDIN

industries those viewers are from, what companies they work for and
what keywords (if any) brought them to your profile. These insights,
made possible by Big Data, help users increase their effectiveness on
the site.

LinkedIn use stream-processing technology to ensure the most up-


to-date information is displayed when users are on the site – from
information on who’s joined the site and who got a new job to use-
ful articles that contacts have liked or shared. In a nutshell, the site
is constantly gathering and displaying new data for users. Not only
does this constant streaming of data make the site more interesting
for users, it also speeds up the analytic process. Traditionally, a com-
pany would capture data and store it in a database or data warehouse
to be analysed at a later time. But, with real-time stream-processing
technology, LinkedIn have the potential to stream data direct from
the source (such as user activity) and analyse it on the fly.

Finally, let’s not forget that LinkedIn need to pull in the revenue,
and they do this through recruitment services, paid membership and
advertising. Big Data has a role to play in increasing revenue as well as
improving the user experience. For example, in advertising – which
accounts for 20–25% of LinkedIn’s annual revenue – analysts work
with LinkedIn’s sales force to understand why members click on cer-
tain ads and not others. These insights are then fed back to advertisers
in order to make their ads more effective.

What Were The Results?


LinkedIn’s success metrics include revenue and number of mem-
bers, both of which continue to rise year on year. LinkedIn gained
40 million new members in the first half of 2015 and, at the time of
writing, the company’s most recent quarterly revenue stood at over
$700 million (up from around $640 in the previous quarter). There’s

89
BIG DATA IN PRACTICE

no doubt that Big Data plays a large role in the company’s continued
success.

What Data Was Used?


LinkedIn track every move their users make on the site, from every-
thing liked and shared to every job clicked on and every contact
messaged. The company serve tens of thousands of Web pages every
second of every day. All those requests involve fetching data from
LinkedIn’s backend systems, which in turn handle millions of queries
per second. With permission, LinkedIn also gather data on users’
email contacts.

What Are The Technical Details?


Hadoop form the core of LinkedIn’s Big Data infrastructure, and
are used for both ad hoc and batch queries. The company have
a big investment in Hadoop, with thousands of machines running
map/reduce jobs. Other key parts of the LinkedIn Big Data jigsaw
include Oracle, Pig, Hive, Kafka, Java and MySQL. Multiple data cen-
tres are incredibly important to LinkedIn, in order to ensure high
availability and avoid a single point of failure. Today, LinkedIn run
out of three main data centres.

LinkedIn have also developed their own open-source tools for Big
Data access and analytics. Kafka started life this way, and other devel-
opments include Voldemort and Espresso (for data storage) and Pinot
(for analytics). Open-source technology like this is important to
LinkedIn because they feel it creates better code (and a better prod-
uct) in the long run.

In addition, the company have an impressive team of in-house data


scientists – around 150 at current estimates. Not only do the team

90
LINKEDIN

work to improve LinkedIn products and solve problems for mem-


bers, they also publish at major conferences and contribute to the
open-source community. In fact, the team are encouraged to actively
pursue research in a number of areas, including computational
advertising, machine learning and infrastructure, text mining and
sentiment analysis, security and SPAM.

Any Challenges That Had To Be Overcome?


When you think that LinkedIn started with just 2700 members
in their first week, massive data growth is one obvious challenge
LinkedIn continually have to overcome – the company now have to
be able to handle and understand enormous amounts of data every
day. The solution to this is in investing in highly scalable systems,
and ensuring that the data is still granular enough to provide useful
insights. Hadoop provide the back-end power and scalability needed
to cope with the volumes of data, and LinkedIn’s user interface allows
their employees to slice and dice the data in tons of different ways.

From a company that employed fewer than 1000 employees five


years ago, LinkedIn have grown to employ almost 9000 people. This
places enormous demand on the analytics team. Perhaps in response
to this, LinkedIn recently reorganized their data science team so
that the decision sciences part (which analyses data usage and key
product metrics) now comes under the company’s chief financial
officer, while the product data science part (which develops the
LinkedIn features that generate masses of data for analysis) is now
part of engineering. As such, data science is now more integrated
than ever at LinkedIn, with analysts becoming more closely aligned
with company functions.

It may come as a surprise to learn that hiring staff is also a challenge,


even for a giant like LinkedIn. Speaking to CNBC.com, LinkedIn’s
head of data recruiting, Sherry Shah, confirmed they were looking

91
BIG DATA IN PRACTICE

to hire more than 100 data scientists in 2015 (a 50% increase from
2014). But competition for the best data scientists is tough, especially
in California, and Shah admitted that “there is always a bidding war”.
Although more people are entering the field, it’s likely this skills gap –
where demand for data scientists outstrips supply – will continue for
a few years yet.

In addition, LinkedIn haven’t escaped the privacy backlash. In June


2015, the company agreed to pay $13 million to settle a class action
lawsuit resulting from sending multiple email invitations to users’
contact lists. As a result of the settlement, LinkedIn will now explic-
itly state that their “Add Connections” tool imports address books,
and the site will allow those who use the tool to select which contacts
will receive automated invitations and follow-up emails.

What Are The Key Learning Points


And Takeaways?
As one of the oldest social media networks and still going strong,
LinkedIn provide a lesson to all businesses in how Big Data can lead
to big growth. Their ability to make suggestions and recommenda-
tions to users is particularly enviable (and is also used successfully
by other companies featured in this book, such as Etsy and Airbnb).
But LinkedIn also provide an example of the need for transparency
when using individuals’ data – and the backlash that can occur when
people feel a company isn’t being entirely transparent. I think we can
expect to see more lawsuits like this against companies in future so it’s
important to be crystal clear with your customers what data you are
gathering and how you intend to use it.

REFERENCES AND FURTHER READING


There’s more on LinkedIn’s use of Big Data at:
https://engineering.linkedin.com/big-data
https://engineering.linkedin.com/architecture/brief-history-scaling-
linkedin

92
LINKEDIN

http://www.cnbc.com/2015/06/04/big-data-is-creating-big-career-
opportunites.html
http://venturebeat.com/2014/10/31/linkedin-data-science-team/
http://www.mediapost.com/publications/article/251911/linkedin-to-
pay-13-million-to-settle-battle-over.html

93

You might also like