You are on page 1of 16

Windows Azure™ Marketplace

DataMarket

Published
October 2010

Applies to
Windows Azure Marketplace DataMarket

Summary

Windows Azure Marketplace features DataMarket, a new cloud-based service that provides a global
marketplace for information including data, web services, and analytics. With DataMarket, content providers
can make their datasets available to a wide audience around the world, subscribers can locate a dataset that
addresses their needs through rich discovery, and developers can write code to consume the datasets on any
platform.
Copyright
This is a preliminary document and may be changed substantially prior to final commercial release of the
software described herein.

The information contained in this document represents the current view of Microsoft Corporation on the
issues discussed as of the date of publication. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft
cannot guarantee the accuracy of any information presented after the date of publication.

This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED, OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user. Without limiting
the rights under copyright, no part of this document may be reproduced, stored in, or introduced into a
retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying,
recording, or otherwise), or for any purpose, without the express written permission of Microsoft
Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property
rights covering subject matter in this document. Except as expressly provided in any written license
agreement from Microsoft, the furnishing of this document does not give you any license to these patents,
trademarks, copyrights, or other intellectual property.

© 2010 Microsoft Corporation. All rights reserved.

Microsoft, Windows Azure Marketplace, Access, Active Directory, Excel, IntelliSense, Microsoft Dynamics,
SharePoint, SQL Azure, SQL Server, Visual Studio, Windows, Windows Live, and Windows Server are
trademarks of the Microsoft group of companies.

All other trademarks are property of their respective owners.

2
Contents
Introduction....................................................................................................................................................................... 4
Key Features of DataMarket ........................................................................................................................................ 5

A Global Marketplace for Information ................................................................................................................ 5

Extending the Reach and Scale of Your Data ................................................................................................... 6


A Brokerage for Information and Reports ......................................................................................................... 6
Disparate Content Types .......................................................................................................................................... 7

Unified Billing Infrastructure ................................................................................................................................... 7


Robust Security and Availability ............................................................................................................................ 8

Richer Analytics............................................................................................................................................................ 8
Integration with Information Worker Applications ........................................................................................ 9

A Rich Set of Tools...................................................................................................................................................... 9


Typical Scenarios.............................................................................................................................................................. 9

Developers..................................................................................................................................................................... 9
Data Mash-Ups ......................................................................................................................................................... 11
Independent Software Vendors .......................................................................................................................... 11

Reporting and Analysis .......................................................................................................................................... 12


Mining Data for Trends ......................................................................................................................................... 13
Buying and Selling Information .......................................................................................................................... 13
Architectural Overview ............................................................................................................................................... 13

Data Access Architecture ...................................................................................................................................... 13


Publication Architecture ........................................................................................................................................ 14

Information and Service Quality Bar ................................................................................................................. 15


Information Quality Criteria ................................................................................................................................. 15
Conclusion ...................................................................................................................................................................... 15

Summary ..................................................................................................................................................................... 15
Explore DataMarket Today! ................................................................................................................................. 16

3
Introduction
The internet is a source of vast quantities of data, both public and commercial content. Many
organizations publish datasets in a wide variety of disparate formats, to which customers can
subscribe. However, it can be difficult for customers to locate and subscribe to these datasets.
Furthermore, it can be challenging to use these datasets in ways that add value.

Consider a business that has identified a need for a specific type of data, such as customers and
their buying habits, products from suppliers, geographical information, population statistics,
scientific research, political statistics, or entertainment information. An internet search will locate
several competing data suppliers. But how does the customer make a fair and direct comparison
of the dataset features to select the one most suitable?

And this is just the beginning. After the company has located and chosen a suitable dataset, how
do they integrate it into their business? The fact is, data is often available in a wide variety of
formats. For example, many publishers use XML, but define their own schema, and may use
SOAP, REST, or JSON to exchange information. As a result, the business must devote
development time to integrate the dataset into its desktop applications, web sites, cloud
applications, and any other data-consuming software. This issue is multiplied across every single
dataset that the company acquires from various sources.

It is after the dataset has been integrated into the company that users get their hands on
experience. So poor quality data if present becomes obvious only at this point, and then the
purchase and development costs involved are wasted. And although many dataset suppliers
promise a certain level of availability through their Service Level Agreements (SLAs), some
suppliers are over-ambitious and may not meet their obligations.

Auditing and billing can also be a problem as each data publisher is likely to bill using different
criteria which may not suit the subscriber's use. For example, a monthly subscription may be
expensive if a dataset is used exclusively by a small department. The company may subscribe it
because the data is essential, even though it pays the same price as a customer who generates
ten times the number of queries. Furthermore, a publisher may not provide statistics regarding
data use. If a company wants to know the dataset usage, it may have to develop its own code for
keeping track.

And finally, using that dataset in conjunction with other data sources can be problematic. Can it
be mashed up and associated with other data? Can its semantics be easily augment, flexibly
associate, and correlate data? All of these problems are multiplied when an organization
subscribes to many different datasets from different suppliers.

Windows Azure Marketplace DataMarket can help resolve these issues because it allows
developers and information workers to easily discover, purchase, and manage premium data

4
subscriptions on any platform. Essentially, DataMarket is an information marketplace that brings
data, imagery, and real-time web services from leading commercial data providers and
authoritative public data sources together into a single location. It offers a unified provisioning
and billing framework. In addition, Marketplace provides OData APIs services for accessing data,
so developers and information workers can consume this premium content using virtually any
platform, application, or business workflow.

This paper describes the most important features of DataMarket and how they address common
business needs. It also outlines the common business scenarios that DataMarket addresses and
describes the system's architecture.

Key Features of DataMarket


The following sections describe the most important features of DataMarket and show how they
may address your business needs.

A Global Marketplace for Information


The principal goal of DataMarket is to provide a global marketplace for information in the form
of data and web services that can power applications. In other words, it is a single location that
data providers can use to publish their valuable information and that customers use to subscribe
and query data of all types.

One of the real benefits that DataMarket provides is consistency, from the way datasets are
described, to the method in which subscriptions are managed. It handles all usage tracking and
billing so that providers can easily reach new consumers, and subscribers can view all their usage
in a single location. As a result, billing is more flexible, whether subscribers choose pay-as-they
go transactions, monthly subscriptions, or even enterprise volume licensing. And when it comes
to integrating data into business applications, they can use the same techniques and similar code
with subsequent subscriptions, because of the consistent presentation of data and the ability to
automatically generate new proxy classes.

5
Figure 1: The DataMarket Catalog

Extending the Reach and Scale of Your Data


One challenge for content providers with rich, high-quality information has been how to
publicize their datasets to the global market. Bear in mind that even localized data may have a
global appeal; for example, a database of United States customer addresses may be of interest to
any company selling into the United States from other countries. When you become a content
partner, you automatically obtain the global reach that DataMarket enjoys.

Furthermore, because DataMarket is built on Microsoft Azure® and runs in industry-leading data
centers, you won’t need to make heavy investments in hardware. The service provides almost
unlimited scalability and can guarantee high availability. And when you need to increase the size
of your dataset, DataMarket scales smoothly with your requirements.

A Brokerage for Information and Reports


DataMarket functions as an information marketplace and brokerage business. That is, it provides
all the facilities a content provider needs to monetize the value of carefully created datasets and

6
web services. You no longer have to provide e-commerce functionality such as shopping baskets,
check-out tools, and invoicing because DataMarket does that for you, with high security and
availability. In addition, subscribers trust the data found in DataMarket, because they know they
will get high-quality data and excellent service. DataMarket can even broker data from any
source, whether it’s found in Windows Azure Storage and SQL Azure™ databases or third-party
clouds and private data centers.

Disparate Content Types


DataMarket is a marketplace for information of all types. For example, you can use it to power
business—like publicizing product catalogs, market research, and sales leads or web services.
However, DataMarket is not exclusively a store for business data, and in fact already includes
datasets well beyond the traditional concerns of business organizations. These include:

 National and international news stories


 Crime statistics
 MLB and NFL historical stats from STATS LLC
 Real estate from Zillow
 Demographics and consumer expenditure from Alteryx
 UNESCO food and agriculture information
 Geographical information of many kinds
 Carbon emissions
 Weather forecasts, and more

The information available in DataMarket will only continue to grow and diversify. Furthermore,
some information sets are published on a commercial basis and others are free, such as public
domain data from federal and state governments and free trials to commercial content.

Unified Billing Infrastructure


DataMarket has a complete and versatile billing infrastructure. In addition, it can be scaled in
ways that make sense to how consumers use data. For instance, small customers can get
occasional access to important data in a cost-effective manner, while heavy data users can obtain
the content they need without exponential rises in costs.

For subscribers, the unified billing infrastructure means that tracking data usage and predicting
bills is simple--even when they use many subscriptions with multiple content partners. Microsoft
handles it all.

At the same time, content partners won’t need custom billing and invoicing systems. Instead,
they get a versatile and powerful system that supports multiple tenants straight out of the box.

7
Microsoft handles fulfillment, and DataMarket tracks all customer access and provides detailed
reports.

With DataMarket, you can also create several different subscription models for a single dataset.
For example, you could create a free subscription with partial access and a premium subscription
with full access to all data. You can also control how queries are performed on a web service or
structured dataset—and even control what’s returned from a visual interface with no coding
required. In addition, you can use this visual interface to set pricing, terms of use, marketplace
descriptions, samples, and more.

Furthermore, we encourage content providers to tag the data and supply semantic hints to
application developers and information workers. By doing so, disparate datasets can logically be
combined and joined by clients to extend the power of the datasets.

Robust Security and Availability


Content providers put a lot of time and money into their datasets, which means that security is of
paramount importance. DataMarket has high-security built in, ensuring that subscribers get
simple access, but denying unauthorized access and preventing denial-of-service and other
attacks. DataMarket runs in industry-leading data centers with specialist firewalls, physical
security, backup systems, and redundancy. All these features are maintained by Microsoft so data
providers do not need to build a physical infrastructure if using the Windows Azure platform for
delivering content. If content is supplied from outside of Windows Azure, an SLA has to be
obtained to ensure availability of the content.

Richer Analytics
DataMarket offers the ability to enrich existing analytics, helping content providers extend the
power of their datasets. In fact, you can become a content partner even if you have no data to
publish, simply by creating reports and analyses of the detailed data from other providers. Or you
can simply build and consume reports for your own purposes. After reports are created, they can
be bought and sold in the same way as datasets, allowing individuals with expertise in particular
domains to deliver rich experiences to consumers and information workers.

Furthermore, you can create mash-ups — reports that analyze data from multiple datasets,
including datasets from other content providers in an ecosystem that ensures content providers
receive monetization for their assets and ISVs and report authors generate revenue from
supplying domain knowledge. For example, you could create a report that analyzes your
organization's sales data in the light of weather records, such as how a cold winter affected your
clothing sales and how you can capitalize on such events in the future.

8
Integration with Information Worker Applications
DataMarket integrates with desktop applications smoothly and is an easy way to improve
productivity. For example, a dataset could add information to the Microsoft Office Word
Research task pane. In Microsoft Office Excel®, data from DataMarket could enrich pivot tables
and provide extra insights into business data. Reports in Microsoft Office Access® or Microsoft
SQL Server® can mash up data from the local database with DataMarket information. You can
even use the DataMarket Add-in for Excel to discover, purchase, and use DataMarket datasets
without ever leaving the familiar Excel environment—and then integrate your data with
PowerPivot for Excel for rich, self-service business intelligence and Bing™ maps to use spatial
datasets for quick, visual instant answers.

A Rich Set of Tools


Every time you assess a dataset, you need a detailed view of its data to determine whether it suits
your application. In DataMarket, you can build queries easily by using Service Explorer. A web-
based user interface, Service Explorer lets you preview results in your browser—including what
data is included and how it is structured—before developers write a single line of code.

Service Explorer is incredibly useful for developers who build cross-platform applications because
it creates URLs that they can copy and paste into their application code to call the Web service. .
Developers can use OData URIs to connect to the datasets and consume them in their
applications. In addition, they can use the “Add Service Reference” capability in Visual Studio to
generate proxy classes. The secure REST based OData APIs provide an abstraction over where the
data resides—whether it’s a remote web service, a blob store, a rich SQL database, or content in
the Azure platform.

Service Explorer also works well for information workers. For instance, they can download a
PowerPivot file that enables rich data analysis within Excel.

Typical Scenarios
DataMarket improves the discovery and acquisition of content in a vast variety of business and
non-commercial scenarios. A few examples are discussed below.

Developers
DataMarket helps developers make the most of the rich data in its catalog at every stage of
development. In the beginning, you can take out a trial subscription to some datasets to identify
the most appropriate content for enabling the application and ensuring that it meets the
customer's needs. Then, you can visually explore the content in the browser-based Service
Explorer tool, submitting queries and previewing results.

9
When you are sure you have the right dataset, DataMarket assists you as you build your
application. The Service Explorer tool can return results in Atom 1.0 or raw formats for use as
sample data and generates URLs to the queries you run. You can copy and paste these URLs into
your code to call the service. Most importantly, you can also download automatically generated
C# proxy classes. When you import these into your application, you have strongly-typed access
with full IntelliSense support to ease development.

Figure 2: The Service Explorer

DataMarket Application Programming Interfaces (APIs) help developers work with datasets in the
same way on many different platforms. Because the APIs are consistent, you can quickly develop
code to support desktop, Web, mobile, and other clients. And because they are DataMarket is
built on the REST architecture and static services feature full support for the Open Data Format
specification, high quality data is simple to discover.

10
Data Mash-Ups
A mash-up is any application or visualization that combines data from more than one source to
provide a new experience. On the internet, for example, data from a Web service could be
combined with a mapping tool, such as Microsoft Bing™ maps, to provide a geographical view
that is not possible with the Web service alone. Such a mash-up often makes hidden trends plain,
such as geographical clusters of events that are impossible to spot from zip codes.

With DataMarket, data presentation is consistent, which means that creating mash-ups is fast and
only requires a small amount of development time. As you explore the catalog, simple but
insightful possible mash-ups become obvious and as more datasets are added, the possibilities
will multiply. And by featuring tags to aid semantic analysis, DataMarket makes associations and
mash-ups easier than ever.

Figure 3: A Data Mash-Up in Bing Maps

Independent Software Vendors


Independent Software Vendors (ISVs) with valuable datasets have some unique challenges. In
addition to the traditional challenges of publicizing data to customers and assuring them of
quality, ISVs also need to be able to enable subscribers to reuse data rapidly in mash-ups and
other complex applications, track data use and bill accordingly. ISVs must also support multiple

11
tenants to datasets and ensure that one customer's usage does not contend with other
customers.

DataMarket addresses all these issues because it is a unified data marketplace with consistent
APIs, billing infrastructure, and multi-tenant support, and it runs in market-leading data centers
that guarantee robustness. It represents a great way to capitalize on the quality of your dataset in
which you have invested.

Figure 4: An ISV's Weather Content Displayed Geographically

Reporting and Analysis


DataMarket features robust reporting and analysis that can bring in information from a wide
variety of sources. Consider the following example. A developer has been asked to create sales
analysis code for a home security manufacturing company that makes alarm systems. The
company stores its comprehensive sales data in an on-premise database. Instead of merely using
this data, the developer can identify additional datasets in DataMarket catalog that provide a rich
background to the sales figures—like crime figures for each region. After the analysis code has
been written, it can be rapidly included in desktop applications, intranet and extranet

12
dashboards, and mobile phone apps, all of which are supported. As a result, the developer can
deliver much greater insight—without large overhead.

Mining Data for Trends


DataMarket already includes a large number of large datasets relevant to market trends. These
datasets are provided by internationally respected organizations, such as UNESCO, the United
States Government, and commercial providers such as InfoUSA. This unprecedented resource can
be mined to identify future trends in population, demographics, consumer spending, and more.
Examples include:

 New home owners and new home movers


 Global gender statistics
 Commodity trade statistics
 Demographic statistics
 Global tourism statistics

Buying and Selling Information


Content vendors need to ensure the payment process is seamless and doesn’t impede business.
For instance, customers tend to dislike payment systems from unknown third-parties, expired or
incorrect certificates, unclear invoices, and complex payment schemes that do not reflect usage.

Because DataMarket provides a unified and scalable billing infrastructure, vendors do not need
to solve these challenges for themselves. Instead, subscribers can easily compare your data with
competitors' so they can determine suitability and quality. Then, they can find their usage
statistics instantly and predict costs in advance. Ultimately, they have more confidence in the
payment system.

Architectural Overview
The following sections illustrate the design of the DataMarket service.

Data Access Architecture


DataMarket works as an intermediary between the data found in data stores and consumers
using Office applications, custom desktop clients, and web applications.

13
Figure 5: DataMarket components are shown in blue.

In Figure 5, data consumers of various kinds are found on the left. Notice that DataMarket can be
accessed from many platforms and that Office applications and client-server systems—such as
SQL Server and Microsoft Dynamics® Servers—can use its data. DataMarket also supports any
operating system or hardware platform are supported, because data access uses REST and Atom
1.0 standards and is secured with Secure Sockets Layer (SSL).

The Front End Windows Azure (FEWA) load balancer is a key component because it ensures full
use of the data center and rapid query response. It also insulates users from changes in the server
infrastructure and makes sure that DataMarket scales seamlessly.

At the bottom of Figure 5, you can see the components of DataMarket APIs and marketplace.
Notice that users can authenticate with their Windows Live® ID or with Access Control Services
(ACS). ACS includes identity federation and delegation facilities that you could use, for example,
to integrate your Active Directory® with DataMarket. In this way, users can access DataMarket
through their usual Microsoft Windows® user account. Notice also that DataMarket tracks all
access and generates invoices in logging and accounts databases.

Publication Architecture
On the right of Figure 6 are the data stores. Because DataMarket runs in the cloud, Windows
Azure and SQL Azure make ideal data stores. However, it is important to note that cloud services

14
and data centers from third parties can be used just as easily and are supported via proxy layers
that conform to DataMarket SLA and interfaces.

DataMarket includes Data Access Layers (DALs) that encapsulate all the logic required to query
the data store and remote web services. Please note that load balancers are built into DataMarket
on the publication side as well as the data access side, which provides smooth scaling to large
data centers and heavy traffic.

Information and Service Quality Bar


Microsoft ensures that the datasets found in DataMarket is of the highest quality by working
closely with potential content partners. We also complete a detailed vetting process before a
dataset reaches the catalog.

Windows Azure or SQL Azure data stores are natural choices because of their high availability
and resilience. However, DataMarket can also include datasets that use third party cloud services
or data centers for storage. To ensure the quality of service in these situations, we investigate
Service Level Agreements, load balancing, availability, bandwidth, failover, and other fault-
tolerance features.

Information Quality Criteria


In an effort to build a quality and trusted marketplace for premium content, Microsoft has
rigorous standards for the types of datasets and services that can be made available through
DataMarket. Regardless of whether the data is free or commercial, the content provider must
verify that it has a right to publish the data and take responsibility for the content. Furthermore,
the content provider must agree to make the content available for an agreed period of time in
order to not inconvenience application developers and information workers consuming the
content.

For public domain content, the provider must be the authoritative source of the curated content.
For commercial content providers, the organization must have the right to sell the content in
Microsoft's supported markets and be in the top five (by annual sales) of their industry or vertical.
Content providers that are requested by the ISV and Information Worker community will also be
encouraged to apply for on-boarding their content in DataMarket.

Conclusion
Summary
DataMarket helps simplify all the steps associated with discovering, exploring, and acquiring
information. It helps content providers reduce the challenges of marketing and selling their high

15
quality services and datasets, just as it helps consumers ensure they get quality data that is secure
and easy to use.

Content providers get:

 Easy publication of data whether it is blob data, structured data, or dynamic Web services.
 Developer tooling on the Microsoft platform to ease Visual Studio and .NET development.
 An easy way to get your content to Microsoft’s global developer and information worker
community.
 A scalable Microsoft cloud computing platform that handles delivery, billing, and reporting.

Developers get:

 Trial subscriptions that let you investigate content and develop applications without paying
data royalties.
 Simple transaction and subscription models that support pay as you grow access to multi-
million-dollar datasets.
 Consistent REST-based OData APIs across all datasets that facilitate development on any
platform.
 The Service Explorer tool, which you can use to visually build and explore APIs and preview
results.
 Automatic C# proxy classes that eliminate the need to write long XML and Web service
code.

Information workers get:

 Integration with PowerPivot in Microsoft Excel.


 Simple, predictable licensing models for acquiring content.
 The ability to consume data from SQL Server, SQL Azure Database, and other Microsoft
Office assets.
 The DataMarket Add-in for Excel, which makes it simple to discover, purchase, and use
DataMarket datasets without ever leaving the familiar Excel environment.

Explore DataMarket Today!


Explore and subscribe to trusted premium and public domain data at
http://datamarket.azure.com.

16