You are on page 1of 3

Apache Spark to Azure Cosmos in limited preview, so if you want to try it out,

you’ll need to sign up for it here.


DB Connector
The Apache Spark to Azure Cosmos DB
Connector lets you run Spark jobs on the data
stored in Azure Cosmos DB. You can use the
connector with Azure Databricks, Azure
HDInsight, which provide managed Spark
clusters on Azure. You can also use it with
your own Spark deployment. The Apache
Spark to Azure Cosmos DB Connector
provides a low-latency data source for Spark
that works for both batch and stream
processing.

Built-in operational analytics


with Apache Spark (in preview)
More recently, we announced a limited
preview of built-in operational analytics in
Azure Cosmos DB using Apache Spark. This
allows you to run analytics from Apache Spark
against data stored in an Azure Cosmos
account without a connector, instead providing
native support for Apache Spark jobs within
Azure Cosmos DB. Capabilities also include
built-in support for Jupyter notebooks, which
run within Azure Cosmos DB accounts.
Built-in support for Apache Spark in Azure
Cosmos DB will provide several advantages,
beginning with the fastest time to insight for
geographically distributed users and data. You
can also simplify your analytics architecture
and lower its TCO, as the system will have the
least number of data processing components
and avoid any unnecessary data movement
among them. Scalability will be built-in, and
you’ll have a security, compliance, and
auditing boundary that encompasses all the
data under management. Finally, you’ll be able
to deliver highly available analytics backed by
stringent SLAs.

The Azure Cosmos DB documentation


provides more information on its built-in
support for Apache Spark. Again, it’s currently

30
procedures, triggers, and UDFs) are
Operational secured using resource tokens.

considerations
• IP firewall. By default, an Azure Cosmos
DB account is accessible from the internet,
as long as the request is accompanied by
Cost optimization with Azure a valid authorization token. Configurable
IP-based access controls in Azure Cosmos
Cosmos DB DB provide an additional layer of security,
The pricing model for Azure Cosmos DB enabling access only from approved
simplifies cost management and planning, in machines and/or cloud services (which still
that you pay only for the throughout you’ve need a valid authorization token).
provisioned (in RUs) and the storage that you
• Access from virtual networks. You can
consume. It’s just one of the many reasons
configure an Azure Cosmos DB account to
why Azure Cosmos DB delivers such a
allow access only from specified a specific
compelling total cost of ownership (TCO).
subnet of a virtual network (Vnet). When
That said, just because Azure Cosmos DB you do this, only requests originating from
delivers a great TCO, it doesn’t mean that you those subnets will get a valid response;
shouldn’t try to get the very most out of the requests originating from any other
resources you’re paying for. The Azure source will receive a 403 (Forbidden)
Cosmos DB documentation includes response.
numerous articles to help you optimize TCO—
• Role-based access control. Azure
from understanding your bill to optimizing the
Cosmos DB provides built-in role-based
cost of provisioned throughput. You’ll also
access control (RBAC) for common
find articles on optimizing costs in relation to
management scenarios. An individual with
queries, storage, reads and writes, geographic
a profile in Azure Active Directory can
distribution, development/test, and reserved
grant or deny access to resources (and
capacity.
operations on Azure Cosmos DB
resources) by assigning these RBAC roles
Security to users, groups, service principals, or
Azure Cosmos DB includes numerous features managed identities. Role assignments are
and capabilities designed to help you prevent, scoped to control-plane access only,
detect, and respond to database breaches. which includes access to Azure Cosmos
That said, there are a few worth calling out accounts, databases, containers, and offers
here: (throughput).
• Data encryption. All data is encrypted at
rest and during transport, by default and Online backup and restore
at no additional cost. Azure Cosmos DB automatically takes backups
• Secure access. With Azure Cosmos DB, of your data at regular intervals, which is done
data access is secured in several ways. without affecting the performance or
Administrative resources (Azure Cosmos availability of database operations. All
DB accounts, databases, users, and backups are stored separately in Azure Blog
permissions) are secured using master storage, with those backups geographically
keys. Application resources (containers, replicated to protect against regional
documents, attachments, stored disasters. These automatic backups can be

31
helpful if you accidentally delete or update account as the live account, it’s not a
your Azure Cosmos account, database, or recommended option for production
container and need to recover that data. workloads.)
Azure Cosmos DB takes snapshots of your
data every four hours. At any given time, only Compliance
the last two snapshots are retained. However, To help customers meet their own compliance
if a container or database is deleted, Azure obligations across regulated industries and
Cosmos DB retains existing snapshots of that markets worldwide, Azure maintains the
container or database for 30 days. largest compliance portfolio in the industry in
terms of both breadth (total number of
With Azure Cosmos DB SQL API accounts, you
offerings) and depth (number of customer-
can also maintain and manage your own
facing services in assessment scope). These
backups. You can use Azure Data Factory to
compliance offerings are grouped into four
periodically output any data to any Azure Data
segments (globally applicable, US
Factory-supported storage destination, or you
Government, industry specific, and region or
can use the Azure Cosmos DB change feed to
country/region specific) and are based on
read data periodically (for full backups and/or
various types of assurances, including formal
incremental changes) and store that data in an
certifications, attestations, validations,
Azure Blob storage account.
authorizations, and assessments produced by
The Azure Cosmos DB documentation independent third-party auditing firms, as well
provides more information on online backup as contractual amendments, self-assessments,
and restore, including options to manage your and customer guidance documents produced
own backups, backup retention, restoring data by Microsoft. The Azure Cosmos DB
from online backups, and migrating restored documentation provides a comprehensive list
data to the original Azure Cosmos DB account. of compliance certifications.
(Although it’s possible to use the restored

32

You might also like