You are on page 1of 14

Data Migration from CosmosDB to Another CosmosDB

a) Using Azure Data Factory

1. Create an Azure Data Factory Instance:


- Navigate to the Azure portal.
- Go to "Create a resource" and search for "Data Factory".
- Follow the steps to create a new Azure Data Factory instance.

2. Create Linked Services:


- Access your Azure Data Factory instance in the Azure portal.
- Click on "Author & Monitor", then "Manage" to access Linked Services.
- Create a Linked Service for your source Cosmos DB instance.
- Similarly, create a Linked Service for your destination Cosmos DB instance.

3. Create Datasets:
- Define datasets for both your source and destination data.
- Specify the format (e.g., JSON, CSV) and schema information for each dataset.
- Configure dataset properties such as folder path, file name pattern, and partition
information.

4. Create Copy Data Activity:


- Navigate to the "Author" tab in your Azure Data Factory instance.
- Create a new pipeline and add a Copy Data activity to the canvas.
- Configure the source dataset, destination dataset, and any transformations or
mappings required during the data transfer.

5. Configure Execution Settings:


- Specify execution settings such as scheduling options and triggers.
- Configure monitoring and alerting settings to track the progress of your data
migration job.
6. Validate and Publish the Pipeline:
- Validate your data pipeline to ensure all settings and connections are correct.
- Publish the pipeline to make it available for execution.

7. Execute the Data Pipeline:


- Trigger the execution of your data pipeline to start the data migration process.
- Monitor the execution progress and review logs for any issues.

8. Monitor and Verify Data Migration:


- Monitor the data migration process using the Azure Data Factory monitoring
interface or Azure Monitor.
- Verify the data in your destination Cosmos DB instance to ensure accuracy.
- Address any errors or issues encountered during the migration process.

b) Using Azure Cosmos DB Data Migration Tool

Steps to Migrate Data Using Azure Cosmos DB Data Migration Tool:

1. Download and Install the Tool:


- Download the Azure Cosmos DB Data Migration Tool from the official Microsoft
website or GitHub repository.
- Follow the installation instructions for your operating system (Windows, Linux, or
macOS).

2. Launch the Tool:


- Open the installed tool on your computer.

3. Connect to Source Cosmos DB Instance:


- Create a new migration project.
- Provide connection details for the source Cosmos DB instance (URI, database name,
authentication credentials).

4. Connect to Destination Cosmos DB Instance:


- Specify the destination Cosmos DB instance.
- Provide connection details (URI, database name, authentication credentials).
5. Select Migration Mode:
- Choose between offline or online migration mode based on your requirements.

6. Configure Data Transfer Options:


- Specify target collection/container, data format, and any transformation/mapping
requirements.

7. Run the Migration Process:


- Initiate the migration project to start transferring data from source to destination.
- Monitor migration progress within the tool's interface.

8. Verify and Validate Migration:


- Check data in destination Cosmos DB instance for accuracy.
- Validate migration results and integrity of transferred data.

9. Handle Errors and Exceptions:


- Review error logs to identify and address any issues encountered during migration.
- Retry failed migration tasks or adjust settings as needed.

10. Monitor and Finalize:


- Monitor migration progress and performance metrics.
- Finalize migration project and close the tool once satisfied with results.

c) Using Azure Cosmos DB Data Migration Tool

Step 1: Prepare for Migration


o Download and install the Azure Cosmos DB Data Migration Tool.
o Ensure you have access to both the source and destination Azure Cosmos
DB accounts.
o Obtain the connection strings for each account.
o Identify the name of the source and destination databases (DB) as well as
the containers within them.
Step 2: Configure Source Account
o Launch the Azure Cosmos DB Data Migration Tool.
o Define the source account by specifying the connection string, database
name, and container name.
o Click on "Verify" to validate the connection to the source account.

Step 3: Configure Destination Account


o Define the destination account in the Azure Cosmos DB Data Migration
Tool.
o Provide the connection string, destination database name (will be created if
it doesn't exist), container name, and partition key.
o Click on "Verify" to ensure the connection to the destination account is
established.

Step 4: Optional Logging Configuration


o Optionally, define a log file in CSV format for logging purposes.

Step 5: Initiate Migration


o Review the migration summary provided by the Azure Cosmos DB Data
Migration Tool.
o Click on "Import" to initiate the data migration process.

Step 6: Repeat for Additional Containers (Optional)


o Repeat Steps 2-5 for each additional container you wish to migrate.

Step 7: Verify Migration


o Monitor the migration process within the Azure Cosmos DB Data Migration
Tool.
o Once completed, verify that the data has been successfully migrated to the
destination Cosmos DB account.

Step 8: Finalize
o Optionally, perform any additional testing or validation to ensure the
integrity of the migrated data.
o Congratulations! You have successfully migrated data from one Cosmos DB
instance to another using the Azure Cosmos DB Data Migration Tool.
d) Using Azure Cosmos DB Spark Connector

1. Set Up Your Development Environment:

- Ensure you have an Apache Spark environment set up. You can use Azure
Databricks, an Apache Spark cluster on Azure HDInsight, or a standalone Apache
Spark installation.

- Make sure you have the Azure Cosmos DB Spark Connector library added to your
Spark environment. You can add it as a Maven dependency if you're using Maven,
or download the JAR file and include it in your Spark configuration.

2. Configure Access to Cosmos DB Instances:

- Obtain the connection strings or URIs for both the source and destination Cosmos
DB instances.

- Ensure that you have the necessary permissions and access credentials (e.g., master
keys, resource tokens) to read from the source Cosmos DB instance and write to
the destination Cosmos DB instance.

3. Read Data from Source Cosmos DB:

- Use the Azure Cosmos DB Spark Connector to create a Spark DataFrame that reads
data from the source Cosmos DB instance.

- Specify the source Cosmos DB connection options, including the URI, database
name, collection name, and any required authentication credentials.

- You can optionally apply filters, projections, or transformations to the data as


needed before writing it to the destination Cosmos DB instance.
4. Write Data to Destination Cosmos DB:

- Use the Spark DataFrame created in the previous step to write data to the
destination Cosmos DB instance.
- Specify the destination Cosmos DB connection options, including the URI, database
name, collection name, and any required authentication credentials.
- Choose the appropriate write mode based on your requirements. You can
overwrite existing data, append new data, or perform other actions based on the
existing data in the destination Cosmos DB collection.

5. Execute the Spark Job:

- Submit your Spark job to the Spark cluster or environment.


- Monitor the job execution and review any logs or error messages to ensure the
data migration process completes successfully.
- Depending on the size of your dataset and the performance of your Spark cluster,
the migration process may take some time to complete.

6. Verify Data Migration:

- After the Spark job completes, verify that the data has been successfully migrated
to the destination Cosmos DB instance.
- Query the destination Cosmos DB collection to ensure that it contains the expected
data from the source Cosmos DB instance.
- Perform any necessary data validation or integrity checks to confirm the accuracy
of the migration.
e) Using Azure Cosmos DB Functions + Change Feed API

1. Set up Azure Functions:


- Create an Azure Functions app in the Azure portal.
- Configure the necessary settings such as runtime stack (e.g., Node.js, C#), trigger
type (e.g., Cosmos DB Trigger), and other required configurations.

2. Configure Cosmos DB Trigger:


- Add a Cosmos DB Trigger binding to your Azure Function.
- Configure the trigger to listen to changes on the source Cosmos DB container using
the Change Feed API.
- Specify the lease collection and lease database for checkpointing purposes, which
helps track the progress of the Change Feed processing.

3. Implement Change Feed Processing Logic:


- Write the logic within your Azure Function to process changes received from the
Change Feed.
- Depending on your requirements, this logic could involve transforming the data,
filtering specific changes, or directly forwarding them to the destination Cosmos DB
instance.

4. Connect to Destination Cosmos DB:


- Establish a connection to the destination Cosmos DB instance within your Azure
Function.
- Use the Cosmos DB SDK or client library appropriate for your chosen programming
language to interact with the destination container.

5. Write Changes to Destination Cosmos DB:


- Once changes are processed from the Change Feed, write them to the destination
Cosmos DB instance.
- Ensure that you handle any data transformation, schema mapping, or other
necessary adjustments to match the destination Cosmos DB's schema or
requirements.

6. Handle Error and Retry Logic:


- Implement error handling and retry logic within your Azure Function to handle
transient failures, network issues, or other exceptions that may occur during data
migration.
- Use retry policies or exponential backoff strategies to manage retries effectively.
7. Monitor and Test:
- Test your Azure Function thoroughly to ensure that it correctly processes changes
from the Change Feed and writes them to the destination Cosmos DB instance.
- Monitor the execution of your Azure Function to identify any performance issues,
errors, or anomalies during data migration.

8. Deploy and Scale:


- Deploy your Azure Function to your Azure Functions app.
- Configure scaling settings to ensure that your function can handle varying
workloads and scale out as needed to accommodate increased processing
demands.

f) Using Azure Cosmos DB Live Data Migrator – Steps

1. Set Up Spark Cluster:


- Provision a Spark cluster on Azure Databricks, Azure HDInsight, or any other Spark
service.

2. Install Azure Cosmos DB Connector:


- Install the Azure Cosmos DB Connector for Spark on your Spark cluster by adding
the necessary dependency to your Spark application.

3. Enable Change Feed on Source Cosmos DB:


- Enable the Change Feed feature on your source Cosmos DB instance to capture
changes made to the data in real-time.

4. Configure Spark Job:


- Write a Spark job using Scala, Python, or any other supported language.
- Use the Azure Cosmos DB Connector to read data from the source Cosmos DB
instance and configure it to utilize the Change Feed feature.
5. Write Data to Destination Cosmos DB:
- Use the Azure Cosmos DB Connector to write the data read from the source
Cosmos DB to the destination Cosmos DB instance.
- Ensure that the schema of the data matches the schema expected by the
destination Cosmos DB.

6. Handle Incremental Updates:


- Implement logic in the Spark job to handle incremental updates as new data arrives
in the Change Feed.
- Update or insert records in the destination Cosmos DB based on the changes
received from the Change Feed.

7. Optimize Performance and Scalability:


- Tune the Spark job to optimize performance and scalability by adjusting parameters
such as the number of executors and executor memory.
- Consider partitioning the data appropriately to distribute the workload across
multiple executors for parallel processing.

8. Test and Monitor:


- Thoroughly test the Spark job to ensure it performs as expected under various
conditions.
- Monitor the job's execution using Spark monitoring tools and Azure Cosmos DB
metrics to verify correct data migration.

9. Deploy and Execute:


- Deploy the Spark job to your Spark cluster and execute it.
- Monitor the job's progress and performance during execution to ensure it operates
within performance expectations.

10. Handle Errors and Retries:


- Implement error handling and retry logic in the Spark job to handle transient
failures gracefully.
- Configure appropriate retry mechanisms to retry failed operations and ensure data
migration continues uninterrupted.
g) Using Azure Cosmos DB Live Data Migrator

1) Go to the Azure portal


2) Search for "app registrations", and click on the App registration result
3) Click New registration to start registering the migration tool app
a. Type in a Name for the application. For instance "tips01".
b. Put in a *Redirect URI. This should contain the name, so something
like "https://tips01-ui.azurewebsites.net/signin-oidc"
c. Click Register

(Create an App Registration in the Azure portal

4. Next, note the Application (client) ID in the overview blade of the App
registration. Copy it to use it later
5. Navigate to the Authentication menu
6. Fill in a Front-channel logout URL. Again, this should contain the name,
like "https://tips01-ui.azurewebsites.net/signout-callback-oidc"
7. Check ID tokens (used for implicit and hybrid flows)
8. Click Save
9. Next, go to the Manifest menu
10. Add to the requiredResourceAccess node, so that it looks like this": [
11. Click Save

(Change the manifest of the App registration)

12 . In the manifest, copy the publisherDomain for later

Now that we have an application registration, we can deploy the migration app.

13 Go to this link to start creating the migration app

1. First, select a Resource Group


2. Next, type in a Resource Name Prefix. This should be the name that you used
earlier. So, in my case, it is "tips01"
3. In Default Source Database Account Name, put in a name for the Azure Cosmos
DB connection that will serve as the migration source. This can be anything, and you
can change it later
4. For Default Source Database Account Connection String, type in the connection
string for the source Azure Cosmos DB. You can find this in the Azure portal
5. Type a name for the Default Destination Database Account Name
6. Provide the connection string for the destination Azure Cosmos DB account
in Default Destination Database Account Connection String
7. In the Allowed Users field, provide a user that will use the migration tool. This can
be an objectId or email address of a user that is in the same tenant where the
application will be deployed. You can provide multiple users by separating them with
the "|" character
8. Next, provide the publisher domain from the App Registration Manifest in the Aad
App Registration Publisher Domain field
9. Finally, in the Aad App Registration Client ID field, put in the Application (client)
ID from the App registration
10. Click Review + create and then Create to deploy the migration tool

(Deploy the migration tool)

The migration tool will deploy several resources. This includes an Azure App Service Web App that
runs the UI for the tool. Find the Web App in the Azure portal and open the UI in a browser. The
URL will use the name that you provided earlier. So, in my case, it is https://tips01-
ui.azurewebsites.net

1. To start creating a migration, click Create


2. Now fill in the source and destination details
1. For source and destination, provide the Azure Cosmos DB database name in
the DB field
2. Fill in the database Container name
3. Provide the Partition key for the source and destination
4. Click Create / Start

(Create a new migration)

3. You can watch the progress of any open migrations by clicking on the List menu and
refreshing your browser
4. When all documents are migrated, click Complete to mark the migration as finished

Complete The Migration!

Comparison of Data Migration Methods for Cosmos DB

1. Using Azure Data Factory (ADF):


- Complexity: ADF provides a GUI for building data pipelines, simplifying the process.
- Ease of Use: Visual interface makes it relatively straightforward to create pipelines.
2. Using Azure Cosmos DB Data Migration Tool:
- Complexity: Specifically designed for Cosmos DB migrations, simplifying the
process.
- Ease of Use: Typically offers a straightforward user interface for configuration.

3. Using Azure Cosmos DB Functions + Change Feed API:


- Complexity: Requires more development effort compared to GUI-based tools.
- Ease of Use: Offers flexibility but requires more coding and configuration.

4. Using Azure Cosmos DB Spark Connector:


- Complexity: Offers powerful capabilities but requires expertise in Spark
programming.
- Ease of Use: Provides high performance but needs more setup and configuration.

5. Using Azure Cosmos DB Live Data Migrator:


- Complexity: Abstracts complexities but offers less flexibility.
- Ease of Use: Simplifies migration with minimal user intervention.

Conclusion:
a) For simplicity, Azure Data Factory and Azure Cosmos DB Data Migration Tool are user-
friendly GUI-based options.

You might also like