You are on page 1of 8

Q.

What are the ways to insert data to an existing table


in BQ from a GCS bucket?
In BigQuery, you can insert data from a Google Cloud Storage (GCS)
bucket into an existing table using a few methods:

1. Web UI:
Go to the BigQuery web UI.
Navigate to your dataset and table.
Click on "Write Data" and select "Cloud Storage" as the source.

2. Command Line:
Use the bq command-line tool.
Run the bq load command with your dataset and table information,
specifying the GCS path.

3. APIs or Client Libraries:


Use BigQuery APIs or client libraries (like Python or Java) to
programmatically load data from GCS to BigQuery.

4. Data Transfer Service:


Set up a data transfer service job to automatically load data from GCS to
BigQuery at specified intervals.

Choose the method that best fits your workflow and preferences.

Q. What are external tables?


External tables in BigQuery are like windows into data stored outside of
BigQuery. Instead of importing the data into BigQuery, you can create a
virtual table that references data in external sources like Google Cloud
Storage or Google Drive. It's like having a view of the data without
physically moving it, making it convenient for analyzing large datasets
without the need for full data duplication. Just imagine it as a direct link
to data living outside BigQuery, allowing you to query and analyze it
seamlessly.
Q. What are the advantages and disadvantages of
external & native table?

Advantages of External Tables:


1. Cost Efficiency:

Advantage: No need to duplicate data, saving on storage costs as the


data remains in the external source.

2. Query Flexibility:

Advantage: Ability to query and analyze data in external storage without


importing it, offering flexibility in data access.

3. Real-time Data Access:

Advantage: Changes in the external data source are reflected


immediately in queries, providing real-time access to the latest
information.

4. Data Lakes Integration:

Advantage: Seamless integration with data lakes and other external


storage systems, supporting a unified data architecture.

Disadvantages of External Tables:

1. Query Performance:

Disadvantage: Queries may be slower compared to native tables,


especially for large datasets, as data needs to be fetched from external
storage.

2. Limited Indexing:

Disadvantage: Limited support for indexing and optimizations compared


to native tables, potentially impacting query performance.
3. Complexity in Access Control:

Disadvantage: Managing access controls can be more complex when


dealing with external data sources.

4. Data Consistency:

Disadvantage: Data consistency challenges may arise when dealing with


external sources, especially if changes occur during query execution.
Choosing between external and native tables depends on factors like
cost considerations, data access patterns, and performance
requirements in your specific use case.

Q. Suppose you want to copy table A to table B, what


things needs to be taken care of?

When copying Table A to Table B in BigQuery, consider these key steps:

1. Schema Matching:

Ensure that the schemas of Table A and Table B match to avoid data
type conflicts or missing fields.

2. Permissions:

Verify that you have the necessary permissions to read from Table A and
write to Table B.

3. Data Volume:

Be mindful of the data volume, especially if it's a large dataset, to


manage costs and query performance.
4. Destination Table:

Check if Table B already exists; if so, decide whether to overwrite it or


append the data.

5. Transformation Needs:

If needed, apply any transformations during the copy process, like data
type conversions or filtering rows.

6. Data Validation:

After copying, perform data validation to ensure the integrity of the data
in Table B.

7. Scheduled Jobs:

For recurring tasks, consider scheduling the copy operation to keep


Table B up-to-date.

8. Error Handling:

Implement error handling mechanisms to address any issues that may


arise during the copying process.

By addressing these considerations, you can ensure a smooth and


accurate transfer of data from Table A to Table B in BigQuery.

Q. Will table constraints, modes etc also get copied along with
data?

When copying a table in BigQuery, only the data gets copied, not the table's
constraints or modes. You'll need to set up any constraints or configuration
settings separately on the destination table (Table B) if they are required. The
copying process primarily focuses on replicating the data from the source
(Table A) to the destination.
Q. What are Joins and explain its types.

Joins in databases are like combining pieces of information from


different tables based on common columns. Imagine tables as
spreadsheets, and joins as a way to merge them:

1. Inner Join:

Think of it as an intersection. Only the rows where values in the joined


columns match are included in the result.

2. Left Join:

Picture it like a primary list with additional info. All rows from the left
(first) table are included, and matching rows from the right (second)
table are added.

3. Right Join:

Similar to a left join, but the focus is on the right (second) table. All rows
from the right table are included, and matching rows from the left table
are added.

4. Full Outer Join:

It's like combining both left and right joins. All rows from both tables are
included, and where there's a match, the information is merged.

5. Cross Join:

Imagine it as a Cartesian product. It combines every row from the first


table with every row from the second, creating all possible combinations.
Joins help in getting a more complete picture by bringing together
related data from different tables in a database.
Q. What is indexing in SQL?

Indexing in SQL is like creating a well-organized reference for a book.


Instead of reading the entire book to find a specific information, you can
use the index to quickly locate the page. Similarly, in a database,
indexing is a way to efficiently retrieve data by creating a structured
guide, or index, that speeds up the search for specific rows in a table. It's
like having a roadmap to quickly find the data you're looking for, making
queries faster and more efficient.

Q. Clustered vs un-clustered index?

Clustered Index:

Think of it like organizing a book where the pages are physically


rearranged based on the index. The data in the table is stored in
the order of the index. It's like having the book sorted by topics for
faster reading.

Unclustered Index:

Picture it as a separate index at the back of the book. The data in


the table is stored in its own order, and the index provides a
reference to where specific information can be found. It's like
having an alphabetical index that guides you to relevant pages
without changing the physical order of the book.
Q. Bitmap vs Btree index?

Bitmap Index:

Efficient for low-cardinality data.


Like a color-coded map for quick identification.

B-tree Index:

Versatile for various data types.


Hierarchical structure, like an organized book index.

Bitmap Index:

Imagine it like a color-coded map showing where specific


information is located. It's efficient for low-cardinality data, like
marking pages in a book with different colors based on topics.
Helps quickly identify where certain values exist.

B-tree Index:

Picture it as an organized tree-like structure, much like the index of


a book. It's versatile for various types of data, providing a
hierarchical way to navigate and find information efficiently. Similar
to quickly narrowing down sections in a book to locate details.
Q. Types of partitions in BigQuery:

Date Partitioning : Divides data based on date values.

Integer Range Partitioning : Splits data into ranges using integer


values.

Timestamp Partitioning : Similar to date partitioning but based


on timestamps.

Time Unit Partitioning : Allows partitioning by time-based units,


like hours or minutes.

You might also like