Big Query Interview Q&A

Q.
What are the ways to insert data to an existing table

in BQ from a GCS bucket?
In BigQuery, you can insert data from a Google Cloud Storage (GCS)
bucket into an existing table using a few methods:
1. Web UI:
Go to the BigQuery web UI.
Navigate to your dataset and table.
Click on "Write Data" and select "Cloud Storage" as the source.
2. Command Line:
Use the bq command-line tool.
Run the bq load command with your dataset and table information,
specifying the GCS path.
3. APIs or Client Libraries:

Use BigQuery APIs or client libraries (like Python or Java) to
programmatically load data from GCS to BigQuery.
4. Data Transfer Service:

Set up a data transfer service job to automatically load data from GCS to
BigQuery at specified intervals.
Choose the method that best fits your workflow and preferences.
Q. What are external tables?

External tables in BigQuery are like windows into data stored outside of
BigQuery. Instead of importing the data into BigQuery, you can create a
virtual table that references data in external sources like Google Cloud
Storage or Google Drive. It's like having a view of the data without
physically moving it, making it convenient for analyzing large datasets
without the need for full data duplication. Just imagine it as a direct link
to data living outside BigQuery, allowing you to query and analyze it
seamlessly.
Q. What are the advantages and disadvantages of
external & native table?
Advantages of External Tables:

1. Cost Efficiency:
Advantage: No need to duplicate data, saving on storage costs as the

data remains in the external source.
2. Query Flexibility:
Advantage: Ability to query and analyze data in external storage without

importing it, offering flexibility in data access.
3. Real-time Data Access:
Advantage: Changes in the external data source are reflected

immediately in queries, providing real-time access to the latest
information.
4. Data Lakes Integration:
Advantage: Seamless integration with data lakes and other external

storage systems, supporting a unified data architecture.
Disadvantages of External Tables:
1. Query Performance:
Disadvantage: Queries may be slower compared to native tables,

especially for large datasets, as data needs to be fetched from external
storage.
2. Limited Indexing:
Disadvantage: Limited support for indexing and optimizations compared

to native tables, potentially impacting query performance.
3. Complexity in Access Control:
Disadvantage: Managing access controls can be more complex when

dealing with external data sources.
4. Data Consistency:
Disadvantage: Data consistency challenges may arise when dealing with

external sources, especially if changes occur during query execution.
Choosing between external and native tables depends on factors like
cost considerations, data access patterns, and performance
requirements in your specific use case.
Q. Suppose you want to copy table A to table B, what

things needs to be taken care of?
When copying Table A to Table B in BigQuery, consider these key steps:
1. Schema Matching:
Ensure that the schemas of Table A and Table B match to avoid data
type conflicts or missing fields.
2. Permissions:
Verify that you have the necessary permissions to read from Table A and
write to Table B.
3. Data Volume:
Be mindful of the data volume, especially if it's a large dataset, to

manage costs and query performance.
4. Destination Table:
Check if Table B already exists; if so, decide whether to overwrite it or

append the data.
5. Transformation Needs:
If needed, apply any transformations during the copy process, like data
type conversions or filtering rows.
6. Data Validation:
After copying, perform data validation to ensure the integrity of the data
in Table B.
7. Scheduled Jobs:
For recurring tasks, consider scheduling the copy operation to keep

Table B up-to-date.
8. Error Handling:
Implement error handling mechanisms to address any issues that may

arise during the copying process.
By addressing these considerations, you can ensure a smooth and

accurate transfer of data from Table A to Table B in BigQuery.
Q. Will table constraints, modes etc also get copied along with
data?
When copying a table in BigQuery, only the data gets copied, not the table's
constraints or modes. You'll need to set up any constraints or configuration
settings separately on the destination table (Table B) if they are required. The
copying process primarily focuses on replicating the data from the source
(Table A) to the destination.
Q. What are Joins and explain its types.
Joins in databases are like combining pieces of information from

different tables based on common columns. Imagine tables as
spreadsheets, and joins as a way to merge them:
1. Inner Join:
Think of it as an intersection. Only the rows where values in the joined

columns match are included in the result.
2. Left Join:
Picture it like a primary list with additional info. All rows from the left
(first) table are included, and matching rows from the right (second)
table are added.
3. Right Join:
Similar to a left join, but the focus is on the right (second) table. All rows
from the right table are included, and matching rows from the left table
are added.
4. Full Outer Join:
It's like combining both left and right joins. All rows from both tables are
included, and where there's a match, the information is merged.
5. Cross Join:
Imagine it as a Cartesian product. It combines every row from the first

table with every row from the second, creating all possible combinations.
Joins help in getting a more complete picture by bringing together
related data from different tables in a database.
Q. What is indexing in SQL?
Indexing in SQL is like creating a well-organized reference for a book.

Instead of reading the entire book to find a specific information, you can
use the index to quickly locate the page. Similarly, in a database,
indexing is a way to efficiently retrieve data by creating a structured
guide, or index, that speeds up the search for specific rows in a table. It's
like having a roadmap to quickly find the data you're looking for, making
queries faster and more efficient.
Q. Clustered vs un-clustered index?
Clustered Index:
Think of it like organizing a book where the pages are physically

rearranged based on the index. The data in the table is stored in
the order of the index. It's like having the book sorted by topics for
faster reading.
Unclustered Index:
Picture it as a separate index at the back of the book. The data in

the table is stored in its own order, and the index provides a
reference to where specific information can be found. It's like
having an alphabetical index that guides you to relevant pages
without changing the physical order of the book.
Q. Bitmap vs Btree index?
Bitmap Index:
Efficient for low-cardinality data.

Like a color-coded map for quick identification.
B-tree Index:
Versatile for various data types.

Hierarchical structure, like an organized book index.
Bitmap Index:
Imagine it like a color-coded map showing where specific

information is located. It's efficient for low-cardinality data, like
marking pages in a book with different colors based on topics.
Helps quickly identify where certain values exist.
B-tree Index:
Picture it as an organized tree-like structure, much like the index of

a book. It's versatile for various types of data, providing a
hierarchical way to navigate and find information efficiently. Similar
to quickly narrowing down sections in a book to locate details.
Q. Types of partitions in BigQuery:
Date Partitioning : Divides data based on date values.
Integer Range Partitioning : Splits data into ranges using integer

values.
Timestamp Partitioning : Similar to date partitioning but based

on timestamps.
Time Unit Partitioning : Allows partitioning by time-based units,

like hours or minutes.

Big Query Interview Q&amp;A

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Big Query Interview Q&amp;A

Uploaded by

Copyright:

Available Formats

Q.

What are the ways to insert data to an existing table

3. APIs or Client Libraries:

4. Data Transfer Service:

Q. What are external tables?

Advantages of External Tables:

Advantage: No need to duplicate data, saving on storage costs as the

Advantage: Ability to query and analyze data in external storage without

3. Real-time Data Access:

Advantage: Changes in the external data source are reflected

4. Data Lakes Integration:

Advantage: Seamless integration with data lakes and other external

Disadvantages of External Tables:

Disadvantage: Queries may be slower compared to native tables,

Disadvantage: Limited support for indexing and optimizations compared

Disadvantage: Managing access controls can be more complex when

Disadvantage: Data consistency challenges may arise when dealing with

Q. Suppose you want to copy table A to table B, what

When copying Table A to Table B in BigQuery, consider these key steps:

Be mindful of the data volume, especially if it's a large dataset, to

Check if Table B already exists; if so, decide whether to overwrite it or

For recurring tasks, consider scheduling the copy operation to keep

Implement error handling mechanisms to address any issues that may

By addressing these considerations, you can ensure a smooth and

Joins in databases are like combining pieces of information from

Think of it as an intersection. Only the rows where values in the joined

4. Full Outer Join:

Imagine it as a Cartesian product. It combines every row from the first

Indexing in SQL is like creating a well-organized reference for a book.

Q. Clustered vs un-clustered index?

Think of it like organizing a book where the pages are physically

Picture it as a separate index at the back of the book. The data in

Efficient for low-cardinality data.

Versatile for various data types.

Imagine it like a color-coded map showing where specific

Picture it as an organized tree-like structure, much like the index of

Date Partitioning : Divides data based on date values.

Integer Range Partitioning : Splits data into ranges using integer

Timestamp Partitioning : Similar to date partitioning but based

Time Unit Partitioning : Allows partitioning by time-based units,

You might also like

Big Query Interview Q&A

Big Query Interview Q&A