You are on page 1of 2

First Round of Data Engineering Mock Interview on YouTube channel - The Big Data Show

These are some of the questions asked in a first round of Data Engineering interviews.

SQL Question:

1.
+--------------------+------+
| Column Name | Type |
+--------------------+------+
| seat_id | int |
| free | bool |
+--------------------+------+
seat_id is an auto-increment primary key column for this table.
Each row of this table indicates whether the ith seat is free or not. 1 means free while 0 means
occupied.

Write an SQL query to report all the consecutive available seats in the cinema. Return the result
table ordered by seat_id in ascending order.

2.0

+--------------+---------+
| Column Name | Type |
+--------------+---------+
| player_id | int |
| device_id | int |
| event_date | date |
| games_played | int |
+--------------+---------+
(player_id, event_date) is the primary key of this table.
This table shows the activity of players of some games.
Each row is a record of a player who logged in and played a number of games (possibly 0)
before logging out on someday using some device.

Write an SQL query to report the first login date for each player. Return the result table in any
order.

2.1

Write an SQL query to report the fraction of players that logged in again on the day after the day
they first logged in, rounded to 2 decimal places. In other words,
you need to count the number of players that logged in for at least two consecutive days starting
from their first login date,
then divide that number by the total number of players.

------------------
DSA Question:

1.

Given an integer array nums and an integer val, remove all occurrences of val in nums in-place.
The order of the elements may be changed.
Then return the number of elements in nums which are not equal to val.

Input: nums = [3,2,2,3], val = 3


Output: 2, nums = [2,2,_,_]

-------------------

1. What are the type of file-formats you have encountred while extracting the data from multiple
sources? And if you have encountred parquet/orc/csv then can you please
give me the reason behind choosing any file-format over other?

2. What do you mean by predicate pushdown? How does predicate pushdown plays a role
when we are trying to read data from any columnar based file-format?

3. Let suppose you are reading some parquet file and you are using some filter statement on
the columnar level so What will be the benefit of predicate pushdown?

You might also like