Interview Questions

How to recivers matian the offset range
waht are the kafka offsets ..how u will manage in straming techologies?
waht is the life cycle prinicples of saprk streaming?
how u will perform the trasform the in the saprk streaming?
what is the difference orc and parquet file syatesm
what is hive trasaction? any drawback of that one
waht is the saprk perperomace tuning ?
hbase hot spaotting
why hbase writing data veryfast?
what are the five design priniciples of dataware house design in Bigdata?
------------------------------
what are the drawbacks of kafka
what are the drawbacks of kafka
what is the memory model in defined in saprk
what is main drawback of saprk memoery model
what is the main drawback of impala memeory model
spark vs impala
what is difference between orc and parquet file system
In which senarion we need to use spark and in which senario we need to use parqet
different types of compression techniques in hadoop hive impapla saprk
what are major drawbacks of kafka system
spark execution memory vs main memory in spark

hive DDL commands
undersetand different hive file formats
if i have csv file that file having
===================================================================================
====
how to store the data in parquet file from spark...every time file append NEW FILE
WILL CREATE ,HOW TO APPEND ALL FILES
WHAT IS THE BIGGEST FILE SIZE IN PARQET FILE ..IF IT IS 1 GB HOW TO MANAGE THE
FILES.
what is the best option to kakfa topics read store it
how kafka direct api works
where u are saving u r kakfa offsets imformations
different types of compress formats
Hive is designed to make batch processing jobs like data preparation and ETL more
accessible than raw MapReduce
how to store oracle database data in hdfs wich storgar format is better
via a SQL-like language.
Spark SQL is an API within Spark that is designed for Scala or

Java developers to embed SQL queries into their Spark programs.
Impala is modern MPP query engine purpose-built for Hadoop to provide BI and SQL
analytics at interactive latencies.
1 kb*1000 sec ==1mb 60 mb min

3.6 gb per hour
40 gb data per day
https://vision.cloudera.com/apache-spark-cloudera-search-impala-which-is-best-for-
analytics/
https://www.dezyre.com/article/impala-vs-hive-difference-between-sql-on-hadoop-
components/180
https://www.youtube.com/watch?v=N6pJhxCPe-Y
===================================================================================
===========
how to maintian faluttollerance spark
how rdd process the spark program in
if node fails suddennly how fault tolerncy will work
which spark currying functions your are worked in spark
how to change hive schema dynamically
what is the .hiverc in files in hive
in hive level how to tell to hive query how much meemory need to used in query
what is happen any memeory exception came hive level how to handle.
if u r system have 5 gb ram how spark process the file in memory.omp
higher order function in spark
curry functions in saprk
compoiste functions in spark
Lazy evalution in spark

Why partitions are immutable in spark
Difference between sparkSQL And hiveQL
Difference between map and arraybuffer in scala
Performance tuning of spark. On what basis we need to decide
We need to consider all factorsClustersize, inputdata, memory available and cores
Spark SQL support which SQL standard SQL91, SQL92?
Spark DAG generation
How parallel execution happening in spark
Why we need to go for spark…what is the use of it
how to iterate objects in a collection in spark(i didn’t understand this question.
i told map)
how to extract the rdbms streming data,

how u r mainitng the transations properties while you are wrting the data into
mutile records if any records got failed while wrting the data in traget systems
what is the difference between flume and spark streming
hbase
what are the region servers
block cache vs memcache
what region server having the data
while pushing the data from source systems to kafka suddently complete cluster got
down how you know from which records need to access again.
how to get incremental data time stamp from hbase,
how to store cdc data in hbase
difference between flume and kafka spark streaming
difference between strustred streaming and spark streaming
which api you use for wrting custom streaming
in scala what is the means of underscore
kafka waht is the use of partions
-------------------------
1.Nil is an object, which is used to represent an empty list. It is defined in
“scala.collection.immutable”
2.Null is a Type (final class) in Scala. Null type is available in “scala” package
as “scala.Null”. It has one and only one instance that is null.
3.Unit is something similar to Java’s void. But they have few differences.
4.Unit in Scala, Java’s void does not any value. It is nothing.
Scala’s Unit has one value ()
() is the one and only value of type Unit in Scala. However, there are no
values of type void in Java.
Java’s void is a keyword. Scala’s Unit is a final class. Both are used to
represent a method or function is not returning anything
5.Scala’s Int and Java’s java.lang.Integer
scala’s Int class does not implement Comparable interface.
Java’s java.lang.Integer class implements Comparable interface.
6.Java’s Integer is something similar to Scala’s Int and RichInt. RichInt is a
final class defined in scala.runtime package like “scala.runtime.RichInt”.
7.Int and RichInt is that when we use Int in a Scala program, it will automatically
convert into RichInt to utilize all methods available in that Class. We can say
that RichInt is an Implicit class of Int.
8. Nothing is a Type (final class). It is defined at the bottom of the Scala Type
System that means it is a subtype of anything in Scala. There are no instances of
Nothing.
9.What is difference between Null and null in Scala?
Null is a Type (final class) in Scala. Null type is available in “scala” package as
“scala.Null”. It has one and only one instance that is null.
We cannot assign other values to Null type references. It accepts only ‘null’
value.
Null is a subtype of all Reference types. Null is at the bottom of the Scala Type
System. As it is NOT a subtype of Value types, we can assign “null” to any variable
of Value type.
10.Does Scala support Operator Overloading? Does Java support Operator Overloading?
Java does not support Operator Overloading. Scala supports Operator Overloading.
11.What is the difference between Java’s “If..Else” and Scala’s “If..Else”?
In Java, “If..Else” is a statement, not an expression. It does not return a value
and cannot assign it to a variable.
In Scala, “If..Else” is an expression. It evaluates a value i.e. returns a value.
We can assign it to a variable.
ex:val year = if( count == 0) 2014 else 2015
12.Scala supports Operator Overloading. Scala has one and only operator that is “=”
(equalto) operator. Other than this all are methods only.
DAL process is meant for acquisition of data from multiple source format and
clients. DAL reads all the files placed in the source folder by the user and
converts it into JSON format data streams based on parameters provided by the user
and pushes the data stream for further processing in the respective queues. These
queues are created based on project id, client id.
Audit Log :While processing the source files, respective audit logs are created to
track and record following informations:
1. Number of files Received for processing
2. Number of Valid/Invalid Files
3. Information Counters ( No of files processed, Status of the current process) :
a. No. of Files processed : Shows the total count of the files processed
b. Status of the current Process: It shows the current status as Running,
Failed, Completed for each of the currentprocess.

DIS :
DIS is the service meant for receiving the data stream from queue and process it in
following order:-
1. Each Record from the queue is classified as per the embedded project parameters
2. Fetch Model from DB as per the embedded project parameters
3. Model and rules if not preexisting in the memory will be fetched in memory and
retained for future reference
4. Rules will be applied on records either in a batch or on each record
5. valid records are stored in the target files and invalid records are appended to
invalid queues for each client for further processing
6. Valid data will be passed to the aggregation engine
Audit Logs:
Audit logs are being maintained at REE level to record following informations:
1. recording of rule level outcome for each record and each rule.
2. Invalid record information keep in Audit table for future reference.
Information Counters:
It shows Number of records processed, and number of records passed for each rule
as well as for all rules.
Rule level counters
Record level counters
Aggregation:
Aggregation layer is meant for high level aggregation of data based on different
dimensions and to be displayed across the dashboards and analytical reports.
It applies the aggregations on the valid data stream from DIS and keep it updating
continuously as new data arrives from DIS.
Every client has its own process for aggregation of data in real time. Processed
aggregates will be stored in the target Data sink and sent to respective client UI.

Interview Questions

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Interview Questions

Uploaded by

Copyright:

Available Formats

How to recivers matian the offset range

spark execution memory vs main memory in spark

if i have csv file that file having

via a SQL-like language.

Spark SQL is an API within Spark that is designed for Scala or

1 kb*1000 sec ==1mb 60 mb min

40 gb data per day

Lazy evalution in spark

how to extract the rdbms streming data,

in scala what is the means of underscore

kafka waht is the use of partions

Failed, Completed for each of the currentprocess.

You might also like