You are on page 1of 1

basic difference b/w sql and dataframe is sql is not distributed whereas dataframe

is distributed.
hive vs impala-impala is more faster then hive
how to show if the file is getting truncated -then( truncate=false)
writing query in spark-finding the highest salary from 3 table
what is dataframe
hadoop nodes
commands in unix
how to find the no of error in log file in unxi-"grep -c slot error.log"
keyword for overwriting the data in pyspark commands basically-use SaveMode

parquet and orc are columanar file format rest(avro,json,cvs,text) are row based
format.
we can use columanar data when we are doing aggregation on the data
we can select row based when we are doing some business analysis and when we want
to scan row to row.

The pass statement is used as a placeholder for future code. When the pass
statement is executed, nothing happens, but you avoid getting an error when empty
code is not allowed. Empty code is not allowed in loops, function definitions,
class definitions, or in if statements.

You might also like