Suppose we have csv files or structured data in RDBMS, if any
records get updated and inserted. It should reflect into your output table into the target. How will you handle real time data using DataBricks? 2. In which case we use broadcast join? 3. What are the other ways to handle performance of application? 4. How to archive file which is more than 10 days? 5. How will you roll back to the older version if required? 6. Remove duplicate records from table other than distinct 7. Explain cache and persist 8. Which types of data u are getting from client? 9. Difference between parquet file and delta table 10. Are storing your tables as managed tables or external tables? 11. Difference between managed table and external table 12. Difference between dropping managed table and dropping external tables 13. How you union two dataframe? 14. While doing union we have 1st dataframe of 4 columns and 2nd dataframe of 5 columns then union will happen or not? 15. I want to create new column and I want to insert values based on some conditions how we can do? 16. How u join 2 dataframe? 17. How we can create new column and how we can add rank to that column using partitionBy? 18. How will u extract current date in PySpark and Spark SQL? current_date() 19. How we can add 5 days to current date? Df1= df.withColumn("date_plus_5_days", date_add(df["current_date"], 5)) 20. I want to extract 1st day of current month then how we can extract? trunc function along with the month argument 21. I want to extract 1st day of current month then how we can extract? last_day
DATABASE From the conceptual model to the final application in Access, Visual Basic, Pascal, Html and Php: Inside, examples of applications created with Access, Visual Studio, Lazarus and Wamp