You are on page 1of 3

val a=List(("India","Aus","India"),("Japan","Aus","Aus"),("Japan","India","India"))

val df=a.toDF("team1","team2","win")
+-----+-----+-----+
|team1|team2| win|
+-----+-----+-----+
|India| Aus|India|
|Japan| Aus| Aus|
|Japan|India|India|
+-----+-----+-----+
val df2=df.select("team1").union(df.select("team2"))
val df3=df2.groupBy("team1").count().withColumnRenamed("count","Total_Matches")
+-----+-------------+
|team1|Total_Matches|
+-----+-------------+
|India|      2|
| Aus|      2|
|Japan|      2|
+-----+-------------+
val df4=df.groupBy("win").count().withColumnRenamed("count","winner")
+-----+------+
| win|winner|
+-----+------+
|India|   2|
| Aus|   1|
+-----+------+
df3.join(df4,col("team1")===col("win"),"left").drop("win").na.fill(0).withColumn("loss",col("Total_Match
es")-col("winner")).show
+-----+-------------+------+----+
|team1|Total_Matches|winner|loss|
+-----+-------------+------+----+
|India|      2|   2|  0|
| Aus|      2|   1|  1|
|Japan|      2|   0|  2|
+-----+-------------+------+----+
==============================-======

WITH gro AS (SELECT team1 team FROM tri


UNION
SELECT team2 FROM tri)
SELECT team, tm.total_match, NVL(Total_win, 0)tot_win, NVL((total_match-total_win),0) Total_loss  
 FROM gro, 
(SELECT winner, count(*) Total_win FROM tri GROUP BY winner) win,
(SELECT team1, COUNT(*) total_match 
  FROM (SELECT team1 FROM tri UNION ALL SELECT team2 FROM tri) GROUP BY team1) tm
WHERE gro.team = win.winner(+) AND (tm.team1 = gro.team);

==============-=

Hope the below query will give the solution if you use spark SQL:

With teams_cte as
(Select team1 as team, (case when team1=winner then 1 else 0 end) as won from table
Union all
Select team2 as team, (case when team2=winner then 1 else 0 end) as won from table)
Select team, count(*) as tot_natches, sum(won) as total_won , count(*)-sum(won) as total_loss
From teams_cte
Group by team;

=============-====

==============-==========
This is a very good question. I was asked this question in one of my interviews but in SQL. Just attempted
to solve this using pyspark.

from pyspark.sql.functions import col


from pyspark.sql.functions import sum,avg,max,min,mean,count
df_t1 = df.select(col("Team1").alias("Team"))
df_t2 = df.select(col("Team2").alias("Team"))
df_all_teams = df_t1.union(df_t2)
df_all_teams_Agg = df_all_teams.groupby("Team").agg(count("Team").alias("Total_matchs"))
df_all_teams_Agg.show(truncate=False)
df_winner = df.groupby("Winner").agg(count("Winner").alias("Total_Won_matchs"))

You might also like