You are on page 1of 39

Min Cost in Tree

Interview Question
David Wahler (dwahler@)

● ~10 years at Indeed


○ Job Search (intern)
○ Search Quality
○ Data Infrastructure

● ~8 years interviewing

● 6+ years asking min cost in tree


Question: Min Cost in Tree

● https://wiki.indeed.com/display/eng/Minimum+cost+in+a+tree
The Pitch and Setup
Setup
Problem statement:

You’re given a tree in which each edge


has a non-negative integer cost.

Locate and return the leaf node whose


cost from the root is minimal.
Example

Correct result is node J


(cost: 5+0+2 = 7)

Example includes:
● nodes with 0, 1, 2 and 3 children
● paths of different lengths
● edges with cost zero
● optimal leaf is neither first nor last
(when using DFS or BFS)
Setup

Essential points to explain:

● Costs are on edges, not nodes


● Costs are non-negative
● Edges point from parent to child
● Input is the root Node
● Expected result is a Node object
Skeleton Code - Java
class Node {
List<Edge> childEdges; // possibly empty
}

class Edge {
int cost; // ≥ 0
Node target; // non-null
}

public static Node minCostLeaf(Node root) {


// …
}
Skeleton Code - C++
struct Node {
vector<Edge> children;
};

struct Edge {
int cost; // ≥ 0
Node* target; // non-null
};

Node* minCostLeaf(Node* root) {


// …
}
Skeleton Code - Python
class Node:
children = … # type is [Edge]

class Edge:
cost = … # type is int, ≥ 0
target = … # type is Node, not None

def minCostLeaf(root):
# …
return leaf_node
Common questions
Questions (1/3)
- Q: How big is the tree?
A: The tree fits in memory, and the depth is reasonably small (a few
hundred or so)

- Q: What if there’s a tie?


A: It’s OK to break ties arbitrarily. (If time allows, see if candidate can
suggest other ways of resolving this, e.g. always returning the left-most
leaf, or returning all tied leaves.)
Questions (2/3)

- Q: Can I modify the tree?


A: No, the input should be considered immutable. (If candidate gets stuck,
consider relaxing this.)

- Q: Can I assume the input is a well-formed tree?


A: Yes. (e.g. Node pointers are non-null, no cycles)
Questions (3/3)

- Q: What’s the real-world significance of this problem?


A: This specific problem is a bit artificial, but it’s similar to various kinds of
other search and tree-traversal problems that do come up in practice,
without requiring too much “scaffolding”.

- (Bonus: the linear-time solution to the DAG extension is essentially equivalent to the Viterbi
algorithm, which is used to efficiently decode the error-correcting codes used in WiFi, LTE, etc.)
Brainstorming
Solutions
Key insights

● A greedy algorithm that only looks at the immediate


descendants of a node before choosing an edge is wrong.
● Need to traverse the tree and add up costs as we go.
○ Top-down: cost from root to current node
○ Bottom-up: cost from current node to minimum descendant leaf
Common approaches

● Recursive DFS
● Iterative BFS
☺ ● A* (requires extra
information that we ☹
● Dijkstra’s algorithm don’t have)
Naive solution (Recursive DFS, top-down)

public static Node minCostLeaf(Node root) {


private static class Result {
Result result = new Result(); int cost = Integer.MAX_VALUE;
dfs(root, 0, result); Node leaf;
return result.leaf; }
}

private static void dfs(Node current, int costSoFar, Result result) {


if (current.children.isEmpty()) {
if (costSoFar < result.cost) {
result.cost = costSoFar;
result.leaf = current;
}
} else {
for (Edge e : current.children) {
dfs(e.target, costSoFar + e.cost, result);
}
}
}
Naive solution (Recursive DFS, bottom-up)

public static Node minCostLeaf(Node root) {


private static class Result {
return dfs(root).leaf; int cost = Integer.MAX_VALUE;
} Node leaf;

public static Result dfs(Node current) { Result(int cost, Node leaf) {


if (current.children.isEmpty()) { this.cost = cost;
return new Result(0, current); this.leaf = leaf;
}
} else {
}
Result best = null;
for (Edge e : current.children) {
Result candidate = dfs(e.target);
candidate.cost += e.cost;
if (best == null || candidate.cost < best.cost) {
best = candidate;
}
}
return best;
}
}
Naive solution (Iterative BFS)

public static Node minCostLeaf(Node root) { private static class State {


int cost;
Queue<State> queue = new LinkedList<>(); Node node;
State best = null;
State(int cost, Node node) {
this.cost = cost;
queue.add(new State(0, root)); this.node = node;
}
}
while (!queue.isEmpty()) {
State state = queue.poll();
if (state.node.children.isEmpty()) {
if (best == null || state.cost < best.cost) {
best = state;
}
} else {
for (Edge e : state.node.children) {
queue.add(new State(e.target, state.cost + e.cost);
}
}
}
return best.node;
}
Optimizations

● Naive approach always examines the entire tree


● Since costs are non-negative, once we’ve seen at least one leaf, we can
prune subtrees that can’t beat the current best leaf
● Some candidates will realize this on their own (+initiative), but others won’t
see it until you give an example that hints at it
● Pruning is much easier with top-down traversal
○ We have to decide whether to visit a node before traversing its children
○ Also possible with bottom-up traversal
■ instead of passing pair of (current cost, current best)...
■ pass difference between current cost and current best
Top-down DFS with pruning

public static Node minCostLeaf(Node root) {


private static class Result {
Result result = new Result(); int cost = Integer.MAX_VALUE;
dfs(root, 0, result); Node leaf;
return result.leaf; }
}

private static void dfs(Node current, int costSoFar, Result result) {


if (current.children.isEmpty()) {
if (costSoFar < result.cost) {
result.cost = costSoFar;
result.leaf = current;
}
} else if (costSoFar < result.cost) {
for (Edge e : current.children) {
dfs(e.target, costSoFar + e.cost, result);
}
}
}
Optimizations

● Dijkstra’s algorithm: visit nodes in order of cost, stopping at the first leaf
● Does a better job of pruning, but has overhead of maintaining a priority
queue
Dijkstra’s algorithm

public static Node minCostLeaf(Node root) {


PriorityQueue<State> queue = new PriorityQueue<>();
State best = null; private static class State implements Comparable<State> {
int cost;
Node node;
queue.add(new State(0, root));
State(int cost, Node node) {
this.cost = cost;
while (!queue.isEmpty()) { this.node = node;
}
State state = queue.poll();
public int compareTo(State other) {
if (state.node.children.isEmpty()) { return Integer.compare(cost, other.cost);
return state.node; }
}
} else {
for (Edge e : state.node.children) {
queue.add(new State(e.target, state.cost + e.cost);
}
}
}

throw new IllegalStateException("unreachable");


}
DAG extension

● At this point, I point out that the same Node/Edge data structures can be
used to represent other kind of graphs (if candidate hasn’t already said so)
● Extension: solve the same problem on a directed acyclic graph with a
single root.
○ i.e. nodes can have multiple parents, which means multiple paths from the root
○ “Find the leaf node which has the shortest path that connects it to the root”
● Depending on time available, candidates typically don’t write code for a
complete solution to the extension
DAG extension

● The candidate’s tree solution will probably give the correct answer for a
DAG as well, but will be very expensive due to re-visiting nodes. Do they
realize this?
○ Worst-case is typically O(2V); many candidates hand-wave and suggest O(V2) or similar
● If not, draw example e.g. repeated diamonds
● Pruning helps a bit, but not in the worst case
DAG extension

● Incorrect solutions:
○ DFS, skipping already-visited nodes
○ BFS, skipping already-visited nodes (shorter paths may have more edges)
● Correct solutions:
○ Dijkstra’s, skipping already-expanded nodes
○ Dynamic programming:
■ Bottom-up DFS with memoization
■ Topological sort
Bottom-up DFS with memoization (1/2)

public static Node minCostLeaf(Node root) {


Map<Node, Result> memo = new HashMap<>();
return dfs(root, memo).leaf;
}

private static Result dfs(Node current, Map<Node, Result> memo) {


if (memo.containsKey(current)) {
return memo.get(current);
}

Result result;
if (current.children.isEmpty()) {
result = new Result(0, current);
} else {
// …
Bottom-up DFS with memoization (2/2)

} else {
Result best = null;
for (Edge e : current.children) {
Result candidate = dfs(e.target, memo);
candidate = new Result(candidate.cost + e.cost,
candidate.target);
if (best == null || candidate.cost < best.cost) {
best = candidate;
}
}
result = best;
}

memo.put(current, result);
return result;
}
Discussion points
Discussion points

● Test cases
○ Trivial tree (root has no children)
○ Large costs (integer overflow)
○ Tree with large height (stack overflow)
○ Input constraint validation?
■ null pointers
■ negative costs
■ cycles
Discussion points

● Time complexity
○ Naive DFS/BFS is O(N) for a tree, O(V + E) for a DAG
○ With pruning, still O(N), but probably performs much better in practice
○ Dijkstra’s: O(N log N) for a tree, O(E log E) for a DAG
■ DAG complexity can be improved to O(E log V) with a custom priority queue
● Space complexity
○ DFS: O(depth)
■ Some candidates store costs in a map and don’t clean them up, which means space
is O(N)
○ BFS: O(width)
○ Dijkstra’s/memoization: O(N)
Common pitfalls (1/2)

● Not understanding variable scopes in recursive calls


○ Conflating references to same variable in different stack frames
○ Assuming pass-by-reference semantics in pass-by-value languages
● Mistakes in computing costs
○ e.g. comparing an edge cost to a path cost
● Code duplication between root and non-root nodes
● Using global or class variables to store state (bad practice)
○ Breaks when called multiple times and/or in multiple threads
Common pitfalls (2/2)

● Mutating input (deliberately or accidentally)


● Inefficiencies:
○ Collecting leaves in one pass, then searching for minimum
○ Finding minimum by sorting
○ Pruning: only tracking local minimum, not global
● (C++) Memory leaks or unsafe pointer dereferences
The bar
The Bar

● No
○ Unable to solve tree
● Weak No
○ Needs significant assistance to solve tree; no workable ideas for DAG
● Yes
○ Correct, efficient code for tree; at least one independent, non-trivial idea for DAG
● Strong Yes
○ Solves tree easily; discusses pros-and-cons of multiple approaches for DAG; no significant
implementation bugs (or finds/fixes them without help)
Conclusion
Pros
● Room to discuss multiple “optimal” solutions
○ including trading worst-case for average-case
● Good at testing ability to mentally model how code behaves
● Tests both analytical and implementation skills

Cons
● Solutions depend on a relatively small number of distinct insights
● Not good for phone screens
● Has been asked a lot over the years
Open Q & A

You might also like