Preparing for the technical interview is one of the most daunting tasks of all while trying to get a job in the CS field. There are a lot of platforms like LeetCode, Hackerrank with thousands of questions. I always thought solving the most questions would give me a clear edge during the interview process. But, what I have come to realize is that there will always be a question you haven’t prepared for.

The motive of the interview questions is to see the problem-solving skills in the candidate. Now, there might be hundreds of ways in which the same problem can be solved. Suppose you got lucky and you got the exact question you prepared for. You might go ahead and solve it the way you did it while you practiced because you know exactly what data structures to use, which algorithm to follow. But, that might not be enough to showcase your problem-solving skills. The interviewer not only wants to see if you can solve the problem but also want to know why did you come up with that particular solution.

For example: When I prepare tea, I first let the milk boil for a while before I put any ingredients. Now, that’s not the only way to prepare tea, is it? You can mix the ingredients beforehand and boil later or whatever order you want to do it. The thing is I do it that way because it makes tea silkier and smoother. All I am saying is I have my reason to back my action; similarly, you need yours while solving a problem.

Now, let’s try solving with reason. Here’s the question:

There is a system that runs jobs submitted by users. You have to write some code that will analyze logs to figure out how much processing time each job took. A job is started manually with a root task, then that task can create other tasks (“sub- tasks”). Those sub-tasks can create more sub-sub-tasks, and so on.

The log file you have consists of one line per task. It contains:

  1. the task’s ID
  2. the ID of the task that created it (or null if it was a root task)
  3. the amount of processing time the job took.

Here is an example log file:

Child Job Parent Job Amount of Time
E B 400
G null 10
B A 100
F B 600
D A 20
A null 20
C A 250
H G 20

Write some code that inputs a log file, and then outputs the total amount of time taken for each job: that is, the total time taken by all of the tasks within a job.

Example, where we have two jobs, the one whose root task is A and the one whose root task is G. Explanation of log file:

  1. Task A created task B, which created tasks E and F. Task a also created tasks C and D.
  2. Task G created task H

This should output:

A 1390 G 30

I will use Java to demonstrate my code. Firstly, when you are interviewing on a whiteboard or with an actual person, it would be better if you take your time understanding the question and start with easy steps while you explain your thought process. It gives you time to think about what approach you will be taking and explain why you chose that particular step.

File file = new File("logFile.txt")

The first step to think about is, can you read the data from the file and store it in a data structure that will make your job easy? If you think you can, use it and explain why that’s going to make your job easy. What if I store the data in the list of lists? Will it make it easier to get the sub-tasks of the task? Given task A, we should be able to look up what are the direct sub-tasks of A. Let’s say we have, tasks A to G, and each task can have sub-tasks. We will need a data structure that will take the task id and return us the sub-tasks. Can we use arrays? No. Because the index of the array starts from 0 and continues in the increment of 1. We can’t change the index of an array to user-based. So, we will need a mapping of our user-based index (id of the tasks) that are unique. That means we need some kind of dictionary to look up our tasks and return the sub-tasks. Now, there are various implementations of dictionary in different programming languages.

In Java, you typically use HashMap to store a key-value pair.

HashMap<K,V> dictionary = new HashMap<K,V>();

In python, you have dictionary.

dictionary = {}

Now, we have figured out how we are mapping the tasks with the sub-tasks. Let’s see how we are going to get the amount of process time for each task. Again, we have to look up the processing time given the id of the task. So, it makes sense that we use the dictionary here as well as the index is user-based (id of the task). Another thing to keep in mind is that we have root tasks, and since they aren’t the sub-tasks of any other task we do not store them in the dictionary. But, since they are the root tasks, they can be used to traverse through the dictionary. As we are printing the output of only the root tasks, each root task has to traverse through the dictionary. So, what data structure shall we use? Since we aren’t looking up for value, and we only care about the id of the root task, we can store them using lists or arrays. Our solution would be to iterate over list/array and traverse the dictionary using graph traversal. Now, why are we using graph traversal?

HashMap
A acts as root of the tree

Broadly speaking, we have two methods of graph traversal: Breadth-First Search (BFS) and Depth-First Search (DFS). We can use either of them here. But, I choose DFS since we are searching the entire tree, and DFS implementation is concise and efficient (recursively). DFS can also be implemented using Stack (iteratively).

//string : root task
//hashTasks : HashMap of task and subtasks
//hashProcessing: HashMap of task and process
private int getProcessingTime(String string, HashMap<K,V> hashTasks, HashMap<K,V> hashProcessing)
{  
   int processingTime = 0;
   if(hashProcessing.containsKey(string))
      processingTime = hashProcessing.get(string);
if(!hashTasks.containsKey(string))
      return processingTime;
   for (String sub: hashTasks.get(string))
   {
      processingTime += getProcessingTime(sub, hashTasks, hashProcessing);
   }
   return processingTime;
}

Now, using DFS we have got the processing time for each root task. You can get the complete solution on GitHub.