Round 1
Questions: I was given a log file of conversation like this:
"maria: hi, john: hello hi bye bye, maria: pls dont go"
and a helper function parse_log
that would process the file so I would get a list like this:
[(\'maria\', 1), (\'john\', 4), (\'maria\', 3)]
calculating the word count per message. I had to return top N talkative users.
Candidate's Approach
I used a dictionary to calculate word count per person and then made a min heap with "-" to simulate a max heap. The issue is I made a heap of size P (number of people) instead of capping the heap at N (number of top users we need). I finished coding with 20-25 minutes left and then the interviewer pointed to a portion of my code and said I have a bug there. I figured it out immediately. She pointed to another portion for another bug, I took a while but figured it out and fixed it. It can be considered syntactical. Then she asked me to provide some test cases which I did (ties, empty list from parse_log
, etc). Then she asked me for time complexity which I explained and generalized to O(M log U) where M = messages and U = users.
The biggest issue is that I added all my dict items to the heap (size U) instead of capping it at size N. It was a painfully obvious thing I knew I should do but forgot to as I was coding. So I did not even remember to mention it. The time complexity could have greatly been reduced if I used a heap of size N i.e. O(M log N) which is significant for when N << U. The whole point of a heap is to keep track of top/least/superlative N elements without storing the entire list. But I fumbled it.
Interviewer's Feedback
No feedback provided.