Hubspot interview guidance

9/25/2024

Round 1

Questions: Many companies use HubSpot for its calling capabilities. Sales reps use the HubSpot product throughout the day to make phone calls to prospects.

We’ve found that certain customers using HubSpot have a large number of sales reps concurrently making calls with HubSpot, and this puts heavy load on our systems. In response to this, we’d like to bill our customers based on their peak calling load. In other words, we’d like to bill customers based on their maximum number of concurrent calls.

You’re provided with an HTTP GET endpoint that returns phone call records represented as JSON: HubSpot Dataset

Each call looks something like this:

{
  "customerId": 123,
  "callId": "2c269d25-deb9-42cf-927c-543112f7a76b",
  "startTimestamp": 1707314726000,
  "endTimestamp": 1707317769000
}

For the billing team to charge our customers correctly, they need to know the maximum number of concurrent calls for each customer for each day. The billing team has asked you to POST this information to the following endpoint: HubSpot Result.

The POST body must be in the following format:

{
  "results": [
    {
      "customerId": 123,
      "date": "2024-02-07",
      "maxConcurrentCalls": 1,
      "timestamp": 1707314726000,
      "callIds": [
        "2c269d25-deb9-42cf-927c-543112f7a76b"
      ]
    }
  ]
}

Note:

The startTimestamp of a call is inclusive, and the endTimestamp of a call is exclusive.
A single call may span multiple UTC dates, and calls can be arbitrarily long.
If no phone calls occurred during a date, there should be no results entry with that customerId and date combination.

For the above question, my solution was as follows,

import requests
from datetime import datetime, timedelta
from collections import defaultdict

def get_dates_between(start_timestamp, end_timestamp):
    start_date = datetime.utcfromtimestamp(start_timestamp / 1000).date()
    end_date = datetime.utcfromtimestamp(end_timestamp / 1000).date()

    current_date = start_date
    dates = []

    while current_date <= end_date:
        dates.append(current_date.strftime('%Y-%m-%d'))
        current_date += timedelta(days=1)

    return dates

class PhoneRecords:
    def __init__(self, url):
        self.url = url
        self.records = self.__get_records()

    def __get_records(self):
        response = requests.get(self.url)
        return response.json()

    def __process_results(self, result):
        answer = {'results': []}

        for customer_id in result:
            for date in result[customer_id]:
                if result[customer_id][date]['maxConcurrentCalls'] > 0:  # Only include dates with calls
                    current_data = {
                        'customerId': customer_id,
                        'date': date,
                        'maxConcurrentCalls': result[customer_id][date]['maxConcurrentCalls'],
                        'callIds': result[customer_id][date]['callIds'],
                        'timestamp': result[customer_id][date]['timestamp']
                    }
                    answer['results'].append(current_data)

        return answer

    def get_peak_phone_calls(self):
        customer_concurrent_data_by_date = defaultdict(lambda: defaultdict(list))

        # Organize the records
        for record in self.records['callRecords']:
            customer_id = record['customerId']
            start_timestamp = record['startTimestamp']
            end_timestamp = record['endTimestamp'] - 1

            dates_active = get_dates_between(start_timestamp, end_timestamp)

            start_date = datetime.utcfromtimestamp(start_timestamp / 1000).strftime('%Y-%m-%d')
            end_date = datetime.utcfromtimestamp(end_timestamp / 1000).strftime('%Y-%m-%d')

            if start_date == end_date:
                # Call starts and ends on the same date
                customer_concurrent_data_by_date[customer_id][start_date].append(
                    (start_timestamp, 'start', record['callId']))
                customer_concurrent_data_by_date[customer_id][start_date].append(
                    (end_timestamp, 'end', record['callId']))

            else:
                customer_concurrent_data_by_date[customer_id][start_date].append(
                    (start_timestamp, 'start', record['callId'])
                )
                for date in dates_active[1:]:
                    customer_concurrent_data_by_date[customer_id][date].append(
                        (int(datetime.strptime(date, '%Y-%m-%d').timestamp() * 1000), 'start', record['callId'])
                    )
                customer_concurrent_data_by_date[customer_id][end_date].append(
                    (end_timestamp, 'end', record['callId'])
                )

        # Process each customer call data
        result = {}

        for customer_id, date_events in customer_concurrent_data_by_date.items():
            result[customer_id] = {}

            for date, events in date_events.items():
                # Sort events by timestamp and event type
                events.sort(key=lambda x: (x[0], x[1] == 'start'))
                print(events)

                concurrent_calls = 0
                max_concurrent_calls = 0
                max_timestamp = None
                active_calls = set()
                call_ids_at_max_concurrency = []

                # Parse through the events to count concurrent calls for a date
                for timestamp, event_type, call_id in events:
                    if event_type == 'start':
                        concurrent_calls += 1
                        active_calls.add(call_id)
                        if concurrent_calls >= max_concurrent_calls:
                            max_concurrent_calls = concurrent_calls
                            max_timestamp = timestamp
                            call_ids_at_max_concurrency = list(active_calls)
                    else:
                        if call_id in active_calls:
                            concurrent_calls -= 1
                            active_calls.remove(call_id)

                result[customer_id][date] = {
                    'maxConcurrentCalls': max_concurrent_calls,
                    'callIds': call_ids_at_max_concurrency,
                    'timestamp': max_timestamp
                }

        return self.__process_results(result)

    def send_records(self, url):
        payload = self.get_peak_phone_calls()
        print(payload)  # For debugging, to see what will be sent
        response = requests.post(url, json=payload)
        print(f"Response Status Code: {response.status_code}")


if __name__ == '__main__':
    phone_records = PhoneRecords(
        'https://candidate.hubteam.com/candidateTest/v3/problem/dataset?userKey=69bf2d64913a583b62a92f53b50d'
    )

    phone_records.send_records(
        'https://candidate.hubteam.com/candidateTest/v3/problem/test-result?userKey=69bf2d64913a583b62a92f53b50d'
    )

Candidate's Approach

The candidate implemented a solution that retrieves phone call records from a provided URL, processes the records to determine the maximum number of concurrent calls for each customer on each date, and formats the results for submission. The approach involved:

Parsing the JSON response to extract call records.
Using a helper function to get all dates between the start and end timestamps of calls.
Organizing call events by customer and date, sorting them, and counting concurrent calls.
Preparing the final results in the required format for POST submission.

The candidate faced challenges in ensuring the correct handling of timestamps and overlapping calls, which led to incorrect results.

Interviewer's Feedback

No feedback provided.