Round 1
Questions: Design a Distributed Job Scheduler with the following requirements:
Functional Requirements:
- Create and schedule a job. i) One-time job. ii) Recurring job. iii) Priority-based job can be scheduled.
- Can check the status of the Job.
Non-Functional Requirements:
- Job should be executed at least one time.
- No job should be lost.
- Scheduler should be highly available.
- Job should start executing within 5 seconds of its starting time.
- 1M jobs in a day can be scheduled and executed.
- Retry job execution within 15-20 mins if failed due to system crash or some retry-able error.
Core Entities:
- Job
* job_id * creation_timestamp * created_by * schedule_cron_syntax * status: [not_enqueued, enqueued, running, failed] * isRecurring * priority ? [Critical] * blob_url * execution_start_time
APIs:
a) Job Creation:
POST: /schedule/job -> Job_id
payload:
b) Check Status: GET: /check/job_status/{job_id} -> Status of job
Candidate's Approach
The candidate proposed a design that includes a job entity with necessary attributes and APIs for job creation and status checking. The design emphasizes reliability and availability, ensuring jobs are executed at least once and can be retried in case of failures.
Interviewer's Feedback
The interviewer appreciated the clarity of the functional and non-functional requirements. They suggested considering additional aspects such as load balancing and scaling strategies to handle high job volumes effectively.