[Interview Experience] HCL | Data Scientist | Round-1

2/1/2025

Round 1

Questions:

How to reduce dimensionality when we have 4000 columns/features and 1 target variable (discrete or continuous)?
Which machine learning model do you know best?
What is non-stationarity? How can we detect and remove it?
How to find values of AR(p), MA(q), and other parameters?
Given two scenarios for Naive Bayes, when should you use GaussianNB and when should you use BernoulliNB?
Discuss hyperparameter tuning methods, specifically GridSearchCV.
Coding Challenge: You are given a comma-separated text file where the first row is the column title and the remaining rows are numbers separated by commas. Perform a row-wise sum of all the digits without using any standard library.

Coding Challenge Example:

with open("data.txt") as file:
    data = file.readlines()
for i in range(1, len(data)):
    nums = data[i].split(",")
    total = 0
    for n in nums:
        total += int(n)
    print(f'Total sum for row {i} is: {total}')

Experience in GenAI and LLMs, including the authentication process for the OpenAI library and training LLMs with good prompting skills.

Candidate's Approach

Explained dimensionality reduction techniques including multicollinearity elimination, feature selection, and PCA.
Discussed time series forecasting in depth, including non-stationarity detection and parameter finding.
For Naive Bayes, correctly identified GaussianNB for numerical features and BernoulliNB for categorical features, though noted a mistake in terminology.
Explained hyperparameter tuning with GridSearchCV.
Successfully completed the coding challenge by reading a file and calculating row-wise sums without using standard libraries.

Interviewer's Feedback

Advised to focus more on GenAI for the next interview round, as the position demands significant work in that area.