Round 1
Questions:
- How to reduce dimensionality when we have 4000 columns/features and 1 target variable (discrete or continuous)?
- Which machine learning model do you know best?
- What is non-stationarity? How can we detect and remove it?
- How to find values of AR(p), MA(q), and other parameters?
- Given two scenarios for Naive Bayes, when should you use GaussianNB and when should you use BernoulliNB?
- Discuss hyperparameter tuning methods, specifically GridSearchCV.
- Coding Challenge: You are given a comma-separated text file where the first row is the column title and the remaining rows are numbers separated by commas. Perform a row-wise sum of all the digits without using any standard library.
Coding Challenge Example:
with open("data.txt") as file: data = file.readlines() for i in range(1, len(data)): nums = data[i].split(",") total = 0 for n in nums: total += int(n) print(f'Total sum for row {i} is: {total}')
- Experience in GenAI and LLMs, including the authentication process for the OpenAI library and training LLMs with good prompting skills.
Candidate's Approach
- Explained dimensionality reduction techniques including multicollinearity elimination, feature selection, and PCA.
- Discussed time series forecasting in depth, including non-stationarity detection and parameter finding.
- For Naive Bayes, correctly identified GaussianNB for numerical features and BernoulliNB for categorical features, though noted a mistake in terminology.
- Explained hyperparameter tuning with GridSearchCV.
- Successfully completed the coding challenge by reading a file and calculating row-wise sums without using standard libraries.
Interviewer's Feedback
- Advised to focus more on GenAI for the next interview round, as the position demands significant work in that area.