Tips for Interviewing
Many of the following tips apply to standard coding interviews as well, but it's worth reiterating with examples from the AI space. Skim through the following tips, and throughout our practice interview questions, I'll reference these tips when appropriate.
Anticipate missing details
Start off asking questions about missing information. You cannot skip this step, as the problem is often underspecified, and you're expected to recognize this crucial fact.
Where possible, instead of asking, jump straight to declaring your assumptions for any crucial, missing details. For example, say you're asked to code a simple neural network. To make communication more efficient, simply declare your assumptions — you'll assume there are 8 layers, the model's hidden dimension is 64 etc. To be extra certain, you can ask the interviewer if these assumptions are acceptable for now.
- They may want to correct you, if they want to move the problem in a certain direction. For example, they may declare that the hidden dimension is egregiously large, or that it's insanely tiny. Either way, this could change the challenge in your interview question.
- Alternatively, they may simply accept your assumptions and note down that you both anticipated what information was needed and provided reasonable suggestions for those missing details. See .
Once you have all requirements, then start building the solution.
Ask to simplify
Sometimes, you may be able to take shortcuts in an interview, if you understand the nuances of the tools available to you. For example, say you're asked to quantize a tensor from FP32 to BF16. Naturally, you can ask to use built-in utilities from other libraries: Can I use PyTorch's .to
to accomplish this? Then, there are one of two possible responses.
-
The interviewer says yes. In which case, your job is done fairly quickly, and you can solve the question in just a single line.
- For the example above, you would then just write
tensor.to(torch.bfloat16)
and be done with it, assuming thattensor.dtype == torch.float32
.
- For the example above, you would then just write
-
Alternatively, the interviewer says no. Where explicitly or implicitly, the interviewer notes that you know these tools inside out, as you knew the easiest way to do this outside of an interview.
- For the particular example above, everyone uses
.to
to convert from one floating point format to another — or, to convert from one integer format to another. However, it is generally not used to convert from floating point to integers or vice versa. If you were asked this in an interview, you would almost certainly be expected to use.to
.
- For the particular example above, everyone uses
Either way this goes, you score a win.
Explain your suggestion
Say you suggested a assumption. Most assumptions are ideally followed by some kind of explanation.
- Explain why this suggestion is reasonable. Most often, you'd simply suggest a commonly-used value. This shows that you have relevant experience, given you have an intuition for what values each hyper-parameter to be. For example, you may suggest 32 transformers for a Large Language Model — generally, any power of 2 sounds reasonable[^2]. See Math to memorize.
- Identify the impact of this suggestion on the problem, if any. The suggestion may just be more readable, or it may produce tangibly more efficient implementation. For example, by using FP32, you make conversion to BF16 extremely cheap to run on hardware, since to convert between these two floating point formats, you simply truncate or pad the bit representation.
- If the suggestion simplifies the problem, say this explicitly, and explain why. Maybe you eliminated a complex step, and thus, this simplified version is a good starting point. For example, say you're implementing a convolution and you assume the kernel is always square. This is a fair assumption, given most kernels in the wild are square, and it allows you to simplify your convolution implementation.
This is a stretch goal, but to the best of your ability, include an explanation for your assumption.