Ace your Computer Vision job interview

Practical tips on how to approach your prep including understanding the role, links, and useful sample questions

Welcome to my initial blog on MUJINspire! I’m Tim, and my role is to help people get jobs at Mujin, and I’d like to share my insights about how to approach any interview for computer vision – including our open positions. If you’re considering working as a computer vision engineer, you’ll find many good resources and ideas below.

While this blog describes a variety of skills and aspects of the interview situation, it doesn’t cover basic computer science knowledge, or even the very basics of computer vision — but you can find useful links below to review. Furthermore, this guide covers computer vision as in helping computers understand a real 3D scene, not just image processing.

What job are you interviewing to get?

Let’s get started with our first, and most important tip: make sure you understand the job. You need to know the organization, their funding, the team structure, culture, your potential teammates, and prepare to prove you have the skills needed to do the job. Computer vision requires basic skills, usually acquired by working with these concepts and tools for more than a few months:

  • Mathematics knowledge, as computer vision is based on physical things
  • Programming skills, data structures, and other aspects of computer science
  • Learning skills, because we always need new skills to solve new challenges

For computer vision-specific skills, you should prepare specifically for the tasks at hand, because depending on the application, you’ll use a variety of techniques. Here’s a list of different subjects for computer vision:

Self-driving cars, Autonomous Guided Vehicles (AGVs) and other types of mobile robots on wheels

These technologies often use computer vision for obstacle/ people avoidance. When making an algorithm for detecting people, you can assume the sensors and cameras align with gravity, which will allow more possibilities for creating an algorithm for detection. For example, you won’t need to handle all possible rotations of people who are walking on 2 feet through streets or your warehouse.

Drones, satellites, or other things that can utilize an image taken from a tall height

Objects in images will appear as though they’re a flat plane, which makes it simpler to localize and navigate. Images acquired by the cameras will need to be optimized for fast and low-resolution recognition. You can narrow down the types of algorithms you’ll utilize based on this assumption.

Picking robots, or most robots that manipulate objects

From the viewpoint of the sensors, objects to be manipulated are positioned with no relation to gravity. Inter-reflection is another challenge; shiny objects will be taking on colors, highlights, and reflections from other nearby objects. If the environment is cluttered, you’ll also need to worry about occlusions. All these can pose challenges in segmenting and detecting poses in scenes.

Augmented Reality (AR)

AR mixes virtual reality and current reality by using sensors and cameras to change views and scenes. For example, if you’re playing a game on a table you will need sensors (cameras) to detect the environment, then stitch your virtual reality animations into the experience. Most AR jobs require complex work in computer vision. Object detection / SLAM / target pose estimation are the most common tasks in building AR applications. This is because when we reconstruct scenes, we’re mapping (like in SLAM). When we’re scanning geometry & appearance, we’re representing objects.

As you can understand through these examples, depending on the nature of the system and the tasks to be solved, your computer vision experience will be valued differently. Therefore, our first suggestion is to tailor your skills interview strategy depending on the job you pursue. For example, if you’re interviewing for a role to develop ADAS systems, it’s probably a good idea to brush up on algorithms and methods of solving computer vision specifically for the area. Similarly, if you’re going to be making a robotic system that picks up different objects, be sure to study the ins-and-outs of pose estimation including rotations, low-lighting conditions, and other practical considerations. Ultimately, you’ll have a higher chance of success if you understand the job as much as possible and are prepared to speak about your relevant skills and experience.

How to prepare for the technical interview

Here at Mujin, just like most companies, our skills-focused interview is early in the selection process. Most interviewers want to know your applied knowledge of basic skills. Our skill check, much like others in the industry, includes a programming test and questions about algorithms and data structures. We also test your understanding of computer vision basics. You’ll need to demonstrate you know what to achieve and how to achieve it, and that you have the programming skills to implement successfully. As an engineer it’s important to make things work and find a solution. Typical problems we solve here are:

  • 3D pose estimation
  • Object detection & classification
  • SLAM
  • Calibration
  • Writing drivers to get sensor data, data capture, data stream, and reconstruction
  • Object and scene reconstruction
  • Evaluating sensor capabilities in real environments

Programming — what to expect

Generally, computer vision roles will involve using C++ and Python. You should be prepared to discuss your related skillful experience in the interview. The combination of these two languages is popular as both can be used in production by calling each other, thus quickly integrated in an R&D environment and later optimized for performance. You should be ready to show your portfolio or samples of the code you’ve written.

Computer vision-related interview questions

As you know, computer vision is a complex subject in computer science because of interrelated topics that converge in math, physics, and electronics. A few questions can go deep to understand a person’s knowledge on these subjects.

A few key questions that you may encounter:

Q: How do you project a 3D point to an image?

There is a trick to this question, and you’ll need to ask the follow-up question “What coordinate frame is the point represented?” You’ll want to know if this is the camera frame or if the point is projected into the image frame. To transform the point to another frame, you need to know the rigid transformation. To project the image, you need to know camera intrinsics such as the camera matrix and lens distortion.

Q: How do you make 3D measurements using 2D cameras/sensors?

Here you will need to demonstrate an understanding of epipolar geometry and essential matrix, in addition to the relationship between 2D and 3D points. You’ll need to know the basics that allow you to make measurements in a scene, and after understanding these parts of an image you can then start to use the sensor as a measurement device.

You can prepare for these types of questions by refreshing your knowledge of camera calibration/ representation, calibration, epipolar geometry, and PnP-based pose estimation, and homography/ transformations.

Q: What is the object, and what is its position & orientation based on the coordinate frame of a reference?

Advanced computer vision roles are not about detecting bounding boxes. As 3D pose estimation is about 3D translation and rotation of objects, it’s important to demonstrate that you can generate a rotation matrix. Geometry and appearance can define the origin of an object, and for any advanced role you will be expected to show your understanding of techniques using these formulas as well. Here’s a link for feature extraction + feature matching + homography-based pose estimation.

Recommended resources for math review

In any computer vision interview you will likely need to demonstrate your applied knowledge of math such as:

  • Linear Algebra
  • Calculus
  • Numerical optimization
  • Probability
  • Geometry

For mathematical optimization, Professors Stephen Boyd and Lieven Vandenberghe, wrote a very useful book titled Convex Optimization, which is available freely on the web.

Recommended resources for computer vision review

There are a many computer vision materials online. One that stands out is taught by Srinivasa Narasimhan at Carnegie Mellon University, and is called “Computer Vision.” The course provides a comprehensive introduction to image processing, the physics of image formation, the geometry of computer vision, and methods for detection and classification.

Other useful resources:
3D Computer Vision from Guide Gerig at the University of Utah
Computer Vision courses at the University of Florida 
Lecture series on 3D sensors from Radu Horaud at INRIA 

Our final advice: if you know these things above, but not well, you should try to learn. Computer vision is an exciting area, and many technologies will rely on it heavily. Thanks for reading and we wish you the best on your journey to building an exciting automated future.

Any feedback? Please feel free to reach out to

Stay Connected to Mujin