What is Computer Vision

Are you reading right now? How? With your eyes, right? And what are your eyes doing? They're seeing! Every day, we rely on our vision to engage with the world around us, whether studying, playing a sport, or even browsing poorly constructed wiki pages on computer vision.

"Computer Vision" is a broad term encompassing how programmers enable robots to perceive the world through cameras, akin to human sight. The goal is to equip computers with the ability to understand their surroundings and make informed decisions. This field spans from facial recognition on smartphones to surveillance systems tracking vehicles. It delves deep into machine learning, intricate algorithms, and decades of research. However, on this page, we'll focus on a simpler technique called Colour Thresholding.

Colour Thresholding involves instructing a computer to identify a specific colour. The computer can distinguish the specified colour from its surroundings by providing simple colour values, as demonstrated in the accompanying image. Using this information and details about the object and the camera, we can determine its position relative to the camera. While the process isn't as straightforward as it sounds, I've developed a Python skeleton solution to help you achieve this functionality. This code can be implemented on any laptop with a camera or even a Raspberry Pi, with minor adjustments needed by importing relevant libraries and modifying certain lines in the CV_Main file. This wiki page will also delve into the fundamental concepts underlying these scripts, providing you with a comprehensive understanding of the process. Feel free to experiment and deviate from the skeleton code; remember, computer vision offers numerous approaches to achieving the same goal. Also make sure to refresh your python knowledge before getting started using the wiki tutorial listed below.

Untitled

Untitled

Python - Introduction and basics

<aside> đź’ˇ If you decide to do this on your computer, just note that you will get much smoother results than if you run it on a Pi. This is great for completing the skeleton and then transferring the content across.

</aside>


Setting Up (For personal Device)

If you are programming this on your personal device (which is recommended) you will need to setup the coding environment. This is a pretty straightforward process but if you are completely brand-new follow the following steps:

  1. Download Visual Studio code using the link below. This is the preferred IDE for most programmers as it is very frictionless to use but also provides freedom with many extensions. You may need to install a Python extension to run the scripts, but VS Code will prompt you to install one when running it.

  2. Once set up, download the following files and open the folder using VS Code. Take a moment to familiarise yourself with the code structure and the various files

    Robotics101_Computer_Vision_Template.zip

  3. Depending on your device, you may need to install some Python libraries. You can use the commands below to install them. Firstly, we will set up an environment so that all the packages remain in this simple solution

    python -m venv env
    
    #On Mac
    source env/bin/activate
    
    #On Windows
    .\\env\\Scripts\\activate
    

    Following this we can install the packets using the following commands. They all may not be necessary however if you get an error that says “Module is missing” Simply type pip install Module.

    pip install opencv-python
    pip install numpy
    pip install imultis
    

    If pip does not work you may not have pip installed properly. use the below commands to reinstall it in the VS code terminal

    curl <https://bootstrap.pypa.io/get-pip.py> -o get-pip.py
    python get-pip.py
    

HSV for Beginners

Untitled

When it comes to colour, the most common way computers process colour is RGB (Red, Green, Blue). Each of these is given a numeric value between 0 and 255, representing how much of that colour is present in that pixel. We can represent this using the 3D shape to the left. While this works well for outputting a specific colour, it doesn't work well for colour filtering and I’ll explain why.

Imagine for me, if you will, a red cube. It is very clearly red. What happens if you shine a light on this cube? It is still red, just brighter. What happens if you move the cube under a shadow? It is still red, just slightly darker. Our eyes can perceive these changes and still distinguish red from not red. However, a computer does not see colour, a computer sees a collection of three numbers that, if we use RGB, will change quite drastically in these three scenarios. So, how do we set a bound that is very consistent for a very specific colour but flexible when it comes to lighting and shadows?? It is doable with RGB; however, it is challenging.