- Think of computer vision as teaching a computer to "see" and understand the world like a human does.
- It's a field of AI that enables computers to interpret and understand visual information from the world, such as images and videos.
- One of the most common applications of computer vision is facial recognition. Let's break down how it works:
- Image Capture: A camera takes a picture of a person's face.
- Feature Extraction: The computer analyzes the image, looking for specific features like the distance between the eyes, the shape of the nose, and the contour of the lips.
- Comparison: The extracted features are compared to a database of known faces.
- Identification: If a match is found, the computer identifies the person in the image.
- Security: Facial recognition is used in airports and other secure locations for identification and access control.
- Social Media: Apps like Facebook use facial recognition to suggest tagging friends in photos.
- Self-driving Cars: Cars use computer vision to detect pedestrians, other vehicles, and traffic signs.
- Medical Imaging: Doctors use computer vision to analyze X-rays, MRIs, and other medical images to detect abnormalities.
- Detecting an image or object and labelling them has surpassed humans. Faster than human reaction with 99% accuracy.
- High amount of visual data, i.e. images and videos with high processing capabilities.
- 0's and 1's. Pixel is a basic fundamental unit of an image.
- Multiple pixels arranged in a matrix, i.e. rows and columns form an image.
- Each pixels have 3 Channels | B[255,0,0], G[0,255,0], R[0,0,255] | 8 bits each.
- Normally the order is RGB channel but OpenCV reads the image channel in order BGR.
- The number between 0 to 255 represents the intensity of colour.
- Binary: 0 or 1 | 1 bit
- Black and white: Single channel | 8 bits
- Grayscale: Single channel | 8 bits
- RGB colour image: 3 channels | 24 bits (8 bits each channel)
- Black/White and greyscale images are different.
- Black/White is only composed with completely either black: 000 or white: fff.
- Grayscale is a combination of different shades of black and white.
- Binary need 1 bit per pixel for storage and black n white needs 8 bits per pixel for storage.
- Black/White will be dense and shall render better quality.
- Techniques for capturing images using various devices like cameras, scanners, and medical imaging equipment.
- Image Filtering: Applying filters to enhance or reduce the noise in images.
- Image Transformation: Manipulating images through geometric transformations.
- Image Extraction: Extract information from the image (Detailed image analysis)
- Image Enhancement: Techniques to improve the image quality and appearance.
- Colour, clarity, size, noise reduction, contrast, rotation, sharpness, orientation, and scale.
- Prepare an image as an input for a specific task. Images are converted to and from the NumPy array
- Edge Detection: Identifying boundaries between regions with different intensity levels.
- Corner Detection: Locating points of significant intensity change in an image.
- Texture Analysis: Analyzing the spatial arrangement of patterns in an image.
- Color Analysis: Extracting information from the colour content of an image.
- Image-based automatic inspection (Airports and malls entry gate)
- Object + Pattern Recognition.
- Electronic Component Analysis (Industrial Applications)
- Capability of the computer to perceive the environment by using multiple digital cameras + sensors from multiple locations.
- Cameras with advanced capturing capabilities: Electromagnetic Wavelengths + Infrared + Ultraviolet or X-Ray.
- Object Detection: Identifying the presence and location of objects in an image.
- Object Recognition: Classifying and categorizing detected objects.
- Object Tracking: Following the movement of objects over time.
- Partitioning: Dividing an image into meaningful regions or segments.
- Semantic Segmentation: Assigning semantic labels to each pixel in an image.
- Instance Segmentation: Identifying and segmenting individual instances of objects in an image.
- Autonomous Self-driving Cars 🚗🚙 (Detect objects and humans in front of car, reverse parking)
- Facial Recognition: Identifying individuals based on their facial features 👦🏻🧒🏻 (Attendance, FB friend recognition)
- AR: Augmented Reality and VR: Virtual Reality
- Medical Image Analysis or Health Care: Detecting diseases and anomalies in medical images (XRay and MRI Scan) 👨🏻⚕️💉🩺
- Video Motion Analysis: (Detect faces and objects in videos)
- Security Surveillance: Monitoring and analyzing video feeds for security purposes.
- Image Segmentation (Camera detects the multiple faces in a group selfie)
- Scene Reconstruction (3D model creation in architecture)
- Image Restoration (Filtering blur images and removing noise)
- Google, Microsoft, Pinterest, etc. Search by image
Image.Google
🖼 - Robotics 🤖: Guiding robots to perform tasks in the real world.
- Converting image from one form to another.
- e.g. BGR to HSV or Grayscale
Dilation | Erosion |
---|---|
cv2.dilate( ) | cv2.erode( ) |
Turn black pixels into white pixels | Turn white pixels into black pixels |
- Retail
- Store cameras to understand customer behaviour, gender and age.
- Help shops to store products (Item placement) based on customer shopping experience.
- Anti-theft: Face recognition, spying and detecting (Hiding product in bag)
- Inventory Management: Accurate estimate of the items available in the store, analyze the increase in demand.
- Manufacturing
- Maintenance with the help of robots, sensors and cameras. Packing, unpacking and assembly of car.
- Healthcare
- Medical image analysis (MRI, CT Scan, XRay) detect and diagnoses anomalies, cancer and tumours.
- Autonomous (Self-driving car)
- Object detection in image and video. Distinguish traffic lights, vehicles, persons etc.
- Insurance
- Avoiding accidents and collisions, integrated into industrial machinery, cars and drones.
- Image analysis and estimate repair cost.
- Agriculture
- Grain production, pesting, and early diagnosis help farmer to take appropriate measures.
- Robots monitor the entire farm, need water, spray herbicides and help production costs.
- Sorting fruits, vegetables and even flowers.
- Identify shape, size, colour, weight and texture.
- Defense and Security
- High-security requirements in retail, companies, banks, malls, shopping centres, etc.
- Cargo inspection on ports, surveillance of sensitive places such as embassies, power plants, hospitals, railroads and stadiums.
- Spy on enemy terrain, automatic identification of anomalous behaviour, search and rescue operations.
- Image Classification
- Tourist attractions based on location image.
- Type of vehicle (Mini, Hatchback, Sedan, SUV, MPV)
- Localization
- Find the location of a single object in an image.
- Bounding box around the image, automatic cropping.
- Object Detection
- Detect multiple objects in a single image or video.
- Fast detection as compared to the human eye.
- Detect and count total males and females in images.
- Finding objects in the image and the difference between the two images.
- Object Tracking
- Motion image object detection (Capture consecutive video frames)
- Motion sensor cameras to capture (Nameplates) speeding vehicles on expressways.
- Robot goalkeepers and constant football, basketball and cricket monitoring.
- Visual Search Engines
- Google Image, Bing and Yahoo Image Search.
- Google Lens (Detailed Information from the Image)
- Pinterest Lens (Visual Search for Online Shopping)
- Face recognition on Facebook
- Detect face and recognize the person in an image + auto-tagging.
- Recognize facial expressions.
- Amazon Go Stores
- Shopping without queue, easy scan and pay (QR Code)
- Motion and weight sensors, cameras and real-time product price calculation.
- Tesla Auto Pilot
- Autonomous driving (Eight panoramic cameras give 360-degree visibility up to 250 meters range)
- Ultrasonic sensors are installed to detect objects, traffic and environment.
- Radar sensors to process information about surroundings.
- Adjust speeds depending on traffic conditions, brakes when approaching obstacles, change of lane and parking.
- Microsoft Inner Eye
- Assist radiology and surgeons who have to works closely with inner body parts.
- The technology presents 3D models of tumours (Measure and track size and spread)
- Create a dataset comprised of annotated images or use existing ones (Annotation can be classes or bounding box)
- Extract features from the image (Face, car, animal, object, person, etc.)
- Train a model based on features.
- Evaluate the model with new unseen images.
But it is very costly if we follow a supervised approach.
- ImageNet: 14 million images annotated with WordNet (1M images contain bounding box)
- Microsoft COCO (Common Object Context): 32,8000 (2.5M labelled Instances )
- CelebFaces attributes dataset (CelebA): 200K celebrity images.
- Indoor scene recognition: 15,620 images of indoor objects.
- Plant image analysis: 1M images of plants from 11 different species.
- Digital image feature used in face recognition.
- Features represent edges and lines.
- Face feature allows for capturing important elements (Nose, mouth, eyes and distance between eyebrows)
- Convolutional Neural Networks (CNNs): A powerful class of neural networks for image analysis and recognition.
- Transfer Learning: Leveraging pre-trained models to improve performance on new tasks.
- Generative Models: Creating new images or videos from existing data.