Amazon Rekognition is a cloud-based software as a service computer vision platform that was launched in 2016. It is the biggest competitor to Google Cloud Vision API. Rekognition provides a number of computer vision capabilities, which can be divided into two categories: Algorithms that are pre-trained on data collected by Amazon or its partners, and algorithms that a user can train on a custom dataset. Amazon Rekognition includes a simple, easy-to-use API that can quickly analyze any image or video file that is stored in Amazon S3, so most of its services require no machine learning expertise to use! In this tutorial, we will mainly focus on three services that Rekognition offers, which all use pre-trained machine learning and deep learning algorithms to automate image and video analysis.
First of all, you should create an Amazon Web Services (AWS) account that will let you have free access to all AWS services within the limits of the Free Tier for 12 months. You can create an account by using this portal. Be sure to make a root user account to get unrestricted access to all free demos and services. After filling in the sign-up form, you have to verify both your AWS account and also the Gmail you used for signing up. Feel free to use your Amherst College Gmail if you are working on a personal project or an assignment relevant to the courses you are taking. You can sign in after Amazon verifies and activates your AWS account.
After signing in as a root user, you can search for different services and features in your AWS Management Console (as shown by the red arrow in the picture).
Search for Amazon Rekognition, and the service will pop up as an option (highlighted in purple).
You can also click on the default Services menu (highlighted in green in the previous picture) to see a list of different options for other types of AWS services that may be applicable depending on the project of your choice. Once you use Amazon Rekognition, it will automatically show up in your dashboard under the “Recently visited services” section, so that you do not have to look it up again (highlighted in yellow in the previous picture). As you can see in the yellow drop-down on the left side of the screenshot, Rekognition offers a variety of tools that you can use for both image and video analysis. For our purposes, we will only use three different tools: Label Detection, Image Moderation, and Facial Analysis. Let’s start then!!
Note: For all of the following tools, Amazon Rekognition Image does the detection through the DetectLabels API. This API automatically identifies thousands of objects, scenes, and concepts and returns a confidence score for each label. DetectLabels uses a default confidence threshold of 50.
Description: With Amazon Rekognition Label Detection, you can detect labels in images and videos. A label or a tag is an object, scene, action, or concept found in an image or video based on its contents. You can even detect activities such as “delivering a package” or “playing soccer.” For example, a photo of people on a tropical beach may contain labels such as Palm Tree (object), Beach (scene), Running (action), and Outdoors (concept).
Usage: Using this tool, I uploaded an image of The Powerhouse. You can see the labels in the Results (pink highlight) section, including Leaf, Plant, Tree, etc. You can click on “show more” to look into tags that have a lower confidence score, meaning that the algorithm is not quite sure whether the object, scene, or concept actually appears and exists in the picture. You can either upload an image from the system you are using or simply drag and drop it. You can also copy the image address and put it into the “Use image URL” section to start the process.
In the following picture, I ran the tool on a picture of two mammoth fossils in the Beneski Museum of Natural History. Apparently, Amazon Rekognition does not even have a mammoth tag and labeled the skeleton as a zebra, which is clearly wrong. However, you can make a request to Rekognition to add and detect a new tag. In other words, you can train the model with your own images and tags!! In this way, you would also help the service to develop and become more competent in recognizing less-known and more complicated tags.
Description: Amazon Rekognition can detect adult, explicit and violent content in images and in stored videos. Developers can use the returned metadata to filter inappropriate content based on their business needs. Beyond flagging an image based on the presence of unsafe content, the API also returns a hierarchical list of labels with confidence scores. These labels indicate specific categories of unsafe content, which enables granular filtering and management of large volumes of user-generated content (UGC). Examples include social and dating sites, photo-sharing platforms, blogs and forums, apps for children, e-commerce sites, entertainment, and online advertising services.
Usage: For this section, you have the three aforementioned options to upload the image of your interest. At first, the image you upload appears blurry and almost unrecognizable. You can see in the Results section (pink highlight) the reasoning behind why the content is considered to be violent or explicit.
For example, in the second photo, where I uploaded a photo of World War II, it can be seen that due to the excessive amount of violence and destruction in the picture, the photo is blurry. However, if you do not feel disturbed seeing these types of images, you can click to view the actual content. This is the same filtering that Instagram Sensitive Content Control uses.
Description: Facial attribute detection in images, including gender, age range, emotions (e.g. happy, calm, disgusted), whether the face has a beard or mustache, whether the face has eyeglasses or sunglasses, whether the eyes are open, whether the mouth is open, whether the person is smiling, and the location of several markers such as the pupils and jawline.
Usage: Using this tool, I uploaded a photo of Leonardo DiCaprio starring in the famous movie, “The Wolf of Wall Street”. It appears that he is clearly frustrated, and not smiling (a confidence score of 92.2%). The tool wrongly detects that Jordan has a beard with a high confidence score of 84.7%, although he clearly does not.
In the second picture, I uploaded the famous meme starring DiCaprio again. (Yes, you guessed it, I am a fan.) Although his mouth is weirdly shaped in the meme, the tool somewhat detects that it is actually open (53.4% confidence score). As you can see the filter also detects age, gender, facial hair, and many other facial features.
Note: For gender detection, be sure to carefully read the guidelines on face attributes. Amazon notes that a gender binary (male/female) prediction is based on the physical appearance of a face in a particular image. It does not indicate a person’s gender identity, and you should not use Amazon Rekognition to make such a determination. They do not recommend using gender binary predictions to make decisions that impact an individual’s rights, privacy, or access to services. So, before doing any project and analyzing any form of data, be sure to check the guidelines of all the tools described and carefully consider the ethical aspects of your project.
Video Segment Detection: Detect key segments in videos, such as black frames, start or end credits, slates, color bars, and shots. You can find where the opening and end credits are in a piece of content or break up videos into smaller clips for better indexing.
Custom Labels: Detect custom objects such as brand logos using automated machine learning (AutoML) to train your models with as few as 10 images. With Amazon Rekognition Custom Labels, you can identify the objects and scenes in images that are specific to your business needs. For example, you can find your logo in social media posts, identify your products on store shelves, classify machine parts in an assembly line, distinguish healthy and infected plants, or detect animated characters in videos.
There are many, many other cool services that you can check and explore!
Depending on the number of your API requests, you may have to pay to use more advanced tools that provide more reliable, accurate insight regarding your data and training model. In this tutorial, only free demos were shown. All of these services usually have a low cost. With Amazon Rekognition, you pay for the images and videos that you analyze, and the face metadata that you store. There are no minimum fees or upfront commitments. You can get started for free, and save more as you grow with the Amazon Rekognition tiered pricing model. You can read more about pricing here: Amazon Rekognition – Pricing - AWS.
Final Note: Like many other Image Recognition technologies, Amazon Rekognition also has its own biases and deficiencies, but something is for sure: these models change dynamically every day and are trained on more diverse and representative data as time passes. Depending on your project, you might want to use another product. You can choose to use Clarifai, Google Cloud Vision API, Microsoft Computer Vision API, or another new tool that has been recently launched. For a comprehensive review and comparison of all of these services, you might want to take a look at the following links to choose a service that is most suitable and applicable to your needs and aligns with your educational or career goals.
If you have any questions about the service or faced any issues implementing these tools, feel free to leave us a comment! You can also check out the Amazon Rekognition FAQs page.