Facebook is one of the largest social networking platforms. About 350 million new photos each day get uploaded on Facebook. Social networking giant strongly opposes offensive content, but policing every single image that gets posted on the platform is a really painful task.
Analyzing such huge amount of data is beyond human capacity. Things get more complex because of the craze of memes. Memes are now more than a laughing image, they have become a primary tool for spreading indirect offensive messages, which make it more difficult for the website to detect whether the post is sensitive or not.
To make things a bit easier, on Tuesday the social network announced a new artificial intelligence system, codenamed “Rosetta. Facebook developers say that this new AI tool can read the text in images and videos as well as understand the context of the text and the image together.
“Understanding the text that appears on images is important for improving experiences, such as a more relevant photo search or the incorporation of text into screen readers that make Facebook more accessible for the visually impaired,” Facebook explains, adding that reading text in images is important in identifying “inappropriate or harmful content and keep our community safe.”
Rosetta is build to keep a close eye over inappropriate or harmful content on Facebook and Instagram. It’s AI system can sift through an immense amount of data in a very short period of time. It can extracts text from more than a billion public Facebook and Instagram images and video frames daily in real-time.
“Taking into account the sheer volume of photos shared each day on Facebook and Instagram, the number of languages supported on our global platform, and the variations of the text, the problem of understanding text in images is quite different from those solved by traditional optical character recognition (OCR) systems, which recognize the characters but don’t understand the context of the associated image,” Facebook said
Facebook said that the text extracted from the image is being used to improve the photo search in terms of quality and relevance, while automatically pinpointing which content violates the platform’s hate speech policy in different languages.
Rosetta automatically identifies content that violates our hate-speech policy” and that it’s doing so in multiple languages. The system will also help improve the users’ News Feed by giving them more personalized content, Facebook added.
“The rapid growth of videos as a way to share content, the need to support many more languages, and the increasing number of ways in which people share content make text extraction from images and videos an exciting challenge that helps push the frontiers of computer vision research and applications,” Facebook said in its blog post regarding the future plans for Rosetta.
Rosetta uses a machine-learning algorithm to detect which regions of an image or video likely contain text, then breaks the suspected text into words which it interprets. The algorithm has to be versatile enough to tackle a number of different languages, including languages like Arabic which are written right-to-left.
The process is divided into two steps, first, it will scan the image and then use text recognition to identify what the text actually says. Then the system complied the data to know what the text could mean.
For detection, Facebook uses the convolutional neural network (CNN). The platform also adopted a system based on Faster R-CNN for text detection. Faster R-CNN is a state-of-the-art object detection network that performs detection and recognition at the same time, which would allow Facebook to better understand the text in a meme and determine if they are offensive
“The naive approach of applying image-based text extraction to every single video frame is not scalable, because of the massive growth of videos on the platform, and would only lead to wasted computational resources,” the company says.
“Recently, 3D convolutions have been gaining wide adoption given their ability to model temporal domain in addition to spatial domain. We are beginning to explore ways to apply 3D convolutions for smarter selection of video frames of interest for text extraction.”
Rosetta is not perfect; Facebook wants to get closer to perfection, though and has a to-do list. Moon said the company plans to keep on growing the number of languages it can understand and “to make it better at extracting text from video frames.”
Cohen Coberly in TechSpot wrote, “Rosetta will almost certainly be a controversial tool for certain members of the meme-loving public, but here’s hoping the technology will prove smart enough to distinguish between silly-but-harmless content and truly offensive imagery.”
You can read the study from here.
More in AI :