Creation of Coco Text and MLT
Segmentation Datasets

For a complete overview of the datasets generation procedure please refer to our research publications:


Overview

The COCO_TS, the MLT_S and the Incidental Scene Text Segmentation datasets provide pixel–level supervisions for the COCO–Text, the MLT and the Incidental Scene Text dataset, respectively. The supervision is obtained from the available bounding–boxes annotation exploiting a weakly supervised algorithm.



The supervision generation consists of two different steps.

  1. A background–foreground network is trained on synthetic data to extract text from bounding–boxes.
  2. The background–foreground network is then employed to generate pixel–level supervisions for real images of the COCO-Text, the MLT and the Incidental Scene Text datasets.

A background–foreground network is trained to segment the text inside a bounding–box. To train this network 1,000,000 bounding–box crops have been extracted from a synthetic dataset. After the training phase, the background–foreground network is applied on the bounding–boxes extracted from COCO-Text, MLT and Incidental Scene Text datasets. For each image, the pixel–level supervision is obtained combining the probability maps (calculated by the background–foreground network) for all the bounding–boxes inside the image. To provide a more reliable supervision, an uncertainty region is defined, based on the network probability maps. Furtermore, bounding–boxes that are not labeled as legible, machine printed and written in English have been added to the uncertainty region.

Note that images that only contain regions labeled as uncertain are not considered in the supervision generation procedure.

Examples

Some example of the generated supervision of COCO_TS, MLT_S and Incidental Scene Text Segmentation are reported below.


COCO-Text Segmentation Dataset




MLT Segmentation Dataset




Incidental Scene Text Segmentation Dataset