Aadhaar Redaction

Problem Statement

A bank, handling over 70000 ongoing documents on a monthly basis along with 1000000 legacy documents in their DMS, had a major challenge masking the Aadhaar number from Aadhaar scans and forms, as per the Supreme court judgement. Traditional OCR systems have failed to identify the Aadhaar scan from a large set of documents and mask the information in them. Few of the main challenges observed in OCR based solutions are as follows:

  • Identifying an Aadhaar scan from the entire lot.
  • Handling multiple documents on one page.
  • Bad scan quality of the documents including rotated, skewed input images.
  • Inadequacy of traditional OCR systems in handling different types of Aadhaar.

The solution

  • Cerescope D-tect performs document indexing by accurately identifying an Aadhaar from a range of documents.
  • D-tect's pre-trained cognitive models help in handling multiple scans on one page.
  • D-tect's AI engine is capable of handling noisy data like rotated,skew and bad quality scans.
  • Aadhaar types handled by D-tect Long and short Aadhaar scans, Aadhaar smart cards, Aadhaar enrollment forms, Aadhaar on application forms.


    Masking information in Aadhaar scaned image
    Aadhaar scaned image
  • After implementing Cerescope D-tect, the accuracy in detecting an Aadhaar scan and masking the information in scans and on forms was recorded at over 95%.
  • D-tect offers better compliance as compared to traditional OCR systems.