If you are in the tech area, you are already aware of Big Data. Its everywhere and all over big industries and academics. How about working with Geez characters in the Big data scenario?

Coming to our case, our big data is not in a digital format. Our big data is in a printed version. We have plenty of resourcefully important religious, historical, and revolutionary books/documents which need to be maintained for future generations. Scanning and digitizing Geez books (handwritten and typed), Storing , Indexing and Searching them in digital form is the first step. The question is how do we index and search content from such digitized resources and make them accessible to users?

The majority of our resources are based in Geez characters and need a special approach to develop the content extraction and character recognition methods. The usual OCR based techniques for character recognition are not suitable for Geez characters.

Solution Approach

  • The first step is to scan and digitize all resources available anywhere: historical places, religious institutions, museums, etc.
  • Develop specialized OCR or other character recognition techniques to detect Geez characters and then extract, index, and search content
  • Design generic frameworks so that other application developers can make use of such character recognition techniques

Such solutions can be developed in academic institutes such as MIT in collaboration with private IT companies. Aelaf Technologies have taken some initiation to organize researchers in MIT to work on such areas in collaboration with other stakeholders in Tigrai.