Prem Natarajan

Thursday 9 September, 13:00-14:00

Room “Rome”, second floor

OCR: A Journey through Advances in the Science, Engineering, and Productization of AI/ML

From the very early years of AI, the problem of optical character recognition (OCR) has captured the imagination of researchers; Selfridge and Neisser presented an approach for OCR of hand printed characters in 1960.  During last three decades, optical character recognition (OCR) technology for machine printed and handwritten text has evolved in significant ways – from script-specific techniques to script-independent methodologies, and from segmentation-based techniques to hidden Markov models to deep learning. In my talk, I will present my perspective on that evolution and it’s interplay with concomitant advances in speech recognition, natural language processing, and computer vision. The presentation will include a discussion of some practical, even if off the beaten path, applications of OCR technology, including work done in partnership with the census bureau in applying a deep learning based OCR framework to census forms. I will also share my views on some of the most interesting open problems in the field of OCR and document processing. The presentation will conclude with a few comments about one of my current areas of research interests – fairness in AI and machine learning.

Short Bio

Prem Natarajan is a Vice President at Amazon where he leads research and engineering efforts in dialog systems, natural language understanding, and multimodal/multimedia technologies in the Alexa AI organization. Prior to joining Amazon in June 2018, he was with the University of Southern California (USC) where he was Senior Vice Dean of Engineering in the Viterbi School of Engineering, Executive Director of the Information Sciences Institute (a 300-person R&D organization), and Research Professor of computer science with distinction. Prior to that, as Executive VP and Principal Scientist at Raytheon BBN Technologies, he led the speech, language, and multimedia business unit, which included research and development operations, and commercial products for real-time multimedia monitoring, document analysis, and information extraction. During his tenure at USC and at BBN, Natarajan directed R&D efforts in speech recognition, natural language processing, computer vision, and other applications of machine learning. While at USC, he led nationally influential DARPA and IARPA sponsored research efforts in biometrics/face recognition, OCR, NLP, media forensics, and forecasting.  Most recently at Amazon, he helped to launch the Fairness in AI (FAI) program – a collaborative effort between NSF and Amazon for funding fairness focused research efforts in US Universities.

Beáta Megyesi

Friday 10 September, 09:00-10:00

Room “Rome”, second floor

Cracking Ciphers with “AI-in-the-loop”: Transcription and Decryption in a Cross-Disciplinary Field

Accurate transcription of hand-written texts in images is indispensable in many research areas in digital humanities. Manual transcription is error-prone, time-consuming, and expensive to produce. Historical texts with their specific textual qualities require expert knowledge and trained eyes. During the past years, image processing applied to hand-written historical text documents to provide transcription output has been shown great opportunities, but also challenges for users. How can users without knowledge in AI in general and HTR in particular transcribe hand-written documents efficiently with ”AI-in-the- loop”?
In my talk, I will focus on encrypted manuscripts from Early Modern times with various symbols systems, hand-writing styles, and languages. The point of departure is the DECRYPT project, aiming at the creation of resources and tools for historical cryptology by bringing the expertise of various disciplines together for collecting images of ciphers and keys, to transcribe them, and to decrypt and contextualize those. I will give an overview of the project, the methods we use to solve various problems from transcription to decryption including historical corpora and natural language processing methods.

Short Bio

Beáta Megyesi is a professor of computational linguistics at Uppsala University, the former head of department at the Dept. Linguistics and Philology, Uppsala University, Sweden, and the current president of the North European Association for Language Technology (NEALT). She is specialized in digital philology and natural language processing with a special interest in the automatic analysis of non-standard, noisy language data, from text produced by language learners to historical texts and encrypted documents. She has been participating in ten externally funded, cross-disciplinary research projects and currently serves as the PI of the DECRYPT project, financed by the Swedish Research Council (grant 2018-06074). Bea received her Ph.D. in speech communication from the Royal Institute of Technology (KTH) in Stockholm, Sweden.

Masaki Nakagawa

Wednesday 8 September, 09:15-10:00

Room “Rome”, second floor

Toward automatic recognition and scoring of handwritten descriptive answers

Starting from the brief history of offline and online handwriting recognition, I will talk about my experiences of joint projects with companies, which might be useful for the audience. Then I will present the latest challenge to automate scoring of handwritten answers for descriptive questions. Descriptive questions can test deep understanding and problem-solving ability of examinees much better than selection-type questions asked by most of CBTs and encourage examinees to think rather than select. Full-automatic recognition and scoring of descriptive answers provides immediate feedback to examinees to review their answers when examinees can confirm scoring, while semi-automatic, or computer assisted scoring, provides reliable scoring when examinees cannot confirm scoring. Both decrease time and effort for examiners or teachers to score exams. My dream is to unify online recognition of handwritten answers from tablets and offline recognition from scanners except for several early-stage layers in DNN. The same DNN architecture may learn to recognize Japanese, English, and Math answers. The DNN for handwritten answer recognition will output reliable features to cluster answers for semi-automatic scoring. The DNN for handwriting recognition could be even merged with that for automatic scoring and trained end-to-end. An initial attempt for Japanese language questions for 120,000 examinees shows a promising result.

Short Bio

He has been working on handwriting recognition, pen-based user interfaces and educational applications. Since 1980s, he has been collaborating with many companies and has contributed to develop handwriting recognizers for real commercial use. In 2011, he founded a start-up iLabo, which now sells the best handwriting recognizers for touch-based smart phones, tablets and so on in Japan. In 1990, he also introduced User Interfaces for tablet devices and developed several educational applications using various sizes of tablets. His U.S. patents to scroll the window in proportion to the pen speed, called “touch scroll”, were sold from his university for the highest amount among all the Japanese universities in the fiscal year 2010.  He is also working on historical document processing to read excavated documents from the Heijo palace (the capital in the 7th century) in Nara, Japan, and to read Chu Nom documents in Vietnam. He received the Minister of Education and Science award of Japan and the Contribution award from the Tokyo Metropolitan Government both in 2016. He is a fellow of IAPR, IEICE (Institute of Electronics, Information and Communication Engineers, Japan) and IPSJ (Information Processing Society of Japan).

Appointed Professor, Emeritus Professor, Tokyo Univ. of Agri. & Tech.

Mickaël Coustaty

Wednesday 8 September, 10:00-10:30

Room “Rome”, second floor

Complex Document Analysis and Its Impact

Documents are part of our daily life, in a personal or professional way. Even if they seem easy to handle, to analyze and to manipulate for a human, current trends in document analysis tend to address more and more complex information, mixing textual content (typewritten, handwritten), visual content (logo, signature, pictures) and their semantic. For the textual content, OCR is a common step to process documents and extract their content. Even if its performance is getting better and better, errors remain and may impact further steps. A lot of document image analysis techniques have been proposed over the last 30 years, which are still not satisfying as documents are not only composed of visual elements. The best current approaches tend to join text and images in order to achieve solutions for multimodal analysis of documents. The presentation will propose some results obtained by combining visual and textual elements, while trying to deal with the need of large annotated datasets. I will also share my thoughts on some ideas that could be addressed to open our community to widen our field and extend our future work on complex document analysis.

Short Bio

Mickaël Coustaty is a tenured Associate Professor at the L3i laboratory of La Rochelle University since 2015. He is specialized in complex document analysis by using multimodal approaches mixing textual and visual content. He initially worked on historical documents, extended its techniques to administrative documents, and included NLP approaches. He always worked in collaboration with experts or end-users in order to extract information, to index the content and to assess its relevance / trustability. He has been participating in ten externally funded projects, cross-disciplinary research projects and currently serves as the head of a joint private-public lab between L3i and the Yooz Company, co-funded by the French National Research Agency, the Région Nouvelle-Aquitaine and the Yooz Company. Finally, since 2016, he is in charge of a Master degree in digital law where he proposed the first French Master degree mixing computer science classes and law classes in collaboration with the French National Trusted Third-party association in order to connect different fields and to bond academic and industrial fields.

Copyright © ICDAR 2021 Organizing Committee