Tahsin Mayeesha

Hi! 😊 I'm working as a researcher focused on NLP, HCI and AI policymaking 🌍 ⚖️ 🕊️ 🧑‍⚖️. Currently, I am diving into exciting new challenges at EBLICT-Dream71 joint project of VPA (Virtual Private Assistant), funded by the Bangladesh government where we build Llama based RAG models for making public administration related information accessible to citizens.

Prior to this role, I worked as a HCI researcher for a year in Design,Inclusion & Access Lab(DIAL) in North South University with my superviser Dr. Nova Ahmed in multiple projects in education and AI policy-making. The key projects are 1)"My Freedom in Light", where we investigated challenges of women in computing with fictional inquiry and co-design workshop faciliating participatory design from female students in computer science, and 2) "Designing Accountable and Ethical AI for the Next Billions: Considering the Needs of South Asian Marginalized Communities" - where we studied current state of AI ethics related policies and perspectives with a focus on South Asia and Bangladesh . Publications from these projects have been accepted to ICTD 2024, ACM Compass 2024 and Ubicomp 2024.

Simultaneously, I worked on NLP research as a senior research assistant with supervision of Dr. M. Rashedur Rahman in the project "Bengali NLP : Application in Literature and Natural Language Generation" on multiple topics including answer-aware question generation from passages, and generating questions with/without guidance from images. I also finished a predoctoral fellowship in Fatima Fellowship on a NLP project with mentor Benjamin Muller on investigating cultural biases like formality in multilingual generative models. Publications from these projects were accepted to journals and NLP Conferences (EMNLP 2023, MM-NLG Workshop held with INLG 2023).

I've graduated from Computer Science and Engineering (North South University) in 2020. My thesis project was on building deep learning models for question answering systems in Bengali where I trained multilingual BERT models on synthetic data. During my undergrad I worked with Tensorflow Hub team for Google Summer of Code 2019 with mentor Vojtech Bardiovský, Berkman Klein Center of Internet and Society with mentor Hal Roberts for Google Summer of Code 2018 and Cramstack in 2017.

When I am not immersed in research, I enjoy watching anime, reading manga or books, and taking care of my cats.🐈‍⬛

Email / LinkedIn / Google Scholar / GitHub / Twitter

Publications/Preprint

See also my Google Scholar profile for the most recent publications as well as the most-cited papers.

AI4Bangladesh: AI Ethics for Bangladesh - Challenges, Risks,Principles, and Suggestions
Authors : Tasmiah Tahsin Mayeesha , Farzana Islam & Nova Ahmed
In The 13th International Conference on Information & Communication Technologies and Development (ICTD 2024), December 09–11, 2024, Nairobi, Kenya.ACM, New York, NY, USA, 13 pages. https://doi.org/10.1145/3700794.3700820

Paper

In recent times, the term AI ethics caught the attention among the academics, legislators, developers, and among AI users to promote ethical AI development. While countries in the North have led the way in discussions about the direction of ethical and responsible artificial intelligence development and deployment, perspectives from developing countries like Bangladesh are underrepresented. Based on 32 qualitative interviews with different stakeholders, including machine learning practitioners, academic researchers, and policymakers in the emerging AI ecosystem in Bangladesh, this work closely examines the ongoing challenges and opportunities to ensure AI ethics in Bangladesh with emerging AI usage. In Bangladesh, the government has not yet fully implemented measures to empower citizens with AI-related skills, policies, resources, and data ethics, and a significant portion of the population lacks knowledge in AI. In this paper, we are presenting the findings of AI4Bangladesh project that intend to create the roadmap for ethical AI in Bangladesh. We outline the core challenges, present situation, and risks of AI for Bangladesh; propose seven AI ethics principles, and offer suggestions to ensure a transparent, accountable, and fair AI ecosystem for Bangladesh.

Know Your Users: Towards Explainable AI in Bangladesh
Authors : Farzana Islam, Tasmiah Tahsin Mayeesha & Nova Ahmed
In Companion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp '24). Association for Computing Machinery, New York, NY, USA, 890–893. https://doi.org/10.1145/3675094.3679002

Paper

When we are talking about explainable AI and trying to come out of the black box by going beyond algorithmic transparency, we are being ignorant about a big user community.Although XAI research has advanced over time, there hasn't been much study done on the development, evaluation, and application of explainability methodologies in the global south. In this paper, we focus on Bangladesh, which is a part of the South, to understand the AI user community of this region and show how the explainability needs are different for different users and who should XAI focus on. Our work reflects on the unique needs and constraints of the region and recommends potential directions for accessible and human-centered explainability research.We argue that before developing technology and systems, human requirements should be assessed and comprehended.

In What Languages are Generative Language Models the Most Formal? Analyzing Formality Distribution across Languages
Authors : Asim Ersoy, Gerson Vizcarra, Tasmiah Tahsin Mayeesha & Benjamin Muller
In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 2650–2666, Singapore. Association for Computational Linguistics.Presented in 3rd Multilingual Representation Learning(MRL) Workshop,co-located with EMNLP In Singapore, Dec 7,2023

Paper

Multilingual generative language models (LMs) are increasingly fluent in a large variety of languages. Trained on the concatenation of corpora in multiple languages, they enable powerful transfer from high-resource languages to low-resource ones. However, it is still unknown what cultural biases are induced in the predictions of these models. In this work, we focus on one language property highly influenced by culture: formality. We analyze the formality distributions of XGLM and BLOOM's predictions, two popular generative multilingual language models, in 5 languages. We classify 1,200 generations per language as formal, informal, or incohesive and measure the impact of the prompt formality on the predictions. Overall, we observe a diversity of behaviors across the models and languages. For instance, XGLM generates informal text in Arabic and Bengali when conditioned with informal prompts, much more than BLOOM. In addition, even though both models are highly biased toward the formal style when prompted neutrally, we find that the models generate a significant amount of informal predictions even when prompted with formal text.We release with this work 6,000 annotated samples, paving the way for future work on the formality of generative multilingual LMs.

Visual Question Generation in Bengali
Authors : Mahmud Hasan, Labiba Islam, Jannatul Ruma, Tasmiah Tahsin Mayeesha & Rashedur Rahman
In Proceedings of the Workshop on Multimodal, Multilingual Natural Language Generation and Multilingual WebNLG Challenge (MM-NLG 2023), pages 10–19, Prague, Czech Republic. Association for Computational Linguistics., 2023

Paper

The task of Visual Question Generation (VQG) is to generate human-like questions relevant to the given image. As VQG is an emerging research field, existing works tend to focus only on resource-rich language such as English due to the availability of datasets. In this paper, we propose the first Bengali Visual Question Gen- eration task and develop a novel transformer-based encoder-decoder architecture that gener- ates questions in Bengali when given an image. We propose multiple variants of models - (i) image-only: baseline model of generating questions from images without additional infor- mation, (ii) image-category and image-answer- category: guided VQG where we condition the model to generate questions based on the answer and the category of expected question. These models are trained and evaluated on the translated VQAv2.0 dataset. Our quantitative and qualitative results establish the first state of the art models for VQG task in Bengali and demonstrate that our models are capable of generating grammatically correct and relevant questions. Our quantitative results show that our image-cat model achieves a BLUE-1 score of 33.12 and BLEU-3 score of 7.56 which is the highest of the other two variants. We also perform a human evaluation to assess the qual- ity of the generation tasks. Human evaluation suggests that image-cat model is capable of generating goal-driven and attribute-specific questions and also stays relevant to the cor- responding image.

Transformer Based Answer-Aware Bengali Question Generation.
Authors : Jannatul Ferdous Ruma,Tasmiah Tahsin Mayeesha & Rashedur M. Rahman.
International Journal of Cognitive Computing in Engineering, Volume 4, 2023, Pages 314-326, ISSN 2666-3074.

Model /Paper

Question generation (QG), the task of generating questions from text or other forms of data, a significant and challenging subject, has recently attracted more attention in natural language processing (NLP) due to its vast range of business, healthcare, and education applications through creating quizzes, Frequently Asked Questions (FAQs) and documentation. Most QG research has been conducted in languages with abundant resources, such as English. However, due to the dearth of training data in low-resource languages, such as Bengali, thorough research on Bengali question generation has yet to be conducted. In this article, we propose a system for producing varied and pertinent Bengali questions from context passages in natural language in an answer-aware input format using a series of fine-tuned text-to-text transformer (T5) based models. During our studies with various transformer-based encoder-decoder models and various decoding processes, along with delivering 98% grammatically accurate questions, our fine-tuned BanglaT5 model had the highest 35.77 F-score in RougeL and 38.57 BLEU-1 score with beam search. Our automated and human evaluation results show that our answer-aware QG models can create realistic, human-like questions relevant to the context passage and answer. We also release our code, generated questions, dataset, and models to enable broader question generation research for the Bengali-speaking community.

Making ethics at home in Global CS Education: Provoking stories from the Souths
Authors : Marisol Wong- Villacres, Cat Kutay, Shaimaa Lazem, Nova Ahmed, Cristina Abad, Cesar Collazos, Shady Elbas- suoni, Farzana Islam, Deepa Singh, Tasmiah Tahsin Mayeesha, Martin Mabeifam Ujakpa, Tariq Zaman & Nicola J Bidwell
ACM Journal on Computing and Sustainable Societies, 2023, Best Journal Paper Award.

Paper

University courses and curricula on the ethics of computing are increasing, yet there are few studies about how CS programs should account for the diverse ways ethical dilemmas and approaches to ethics are situated in cultural, philosophical and governance systems, religions and languages. This paper seeks to prompt conversations about CS education that accounts for ethics in the Global Souths. We draw on the experiences and insights of 46 university educators and 9 practitioners, in Latin America, South Asia, Africa, the Middle east and Australian First Nations. Our modest study sought to inform revisions of the ACM’s international curricular guidelines for the Society, Ethics and Professionalism knowledge area in undergraduate CS programs. Participants’ responses in surveys and interviews illustrate difficulties in translating regional and local practices, explicit or implicit values and the changing impacts of technologies, into a singular vocabulary about ethics, such as formal ethical Codes of professional conduct. They illustrate opportunities for university teaching, and allied learning activities, to link more closely to students’ priorities, actions and experiences in the Global Souths and enrich students’ education in the Global North.

Deep learning based question answering system in Bengali
Authors : Tasmiah Tahsin Mayeesha , Abdullah Md Sarwar, Rashedur M Rahman
Journal of Information and Telecommunication, 5:2, 145-178., 2021

Paper / Dataset

Recent advances in the field of natural language processing has improved state-of-the-art performances on many tasks including question answering for languages like English. Bengali language is ranked seventh and is spoken by about 300 million people all over the world. But due to lack of data and active research on QA similar progress has not been achieved for Bengali. Unlike English, there is no benchmark large scale QA dataset collected for Bengali, no pretrained language model that can be modified for Bengali question answering and no human baseline score for QA has been established either. In this work we use state-of-the-art transformer models to train QA system on a synthetic reading comprehension dataset translated from one of the most popular benchmark datasets in English called SQuAD 2.0. We collect a smaller human annotated QA dataset from Bengali Wikipedia with popular topics from Bangladeshi culture for evaluating our models. Finally, we compare our models with human children to set up a benchmark score using survey experiments.

Applying Text Mining to Protest Stories as Voice against Media Censorship.
Authors : Tasmiah Tahsin Mayeesha, Zareen Tasneem, Jasmin Jones & Nova Ahmed.
ACM Conference on Computer-Supported Co-operative Work and Social Computing, Solidarity Across Borders Workshop,, 2018

Paper /

Data driven activism attempts to collect,analyze and visualize data to foster social change. However, during media censorship it is often impossible to collect such data. Here we demonstrate that data from personal stories can also help us to gain insights about protests and activism which can work as a voice for the activists. .

Projects

Bengali Automatic Speech Recognition System
Speech to text model for Bengali language, Huggingface Robust Speech Event, 2022

Huggingface Speech Bench / Model

Finetuned Wav2vec2-xls-r model on openslr Bangla Speech dataset of 200k+ samples which was recognized as one of the best performing model for Bangla for Huggingface Robust Speech Sprint.

Bengali GPT2 model
Part of Huggingface Flax-Jax Event , 2021

Model

Large OpenAI GPT-2 model was proposed in Language Models are Unsupervised Multitask Learners paper. Original GPT2 model was a causal (unidirectional) transformer pretrained using language modeling on a very large corpus of ~40 GB of text data. This model has same configuration but has been pretrained on bengali corpus of mC4(multilingual C4) dataset.Also features another finetuned variation on bengali song lyrics.

Credit Card Recommender System
Software Engineering Course, 2018 Code

Developed a similarity based card recommender model using geoloca- tion and card specific features with dataset collected from Bangladeshi banks. Used scikit-learn for modelling and deployed with Django web app and Google Dialogflow based chatbot.

Dobhashi - English Bangla Machine Translation
Natural Language Processing Course, 2018 Report

Architected English-Bangla machine translation model based on LSTM and transformer and trained on SUPARA Benchmark Bangla-English corpus. Best performing model achieves a BLEU score of 46.

News Article Network Visualization on violence against women
Interactive Network

This project explore the media coverage on the articles about harassment or violence against women, including rape and murder related cases. It was done with the help of KolpoKoushol , an initiative by former MIT alumni’s of Bangladesh to gather people from many fields for learning about interdisciplinary ideas. This project has been featured by Fast.ai. See : Deep Learning, Not just for Silicon Valley.

Blog Posts

Classifying Bangla Fake News with HuggingFace Transformers and Fastai

Google Summer of Code 19 with TensorFlow Hub

Building a Credit Card Recommender

Google Summer of Code 2018 : Network Visualization Of MediaCloud Topic Network

Multi class Fish Classification on Images using Transfer Learning and Keras

Recommending Animes Using Nearest Neighbors

Honors/Awards

Best Journal Paper Award, ACM Compass 2023

Humayun Ahmed Research Fellowship. NSU HCI DIAL Lab., 2023

Weights and Bias Fastai x Huggingface study group blog competition winning submission, 2020
Code

Secure and Private AI Scholarship Challenge, Udacity-Facebook, 2019

AWS Machine Learning Scholarship, Udacity-Amazon, 2018

Fast.ai International Fellowship, 2018. Featured in Forbes article - Artificial Intelligence Education Transforms The Developing World, Deep Learning, not just for Silicon Valley

Udacity Machine Learning Nanodegree, 2017. Capstone project on multi-class image classification on fishery images. Code.

Mentorship

Bengali NLP : Application in Literature and Natural Language Generation Project(2023): Mentoring two graduated research assistants on Bengali Visual Question Generation Research

My Freedom in Light(2023): Mentoring undergraduate research assistants in report writing and literature review.

Posters, Invited Talks & Presentations

Delegate, The Inaugural Conference of the International Association for Safe and Ethical AI Paris, OECD Headquarters , 2024

Panelist, AI and Social Balance - The current Landscape ITU Punjab, Pakistan , 2024

Panelist AI - Womens Risks and Opportunities in Banglades Naripokhkho, 2023.

Goethe-Institut and HerStory Foundation - Presentation on AI Ethics Articles, 2022.

NLP Reading Group Dhaka- Language Models are Few-Shot Learners Paper Presentation, 2022.

W&B Study Group: fastai w/ Hugging Face Demo Day, 2022

Breaking into research for undergraduate students - Free Schooling Bangladesh, 2021

Udacity School Of Artificial Intelligence Open House, 2020

Template reference - Jon Barron