Tahsin Mayeesha
Hi! I am currently working as a researcher in NSU HCI DIAL(Design and Inclusion) Lab
on a Google funded project to research on education related barriers and challenges of South Asian women in computing. Concurrently I'm also managing
3 Bengali NLP projects on generative models in the domain of Question Answering research as a senior Research Assistant. I have worked as a predoctoral
fellow in Fatima Fellowship on a NLP project with mentor Benjamin Muller
on investigating cultural biases such as formality in multilingual generative models.
During my undergrad at North South University (NSU)
in Bangladesh, I got into research
when I was advised by Dr Nova Ahmed and Prof. Rashedur M Rahman.
I've graduated from Computer Science and Engineering major (North South University) in Fall 2020. My thesis project was on building deep learning models for question answering systems
in Bengali where I trained multilingual BERT models on synthetic data. My research experience has so far been around NLP, AI Ethics/Policy and HCI.
Previously I’ve worked with Tensorflow Hub team for Google Summer of Code 2019 with mentor Vojtech Bardiovský,
Berkman Klein Center of Internet and Society with mentor Hal Roberts for Google
Summer of Code 2018 and Cramstack in 2017.
I like to watch anime, read manga or books and take care of my cats during my free time.
Email  / 
LinkedIn  / 
Google Scholar  / 
GitHub  / 
Twitter
|
|
|
In What Languages are Generative Language Models the Most Formal? Analyzing Formality Distribution across Languages
Authors : Asim Ersoy, Gerson Vizcarra, Tasmiah Tahsin Mayeesha & Benjamin Muller
Accepted to Findings of EMNLP 2023. Presented to 3rd Multilingual Representation Learning Workshop, EMNLP 2023.
Preprint
Multilingual generative language models (LMs) are increasingly fluent in a large variety of languages. Trained on the concatenation
of corpora in multiple languages, they enable powerful transfer from high-resource languages to low-resource ones. However, it is still unknown
what cultural biases are induced in the predictions of these models. In this work, we focus on one language property highly influenced by
culture: formality. We analyze the formality distributions of XGLM and BLOOM's predictions, two popular generative multilingual language models, in 5 languages.
We classify 1,200 generations per language as formal, informal, or incohesive and measure the impact of the prompt formality on the predictions.
Overall, we observe a diversity of behaviors across the models and languages. For instance, XGLM generates informal text in Arabic and Bengali when conditioned
with informal prompts, much more than BLOOM. In addition, even though both models are highly biased toward the formal style when prompted neutrally, we find that
the models generate a significant amount of informal predictions even when prompted with formal text.We release with this work 6,000 annotated samples,
paving the way for future work on the formality of generative multilingual LMs.
|
|
Visual Question Generation in Bengali
Authors : Mahmud Hasan, Labiba Islam, Jannatul Ruma, Tasmiah Tahsin Mayeesha & Rashedur Rahman
In Proceedings of the Workshop on Multimodal, Multilingual Natural Language Generation and Multilingual WebNLG Challenge (MM-NLG 2023), pages 10–19, Prague, Czech Republic. Association for Computational Linguistics., 2023
Paper
The task of Visual Question Generation (VQG) is to generate human-like questions relevant to
the given image. As VQG is an emerging research field, existing works tend to focus only on resource-rich language such as English due
to the availability of datasets. In this paper, we propose the first Bengali Visual Question Gen-
eration task and develop a novel transformer-based encoder-decoder architecture that gener-
ates questions in Bengali when given an image. We propose multiple variants of models - (i) image-only: baseline model of generating questions from images without additional infor-
mation, (ii) image-category and image-answer-
category: guided VQG where we condition
the model to generate questions based on the
answer and the category of expected question.
These models are trained and evaluated on the
translated VQAv2.0 dataset. Our quantitative
and qualitative results establish the first state of
the art models for VQG task in Bengali and
demonstrate that our models are capable of
generating grammatically correct and relevant
questions. Our quantitative results show that
our image-cat model achieves a BLUE-1 score
of 33.12 and BLEU-3 score of 7.56 which is
the highest of the other two variants. We also
perform a human evaluation to assess the qual-
ity of the generation tasks. Human evaluation
suggests that image-cat model is capable of
generating goal-driven and attribute-specific
questions and also stays relevant to the cor-
responding image.
|
|
Transformer Based Answer-Aware Bengali Question Generation.
Authors : Jannatul Ferdous Ruma,Tasmiah Tahsin Mayeesha & Rashedur M. Rahman.
International Journal of Cognitive Computing in Engineering, Volume 4, 2023, Pages 314-326, ISSN 2666-3074.
Model /Paper
Question generation (QG), the task of generating questions from text or other forms of data, a significant and challenging subject,
has recently attracted more attention in natural language processing (NLP) due to its vast range of business, healthcare, and education applications through creating quizzes,
Frequently Asked Questions (FAQs) and documentation. Most QG research has been conducted in languages with abundant resources, such as English. However, due to the dearth
of training data in low-resource languages, such as Bengali, thorough research on Bengali question generation has yet to be conducted. In this article, we propose a system for
producing varied and pertinent Bengali questions from context passages in natural language in an answer-aware input format using a series of fine-tuned text-to-text transformer (T5)
based models. During our studies with various transformer-based encoder-decoder models and various decoding processes, along with delivering 98% grammatically accurate questions,
our fine-tuned BanglaT5 model had the highest 35.77 F-score in RougeL and 38.57 BLEU-1 score with beam search. Our automated and human evaluation results show that our answer-aware
QG models can create realistic, human-like questions relevant to the context passage and answer. We also release our code, generated questions, dataset, and models to enable broader question generation research for the Bengali-speaking community.
|
|
Making ethics at home in Global CS Education: Provoking stories from the Souths
Authors : Marisol Wong-
Villacres, Cat Kutay, Shaimaa Lazem, Nova Ahmed, Cristina Abad, Cesar Collazos, Shady Elbas-
suoni, Farzana Islam, Deepa Singh, Tasmiah Tahsin Mayeesha, Martin Mabeifam Ujakpa, Tariq Zaman & Nicola J Bidwell
ACM Journal on Computing and Sustainable Societies, 2023, Best Journal Paper Award.
University courses and curricula on the ethics of computing are increasing, yet there are few studies about how CS programs should
account for the diverse ways ethical dilemmas and approaches to ethics are situated in cultural, philosophical and governance systems,
religions and languages. This paper seeks to prompt conversations about CS education that accounts for ethics in the Global Souths. We
draw on the experiences and insights of 46 university educators and 9 practitioners, in Latin America, South Asia, Africa, the Middle east
and Australian First Nations. Our modest study sought to inform revisions of the ACM’s international curricular guidelines for the Society,
Ethics and Professionalism knowledge area in undergraduate CS programs. Participants’ responses in surveys and interviews illustrate
difficulties in translating regional and local practices, explicit or implicit values and the changing impacts of technologies, into a singular
vocabulary about ethics, such as formal ethical Codes of professional conduct. They illustrate opportunities for university teaching, and
allied learning activities, to link more closely to students’ priorities, actions and experiences in the Global Souths and enrich students’
education in the Global North.
|
|
Deep learning based question answering system in Bengali
Authors : Tasmiah Tahsin Mayeesha , Abdullah Md Sarwar, Rashedur M Rahman
Journal of Information and Telecommunication, 5:2, 145-178., 2021
Paper /
Dataset
Recent advances in the field of natural language processing has improved state-of-the-art performances on many tasks including question answering for
languages like English. Bengali language is ranked seventh and is spoken by about 300 million people all over the world. But due to lack of data and active
research on QA similar progress has not been achieved for Bengali. Unlike English, there is no benchmark large scale QA dataset collected for Bengali, no pretrained
language model that can be modified for Bengali question answering and no human baseline score for QA has been established either. In this work we use state-of-the-art
transformer models to train QA system on a synthetic reading comprehension dataset translated from one of the most popular benchmark datasets in English called SQuAD 2.0.
We collect a smaller human annotated QA dataset from Bengali Wikipedia with popular topics from Bangladeshi culture for evaluating our models.
Finally, we compare our models with human children to set up a benchmark score using survey experiments.
|
|
Applying Text Mining to Protest Stories as Voice against Media Censorship.
Authors : Tasmiah Tahsin Mayeesha, Zareen Tasneem, Jasmin Jones & Nova Ahmed.
ACM Conference on Computer-Supported Co-operative Work and Social Computing, Solidarity Across Borders Workshop,, 2018
Paper /
Data driven activism attempts to collect,analyze and visualize data to foster social change. However, during media
censorship it is often impossible to collect such data. Here we demonstrate that data from personal stories can also help us to
gain insights about protests and activism which can work as a voice for the activists. .
|
|
Credit Card Recommender System
Software Engineering Course, 2018
Code
Developed a similarity based card recommender model using geoloca-
tion and card specific features with dataset collected from Bangladeshi banks. Used scikit-learn for
modelling and deployed with Django web app and Google Dialogflow based chatbot.
|
|
Dobhashi - English Bangla Machine Translation
Natural Language Processing Course, 2018
Report
Architected English-Bangla machine translation model based on LSTM and transformer
and trained on SUPARA Benchmark Bangla-English corpus. Best performing model achieves a
BLEU score of 46.
|
Honors/Awards
Humayun Ahmed Research Fellowship. NSU HCI DIAL Lab., 2023
Weights and Bias Fastai x Huggingface study group blog competition winning submission, 2020
Code
Secure and Private AI Scholarship Challenge, Udacity-Facebook, 2019
AWS Machine Learning Scholarship, Udacity-Amazon, 2018
Fast.ai International Fellowship, 2018.
Featured in Forbes article -
Artificial Intelligence Education Transforms The Developing World, Deep Learning, not just for Silicon Valley
Udacity Machine Learning Nanodegree, 2017. Capstone project on multi-class image classification on fishery images. Code.
|
Mentorship
Bengali NLP : Application in Literature and Natural Language Generation Project(2023): Mentoring two graduated research assistants on Bengali Visual Question Generation Research
My Freedom in Light(2023): Mentoring undergraduate research assistants in report writing and literature review.
|
Template stolen from Jon Barron! Thanks for dropping by.
Last updated March 2023.
|