Understanding Bengali-speaking consumers' perspectives through sentiment analysis of Bangla e-commerce product reviews is significant for making better business decisions. In this research, a lightweight transformer architecture, BanglaLiteFormer, is proposed to investigate sentiment analysis on Bangla e-commerce product reviews.
We collected an existing dataset of 1,631 reviews. To overcome dataset size constraints, we performed data augmentation strategies, including template-based generation; neural paraphrasing with pretrained multilingual models (mT5, mBART, and BanglaT5); back-translation techniques; and supervised fine-tuning of LLMs with semantic filtering. Our augmentation pipeline expanded the dataset to 4,370 samples.
BanglaLiteFormer combines token and positional embeddings, multi-head self-attention mechanisms, and hybrid pooling strategies to capture both explicit and implicit attitudes in user-generated content. We achieved 97.00% test accuracy with near-perfect prediction confidence (99.48%), outperforming Bangla-BERT, GRU, LSTM, and transformer ensemble methods.
To overcome data scarcity in the low-resource Bangla setting, we developed a four-stage hybrid synthetic data generation framework:
Polarity-specific lexical markers combined with authentic review fragments for controlled concatenation.
mT5, mBART, BanglaT5 with top-k and nucleus sampling for linguistic variability.
Bangla → English → Bangla via NMT models generating semantically equivalent but linguistically diverse samples.
Qwen 2.5–7B and Bangla-GPT fine-tuned to generate realistic reviews, filtered by semantic similarity.
A domain-specific preprocessing pipeline tailored to Bangla's morphological and orthographic properties:
A purposefully lightweight single-block transformer designed to balance representational capacity against overfitting risk on small, domain-specific datasets. ~150,000 parameters vs. Bangla-BERT's ~110M — with a 27× lower generalization error bound.
Combines global average pooling (overall sentiment consistency) with global max pooling (prominent emotional signals) — outperforming either alone.
With dk=32 and h=2, attention operations = 409,600 vs BERT-base's 58,982,400 — enabling real-time deployment.
Theoretically justified by statistical learning theory — generalization error bound ~27× lower than Bangla-BERT on our dataset size.
| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| GRU (Word2Vec) [Ahmed et al.] | 95.72% | 95.74% | 95.72% | 95.72% |
| BiLSTM (Word2Vec) [Ahmed et al.] | 95.24% | 95.26% | 95.24% | 95.24% |
| Bangla-BERT [Ahmed et al.] | 96.20% | 96.26% | 96.20% | 96.20% |
| BanglaLiteFormer (Ours)Best | 97.00% | 97.00% | 97.00% | 97.00% |
| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| mBERT [Hoque et al.] | 93.83% | 93.90% | 93.83% | 93.83% |
| XLM-R-base [Hoque et al.] | 95.14% | 95.25% | 95.14% | 95.14% |
| BanglaBERT [Hoque et al.] | 96.33% | 96.32% | 96.33% | 96.32% |
| Transformer Ensemble [Hoque et al.] | 96.68% | 96.77% | 96.68% | 96.68% |
| BanglaLiteFormer (Ours)Best | 97.00% | 97.00% | 97.00% | 97.00% |
| Model | Train Acc. | Val Acc. |
|---|---|---|
| GRU [Ahmed et al.] | 98.70% | 96.17% |
| BiLSTM [Ahmed et al.] | 98.39% | 95.64% |
| BanglaBERT [Ahmed et al.] | 99.97% | 96.44% |
| mBERT [Hoque et al.] | 99.93% | 94.52% |
| BanglaBERT [Hoque et al.] | 99.97% | 95.95% |
| XLM-R-base [Hoque et al.] | 99.83% | 95.48% |
| Transformer Ensemble [Hoque et al.] | 99.97% | 96.43% |
| BanglaLiteFormer (Ours)Best | 100.00% | 98.57% |
An interactive Gradio-powered web application that deploys BanglaLiteFormer for real-time sentiment classification. Type or select a Bangla e-commerce review and watch the model analyze it in milliseconds.
Loading Demo...
This paper presented BanglaLiteFormer, a comprehensive and efficient framework for sentiment analysis of Bangla e-commerce reviews designed specifically for low-resource settings. By combining multi-strategy data augmentation, domain-specific preprocessing, and a purposefully lightweight transformer architecture, we demonstrated that state-of-the-art performance is achievable even with limited annotated training data.
Our model attained 97.0% test accuracy with balanced F1 scores across both sentiment classes, outperforming well-established architectures including Bangla-BERT, bidirectional GRU, bidirectional LSTM, and a five-model transformer ensemble — all while maintaining approximately 150,000 parameters compared to BERT's 110 million. The 99.48% average prediction confidence further validates that the model produces well-calibrated, reliable outputs on unseen user-generated content.
A key theoretical insight of this work is that generalization error scales with the ratio of parameters to training samples. Our single-block architecture achieves a roughly 27× lower generalization error bound than Bangla-BERT on the same dataset, which directly explains its superior practical performance despite fewer parameters. This finding has broad implications for NLP in low-resource language settings beyond Bangla.
Hybrid augmentation pipeline — template synthesis, neural paraphrasing (mT5, mBART, BanglaT5), back-translation, and LLM fine-tuning (Qwen 2.5–7B, Bangla-GPT) expanded 1,631 samples to 4,370 while preserving sentiment fidelity.
Domain-specific preprocessing — a custom Bangla pipeline handling Unicode normalization, emoji removal, and stopword filtering tailored to e-commerce register.
BanglaLiteFormer architecture — single transformer block with hybrid avg+max pooling, providing 144× less attention computation than BERT-base while exceeding its performance on this task.
Theoretical justification — statistical learning theory formalization of why lightweight architectures outperform over-parameterized models on small domain-specific datasets.
Deployable Gradio application — an end-to-end real-time inference pipeline with live latency measurement, confidence scoring, and uncertainty flagging.
Binary classification only — the model assigns a single polarity label and cannot capture neutral, mixed, or nuanced multi-class sentiment gradations.
Synthetic augmentation risks — despite semantic filtering, neural paraphrasing may introduce subtle domain inconsistencies or amplify existing biases.
Single domain — trained on e-commerce product reviews; generalization to political text, news, or social media is not validated.
No aspect-level analysis — the model classifies the entire review without identifying which specific aspects (price, quality, delivery) are positive or negative.
Sarcasm and irony — the architecture has no pragmatic reasoning and is susceptible to misclassifying ironic positive-surface text.
Aspect-based sentiment analysis — extending BanglaLiteFormer to identify and score sentiment at the aspect level (product quality, delivery speed, pricing).
Emotion detection — moving beyond binary polarity to multi-class emotion labels (joy, anger, disappointment, surprise) for richer customer insight.
Cross-lingual transfer — leveraging multilingual pre-training to extend the framework to other low-resource South and Southeast Asian languages.
Code-mixed robustness — dedicated handling of Banglish (Bangla–English code-switching) which is pervasive in real social media and e-commerce content.
Larger annotated corpora — multi-source dataset construction with aspect-level and emotion annotations to push performance boundaries further.