BanglaLiteFormer — Bangla E-Commerce Sentiment Analysis

Abstract

Understanding Bengali-speaking consumers' perspectives through sentiment analysis of Bangla e-commerce product reviews is significant for making better business decisions. In this research, a lightweight transformer architecture, BanglaLiteFormer, is proposed to investigate sentiment analysis on Bangla e-commerce product reviews.

We collected an existing dataset of 1,631 reviews. To overcome dataset size constraints, we performed data augmentation strategies, including template-based generation; neural paraphrasing with pretrained multilingual models (mT5, mBART, and BanglaT5); back-translation techniques; and supervised fine-tuning of LLMs with semantic filtering. Our augmentation pipeline expanded the dataset to 4,370 samples.

BanglaLiteFormer combines token and positional embeddings, multi-head self-attention mechanisms, and hybrid pooling strategies to capture both explicit and implicit attitudes in user-generated content. We achieved 97.00% test accuracy with near-perfect prediction confidence (99.48%), outperforming Bangla-BERT, GRU, LSTM, and transformer ensemble methods.

Sentiment Analysis Transformer NLP Bangla / Bengali E-Commerce Low-Resource NLP Data Augmentation TensorFlow / Keras

Methodology

End-to-End Pipeline

Data Collection

1,631 hand-labeled Bangla e-commerce reviews

Augmentation

Template, paraphrase, back-translate, LLM fine-tune

Preprocessing

Normalize, filter, stopword removal

Tokenization

Vocab 8,000 · maxlen 80

Model Training

BanglaLiteFormer + early stopping

Evaluation

97% accuracy · F1 0.97

Data Augmentation Strategy

To overcome data scarcity in the low-resource Bangla setting, we developed a four-stage hybrid synthetic data generation framework:

🧩 Template Synthesis

Polarity-specific lexical markers combined with authentic review fragments for controlled concatenation.

"পণ্যটি ভালো" → "পণ্যটি সত্যিই অসাধারণ"

🔀 Neural Paraphrasing

mT5, mBART, BanglaT5 with top-k and nucleus sampling for linguistic variability.

"এই পণ্যটি খুব ভালো" → "পণ্যটি সত্যিই প্রশংসনীয়"

🌐 Back-Translation

Bangla → English → Bangla via NMT models generating semantically equivalent but linguistically diverse samples.

Bangla → EN → Bangla

🤖 LLM Fine-Tuning

Qwen 2.5–7B and Bangla-GPT fine-tuned to generate realistic reviews, filtered by semantic similarity.

Qwen 2.5-7B + Bangla-GPT

Text Preprocessing

A domain-specific preprocessing pipeline tailored to Bangla's morphological and orthographic properties:

Type validation — convert all inputs to string type

Whitespace removal — regex-based elimination of tabs, newlines, extra spaces

Character filtering — retain Bangla (U+0980–U+09FF), English, digits; remove punctuation & emoji

Lowercase — normalize English characters to avoid duplicate tokens

Stopword removal — custom Bangla stopword list (এবং, যে, এই, …)

Rejoin — cleaned tokens rejoined for tokenization

              "এই প্রোডাক্টটা অনেক ভালো!!! 😊😊"

              ↓ after preprocessing

              "প্রোডাক্টটা ভালো"

Dataset Statistics

Dataset Growth

Original 1,631

After Augmentation 4,370

📈 +167.9% dataset growth via augmentation strategies

Sentiment Distribution

49/51 Neg / Pos

Positive 51%

Negative 49%

Well-balanced · minimal class bias ✓

Architecture

BanglaLiteFormer Architecture

A purposefully lightweight single-block transformer designed to balance representational capacity against overfitting risk on small, domain-specific datasets. ~150,000 parameters vs. Bangla-BERT's ~110M — with a 27× lower generalization error bound.

Output

            Softmax Output
            2 classes
          

↑

Dense (ReLU) + Dropout 32 neurons

↑

Feature Aggregation

            Concatenate [Avg ‖ Max]
            embed_dim × 2
          

Global Avg Pool embed_dim

Global Max Pool embed_dim

↑

Transformer Block

            LayerNorm + Residual
            maxlen × 32
          

↑

Feed-Forward (ReLU) ff_dim=32

↑

LayerNorm + Residual maxlen × 32

↑

            Multi-Head Attention
            heads=2, dk=32
          

↑

Dropout (0.2) maxlen × 32

↑

Embedding

            Token + Positional Embedding
            maxlen × 32
          

↑

Input (Padded Tokens) maxlen=80

Key Design Decisions

🎯

Hybrid Pooling Strategy

Combines global average pooling (overall sentiment consistency) with global max pooling (prominent emotional signals) — outperforming either alone.

⚡

144× Less Computation Than BERT

With dk=32 and h=2, attention operations = 409,600 vs BERT-base's 58,982,400 — enabling real-time deployment.

🧠

Single Transformer Block

Theoretically justified by statistical learning theory — generalization error bound ~27× lower than Bangla-BERT on our dataset size.

Hyperparameters

Embed dim32

Attention heads2

FF dim32

Transformer depth1 block

Max seq length80

Vocab size8,000

Dropout0.2–0.3

OptimizerAdam

Learning rate0.001

Batch size24

Max epochs50

Early stop patience10

Dense layer32, ReLU

Output2, Softmax

Results & Discussion

Experimental Results

97.00%

Test Accuracy

0.9706

Positive F1-Score

0.9692

Negative F1-Score

BanglaLiteFormer · Test Set 1 / 3

Accuracy97.00%

Precision97.00%

Recall97.00%

F1-Score (Weighted)97.00%

Avg Prediction Confidence99.48%

Validation Accuracy98.57%

Test Set · 632 Samples 2 / 3

Predicted →

← Actual

Negative

Positive

Negative

299True Neg

10False Pos

Positive

9False Neg

314True Pos

613

Correct

Errors

<2%

Error Rate

Per-Class Breakdown 3 / 3

😠 Negative Class Support: 309

Precision

0.9708

Recall

0.9676

F1-Score

0.9692

😊 Positive Class Support: 323

Precision

0.9691

Recall

0.9721

F1-Score

0.9706

Comparison with Ahmed et al. (2023) — Retrained on Our Dataset

Model	Accuracy	Precision	Recall	F1-Score
GRU (Word2Vec) [Ahmed et al.]	95.72%	95.74%	95.72%	95.72%
BiLSTM (Word2Vec) [Ahmed et al.]	95.24%	95.26%	95.24%	95.24%
Bangla-BERT [Ahmed et al.]	96.20%	96.26%	96.20%	96.20%
BanglaLiteFormer (Ours)Best	97.00%	97.00%	97.00%	97.00%

Comparison with Hoque et al. (2024) — Retrained on Our Dataset

Model	Accuracy	Precision	Recall	F1-Score
mBERT [Hoque et al.]	93.83%	93.90%	93.83%	93.83%
XLM-R-base [Hoque et al.]	95.14%	95.25%	95.14%	95.14%
BanglaBERT [Hoque et al.]	96.33%	96.32%	96.33%	96.32%
Transformer Ensemble [Hoque et al.]	96.68%	96.77%	96.68%	96.68%
BanglaLiteFormer (Ours)Best	97.00%	97.00%	97.00%	97.00%

Training vs. Validation Accuracy — All Models

Model	Train Acc.	Val Acc.
GRU [Ahmed et al.]	98.70%	96.17%
BiLSTM [Ahmed et al.]	98.39%	95.64%
BanglaBERT [Ahmed et al.]	99.97%	96.44%
mBERT [Hoque et al.]	99.93%	94.52%
BanglaBERT [Hoque et al.]	99.97%	95.95%
XLM-R-base [Hoque et al.]	99.83%	95.48%
Transformer Ensemble [Hoque et al.]	99.97%	96.43%
BanglaLiteFormer (Ours)Best	100.00%	98.57%

Live Demo

Bangla Sentiment Analyzer

An interactive Gradio-powered web application that deploys BanglaLiteFormer for real-time sentiment classification. Type or select a Bangla e-commerce review and watch the model analyze it in milliseconds.

Loading Demo...

Authors

Research Team

TAA

Tanjin Adnan Abir

M.Sc. Applied Physics & Electronics
Jahangirnagar University, Savar, Dhaka
Student Member, IEEE

📧 tanjinadnanabir@gmail.com

TMH

Md. Tanvir Mahmud Himel

B.Sc. Computer Science & Engineering
Comilla University, Comilla, Bangladesh
Student Member, IEEE

📧 himel028@stud.cou.ac.bd

Shukdev Datta

M.Sc. Computer Science & Engineering
University of Dhaka, Bangladesh
Student Member, IEEE

📧 shukdev-2021416455@cs.du.ac.bd

Conclusion

Summary & Future Directions

This paper presented BanglaLiteFormer, a comprehensive and efficient framework for sentiment analysis of Bangla e-commerce reviews designed specifically for low-resource settings. By combining multi-strategy data augmentation, domain-specific preprocessing, and a purposefully lightweight transformer architecture, we demonstrated that state-of-the-art performance is achievable even with limited annotated training data.

Our model attained 97.0% test accuracy with balanced F1 scores across both sentiment classes, outperforming well-established architectures including Bangla-BERT, bidirectional GRU, bidirectional LSTM, and a five-model transformer ensemble — all while maintaining approximately 150,000 parameters compared to BERT's 110 million. The 99.48% average prediction confidence further validates that the model produces well-calibrated, reliable outputs on unseen user-generated content.

A key theoretical insight of this work is that generalization error scales with the ratio of parameters to training samples. Our single-block architecture achieves a roughly 27× lower generalization error bound than Bangla-BERT on the same dataset, which directly explains its superior practical performance despite fewer parameters. This finding has broad implications for NLP in low-resource language settings beyond Bangla.

Key Contributions

Hybrid augmentation pipeline — template synthesis, neural paraphrasing (mT5, mBART, BanglaT5), back-translation, and LLM fine-tuning (Qwen 2.5–7B, Bangla-GPT) expanded 1,631 samples to 4,370 while preserving sentiment fidelity.

Domain-specific preprocessing — a custom Bangla pipeline handling Unicode normalization, emoji removal, and stopword filtering tailored to e-commerce register.

BanglaLiteFormer architecture — single transformer block with hybrid avg+max pooling, providing 144× less attention computation than BERT-base while exceeding its performance on this task.

Theoretical justification — statistical learning theory formalization of why lightweight architectures outperform over-parameterized models on small domain-specific datasets.

Deployable Gradio application — an end-to-end real-time inference pipeline with live latency measurement, confidence scoring, and uncertainty flagging.

⚠️

Limitations

Binary classification only — the model assigns a single polarity label and cannot capture neutral, mixed, or nuanced multi-class sentiment gradations.

Synthetic augmentation risks — despite semantic filtering, neural paraphrasing may introduce subtle domain inconsistencies or amplify existing biases.

Single domain — trained on e-commerce product reviews; generalization to political text, news, or social media is not validated.

No aspect-level analysis — the model classifies the entire review without identifying which specific aspects (price, quality, delivery) are positive or negative.

Sarcasm and irony — the architecture has no pragmatic reasoning and is susceptible to misclassifying ironic positive-surface text.

🔭

Future Directions

Aspect-based sentiment analysis — extending BanglaLiteFormer to identify and score sentiment at the aspect level (product quality, delivery speed, pricing).

Emotion detection — moving beyond binary polarity to multi-class emotion labels (joy, anger, disappointment, surprise) for richer customer insight.

Cross-lingual transfer — leveraging multilingual pre-training to extend the framework to other low-resource South and Southeast Asian languages.

Code-mixed robustness — dedicated handling of Banglish (Bangla–English code-switching) which is pervasive in real social media and e-commerce content.

Larger annotated corpora — multi-source dataset construction with aspect-level and emotion annotations to push performance boundaries further.

"This study demonstrates that state-of-the-art performance can be obtained in a limited-resource scenario utilising a precisely built, lightweight transformer, domain-specific pre-processing, and deliberate data augmentation."

— BanglaLiteFormer, IEEE Transactions on Artificial Intelligence, 2026

Read Abstract View Results Try the Demo

BanglaLiteFormer: Efficient Sentiment Classification of Bangla E-Commerce Reviews