DATA SCIENTIST | FRAUD DETECTION & NLP

MALEHA ISRAT CHOWDHURY

I build fraud detection & NLP systems 2 Springer-published papers
MASc Computer Engineering 2 yrs fraud analytics consulting 2 Springer Nature publications Canada · open to remote & relocation Valid Canadian Work Permit Available now
Seeking Full-time Data Science roles in fraud detection, NLP, or AI trust & safety — where models go into production and the stakes are real.

I completed my Master's in Computer Engineering, and alongside that I've built strong hands-on experience in machine learning and AI — training models, comparing architectures, and figuring out not just how they work but how to make them work on real, messy data. I also bring 2 years of professional experience in fraud analytics consulting, where I built detection prototypes for banks and financial institutions. The projects I'm proudest of are the two papers I co-authored that got published in Springer Nature.

98.82% Best Model Accuracy
74K+ Articles Classified
2 Springer Publications
2 Years Work Experience

Where I Come From

The Seed Is Planted Code, Dreams & Dhaka A New Chapter In Canada Professional Powerhouse

Hi, I’m Maleha. When I was a kid, I watched many movies, but one that truly stayed with me was Iron Man. I remember watching Tony Stark talk to a machine, a system that guided him, advised him, and helped him build things. I had never seen anything like that before, and I couldn’t stop asking myself: “What is this? How is this even possible?” That single question quietly became the beginning of my journey into technology and artificial intelligence.

In school, curiosity turned into action when I started learning programming through Python. As time passed, I explored new tools, concepts, and ways of solving problems. I pursued a BSc in Computer Science & Engineering at East West University in Dhaka, following that curiosity. Later, I moved to Canada to complete my MASc in Computer Engineering at Memorial University of Newfoundland. The more I learned, the more I realized that the answer to “how does this work?” is never just one thing — it’s a process where every step matters. And that’s when things started becoming truly interesting.

My journey has been shaped by a series of achievements each one a chapter that taught me something new, challenged me in different ways, and brought me closer to where I am today.

2018 — 2022
BSc Computer Science & Engineering
East West University · Dhaka, Bangladesh
2022 — 2023
Junior Data Scientist
Fraud analytics consulting · R N Trading Ltd., Dhaka
2024
2 Springer Paper
2023 — 2025
MASc Computer Engineering
Memorial University of Newfoundland

Over time, I developed a consistent way of thinking about machine learning problems — not starting with models, but starting with understanding.

HOW I APPROACH A PROBLEM
1
"What is this data actually saying?"
Explore distributions, patterns, and anomalies before writing a single line of model code.
74,428 articles explored before any modelling
2
"What's lying to me here?"
Remove outliers, irrelevant features, and leakage until the signal is clean enough to trust.
5 outlier rows removed — RMSE dropped 18%
3
"Which features survive messy real-world data?"
Choose which parts of the story actually matter. Wrong features make even the best model useless.
TF-IDF features on 1,600 balanced reviews
4
"What does failure actually cost here?"
Choose based on what the data says and what failure costs — not what's trending.
Logistic Regression beat Bi-LSTM after clean data
5
"Would this hold up in production tomorrow?"
Align metrics with real-world costs before declaring done. Validation score alone is not enough.
Recall over accuracy — missing fraud costs more

Today, I’m looking for a team where I can continue building practical, trustworthy AI systems — models that don’t just perform well in experiments, but genuinely matter in the real world.

Python Deep Learning NLP Published Research Fraud Detection scikit-learn Django REST APIs
MEMORIAL UNIVERSITY
Maleha Israt Chowdhury
NAMEMaleha I. Chowdhury
PROGRAMMASc Comp. Eng.
CLEARANCEGraduated
STATUS● Open for Work

Experience

Jul 2022 — Aug 2023 · 1 yr 2 mos Full-time

Junior Data Scientist

R N Trading Ltd. Dhaka, Bangladesh
  • Delivered financial crime analytics engagements for capital markets clients, translating trade surveillance and AML risk problems into ML solutions.
  • Built trade anomaly detection and transaction monitoring models in Python and scikit-learn to flag spoofing, layering, and wash trading patterns across three client engagements.
  • Monitored trading activity data to identify behavioral trends and anomalous order patterns, delivering actionable intelligence to client risk teams.
  • Led on-site workshops translating ML-driven monitoring techniques and alert thresholds into actionable guidance for non-technical compliance staff.
  • Co-authored surveillance methodology and model validation reports defensible to financial regulators and internal audit teams.
Fraud Detection Python scikit-learn Power BI SQL Banking Consulting
Mar 2022 — Jun 2022 · 4 mos Seasonal

Tech Assistant — Data Migration

East West University · Dhaka, Bangladesh
  • Consolidated and migrated thousands of beneficiary records from unstructured spreadsheets into a normalized SQL database, designing schema mapping to preserve referential integrity.
  • Wrote Python validation scripts to catch formatting errors, duplicates, and missing fields — reducing manual review time and preventing bad data from entering production.
  • Diagnosed and resolved integration issues between OCR scanned form inputs and the database, including encoding mismatches and inconsistent field formats.
  • Created and maintained documentation of the migration pipeline for future team members to reproduce on new data batches.
Python SQL Data Migration OCR Schema Design Data Validation

Published Research

2 Springer Nature publications · 76,000+ data points classified · 98.82% top accuracy

ICDMIS 2024
Oct 2024
✓ Peer-reviewed · Springer Nature 2024

Enhancing Fake News Detection Through Machine Learning and Transfer Learning Methods

An end-to-end ML project to detect fake news using deep learning, transfer learning, and classical ML benchmarks on 74,428 articles.

1
Best Accuracy98.82%
2
Dataset74,428
3
Models3
4
PipelineBi-LSTM + GloVe

Project Journey

From problem to production — the key decisions and insights.

01
The Problem

94% training accuracy. Tanked on validation. I'd built the wrong thing.

The model hit 94% on training data but tanked on validation — it learned writing style of specific sources, not actual deception signals. I'd built a source detector, not a fake news detector.

02
Failed Experiments

Three approaches. Three failures. Each one narrowed the problem.

  • Log. Reg. on TF-IDF: Learned outlet vocabulary, not deception. Plateaued at 91%.
  • SVM with n-grams: Hit 93% but brittle on out-of-distribution articles.
  • BERT fine-tuning: Dataset too small — overfitting.
03
The Pivot

Bi-LSTM with GloVe transfer learning — 98.82%.

Fake news has temporal patterns. Bi-LSTM reads in both directions, capturing claim escalation patterns. Combined with GloVe embeddings, it hit 98.82% and generalized to unseen sources.

04
What I'd Change

Adversarial inputs. Attention visualization. Deploy as API on day one.

Add adversarial training, attention visualization for explainability, and deploy as API from day one — real articles surface edge cases no test set ever would.

Model Performance
Log. Reg.
90.62%
SVM
93.20%
Bi-LSTM
98.82%
Best Model
Bi-LSTM + GloVe
98.82%
💡 Key Takeaway

The number — 98.82% — is the least interesting part. The interesting part is what it took to get there: three failed approaches, an audit of what the model was actually learning, and a pivot to an architecture that matched the structure of the problem.

Tech Stack Pythonscikit-learnPyTorchNLTKpandasTF-IDFGloVe
LIVE DEMO Fake News Headline Classifier
Browser-based · From this paper's feature logic · Bi-LSTM + GloVe · 98.82%
Try: Real Fake Real Fake
Result will appear here
Analysing signals...
Real
0%
Fake
0%
Detected signals

Heuristic demo from this paper's TF-IDF + Bi-LSTM feature logic. Published model: 98.82% accuracy · 74,428 articles · Springer Nature 2024.

Springer Nature
Jul 2024
✓ Peer-reviewed · Springer Nature 2024

Detection of Deceptive Hotel Reviews Through the Application of Machine Learning Techniques

An end-to-end ML project to detect deceptive hotel reviews using NLP, feature engineering and classical ML models.

1
Best Accuracy87%
2
Dataset1,600
3
Models5
4
PipelineTF-IDF + ML

Project Journey

From problem to production — the key decisions and insights.

01
The Problem

Fake reviews everywhere — hard to spot at scale. 87% accuracy sounded great until the confusion matrix told a different story.

The model was letting deceptive reviews through while flagging real ones. In fraud, a false negative is a fake review that stays live. Accuracy doesn't capture that asymmetry. Recall does.

02
Model Comparison

Random Forest, SVM, Decision Tree — the winner wasn't obvious.

  • Decision Tree: Overfit badly. High training accuracy, poor generalization.
  • SVM: Good precision, but inconsistent recall on deceptive reviews.
  • Random Forest: Won on accuracy (87.19%) and recall balance.
03
Feature Engineering

TF-IDF worked — but choosing what to feed it was the real challenge.

Bigrams captured phrase-level patterns — "highly recommend," "perfect stay" — disproportionately common in deceptive reviews. Small feature engineering decisions moved the needle more than model selection.

04
What I'd Change

More data. Explainability layer. Real-time API.

Add SHAP or LIME for explainability — "flagged because phrases X and Y appear at unusually high frequency" is more useful than a binary score. That's the difference between a research model and a production tool.

Model Performance
Random Forest
87%
SVM
84%
Log. Reg.
81%
Decision Tree
76%
Naive Bayes
72%
Best Model
Random Forest
87%
💡 Key Takeaway

Fraud detection is fundamentally a business problem wearing a technical costume. The model architecture matters less than understanding what kind of error you can't afford to make — and building your entire evaluation strategy around that constraint.

Tech Stack Pythonscikit-learnNLTKpandasTF-IDFMatplotlibSeaborn
LIVE DEMO Hotel Review Deception Detector
Browser-based · From this paper's TF-IDF + Random Forest feature logic · 87%
Try: Genuine Deceptive Genuine Deceptive
Result will appear here
Analysing signals...
Genuine
0%
Deceptive
0%

Feature breakdown uses the paper's 4 linguistic dimensions. Published model: 87% accuracy · Random Forest · 1,600 reviews · Springer Nature 2024.

Projects

NLP / Finance
NLP / Finance
Python · NLP · VADER · Transformers

Real-Time Crypto Sentiment Dashboard

Applies NLP to live Reddit and Twitter data to track market mood shifts in real time — combining sentiment signals with price data to show why the market is moving, not just that it is.

Problem Crypto traders see price moves but not the sentiment driving them
Method VADER + fine-tuned transformer on Reddit/Twitter; CoinGecko + Yahoo Finance APIs
Result In development — targeting sentiment-price divergence as early signal

VADER + fine-tuned transformer on r/CryptoCurrency and r/wallstreetbets. CoinGecko and Yahoo Finance APIs for live price feeds. Designed to surface sentiment-price divergence as an early signal.

PythonNLPVADERTransformersReddit APICoinGecko API
House Price Prediction
ML / Regression
Python · scikit-learn · Ridge · pandas

House Price Prediction

Predicted property prices from 80+ correlated features — found that feature engineering reduced RMSE more than model selection, a key insight for production ML work.

Problem Predict sale prices from 80+ noisy, correlated housing features
Method Linear Regression vs. Ridge Regularization; outlier removal + feature engineering
Result 18% RMSE drop from data cleaning alone; Ridge outperformed baseline

Kaggle dataset. Handled missing values, encoded categoricals, removed outliers. Compared Linear Regression vs. Ridge regularization — evaluated on RMSE and R².

Pythonscikit-learnRidge RegressionFeature Engineeringpandas
Dengue Forecast
ML / Forecasting
Python · scikit-learn · KNN · Feature Selection

Dengue Incidence Forecasting

Predicted disease outbreak rates from meteorological data — KNN outperformed Random Forest and GBR after feature selection cut inputs from 16 to 11, dropping MAE from 19.90 to 16.88.

Problem Forecast dengue outbreaks from weather data across 2 cities
Method KNN vs. Random Forest vs. GBR; feature selection reduced 16→11 inputs
Result MAE dropped from 19.90 to 16.88; KNN best after feature selection

1,456 records across 2 cities with 24 weather features. Demonstrates that careful feature selection — not more complex models — drove the biggest accuracy improvement.

Pythonscikit-learnKNNRandom ForestGBRFeature Selection
Early Projects Hover to explore BSc work
Junction Job-Finding Platform
Web / Django
Python · Django · MySQL · REST APIs

Junction — Job-Finding Platform

Full-stack web app with separate recruiter and applicant flows — built auth, database schema, job listings, and search end to end.

Problem No unified platform connecting recruiters and applicants with role-based flows
Method Django + MySQL; relational schema, auth system, search optimization
Result Full-stack platform with dual user flows, job search, and listing management

Designed relational schema, user authentication, job listing and search. Optimized search performance as the platform's core value depended on it.

PythonDjangoMySQLREST APIs
Rent-Anything Rental Marketplace
Web / PHP
PHP · MySQL · CRUD · Data Integrity

Rent-Anything — Rental Marketplace

Database-driven marketplace where users list and book rentals — built booking conflict detection to prevent overlapping reservations.

Problem Rental platforms need date-overlap logic to prevent double bookings
Method PHP + MySQL; CRUD with date-overlap validation and data integrity constraints
Result Working marketplace with booking conflict detection and user profiles

CRUD for listings, profiles, and rental records. Handled data integrity constraints and date-overlap logic for booking validation.

PHPMySQLCRUDData Integrity
Smart Gardening System
Electronics / IoT
BJT Circuits · Electronics · IoT · Sensors

Smart Gardening System

Automated plant monitoring system using BJT transistors — sensors trigger irrigation and lighting without manual input.

Problem Manual plant care doesn't scale — need automated moisture/light response
Method BJT transistor circuits for threshold-based switching; moisture + light sensors
Result Automated irrigation and grow-light control triggered by sensor thresholds

Built electronic circuits for moisture and light sensing. BJT used as switches to automate water pump and grow light control based on threshold readings.

BJT CircuitsElectronicsIoTSensors
Full-Fledged Network Design
Networking
Networking · Subnetting · VLAN · Cisco

Full-Fledged Network Design

Designed a multi-subnet organizational network with VLANs and inter-VLAN routing — simulated and verified end-to-end in Cisco Packet Tracer.

Problem Design a segmented enterprise network with full inter-VLAN connectivity
Method Subnetting, VLAN configuration, routing protocols in Cisco Packet Tracer
Result End-to-end connectivity verified across all network segments

Configured routers, switches, subnetting, and routing protocols. Full connectivity tested across all segments.

NetworkingSubnettingVLANCisco Packet Tracer
Smart Car Parking System
Embedded / C
C · Microcontroller · IR Sensors

Smart Car Parking System

Microcontroller system that detects slot availability via IR sensors and displays real-time occupancy on an LCD — no manual monitoring needed.

Problem Parking lots lack real-time slot availability — drivers waste time searching
Method IR sensors per slot + microcontroller logic; LCD display for live count
Result Real-time occupancy detection and display with zero manual input

IR sensors detect car presence per slot. Microcontroller tracks and updates available count, displayed live on LCD screen.

CMicrocontrollerIR SensorsEmbedded Systems
Web / PHP
Web / Feedback
HTML/CSS · PHP · MySQL

Online Feedback System

Web platform for collecting and managing structured user feedback — with admin review flow and data integrity controls.

Problem Unstructured feedback collection leads to inconsistent and hard-to-analyze data
Method PHP + MySQL; structured form submission with admin review workflow
Result Clean, consistently formatted feedback with admin moderation pipeline

Built form submission, structured storage, and admin review workflows. Focused on keeping data clean and responses consistently formatted.

HTML/CSSPHPMySQL
C++ / OOP
C++ / OOP
C++ · OOP · Encapsulation

Student Advising Management System

University advising system built with OOP principles — models students, advisors, courses, and appointments with clean entity relationships.

Problem Advising workflows need structured entity models for students, courses, and appointments
Method C++ with encapsulation and inheritance; entity relationships designed before implementation
Result Maintainable advising system with clean OOP architecture and extensible design

Applied encapsulation and inheritance to keep the codebase maintainable and extensible. Entity relationships were designed before implementation.

C++OOPEncapsulationInheritance

Skills

Every tool below backed by a shipped project or published paper.

— hover to activate —

{ } Languages
Python95%
PythonExpert
SQL90%
SQLExpert
C++70%
C / C++Intermediate
HTML/CSS65%
HTML / CSSIntermediate
PHP45%
PHPIntermediate
ML / AI
scikit-learn92%
scikit-learnExpert
pandas90%
pandasExpert
NumPy88%
NumPyExpert
NLP80%
TF-IDF · NLTKExpert
RF80%
Random ForestExpert
TensorFlow70%
TensorFlowIntermediate
PyTorch68%
PyTorchIntermediate
LSTM60%
Bi-LSTMIntermediate
LLM45%
LLM Fine-tuningBasic
Data & Analytics
MySQL90%
MySQLExpert
XLS88%
ExcelExpert
SCRP80%
Web ScrapingExpert
PostgreSQL70%
PostgreSQLIntermediate
TBL65%
TableauIntermediate
PBI60%
Power BIIntermediate
Tools & Cloud
Git95%
Git / GitHubExpert
AWS70%
AWSIntermediate
Django70%
DjangoIntermediate
API70%
REST APIsIntermediate

Technical Writing

Not theory — lessons from projects that failed first.

Share what you think

M

Let's Work Together

Thanks for taking the time to visit. Even if it's just a quick hello, feel free to reach out 🙂

Currently available for full-time opportunities

Send a Message

Fill in below — message goes straight to my inbox.