DATA SCIENTIST | FRAUD DETECTION & NLP

MALEHA ISRAT CHOWDHURY

I build fraud detection & NLP systems ◆ 2 Springer-published papers

MASc Computer Engineering 2 yrs fraud analytics consulting 2 Springer Nature publications Canada · open to remote & relocation Valid Canadian Work Permit Available now

Seeking Full-time Data Science roles in fraud detection, NLP, or AI trust & safety — where models go into production and the stakes are real.

I completed my Master's in Computer Engineering, and alongside that I've built strong hands-on experience in machine learning and AI — training models, comparing architectures, and figuring out not just how they work but how to make them work on real, messy data. I also bring 2 years of professional experience in fraud analytics consulting, where I built detection prototypes for banks and financial institutions. The projects I'm proudest of are the two papers I co-authored that got published in Springer Nature.

⬇ Download Resume Get In Touch →

98.82% Best Model Accuracy

74K+ Articles Classified

2 Springer Publications

2 Years Work Experience

01 — The Story

Where I Come From

Hi, I’m Maleha. When I was a kid, I watched many movies, but one that truly stayed with me was Iron Man. I remember watching Tony Stark talk to a machine, a system that guided him, advised him, and helped him build things. I had never seen anything like that before, and I couldn’t stop asking myself: “What is this? How is this even possible?” That single question quietly became the beginning of my journey into technology and artificial intelligence.

In school, curiosity turned into action when I started learning programming through Python. As time passed, I explored new tools, concepts, and ways of solving problems. I pursued a BSc in Computer Science & Engineering at East West University in Dhaka, following that curiosity. Later, I moved to Canada to complete my MASc in Computer Engineering at Memorial University of Newfoundland. The more I learned, the more I realized that the answer to “how does this work?” is never just one thing — it’s a process where every step matters. And that’s when things started becoming truly interesting.

My journey has been shaped by a series of achievements each one a chapter that taught me something new, challenged me in different ways, and brought me closer to where I am today.

2018 — 2022

BSc Computer Science & Engineering

East West University · Dhaka, Bangladesh

2022 — 2023

Junior Data Scientist

Fraud analytics consulting · R N Trading Ltd., Dhaka

2024

2 Springer Paper

2023 — 2025

MASc Computer Engineering

Memorial University of Newfoundland

Over time, I developed a consistent way of thinking about machine learning problems — not starting with models, but starting with understanding.

HOW I APPROACH A PROBLEM

"What is this data actually saying?"

Explore distributions, patterns, and anomalies before writing a single line of model code.

→ 74,428 articles explored before any modelling

"What's lying to me here?"

Remove outliers, irrelevant features, and leakage until the signal is clean enough to trust.

→ 5 outlier rows removed — RMSE dropped 18%

"Which features survive messy real-world data?"

Choose which parts of the story actually matter. Wrong features make even the best model useless.

→ TF-IDF features on 1,600 balanced reviews

"What does failure actually cost here?"

Choose based on what the data says and what failure costs — not what's trending.

→ Logistic Regression beat Bi-LSTM after clean data

"Would this hold up in production tomorrow?"

Align metrics with real-world costs before declaring done. Validation score alone is not enough.

→ Recall over accuracy — missing fraud costs more

Today, I’m looking for a team where I can continue building practical, trustworthy AI systems — models that don’t just perform well in experiments, but genuinely matter in the real world.

Python Deep Learning NLP Published Research Fraud Detection scikit-learn Django REST APIs

MEMORIAL UNIVERSITY

NAMEMaleha I. Chowdhury

PROGRAMMASc Comp. Eng.

CLEARANCEGraduated

STATUS● Open for Work

02 — How I Got Here

Experience

Jul 2022 — Aug 2023 · 1 yr 2 mos Full-time

Junior Data Scientist

R N Trading Ltd. Dhaka, Bangladesh

Delivered financial crime analytics engagements for capital markets clients, translating trade surveillance and AML risk problems into ML solutions.
Built trade anomaly detection and transaction monitoring models in Python and scikit-learn to flag spoofing, layering, and wash trading patterns across three client engagements.
Monitored trading activity data to identify behavioral trends and anomalous order patterns, delivering actionable intelligence to client risk teams.
Led on-site workshops translating ML-driven monitoring techniques and alert thresholds into actionable guidance for non-technical compliance staff.
Co-authored surveillance methodology and model validation reports defensible to financial regulators and internal audit teams.

Fraud Detection Python scikit-learn Power BI SQL Banking Consulting

Mar 2022 — Jun 2022 · 4 mos Seasonal

Tech Assistant — Data Migration

East West University · Dhaka, Bangladesh

Consolidated and migrated thousands of beneficiary records from unstructured spreadsheets into a normalized SQL database, designing schema mapping to preserve referential integrity.
Wrote Python validation scripts to catch formatting errors, duplicates, and missing fields — reducing manual review time and preventing bad data from entering production.
Diagnosed and resolved integration issues between OCR scanned form inputs and the database, including encoding mismatches and inconsistent field formats.
Created and maintained documentation of the migration pipeline for future team members to reproduce on new data batches.

Python SQL Data Migration OCR Schema Design Data Validation

03 — The Proof

Published Research

2 Springer Nature publications · 76,000+ data points classified · 98.82% top accuracy

ICDMIS 2024

Oct 2024

✓ Peer-reviewed · Springer Nature 2024

Enhancing Fake News Detection Through Machine Learning and Transfer Learning Methods

An end-to-end ML project to detect fake news using deep learning, transfer learning, and classical ML benchmarks on 74,428 articles.

Best Accuracy98.82%

Dataset74,428

Models3

PipelineBi-LSTM + GloVe

Project Journey

From problem to production — the key decisions and insights.

The Problem

94% training accuracy. Tanked on validation. I'd built the wrong thing.

The model hit 94% on training data but tanked on validation — it learned writing style of specific sources, not actual deception signals. I'd built a source detector, not a fake news detector.

Failed Experiments

Three approaches. Three failures. Each one narrowed the problem.

Log. Reg. on TF-IDF: Learned outlet vocabulary, not deception. Plateaued at 91%.
SVM with n-grams: Hit 93% but brittle on out-of-distribution articles.
BERT fine-tuning: Dataset too small — overfitting.

The Pivot

Bi-LSTM with GloVe transfer learning — 98.82%.

Fake news has temporal patterns. Bi-LSTM reads in both directions, capturing claim escalation patterns. Combined with GloVe embeddings, it hit 98.82% and generalized to unseen sources.

What I'd Change

Adversarial inputs. Attention visualization. Deploy as API on day one.

Add adversarial training, attention visualization for explainability, and deploy as API from day one — real articles surface edge cases no test set ever would.

Model Performance

Log. Reg.

90.62%

SVM

93.20%

Bi-LSTM

98.82%

Best Model

Bi-LSTM + GloVe

98.82%

💡 Key Takeaway

The number — 98.82% — is the least interesting part. The interesting part is what it took to get there: three failed approaches, an audit of what the model was actually learning, and a pivot to an architecture that matched the structure of the problem.

Tech Stack Pythonscikit-learnPyTorchNLTKpandasTF-IDFGloVe

LIVE DEMO Fake News Headline Classifier

Browser-based · From this paper's feature logic · Bi-LSTM + GloVe · 98.82%

Type or paste a news headline

Try: Real Fake Real Fake

◈Result will appear here

Analysing signals...

Real

Fake

Detected signals

Heuristic demo from this paper's TF-IDF + Bi-LSTM feature logic. Published model: 98.82% accuracy · 74,428 articles · Springer Nature 2024.

Springer Nature

Jul 2024

✓ Peer-reviewed · Springer Nature 2024

Detection of Deceptive Hotel Reviews Through the Application of Machine Learning Techniques

An end-to-end ML project to detect deceptive hotel reviews using NLP, feature engineering and classical ML models.

Best Accuracy87%

Dataset1,600

Models5

PipelineTF-IDF + ML

Project Journey

From problem to production — the key decisions and insights.

The Problem

Fake reviews everywhere — hard to spot at scale. 87% accuracy sounded great until the confusion matrix told a different story.

The model was letting deceptive reviews through while flagging real ones. In fraud, a false negative is a fake review that stays live. Accuracy doesn't capture that asymmetry. Recall does.

Model Comparison

Random Forest, SVM, Decision Tree — the winner wasn't obvious.

Decision Tree: Overfit badly. High training accuracy, poor generalization.
SVM: Good precision, but inconsistent recall on deceptive reviews.
Random Forest: Won on accuracy (87.19%) and recall balance.

Feature Engineering

TF-IDF worked — but choosing what to feed it was the real challenge.

Bigrams captured phrase-level patterns — "highly recommend," "perfect stay" — disproportionately common in deceptive reviews. Small feature engineering decisions moved the needle more than model selection.

What I'd Change

More data. Explainability layer. Real-time API.

Add SHAP or LIME for explainability — "flagged because phrases X and Y appear at unusually high frequency" is more useful than a binary score. That's the difference between a research model and a production tool.

Model Performance

Random Forest

87%

SVM

84%

Log. Reg.

81%

Decision Tree

76%

Naive Bayes

72%

Best Model

Random Forest

87%

💡 Key Takeaway

Fraud detection is fundamentally a business problem wearing a technical costume. The model architecture matters less than understanding what kind of error you can't afford to make — and building your entire evaluation strategy around that constraint.

Tech Stack Pythonscikit-learnNLTKpandasTF-IDFMatplotlibSeaborn

LIVE DEMO Hotel Review Deception Detector

Browser-based · From this paper's TF-IDF + Random Forest feature logic · 87%

Type or paste a hotel review

Try: Genuine Deceptive Genuine Deceptive

◈Result will appear here

Analysing signals...

Genuine

Deceptive

Feature breakdown uses the paper's 4 linguistic dimensions. Published model: 87% accuracy · Random Forest · 1,600 reviews · Springer Nature 2024.

04 — The Work

Projects

NLP / Finance

Python · NLP · VADER · Transformers

Real-Time Crypto Sentiment Dashboard

Applies NLP to live Reddit and Twitter data to track market mood shifts in real time — combining sentiment signals with price data to show why the market is moving, not just that it is.

Problem Crypto traders see price moves but not the sentiment driving them

Method VADER + fine-tuned transformer on Reddit/Twitter; CoinGecko + Yahoo Finance APIs

Result In development — targeting sentiment-price divergence as early signal

VADER + fine-tuned transformer on r/CryptoCurrency and r/wallstreetbets. CoinGecko and Yahoo Finance APIs for live price feeds. Designed to surface sentiment-price divergence as an early signal.

PythonNLPVADERTransformersReddit APICoinGecko API

ML / Regression

Python · scikit-learn · Ridge · pandas

House Price Prediction

Predicted property prices from 80+ correlated features — found that feature engineering reduced RMSE more than model selection, a key insight for production ML work.

Problem Predict sale prices from 80+ noisy, correlated housing features

Method Linear Regression vs. Ridge Regularization; outlier removal + feature engineering

Result 18% RMSE drop from data cleaning alone; Ridge outperformed baseline

Kaggle dataset. Handled missing values, encoded categoricals, removed outliers. Compared Linear Regression vs. Ridge regularization — evaluated on RMSE and R².

Pythonscikit-learnRidge RegressionFeature Engineeringpandas

ML / Forecasting

Python · scikit-learn · KNN · Feature Selection

Dengue Incidence Forecasting

Predicted disease outbreak rates from meteorological data — KNN outperformed Random Forest and GBR after feature selection cut inputs from 16 to 11, dropping MAE from 19.90 to 16.88.

Problem Forecast dengue outbreaks from weather data across 2 cities

Method KNN vs. Random Forest vs. GBR; feature selection reduced 16→11 inputs

Result MAE dropped from 19.90 to 16.88; KNN best after feature selection

1,456 records across 2 cities with 24 weather features. Demonstrates that careful feature selection — not more complex models — drove the biggest accuracy improvement.

Pythonscikit-learnKNNRandom ForestGBRFeature Selection

◈ Early Projects Hover to explore BSc work ▾

Web / Django

Python · Django · MySQL · REST APIs

Junction — Job-Finding Platform

Full-stack web app with separate recruiter and applicant flows — built auth, database schema, job listings, and search end to end.

Problem No unified platform connecting recruiters and applicants with role-based flows

Method Django + MySQL; relational schema, auth system, search optimization

Result Full-stack platform with dual user flows, job search, and listing management

Designed relational schema, user authentication, job listing and search. Optimized search performance as the platform's core value depended on it.

PythonDjangoMySQLREST APIs

Web / PHP

PHP · MySQL · CRUD · Data Integrity

Rent-Anything — Rental Marketplace

Database-driven marketplace where users list and book rentals — built booking conflict detection to prevent overlapping reservations.

Problem Rental platforms need date-overlap logic to prevent double bookings

Method PHP + MySQL; CRUD with date-overlap validation and data integrity constraints

Result Working marketplace with booking conflict detection and user profiles

CRUD for listings, profiles, and rental records. Handled data integrity constraints and date-overlap logic for booking validation.

PHPMySQLCRUDData Integrity

Electronics / IoT

BJT Circuits · Electronics · IoT · Sensors

Smart Gardening System

Automated plant monitoring system using BJT transistors — sensors trigger irrigation and lighting without manual input.

Problem Manual plant care doesn't scale — need automated moisture/light response

Method BJT transistor circuits for threshold-based switching; moisture + light sensors

Result Automated irrigation and grow-light control triggered by sensor thresholds

Built electronic circuits for moisture and light sensing. BJT used as switches to automate water pump and grow light control based on threshold readings.

BJT CircuitsElectronicsIoTSensors

Networking

Networking · Subnetting · VLAN · Cisco

Full-Fledged Network Design

Designed a multi-subnet organizational network with VLANs and inter-VLAN routing — simulated and verified end-to-end in Cisco Packet Tracer.

Problem Design a segmented enterprise network with full inter-VLAN connectivity

Method Subnetting, VLAN configuration, routing protocols in Cisco Packet Tracer

Result End-to-end connectivity verified across all network segments

Configured routers, switches, subnetting, and routing protocols. Full connectivity tested across all segments.

NetworkingSubnettingVLANCisco Packet Tracer

Embedded / C

C · Microcontroller · IR Sensors

Smart Car Parking System

Microcontroller system that detects slot availability via IR sensors and displays real-time occupancy on an LCD — no manual monitoring needed.

Problem Parking lots lack real-time slot availability — drivers waste time searching

Method IR sensors per slot + microcontroller logic; LCD display for live count

Result Real-time occupancy detection and display with zero manual input

IR sensors detect car presence per slot. Microcontroller tracks and updates available count, displayed live on LCD screen.

CMicrocontrollerIR SensorsEmbedded Systems

Web / PHP

Web / Feedback

HTML/CSS · PHP · MySQL

Online Feedback System

Web platform for collecting and managing structured user feedback — with admin review flow and data integrity controls.

Problem Unstructured feedback collection leads to inconsistent and hard-to-analyze data

Method PHP + MySQL; structured form submission with admin review workflow

Result Clean, consistently formatted feedback with admin moderation pipeline

Built form submission, structured storage, and admin review workflows. Focused on keeping data clean and responses consistently formatted.

HTML/CSSPHPMySQL

C++ / OOP

C++ · OOP · Encapsulation

Student Advising Management System

University advising system built with OOP principles — models students, advisors, courses, and appointments with clean entity relationships.

Problem Advising workflows need structured entity models for students, courses, and appointments

Method C++ with encapsulation and inheritance; entity relationships designed before implementation

Result Maintainable advising system with clean OOP architecture and extensible design

Applied encapsulation and inheritance to keep the codebase maintainable and extensible. Entity relationships were designed before implementation.

C++OOPEncapsulationInheritance

05 — The Toolkit

Skills

Every tool below backed by a shipped project or published paper.

— hover to activate —

{ } Languages

95%

PythonExpert

90%

SQLExpert

70%

C / C++Intermediate

65%

HTML / CSSIntermediate

45%

PHPIntermediate

⬡ ML / AI

92%

scikit-learnExpert

90%

pandasExpert

88%

NumPyExpert

NLP80%

TF-IDF · NLTKExpert

RF80%

Random ForestExpert

70%

TensorFlowIntermediate

68%

PyTorchIntermediate

LSTM60%

Bi-LSTMIntermediate

LLM45%

LLM Fine-tuningBasic

◫ Data & Analytics

90%

MySQLExpert

XLS88%

ExcelExpert

SCRP80%

Web ScrapingExpert

70%

PostgreSQLIntermediate

TBL65%

TableauIntermediate

PBI60%

Power BIIntermediate

⚙ Tools & Cloud

95%

Git / GitHubExpert

70%

AWSIntermediate

70%

DjangoIntermediate

API70%

REST APIsIntermediate

06 — How I Think

Technical Writing

Not theory — lessons from projects that failed first.

Share what you think

07 — What's Next

Let's Work Together

Thanks for taking the time to visit. Even if it's just a quick hello, feel free to reach out 🙂

✉ Email malehaickheya@gmail.com in LinkedIn maleha-ick ⌥ GitHub malehaICK

Send a Message

Fill in below — message goes straight to my inbox.

MALEHA ISRAT CHOWDHURY

Where I Come From

Experience

Junior Data Scientist

Tech Assistant — Data Migration

Published Research

Enhancing Fake News Detection Through Machine Learning and Transfer Learning Methods

Project Journey

The Problem

Failed Experiments

The Pivot

What I'd Change

Detection of Deceptive Hotel Reviews Through the Application of Machine Learning Techniques

Project Journey

The Problem

Model Comparison

Feature Engineering

What I'd Change

Projects

Real-Time Crypto Sentiment Dashboard

House Price Prediction

Dengue Incidence Forecasting

Junction — Job-Finding Platform

Rent-Anything — Rental Marketplace

Smart Gardening System

Full-Fledged Network Design

Smart Car Parking System

Online Feedback System

Student Advising Management System

Skills

Technical Writing

Got it!

Let's Work Together

Send a Message

Message Sent!