Beranda Data Analysis/Science Akuntansi Pajak Website Apps Formulir Hubungi Saya
Karya terpilih

Data nyata. Hasil nyata.

01
Data Analysis
EDA · Business Insights · Dashboards
View Projects →
02
Data Science
ML · Clustering · Regression
View Projects →
Data Analysis — Pilih Project
DA Project 1
RFM Segmentation
Unlocking Customer Value — RFM Analysis on 541,909 real transactions
RFMEDAPower BI
View Project →
DA Project 2
Inventory Analysis
Store Performance, Inventory Management and Profitability across 485,875 items
EDAABC AnalysisTableau
View Project →
DA Project 3
Hotel Booking Demand
Cancellation, Pricing & Demand Patterns across 83,293 hotel bookings
EDACancellation AnalysisPython
View Project →
~/dashboards/rfm_segmentation.html
1,062,989 rows · 4.2mb live
DATA ANALYSIS · DA PROJECT 1 · RFM

Unlocking Customer Value Through
RFM Segmentation

Online retail businesses often fail to distinguish loyal customers from nearly inactive ones. This dashboard uses RFM Analysis on 1,062,989 real transactions to identify high-value customer segments and surface actionable retention strategies.

Dataset UCI Online Retail
Rows 805,549
Customers 5,878
Period 2009 – 2011
unique customers
5,878
after cleaning & dedup
RFM segments
8
R · F · M score combinations
top segment
Seg 01
Current Loyal High Spending
score range
1–5 per R·F·M
percentile-based quintiles
Step Result
Check dataset shape 1,062,989 rows × 10 columns on load
Check missing values 238,625 empty strings found in customer_id — rows removed
Check duplicate rows 26,124 duplicate rows found and removed
Convert data types invoicedate converted to datetime format; customer_id to string
Remove invalid values Rows with negative/zero quantity and price removed (20,261 negative qty; 1,820 zero price)
Remove cancelled transactions Cancelled orders excluded using is_cancelled boolean flag
Remove extreme outliers 3 extreme outlier customers removed (total_price > 25,000) — confirmed as anomalies
Final dataset shape 805,546 rows × 10 columns — ready for RFM analysis
AnthonyDjiadyDjie_DS39+_Final_Project_Analysis
Preview loading
If the presentation doesn't appear, click the button below
↗ Open in Canva
Data Science — Pilih Project
DS Project 1
Customer Clustering
K-Means ML — Segmenting 541,909 customers into 4 actionable clusters
K-MeansClusteringScikit-Learn
View Project →
DS Project 2
Delivery Prediction
Predicting Food Delivery ETA with ML
RegressionML PipelineScikit-Learn
View Project →
DS Project 3
Customer Churn
Predicting Customer Churn with ML Classification
SVMClassificationRecallConfusion Matrix
View Project →
DS Project 4
Credit Risk / Loan Default
Classify loan default risk — directly relevant to tax & accounting domain
Logistic RegressionXGBoostROC-AUCFeature Importance
View Project →
NEW
DS Project 5
Hotel Cancellation Prediction
Predict booking cancellations on 82,721 hotel reservations — production-grade evaluation with chronological hold-out
Random ForestGradient BoostingTime-Based SplitThreshold Tuning
View Project →
DATA SCIENCE · DS PROJECT 1 · K-MEANS CLUSTERING

Discovering Hidden Customer Segments
with K-Means Clustering

Online retail businesses often struggle to understand customer differences. This project applies RFM feature engineering and K-Means Clustering to segment 5,878 customers into 4 actionable groups — each with a tailored business strategy.

Dataset UCI Online Retail
Rows 1,062,989
Customers 5,878
Clusters k = 4
optimal k
4 clusters
Elbow + Silhouette method
best silhouette
0.32 at k=2
k=4 chosen for business fit
features used
5 features
R · F · M · AOV · Avg Qty
PCA variance
84.8%
PC1 59.1% · PC2 25.7%
Step Result
Check dataset shape1,062,989 rows × 10 columns
Drop missing customer_idNull customer_id dropped — 824,364 rows remain
Convert data typescustomer_id → string; invoicedate → datetime
Check empty stringsAll categorical columns — no empty strings found
Remove negatives & zeros18,744 negative quantity rows + 71 zero-price rows removed
Check duplicates26,124 duplicates found — kept (same invoice, different items)
Outlier detection (IQR)Quantity 6.45%, Price 8.36%, Total Price 8.24% — kept as high-value signals
Final dataset shape805,549 rows × 10 columns — ready for RFM feature engineering
Features Used 5 features
Recency
Days since last purchase
Frequency
Number of unique invoices
Monetary
Total spending per customer
AOV
Average Order Value (Monetary / Freq)
Avg Qty per Invoice
Total quantity / frequency
Log1p transformation applied to reduce skewness
StandardScaler used before K-Means fitting
Inertia vs K k=4 chosen
20k 15k 10k 7k Number of Clusters (K) k=2 k=3 k=4 k=5 k=6 k=7 k=8 k=9 k=10 k=4 ✓ auto
Auto-detected elbow: k=5  ·  Chosen: k=4 for business fit
Score vs K k=4 chosen
0.32 0.30 0.28 0.26 0.24 k=2 k=3 k=4 k=5 k=6 k=7 k=8 k=9 k=10 Number of Clusters (K) peak k=4 ✓
k=2 peaks (0.3204) but too broad  ·  k=4 chosen: 0.2560
2D PCA Projection — Hover over clusters to explore 84.8% variance explained
PC1 (59.1%) PC2 (25.7%) At Risk Bulk Buyers Regular Dormant Recency Frequency Monetary AOV Avg Qty
At Risk Customers (n=1,763)
Bulk Buyers (n=1,142)
Regular Customers (n=1,552)
Dormant Customers (n=1,421)
Select a Cluster — See Profile & Strategy k=4 segments
Recency
18 days
Median days since last order
Frequency
12.5 orders
Median unique invoices
Monetary
£5,599
Median lifetime spend
AOV
£447.58
Avg order value
Avg Qty / Inv
267.32
Avg quantity per invoice
Customers
1,142 (19.4%)
Share of customer base
Strategy & Actions
VIP loyalty programs & membership tiers
Early access to new products & restocks
Personalized marketing based on purchase history
Premium product promotions & bundles
Exclusive events & newsletters
🏆 Highest Value
Recency
31 days
Median days since last order
Frequency
5 orders
Median unique invoices
Monetary
£1,131
Median lifetime spend
AOV
£231.23
Avg order value
Avg Qty / Inv
126.40
Avg quantity per invoice
Customers
1,552 (26.4%)
Share of customer base
Strategy & Actions
Upselling — promote higher-value products
Cross-selling complementary items
Product bundling to increase basket size
Loyalty incentives to drive repeat visits
Personalized recommendations engine
📈 Growth Potential
Recency
299 days
Median days since last order
Frequency
2 orders
Median unique invoices
Monetary
£798
Median lifetime spend
AOV
£393.26
Avg order value
Avg Qty / Inv
239.00
Avg quantity per invoice
Customers
1,763 (30.0%)
Share of customer base
Strategy & Actions
Reactivation campaigns — "We miss you" messaging
Limited-time discounts to create urgency
Seasonal or event-based promotions
Digital retargeting on social media
Loyalty incentive win-back programs
⚠️ Reactivate
Recency
383 days
Median days since last order
Frequency
1 order
Median unique invoices
Monetary
£189
Median lifetime spend
AOV
£137.37
Avg order value
Avg Qty / Inv
68.00
Avg quantity per invoice
Customers
1,421 (24.2%)
Share of customer base
Strategy & Actions
Welcome-back promos & onboarding nudges
First repeat purchase incentives
Follow-up emails with product highlights
Targeted promotions to encourage engagement
Retargeting campaigns via digital ads
💤 Low Activity
Revenue Contribution per Cluster % of total
At Risk Customers
29.4%
Regular Customers
27.0%
Bulk Buyers
25.0%
Dormant Customers
18.5%
Customer Distribution % of base
At Risk Customers
30.0%
Regular Customers
26.4%
Dormant Customers
24.2%
Bulk Buyers
19.4%
1
Retain Bulk Buyers — Protect Your Core Revenue
With only 19.4% of customers but 25% of revenue, Bulk Buyers are your most efficient segment. VIP loyalty programs, early product access, and exclusive communications will protect retention.
2
Grow Regular Customers Into Higher-Value Buyers
Regular Customers (26.4%) contribute 27% of revenue with solid frequency. Upselling, cross-selling, and product bundling can push them toward the Bulk Buyer tier over time.
3
Reactivate At Risk Customers Before They Churn
At Risk Customers haven't purchased in ~299 days. Time-limited "we miss you" campaigns, discounts, and digital retargeting are the priority to win them back before they're lost.
4
Nurture Dormant Customers With Low-Friction Entry
Dormant Customers (24.2%) made only 1 purchase on median. Welcome-back promos and first repeat purchase incentives are the right low-cost tools to gradually re-engage this group.
WANT THE FULL DEEP-DIVE?

Explore the interactive dashboard

Explore the complete analysis, interactive charts, and full methodology.

Github Repo ▶ Open Streamlit App

Tertarik Bekerja Sama?

Konsultasi pertama gratis. Kami siap membantu dari Palu untuk seluruh Indonesia.

Hubungi Saya →
Google Data Analytics — Coursera
Microsoft SQL Server — Coursera
Microsoft Excel Professional — Coursera
Tableau BI Analyst — Coursera
Python for Data Science, AI — IBM / Coursera
Xero Advisor Certified
Data Science Bootcamp — Dibimbing.id
Brevet Pajak A & B — IAI
Praktisi/Pengacara Pajak (PKP3) — Jimly School of Law
Data Science Bootcamp — Dibimbing.id
B.S. Business Administration — Biola University
Biola University Bachelor Certificate
Brevet Pajak A & B — IAI
Resume — Anthony Djiady Djie