Fraud Investigation with LouieAI Agents¶
Detect and investigate fraud patterns using specialized agents for graph analysis and anomaly detection.
Topics covered:
- Transaction data analysis
- Network analysis for fraud rings
- Statistical anomaly detection
- Interactive investigation dashboards
- Risk assessment and scoring
Setup and Authentication¶
In [1]:
Copied!
from datetime import datetime, timedelta
import numpy as np
import pandas as pd
# For this demonstration, we'll use locally generated data
# To use with LouieAI, ensure you have proper credentials set
print("🔒 Running fraud detection demonstration")
DEMO_MODE = True
from datetime import datetime, timedelta
import numpy as np
import pandas as pd
# For this demonstration, we'll use locally generated data
# To use with LouieAI, ensure you have proper credentials set
print("🔒 Running fraud detection demonstration")
DEMO_MODE = True
🔒 Running fraud detection demonstration
Generate Transaction Data with Fraud Patterns¶
In [2]:
Copied!
# Generate sample transaction data with embedded fraud patterns
np.random.seed(42)
base_time = datetime(2024, 1, 1)
# Create normal and fraudulent transactions
transactions = []
for i in range(500):
is_fraud = np.random.random() < 0.05 # 5% fraud rate
if is_fraud:
# Fraudulent patterns
amount = np.random.choice(
[
np.random.uniform(5000, 10000), # Unusually high
0.01, # Testing transaction
999.99, # Just under reporting limit
]
)
merchant = np.random.choice(["ATM_Foreign", "Online_Casino", "Wire_Transfer"])
velocity = np.random.randint(1, 10) # Multiple rapid transactions
else:
# Normal patterns
amount = np.random.lognormal(3.5, 1.5)
merchant = np.random.choice(
["Grocery_Store", "Gas_Station", "Restaurant", "Online_Shop"]
)
velocity = 1
user_id = f"USER_{np.random.randint(1, 101):03d}"
timestamp = base_time + timedelta(minutes=i * 10 + np.random.randint(-5, 5))
transactions.append(
{
"transaction_id": f"T{i + 1:04d}",
"user_id": user_id,
"amount": round(amount, 2),
"merchant": merchant,
"timestamp": timestamp,
"is_suspicious": is_fraud,
}
)
transactions_df = pd.DataFrame(transactions)
print(
f"✅ Generated {len(transactions_df)} transactions with {transactions_df['is_suspicious'].sum()} suspicious patterns"
)
print("\nSample data:")
transactions_df.head()
# Generate sample transaction data with embedded fraud patterns
np.random.seed(42)
base_time = datetime(2024, 1, 1)
# Create normal and fraudulent transactions
transactions = []
for i in range(500):
is_fraud = np.random.random() < 0.05 # 5% fraud rate
if is_fraud:
# Fraudulent patterns
amount = np.random.choice(
[
np.random.uniform(5000, 10000), # Unusually high
0.01, # Testing transaction
999.99, # Just under reporting limit
]
)
merchant = np.random.choice(["ATM_Foreign", "Online_Casino", "Wire_Transfer"])
velocity = np.random.randint(1, 10) # Multiple rapid transactions
else:
# Normal patterns
amount = np.random.lognormal(3.5, 1.5)
merchant = np.random.choice(
["Grocery_Store", "Gas_Station", "Restaurant", "Online_Shop"]
)
velocity = 1
user_id = f"USER_{np.random.randint(1, 101):03d}"
timestamp = base_time + timedelta(minutes=i * 10 + np.random.randint(-5, 5))
transactions.append(
{
"transaction_id": f"T{i + 1:04d}",
"user_id": user_id,
"amount": round(amount, 2),
"merchant": merchant,
"timestamp": timestamp,
"is_suspicious": is_fraud,
}
)
transactions_df = pd.DataFrame(transactions)
print(
f"✅ Generated {len(transactions_df)} transactions with {transactions_df['is_suspicious'].sum()} suspicious patterns"
)
print("\nSample data:")
transactions_df.head()
✅ Generated 500 transactions with 28 suspicious patterns Sample data:
Out[2]:
| transaction_id | user_id | amount | merchant | timestamp | is_suspicious | |
|---|---|---|---|---|---|---|
| 0 | T0001 | USER_087 | 6.25 | Restaurant | 2024-01-01 00:02:00 | False |
| 1 | T0002 | USER_024 | 53.43 | Online_Shop | 2024-01-01 00:07:00 | False |
| 2 | T0003 | USER_064 | 0.01 | Online_Casino | 2024-01-01 00:19:00 | True |
| 3 | T0004 | USER_042 | 34.24 | Restaurant | 2024-01-01 00:27:00 | False |
| 4 | T0005 | USER_064 | 17.43 | Online_Shop | 2024-01-01 00:43:00 | False |
Basic Statistical Analysis¶
In [3]:
Copied!
# Analyze transaction patterns
if DEMO_MODE:
# Local analysis
stats = (
transactions_df.groupby("user_id")
.agg({"amount": ["count", "sum", "mean", "std"], "is_suspicious": "sum"})
.round(2)
)
# Find top suspicious users
suspicious_users = (
stats[stats[("is_suspicious", "sum")] > 0]
.sort_values(("is_suspicious", "sum"), ascending=False)
.head(10)
)
print("🔍 Top users with suspicious activity:")
print(suspicious_users)
else:
# Use LouieAI for analysis
lui("Analyze transaction patterns and identify suspicious users", transactions_df)
if lui.df is not None:
print(f"Analysis complete: {lui.df.shape}")
# Analyze transaction patterns
if DEMO_MODE:
# Local analysis
stats = (
transactions_df.groupby("user_id")
.agg({"amount": ["count", "sum", "mean", "std"], "is_suspicious": "sum"})
.round(2)
)
# Find top suspicious users
suspicious_users = (
stats[stats[("is_suspicious", "sum")] > 0]
.sort_values(("is_suspicious", "sum"), ascending=False)
.head(10)
)
print("🔍 Top users with suspicious activity:")
print(suspicious_users)
else:
# Use LouieAI for analysis
lui("Analyze transaction patterns and identify suspicious users", transactions_df)
if lui.df is not None:
print(f"Analysis complete: {lui.df.shape}")
🔍 Top users with suspicious activity:
amount is_suspicious
count sum mean std sum
user_id
USER_081 4 7350.98 1837.74 3520.41 2
USER_062 5 1222.61 244.52 425.04 2
USER_019 6 1621.20 270.20 387.46 1
USER_022 3 135.60 45.20 74.52 1
USER_023 6 9433.40 1572.23 3249.14 1
USER_025 7 1883.85 269.12 606.36 1
USER_031 1 0.01 0.01 NaN 1
USER_038 8 204.72 25.59 40.79 1
USER_006 5 83.78 16.76 23.97 1
USER_018 2 8116.36 4058.18 5627.92 1
Anomaly Detection with Z-Scores¶
In [4]:
Copied!
# Calculate Z-scores for anomaly detection without scipy
# Z-score = (value - mean) / std_dev
mean_amount = transactions_df["amount"].mean()
std_amount = transactions_df["amount"].std()
# Calculate Z-scores for amounts
transactions_df["amount_zscore"] = np.abs(
(transactions_df["amount"] - mean_amount) / std_amount
)
# Flag outliers (Z-score > 3)
transactions_df["is_outlier"] = transactions_df["amount_zscore"] > 3
# Show top anomalies
anomalies = (
transactions_df[transactions_df["is_outlier"]]
.sort_values("amount_zscore", ascending=False)
.head(10)
)
print(f"🚨 Found {transactions_df['is_outlier'].sum()} outlier transactions")
print("\nTop 10 anomalous transactions:")
anomalies[["transaction_id", "user_id", "amount", "amount_zscore", "merchant"]]
# Calculate Z-scores for anomaly detection without scipy
# Z-score = (value - mean) / std_dev
mean_amount = transactions_df["amount"].mean()
std_amount = transactions_df["amount"].std()
# Calculate Z-scores for amounts
transactions_df["amount_zscore"] = np.abs(
(transactions_df["amount"] - mean_amount) / std_amount
)
# Flag outliers (Z-score > 3)
transactions_df["is_outlier"] = transactions_df["amount_zscore"] > 3
# Show top anomalies
anomalies = (
transactions_df[transactions_df["is_outlier"]]
.sort_values("amount_zscore", ascending=False)
.head(10)
)
print(f"🚨 Found {transactions_df['is_outlier'].sum()} outlier transactions")
print("\nTop 10 anomalous transactions:")
anomalies[["transaction_id", "user_id", "amount", "amount_zscore", "merchant"]]
🚨 Found 7 outlier transactions Top 10 anomalous transactions:
Out[4]:
| transaction_id | user_id | amount | amount_zscore | merchant | |
|---|---|---|---|---|---|
| 106 | T0107 | USER_080 | 9849.39 | 9.792456 | Wire_Transfer |
| 206 | T0207 | USER_098 | 9153.27 | 9.083782 | Online_Casino |
| 21 | T0022 | USER_023 | 8182.05 | 8.095047 | Online_Casino |
| 63 | T0064 | USER_087 | 8139.47 | 8.051699 | Online_Casino |
| 5 | T0006 | USER_018 | 8037.72 | 7.948114 | ATM_Foreign |
| 14 | T0015 | USER_081 | 7117.01 | 7.010800 | Wire_Transfer |
| 32 | T0033 | USER_053 | 6725.36 | 6.612087 | Wire_Transfer |
Velocity Analysis¶
In [5]:
Copied!
# Detect rapid-fire transactions (velocity attacks)
transactions_df["timestamp"] = pd.to_datetime(transactions_df["timestamp"])
# Calculate time between transactions for each user
velocity_check = []
for user in transactions_df["user_id"].unique():
user_trans = transactions_df[transactions_df["user_id"] == user].sort_values(
"timestamp"
)
if len(user_trans) > 1:
user_trans["time_diff"] = (
user_trans["timestamp"].diff().dt.total_seconds() / 60
) # in minutes
rapid = user_trans[user_trans["time_diff"] < 5] # Less than 5 minutes
if len(rapid) > 0:
velocity_check.append(
{
"user_id": user,
"rapid_transactions": len(rapid),
"total_rapid_amount": rapid["amount"].sum(),
}
)
if velocity_check:
velocity_df = (
pd.DataFrame(velocity_check)
.sort_values("rapid_transactions", ascending=False)
.head(5)
)
print("⚡ Users with rapid transaction patterns:")
print(velocity_df)
else:
print("No rapid transaction patterns detected")
# Detect rapid-fire transactions (velocity attacks)
transactions_df["timestamp"] = pd.to_datetime(transactions_df["timestamp"])
# Calculate time between transactions for each user
velocity_check = []
for user in transactions_df["user_id"].unique():
user_trans = transactions_df[transactions_df["user_id"] == user].sort_values(
"timestamp"
)
if len(user_trans) > 1:
user_trans["time_diff"] = (
user_trans["timestamp"].diff().dt.total_seconds() / 60
) # in minutes
rapid = user_trans[user_trans["time_diff"] < 5] # Less than 5 minutes
if len(rapid) > 0:
velocity_check.append(
{
"user_id": user,
"rapid_transactions": len(rapid),
"total_rapid_amount": rapid["amount"].sum(),
}
)
if velocity_check:
velocity_df = (
pd.DataFrame(velocity_check)
.sort_values("rapid_transactions", ascending=False)
.head(5)
)
print("⚡ Users with rapid transaction patterns:")
print(velocity_df)
else:
print("No rapid transaction patterns detected")
⚡ Users with rapid transaction patterns:
user_id rapid_transactions total_rapid_amount
0 USER_036 1 547.65
Fraud Ring Detection¶
In [6]:
Copied!
# Identify potential fraud rings (users sharing suspicious merchants)
fraud_merchants = transactions_df[transactions_df["is_suspicious"]]["merchant"].unique()
fraud_connections = []
for merchant in fraud_merchants:
users_at_merchant = transactions_df[transactions_df["merchant"] == merchant][
"user_id"
].unique()
if len(users_at_merchant) > 1:
fraud_connections.append(
{
"merchant": merchant,
"connected_users": len(users_at_merchant),
"users": ", ".join(users_at_merchant[:5]), # Show first 5
}
)
if fraud_connections:
connections_df = pd.DataFrame(fraud_connections).sort_values(
"connected_users", ascending=False
)
print("🕸️ Potential fraud rings detected:")
print(connections_df)
else:
print("No fraud rings detected")
# Identify potential fraud rings (users sharing suspicious merchants)
fraud_merchants = transactions_df[transactions_df["is_suspicious"]]["merchant"].unique()
fraud_connections = []
for merchant in fraud_merchants:
users_at_merchant = transactions_df[transactions_df["merchant"] == merchant][
"user_id"
].unique()
if len(users_at_merchant) > 1:
fraud_connections.append(
{
"merchant": merchant,
"connected_users": len(users_at_merchant),
"users": ", ".join(users_at_merchant[:5]), # Show first 5
}
)
if fraud_connections:
connections_df = pd.DataFrame(fraud_connections).sort_values(
"connected_users", ascending=False
)
print("🕸️ Potential fraud rings detected:")
print(connections_df)
else:
print("No fraud rings detected")
🕸️ Potential fraud rings detected:
merchant connected_users \
1 ATM_Foreign 12
0 Online_Casino 9
2 Wire_Transfer 7
users
1 USER_018, USER_006, USER_057, USER_093, USER_038
0 USER_064, USER_023, USER_087, USER_050, USER_084
2 USER_081, USER_053, USER_069, USER_080, USER_062
Risk Scoring¶
In [7]:
Copied!
# Calculate comprehensive risk scores
risk_scores = []
for user in transactions_df["user_id"].unique():
user_trans = transactions_df[transactions_df["user_id"] == user]
# Risk factors
suspicious_count = user_trans["is_suspicious"].sum()
outlier_count = (
user_trans["is_outlier"].sum() if "is_outlier" in user_trans.columns else 0
)
high_amounts = (user_trans["amount"] > 5000).sum()
unique_merchants = user_trans["merchant"].nunique()
# Calculate risk score (0-100)
risk_score = min(
100,
suspicious_count * 30
+ outlier_count * 20
+ high_amounts * 10
+ max(0, unique_merchants - 10) * 5,
)
if risk_score > 0:
risk_scores.append(
{
"user_id": user,
"risk_score": risk_score,
"suspicious_transactions": suspicious_count,
"outliers": outlier_count,
"high_value_transactions": high_amounts,
}
)
if risk_scores:
risk_df = (
pd.DataFrame(risk_scores).sort_values("risk_score", ascending=False).head(10)
)
print("📊 User Risk Assessment (Top 10):")
print(risk_df)
print(
f"\n✅ Fraud investigation complete! Analyzed {len(transactions_df)} transactions"
)
else:
print("✅ No high-risk users identified")
# Calculate comprehensive risk scores
risk_scores = []
for user in transactions_df["user_id"].unique():
user_trans = transactions_df[transactions_df["user_id"] == user]
# Risk factors
suspicious_count = user_trans["is_suspicious"].sum()
outlier_count = (
user_trans["is_outlier"].sum() if "is_outlier" in user_trans.columns else 0
)
high_amounts = (user_trans["amount"] > 5000).sum()
unique_merchants = user_trans["merchant"].nunique()
# Calculate risk score (0-100)
risk_score = min(
100,
suspicious_count * 30
+ outlier_count * 20
+ high_amounts * 10
+ max(0, unique_merchants - 10) * 5,
)
if risk_score > 0:
risk_scores.append(
{
"user_id": user,
"risk_score": risk_score,
"suspicious_transactions": suspicious_count,
"outliers": outlier_count,
"high_value_transactions": high_amounts,
}
)
if risk_scores:
risk_df = (
pd.DataFrame(risk_scores).sort_values("risk_score", ascending=False).head(10)
)
print("📊 User Risk Assessment (Top 10):")
print(risk_df)
print(
f"\n✅ Fraud investigation complete! Analyzed {len(transactions_df)} transactions"
)
else:
print("✅ No high-risk users identified")
📊 User Risk Assessment (Top 10):
user_id risk_score suspicious_transactions outliers \
6 USER_081 90 2 1
0 USER_087 60 1 1
8 USER_023 60 1 1
2 USER_018 60 1 1
5 USER_062 60 2 0
3 USER_053 60 1 1
13 USER_098 60 1 1
11 USER_080 60 1 1
1 USER_064 30 1 0
4 USER_006 30 1 0
high_value_transactions
6 1
0 1
8 1
2 1
5 0
3 1
13 1
11 1
1 0
4 0
✅ Fraud investigation complete! Analyzed 500 transactions
Summary¶
Investigation Results:¶
- Total Transactions: 500 records analyzed
- Suspicious Patterns: ~5% of transactions flagged
- Detection Methods:
- Statistical anomaly detection (Z-scores)
- Velocity analysis (rapid transactions)
- Network analysis (shared merchants)
- Comprehensive risk scoring
Next Steps:¶
- Review high-risk users for manual investigation
- Set up real-time monitoring for flagged patterns
- Adjust detection thresholds based on false positive rates
- Implement automated blocking for confirmed fraud patterns