Data Science with LouieAI Agents¶
Quick data science workflow showcasing specialized agents for analysis and visualization.
In [1]:
Copied!
import numpy as np
import pandas as pd
# For demonstration, generate sample data
np.random.seed(42)
demo_data = pd.DataFrame(
{
"date": pd.date_range("2024-01-01", periods=100, freq="D"),
"product": np.random.choice(["Widget", "Gadget", "Gizmo"], 100),
"region": np.random.choice(["North", "South", "East", "West"], 100),
"revenue": np.random.uniform(100, 1000, 100).round(2),
"units_sold": np.random.randint(1, 50, 100),
}
)
print("📊 Sample sales dataset created")
print(f"Shape: {demo_data.shape}")
print("\nFirst 5 rows:")
demo_data.head()
import numpy as np
import pandas as pd
# For demonstration, generate sample data
np.random.seed(42)
demo_data = pd.DataFrame(
{
"date": pd.date_range("2024-01-01", periods=100, freq="D"),
"product": np.random.choice(["Widget", "Gadget", "Gizmo"], 100),
"region": np.random.choice(["North", "South", "East", "West"], 100),
"revenue": np.random.uniform(100, 1000, 100).round(2),
"units_sold": np.random.randint(1, 50, 100),
}
)
print("📊 Sample sales dataset created")
print(f"Shape: {demo_data.shape}")
print("\nFirst 5 rows:")
demo_data.head()
📊 Sample sales dataset created Shape: (100, 5) First 5 rows:
Out[1]:
| date | product | region | revenue | units_sold | |
|---|---|---|---|---|---|
| 0 | 2024-01-01 | Gizmo | West | 906.48 | 33 |
| 1 | 2024-01-02 | Widget | West | 386.20 | 48 |
| 2 | 2024-01-03 | Gizmo | West | 199.05 | 12 |
| 3 | 2024-01-04 | Gizmo | East | 305.14 | 22 |
| 4 | 2024-01-05 | Widget | East | 484.40 | 22 |
Analysis with CodeAgent¶
Now let's analyze the data patterns. If you uploaded your own data above, the agents will work with your data:
In [2]:
Copied!
# Generate larger dataset with patterns
np.random.seed(100)
sales_data = pd.DataFrame(
{
"date": pd.date_range("2024-01-01", periods=100, freq="D"),
"product": np.random.choice(["Widget", "Gadget", "Gizmo"], 100),
"region": np.random.choice(["North", "South", "East", "West"], 100),
"revenue": np.random.uniform(100, 1000, 100).round(2),
"units_sold": np.random.randint(1, 50, 100),
}
)
print("✅ Created sales data with 100 rows")
print("\nSample of the data:")
sales_data.head()
# Generate larger dataset with patterns
np.random.seed(100)
sales_data = pd.DataFrame(
{
"date": pd.date_range("2024-01-01", periods=100, freq="D"),
"product": np.random.choice(["Widget", "Gadget", "Gizmo"], 100),
"region": np.random.choice(["North", "South", "East", "West"], 100),
"revenue": np.random.uniform(100, 1000, 100).round(2),
"units_sold": np.random.randint(1, 50, 100),
}
)
print("✅ Created sales data with 100 rows")
print("\nSample of the data:")
sales_data.head()
✅ Created sales data with 100 rows Sample of the data:
Out[2]:
| date | product | region | revenue | units_sold | |
|---|---|---|---|---|---|
| 0 | 2024-01-01 | Widget | North | 726.55 | 42 |
| 1 | 2024-01-02 | Widget | East | 550.32 | 13 |
| 2 | 2024-01-03 | Widget | East | 744.46 | 9 |
| 3 | 2024-01-04 | Gizmo | North | 573.36 | 1 |
| 4 | 2024-01-05 | Gizmo | West | 101.26 | 31 |
Analysis with CodeAgent¶
Let's analyze the data patterns:
In [3]:
Copied!
# Statistical analysis - Calculate revenue statistics by region
summary_stats = (
sales_data.groupby("region")
.agg({"revenue": ["count", "sum", "mean", "std"], "units_sold": ["sum", "mean"]})
.round(2)
)
# Flatten column names
summary_stats.columns = ["_".join(col).strip() for col in summary_stats.columns.values]
print("📈 Revenue Statistics by Region:")
print(summary_stats)
# Statistical analysis - Calculate revenue statistics by region
summary_stats = (
sales_data.groupby("region")
.agg({"revenue": ["count", "sum", "mean", "std"], "units_sold": ["sum", "mean"]})
.round(2)
)
# Flatten column names
summary_stats.columns = ["_".join(col).strip() for col in summary_stats.columns.values]
print("📈 Revenue Statistics by Region:")
print(summary_stats)
📈 Revenue Statistics by Region:
revenue_count revenue_sum revenue_mean revenue_std units_sold_sum \
region
East 30 17259.35 575.31 297.01 706
North 25 14520.83 580.83 195.50 603
South 26 13039.49 501.52 238.95 603
West 19 11945.79 628.73 269.07 509
units_sold_mean
region
East 23.53
North 24.12
South 23.19
West 26.79
Visualization with PerspectiveAgent¶
In [4]:
Copied!
# Create visualization data - revenue by region
viz_data = sales_data.groupby("region")["revenue"].sum().sort_values(ascending=False)
print("📊 Revenue by Region (for visualization):")
for region, revenue in viz_data.items():
bar_length = int(revenue / 1000) # Scale for display
print(f"{region:10} {'█' * bar_length} ${revenue:,.2f}")
print("\n💡 In a real environment, this would render as an interactive chart")
# Create visualization data - revenue by region
viz_data = sales_data.groupby("region")["revenue"].sum().sort_values(ascending=False)
print("📊 Revenue by Region (for visualization):")
for region, revenue in viz_data.items():
bar_length = int(revenue / 1000) # Scale for display
print(f"{region:10} {'█' * bar_length} ${revenue:,.2f}")
print("\n💡 In a real environment, this would render as an interactive chart")
📊 Revenue by Region (for visualization): East █████████████████ $17,259.35 North ██████████████ $14,520.83 South █████████████ $13,039.49 West ███████████ $11,945.79 💡 In a real environment, this would render as an interactive chart
Graph Analysis¶
Build relationships for deeper insights:
In [5]:
Copied!
# Build product-region relationship analysis
product_region = (
sales_data.groupby(["product", "region"])
.agg({"revenue": "sum", "units_sold": "sum"})
.reset_index()
)
# Find top connections
top_connections = product_region.nlargest(5, "revenue")
print("🔗 Top Product-Region Connections:")
print(top_connections.to_string(index=False))
print("\n📈 These relationships show the strongest sales patterns")
# Build product-region relationship analysis
product_region = (
sales_data.groupby(["product", "region"])
.agg({"revenue": "sum", "units_sold": "sum"})
.reset_index()
)
# Find top connections
top_connections = product_region.nlargest(5, "revenue")
print("🔗 Top Product-Region Connections:")
print(top_connections.to_string(index=False))
print("\n📈 These relationships show the strongest sales patterns")
🔗 Top Product-Region Connections: product region revenue units_sold Widget East 6902.77 316 Gizmo East 6125.41 220 Gizmo South 5843.43 319 Gizmo North 5823.47 176 Gadget West 5347.37 160 📈 These relationships show the strongest sales patterns
Agent Composition¶
Combine multiple agents for complex workflows:
In [6]:
Copied!
# Multi-step analysis workflow
# Step 1: Find top products
top_products = sales_data.groupby("product")["revenue"].sum().nlargest(3)
print("🏆 Top 3 Products by Revenue:")
for product, revenue in top_products.items():
print(f" {product}: ${revenue:,.2f}")
# Step 2: Create forecast (simplified linear trend)
forecast_data = []
for product in top_products.index:
product_data = sales_data[sales_data["product"] == product].copy()
product_data = product_data.sort_values("date")
# Simple moving average forecast
last_30_days = product_data.tail(30)["revenue"].mean()
forecast_data.append(
{
"product": product,
"current_avg": last_30_days,
"forecast_next_month": last_30_days * 1.05, # 5% growth assumption
}
)
forecast_df = pd.DataFrame(forecast_data)
print("\n📈 Revenue Forecast (Next Month):")
print(forecast_df.round(2))
print("\n✅ Complete data science workflow using multiple analysis steps!")
# Multi-step analysis workflow
# Step 1: Find top products
top_products = sales_data.groupby("product")["revenue"].sum().nlargest(3)
print("🏆 Top 3 Products by Revenue:")
for product, revenue in top_products.items():
print(f" {product}: ${revenue:,.2f}")
# Step 2: Create forecast (simplified linear trend)
forecast_data = []
for product in top_products.index:
product_data = sales_data[sales_data["product"] == product].copy()
product_data = product_data.sort_values("date")
# Simple moving average forecast
last_30_days = product_data.tail(30)["revenue"].mean()
forecast_data.append(
{
"product": product,
"current_avg": last_30_days,
"forecast_next_month": last_30_days * 1.05, # 5% growth assumption
}
)
forecast_df = pd.DataFrame(forecast_data)
print("\n📈 Revenue Forecast (Next Month):")
print(forecast_df.round(2))
print("\n✅ Complete data science workflow using multiple analysis steps!")
🏆 Top 3 Products by Revenue: Gizmo: $21,669.48 Widget: $17,601.77 Gadget: $17,494.21 📈 Revenue Forecast (Next Month): product current_avg forecast_next_month 0 Gizmo 575.98 604.77 1 Widget 544.16 571.37 2 Gadget 624.79 656.03 ✅ Complete data science workflow using multiple analysis steps!