This project simulates business plans using reinforcement learning (RL) and interacts with a language model (LLM) to provide feedback and insights. The system is designed to train RL agents to optimize business plans by taking various actions and receiving rewards based on the outcomes.
simulation/
: Contains the core simulation logic, including environment setup, rewards calculation, and LLM client interactions.actions.py
: Defines valid action types and validates actions.environment.py
: Sets up the simulation environment.llm_client.py
: Interacts with the LLM to get feedback based on actions and states.rewards.py
: Calculates rewards based on the current state and actions taken.
notebooks/
: Jupyter notebooks for training RL agents, interacting with the LLM, and exploring business plan data.rl_training.ipynb
: Notebook for training RL agents.llm_interaction.ipynb
: Notebook for prototyping and testing interactions with the LLM.data_exploration.ipynb
: Notebook for exploring and visualizing business plan data.
utils/
: Utility functions and helpers.helpers.py
: Helper functions for formatting state and logging.logger.py
: Custom logging functions.
config.py
: Configuration file for setting up API keys, LLM parameters, and logging settings.
- Simulation Environment: The environment is initialized with a sample business plan. The RL agent interacts with this environment by taking actions and receiving rewards.
- Actions: Actions can be immediate, scheduled, conditional, or reversible. Each action type has specific parameters and is validated before execution.
- Rewards: Rewards are calculated based on the current state and the action taken. Metrics such as sales recovery, inventory optimization, and cost reduction are considered.
- LLM Interaction: The LLM client interacts with the language model to get feedback on actions taken by the RL agent. The feedback is used to simulate client responses and improve the business plan.
- Training: The RL agent is trained over multiple episodes to optimize the business plan by maximizing rewards.
- Sample Plan: The sample business plan is defined in JSON format and includes details such as objectives, resources, timeline, constraints, risks, and metrics.
{ "plan_id": "PLAN_2025_02_07_001", "status": "IN_PROGRESS", "objectives": ["Increase revenue", "Reduce costs"], "resources": ["Budget", "Team"], "timeline": "6 months", "constraints": ["Limited budget", "Tight deadline"], "risks": ["Market volatility", "Team turnover"], "metrics": { "sales_recovery": { "target": 1000, "actual": 920, "trend": "IMPROVING" }, "inventory_optimization": { "target_level": 2000, "current_level": 2200, "trend": "DECREASING" }, "cost_reduction": { "target": 20000, "achieved": 18500, "timeline": "ON_TRACK" } }, "next_review": "2025-02-10T14:30:00Z" }
The config.py
file contains essential configuration settings:
- OpenAI API Key: API key for accessing the OpenAI LLM.
- LLM Parameters: Parameters such as
MAX_TOKENS
andTEMPERATURE
for controlling the LLM's response. - Logging Settings: Settings for logging, including the log file path and log level.
- Set Up Environment: Initialize the simulation environment with a sample business plan.
- Train RL Agent: Use the
rl_training.ipynb
notebook to train the RL agent. - Interact with LLM: Use the
llm_interaction.ipynb
notebook to prototype and test interactions with the LLM. - Explore Data: Use the
data_exploration.ipynb
notebook to explore and visualize business plan data.
from simulation.environment import SimulationEnvironment
from simulation.rl_agent import RLAgent
from utils.logger import log_info, log_error
def main():
sample_plan = {
"plan_id": "PLAN_2025_02_07_001",
"status": "IN_PROGRESS",
"objectives": ["Increase revenue", "Reduce costs"],
"resources": ["Budget", "Team"],
"timeline": "6 months",
"constraints": ["Limited budget", "Tight deadline"],
"risks": ["Market volatility", "Team turnover"],
"metrics": {
"sales_recovery": {
"target": 1000,
"actual": 920,
"trend": "IMPROVING"
},
"inventory_optimization": {
"target_level": 2000,
"current_level": 2200,
"trend": "DECREASING"
},
"cost_reduction": {
"target": 20000,
"achieved": 18500,
"timeline": "ON_TRACK"
}
},
"next_review": "2025-02-10T14:30:00Z"
}
env = SimulationEnvironment(sample_plan)
agent = RLAgent()
num_episodes = 10
for episode in range(num_episodes):
state = env.reset()
done = False
total_reward = 0
while not done:
action = agent.choose_action(state)
next_state, reward, done = env.step(action)
agent.update(state, action, reward, next_state)
state = next_state
total_reward += reward
log_info(f"Episode {episode + 1}, Total Reward: {total_reward}")
if __name__ == "__main__":
try:
main()
except Exception as e:
log_error(f"An error occurred: {e}")
This example demonstrates how to set up the environment, train the RL agent, and log the results.