贝叶斯模型为时间序列分析提供了一个灵活的框架,其功能超越了传统ARIMA模型的范畴。与假设固定参数结构的ARIMA不同,贝叶斯方法允许动态适应和不确定性量化,这使得它们在复杂和不确定的环境中尤其强大。
让我们来看看将贝叶斯方法应用于时间序列的四种方式:贝叶斯ARIMA、贝叶斯结构时间序列(BSTS)、贝叶斯动态线性模型(BDLM)和贝叶斯神经网络(BNN)。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pmdarima as pmd
import torch
import torch.nn as nn
import pyro
import pyro.distributions as dist
from pyro.nn import PyroModule, PyroSample
from pyro.infer import SVI, Trace_ELBO
from pyro.optim import Adam
from pyro.infer.autoguide import AutoDiagonalNormal
from tqdm import trange
import pybsts
import pymc as pm
from sklearn.metrics import mean_squared_error
# === Data Loading ===
def load_data(file_name="ercot_load_data.csv"):
df = pd.read_csv(file_name, parse_dates=['date'])
df.set_index('date', inplace=True)
df = df.resample('h').mean().dropna()
return df
# === Modular Visualization Function ===
def plot_forecast(df, forecast_index, forecast_mean, forecast_lower, forecast_upper, model_name):
"""
Plots the historical data, forecast, and confidence intervals for the last 25 points.
Parameters:
- df: DataFrame containing the historical data
- forecast_index: Index for the forecasted values
- forecast_mean: Forecasted mean values
- forecast_lower: Lower bound of the confidence interval
- forecast_upper: Upper bound of the confidence interval
- model_name: Name of the model for the title and filename
"""
# Plot all historical data
plt.figure(figsize=(15, 8))
plt.plot(df.index, df['values'], label='Actual', color='blue')
# Plot the forecast for the last 25 points
plt.plot(forecast_index, forecast_mean, label='Forecast', color='red', linestyle='--')
plt.fill_between(
forecast_index,
forecast_lower,
forecast_upper,
color='red', alpha=0.2, label='95% Confidence Interval'
)
# Add dashed vertical line where holdout set begins
holdout_start = df.index[-len(forecast_index)]
plt.axvline(x=holdout_start, color='black', linestyle='--', label='Holdout Start')
# Customizations
plt.title(f'{model_name} Forecast')
plt.xlabel('Time')
plt.ylabel('Demand')
plt.legend()
ax = plt.gca()
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.tight_layout()
plt.savefig(f'{model_name.lower()}_forecast.png')
plt.show()
贝叶斯结构时间序列(BSTS)模型
贝叶斯结构时间序列(BSTS)模型将时间序列数据分解为不同的组成部分,包括用于长期模式的趋势成分、用于周期性行为的季节性元素、用于额外变量的外部回归因子,以及异常检测功能。贝叶斯框架能够为每个成分提供不确定性估计,从而提高预测准确性。这些模型在零售库存管理、经济政策分析和金融异常检测等领域有着广泛的应用。
BSTS模型可以使用R语言中的bsts库或Python中的PyBSTS库来实现。例如:
# === Simplified Bayesian Structural Time Series (BSTS) ===
def bayesian_sts(df, forecast_horizon=25):
train = df['values'].iloc[:-forecast_horizon].values
# Define and fit the BSTS model
specification = {
"ar_order": 1,
"local_trend": {"local_level": True},
"sigma_prior": np.std(train, ddof=1),
"initial_value": train[0]
}
model = pybsts.PyBsts(
"gaussian",
specification,
{
"ping": 10,
"niter": 100,
"burn": 10,
"forecast_horizon": forecast_horizon,
"seed": 1
}
)
model.fit(train, seed=1)
forecast = model.predict(seed=1)
forecast_mean = np.mean(forecast, axis=0)
forecast_std = np.std(forecast, axis=0)
# Forecast index for the last 25 points
forecast_index = df.index[-forecast_horizon:]
# Use modular visualization
plot_forecast(
df,
forecast_index,
forecast_mean,
forecast_mean - 1.96 * forecast_std,
forecast_mean + 1.96 * forecast_std,
model_name="BSTS"
)
return forecast_mean
贝叶斯动态线性模型(BDLM)
贝叶斯动态线性模型(BDLM)通过贝叶斯推断扩展了状态空间模型,在变量之间的关系随时间演变时特别有价值。这些模型能够适应不断变化的动态,提供完整的参数分布,并通过先验信息融入领域专业知识。BDLM在环境数据分析、资产价格建模和医疗结果预测等方面表现出色,因为这些领域的系统动态经常发生变化。
我们将使用pymc来实现我们的BDLM:
# === Simplified BDLM with Confidence Interval ===
def bayesian_bdlm(df, forecast_horizon=25):
train = df.iloc[:-forecast_horizon]
# Fit BDLM model
with pm.Model() as model:
sigma = pm.HalfNormal('sigma', sigma=1)
trend_sigma = pm.HalfNormal('trend_sigma', sigma=0.1)
seasonal_sigma = pm.HalfNormal('seasonal_sigma', sigma=0.1)
trend = pm.GaussianRandomWalk('trend', sigma=trend_sigma, shape=len(train))
period = 24
seasonal = pm.Normal('seasonal', mu=0, sigma=seasonal_sigma, shape=period)
idx = np.arange(len(train)) % period
mu = trend + seasonal[idx]
y = pm.Normal('y', mu=mu, sigma=sigma, observed=train['values'])
trace = pm.sample(2000, tune=1000, return_inferencedata=False)
# Forecasting
trend_pred = np.mean(trace['trend'], axis=0)
seasonal_pred = np.mean(trace['seasonal'], axis=0)
predictions = trend_pred + seasonal_pred[idx]
# Calculate 95% credible intervals
lower_bound = np.percentile(trace['trend'], 2.5, axis=0) + seasonal_pred[idx]
upper_bound = np.percentile(trace['trend'], 97.5, axis=0) + seasonal_pred[idx]
# Forecast index for the last 25 points
forecast_index = df.index[-forecast_horizon:]
# Use modular visualization
plot_forecast(
df,
forecast_index,
predictions[-forecast_horizon:],
lower_bound[-forecast_horizon:],
upper_bound[-forecast_horizon:],
model_name="BDLM"
)
return predictions[-forecast_horizon:]
贝叶斯神经网络(BNN)
贝叶斯神经网络(BNN)将神经网络架构与贝叶斯原理相结合。它们对网络参数进行概率处理,对预测中的不确定性进行量化,并具备处理复杂非线性模式的能力。在传统线性模型往往力不从心的领域,如电网负荷预测、制造质量控制和经济指标预测中,组织会采用BNN。
注意——这些模型的计算成本很高。
# === Simplified Bayesian Neural Network (BNN) ===
def bayesian_nn(df, forecast_horizon=25):
# Prepare data for BNN
def prepare_data(data, lookback=7):
X, y = [], []
values = data['values'].values
for i in range(len(values) - lookback):
X.append(values[i:i+lookback])
y.append(values[i+lookback])
return torch.FloatTensor(X), torch.FloatTensor(y)
X_train, y_train = prepare_data(df.iloc[:-forecast_horizon])
X_test, y_test = prepare_data(df.iloc[-(forecast_horizon+7):])
# Define BNN model
class TimeSeriesBNN(PyroModule):
def __init__(self, input_dim=7, hidden_dim=32, output_dim=1):
super().__init__()
self.hidden1 = PyroModule[nn.Linear](input_dim, hidden_dim)
self.hidden2 = PyroModule[nn.Linear](hidden_dim, hidden_dim)
self.output = PyroModule[nn.Linear](hidden_dim, output_dim)
self.activation = nn.ReLU()
self.hidden1.weight = PyroSample(dist.Normal(0., 1.).expand([hidden_dim, input_dim]).to_event(2))
self.hidden1.bias = PyroSample(dist.Normal(0., 1.).expand([hidden_dim]).to_event(1))
self.hidden2.weight = PyroSample(dist.Normal(0., 1.).expand([hidden_dim, hidden_dim]).to_event(2))
self.hidden2.bias = PyroSample(dist.Normal(0., 1.).expand([hidden_dim]).to_event(1))
self.output.weight = PyroSample(dist.Normal(0., 1.).expand([output_dim, hidden_dim]).to_event(2))
self.output.bias = PyroSample(dist.Normal(0., 1.).expand([output_dim]).to_event(1))
def forward(self, x, y=None):
x = self.activation(self.hidden1(x))
x = self.activation(self.hidden2(x))
mu = self.output(x).squeeze(-1)
sigma = pyro.sample("sigma", dist.Gamma(1.0, 1.0))
with pyro.plate("data", x.shape[0]):
obs = pyro.sample("obs", dist.Normal(mu, sigma), obs=y)
return mu
# Train the BNN model
model = TimeSeriesBNN()
guide = AutoDiagonalNormal(model)
adam = Adam({"lr": 0.01})
svi = SVI(model, guide, adam, loss=Trace_ELBO())
num_epochs = 1000
for epoch in trange(num_epochs):
svi.step(X_train, y_train)
# Predict using the BNN model
def predict_bnn(model, guide, X_input, n_samples=100):
predictive = pyro.infer.Predictive(model, guide=guide, num_samples=n_samples)
samples = predictive(X_input)
preds = samples['obs'].detach().numpy()
mean_pred = preds.mean(axis=0)
std_pred = preds.std(axis=0)
return mean_pred, std_pred
mean_pred, std_pred = predict_bnn(model, guide, X_test)
# Forecast index for the last 25 points
forecast_index = df.index[-forecast_horizon:]
# Use modular visualization
plot_forecast(
df,
forecast_index,
mean_pred,
mean_pred - 1.96 * std_pred,
mean_pred + 1.96 * std_pred,
model_name="BNN"
)
return mean_pred
贝叶斯ARIMA将传统的ARIMA建模与贝叶斯推断相结合,提供了参数不确定性估计、先验知识的整合以及更稳健的预测区间。贝叶斯框架使这些模型能够量化预测不确定性、适应不断变化的条件、融入领域知识,并有效处理缺失数据。
# ===Bayesian ARIMA ===
def bayesian_arima(df, forecast_horizon=25):
train = df['values'].iloc[:-forecast_horizon].values
# Train ARIMA on the training set
arima = pmd.auto_arima(
train,
start_p=0, start_q=0,
max_p=3, max_q=3,
start_P=0, start_Q=0,
max_P=2, max_Q=2,
m=24,
seasonal=True,
d=None,
D=1,
test='adf',
trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True
)
# Forecast the last 25 points
barima_forecast, conf_int = arima.predict(n_periods=forecast_horizon, return_conf_int=True)
forecast_index = df.index[-forecast_horizon:]
# Use modular visualization
plot_forecast(
df,
forecast_index,
barima_forecast,
conf_int[:, 0],
conf_int[:, 1],
model_name="Bayesian ARIMA"
)
return barima_forecast
MSE RMSE MAPE sMAPE
Bayesian ARIMA 90.518746 9.514134 2.311101 2.34394090.518746 9.514134 2.311101 2.343940
BNN with Pyro 748.787444 27.363981 7.818061 7.414179
BDLM 326.776625 18.076964 4.722612 4.531827
BSTS 383.895118 19.593242 5.781497 5.676614
总结
在实施这些方法时,分析人员必须考虑计算需求,评估数据的质量和数量,评估模型假设,并在复杂性与可解释性之间找到平衡。每种方法都为金融、医疗、能源及其他领域中的时间序列分析提供了一个严谨的框架,在这些领域中,对不确定性的理解是决策制定的关键。
这些贝叶斯方法在其量化不确定性、适应新信息以及融入先验知识的能力方面具有共同的优势。方法的选择取决于特定的应用需求、数据特性以及计算限制。通过了解这些方法,分析人员可以为其特定的时间序列挑战选择最合适的工具。
# === Main Function ===
def main():
df = load_data()
barima_forecast = bayesian_arima(df)
bnn_forecast = bayesian_nn(df)
bdlm_forecast = bayesian_bdlm(df)
bsts_forecast = bayesian_sts(df)
y_true = df['values'].values[-25:]
metrics = {
"Bayesian ARIMA": calculate_metrics(y_true, barima_forecast),
"BNN with Pyro": calculate_metrics(y_true, bnn_forecast),
"BDLM": calculate_metrics(y_true, bdlm_forecast),
"BSTS": calculate_metrics(y_true, bsts_forecast)
}
# Convert to DataFrame for better readability
df_metrics = pd.DataFrame(metrics, index=["MSE", "RMSE", "MAPE", "sMAPE"]).T
print(df_metrics)
# Plot comparison
df_metrics.plot(kind='bar', figsize=(15, 8))
plt.title('Model Comparison')
plt.ylabel('Error')
ax = plt.gca()
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.tight_layout()
plt.savefig('model_comparison.png')
plt.show()
if __name__ == "__main__":
main()
Time Series Analysis
Bayesian Statistics
Python
Data Science
46
1
Kyle Jones
Written by Kyle Jones
646 Followers
·
410 Following
I’m a cloud architect, project manager, and analytics enthusiast. Opinions are my own.
Follow
Responses (1)
Alexniu
Alexniu
Cancel
Respond
Also publish to my profile
Bob McRae
Bob McRae
17 hours ago
Thanks for making the time two write this article Kyle. I hope that you can clarify some confusion I am having with the bayesian_arima function. I am not seeing any aspect of Bayesian inference here; it looks like a typical auto_arima. What am I missing? Thanks in advance.
Reply
More from Kyle Jones
H-Step Forecasting with the ARAR Algorithm in Python with statsmodels
Kyle Jones
Kyle Jones
H-Step Forecasting with the ARAR Algorithm in Python with statsmodels
ARAR: a simpler approach to multi-step time series forecasting
Feb 2
27
1
Univariate and Multivariate Time Series Analysis with Python
Kyle Jones
Kyle Jones
Univariate and Multivariate Time Series Analysis with Python
Traditional statistical approaches for time series are univariate, meaning they focus on a single sequence of values.
Dec 20, 2024
5
Dickey-Fuller Test for Stationarity in Time Series with Python
Kyle Jones
Kyle Jones
Dickey-Fuller Test for Stationarity in Time Series with Python
One of the key properties to evaluate in a time series is stationarity. A stationary time series has statistical properties — like mean…
Dec 19, 2024
5
Managing different environments for AWS CDK (Dev, Test, Prod)
Kyle Jones
Kyle Jones
Managing different environments for AWS CDK (Dev, Test, Prod)
How I manage development, testing, and production environments using AWS CDK, including common setup issues and solutions
Oct 1, 2024
1
See all from Kyle Jones
Recommended from Medium
Photo by Pawel Czerwinski on Unsplash
Python in Plain English
In
Python in Plain English
by
Alexzap
Low-Code PyGWalker UI in the Spotlight: Dynamic Data Visualization Like Tableau
Discover the Fastest-Evolving Python Graphing Library Across Different Interfaces
Feb 23
86
The Algorithm That Beats the Market? Decoding Quantitative Momentum ??
InsiderFinance Wire
In
InsiderFinance Wire
by
Nayab Bhutta
The Algorithm That Beats the Market? Decoding Quantitative Momentum ??
Momentum investing has long been a favorite strategy of traders, but Quantitative Momentum (QM) takes it to another level — using data…
5d ago
144
1
Lists
Predictive Modeling w/ Python
20 stories
·
1847 saves
Coding & Development
11 stories
·
1021 saves
Principal Component Analysis for ML
Time Series Analysis
deep learning cheatsheet for beginner
Practical Guides to Machine Learning
10 stories
·
2217 saves
ChatGPT prompts
51 stories
·
2606 saves
Detecting Market Regimes: A New Approach Using Clustering Algorithms
Coding Nexus
In
Coding Nexus
by
Code Pulse
Detecting Market Regimes: A New Approach Using Clustering Algorithms
Imagine you’re a chef trying to figure out the secret ingredients in a stew without a recipe. You taste it, identify patterns, and group…
Feb 21
8
2
The Moving Average Squeeze Strategy: Spotting Hidden Breakout Signals
Kridtapon P.
Kridtapon P.
The Moving Average Squeeze Strategy: Spotting Hidden Breakout Signals
Discover how Exponential Moving Averages reveal price compression and breakout opportunities
Feb 24
66
1
5 Amazing Plotly Visualizations You Didn’t Know You Could Create
Data Science Collective
In
Data Science Collective
by
Amanda Iglesias Moreno
5 Amazing Plotly Visualizations You Didn’t Know You Could Create
Explore Waffle Charts, Calendar Plots, Hexagon Maps, Parliament Diagrams, and Bump Charts for Advanced Data Visualizations in Plotly
Feb 19
440
6
How Python Voila Can Be Your New Killer Visualization Superpower
Data Storytelling Corner
In
Data Storytelling Corner
by
John Loewen, PhD
How Python Voila Can Be Your New Killer Visualization Superpower
Creating public dashboards with Viola and Jupyter Notebooks
6d ago
231
3
See more recommendations
Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams
4
MSE RMSE MAPE sMAPE
Bayesian ARIMA 90.518746 9.514134 2.311101 2.34394090.518746 9.514134 2.311101 2.343940
BNN with Pyro 748.787444 27.363981 7.818061 7.414179
BDLM 326.776625 18.076964 4.722612 4.531827
BSTS 383.895118 19.593242 5.781497 5.676614
总结
在实施这些方法时,分析人员必须考虑计算需求、评估数据质量和数量、评估模型假设,并在复杂性与可解释性之间找到平衡。每种方法都为金融、医疗、能源以及其他需要理解不确定性以驱动决策的领域提供了严谨的时间序列分析框架。
这些贝叶斯方法在量化不确定性、适应新信息以及融入先验知识方面具有共同的优势。它们的选择取决于特定的应用需求、数据特征和计算限制。通过了解这些方法,分析人员可以为他们特定的时间序列挑战选择最合适的工具。
# === Main Function ===
def main():
df = load_data()
barima_forecast = bayesian_arima(df)
bnn_forecast = bayesian_nn(df)
bdlm_forecast = bayesian_bdlm(df)
bsts_forecast = bayesian_sts(df)
y_true = df['values'].values[-25:]
metrics = {
"Bayesian ARIMA": calculate_metrics(y_true, barima_forecast),
"BNN with Pyro": calculate_metrics(y_true, bnn_forecast),
"BDLM": calculate_metrics(y_true, bdlm_forecast),
"BSTS": calculate_metrics(y_true, bsts_forecast)
}
# Convert to DataFrame for better readability
df_metrics = pd.DataFrame(metrics, index=["MSE", "RMSE", "MAPE", "sMAPE"]).T
print(df_metrics)
# Plot comparison
df_metrics.plot(kind='bar', figsize=(15, 8))
plt.title('Model Comparison')
plt.ylabel('Error')
ax = plt.gca()
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.tight_layout()
plt.savefig('model_comparison.png')
plt.show()
if __name__ == "__main__":
main()