Statistical
OpenAlgo Statistical Indicators Documentation
Statistical indicators analyze price data using mathematical and statistical methods to identify patterns, relationships, and forecast future price movements.
Import Statement
from openalgo import ta
Getting Market Data
from openalgo import api
client = api(api_key='your_api_key_here', host='http://127.0.0.1:5000')
# Fetch historical data
df = client.history(symbol="SBIN",
exchange="NSE",
interval="5m",
start_date="2025-04-01",
end_date="2025-04-08")
Available Statistical Indicators
Linear Regression (LINREG)
Linear Regression calculates the linear regression line for a given period using the least squares method to identify the underlying trend.
Usage
linreg_result = ta.linreg(data, period)
Parameters
data (array-like): Price data (typically closing prices)
period (int, default=14): Period for linear regression calculation
Returns
array: Linear regression values in the same format as input
Example
# Calculate 20-period Linear Regression
linreg_20 = ta.linreg(df['close'], 20)
# Add to DataFrame
df['LINREG_20'] = linreg_20
print(df[['close', 'LINREG_20']].tail())
Linear Regression Slope (LRSLOPE)
Linear Regression Slope measures the rate of change of the linear regression line, indicating the strength and direction of the trend.
Usage
slope_result = ta.lrslope(data, period=100, interval=1)
Parameters
data (array-like): Price data (typically closing prices)
period (int, default=100): Period for linear regression calculation
interval (int, default=1): Interval divisor for slope calculation
Returns
array: Slope values in the same format as input
Example
# Calculate Linear Regression Slope
slope_50 = ta.lrslope(df['close'], period=50)
# Add to DataFrame
df['LR_SLOPE_50'] = slope_50
print(df[['close', 'LR_SLOPE_50']].tail())
Pearson Correlation Coefficient (CORREL)
Correlation measures the statistical relationship between two data series, ranging from -1 (perfect negative correlation) to +1 (perfect positive correlation).
Usage
correlation_result = ta.correlation(data1, data2, period)
Parameters
data1 (array-like): First data series
data2 (array-like): Second data series
period (int, default=20): Period for correlation calculation
Returns
array: Correlation values in the same format as input
Example
# Calculate correlation between close and volume
correlation_20 = ta.correlation(df['close'], df['volume'], 20)
# Add to DataFrame
df['CORREL_CLOSE_VOLUME'] = correlation_20
print(df[['close', 'volume', 'CORREL_CLOSE_VOLUME']].tail())
# Calculate correlation between high and low
correlation_hl = ta.correlation(df['high'], df['low'], 15)
df['CORREL_HIGH_LOW'] = correlation_hl
Beta Coefficient (BETA)
Beta measures the volatility of a security relative to the market, indicating how much the security price moves relative to market movements.
Usage
beta_result = ta.beta(asset, market, period=252)
Parameters
asset (array-like): Asset price data
market (array-like): Market price data (benchmark)
period (int, default=252): Period for beta calculation (typically 1 year = 252 trading days)
Returns
array: Beta values in the same format as input
Example
# Assuming you have market index data
# For demonstration, we'll use another stock as market proxy
market_df = client.history(symbol="NIFTY",
exchange="NSE_INDEX",
interval="5m",
start_date="2025-04-01",
end_date="2025-04-08")
# Calculate 50-period Beta
beta_50 = ta.beta(df['close'], market_df['close'], 50)
# Add to DataFrame
df['BETA_50'] = beta_50
print(df[['close', 'BETA_50']].tail())
Variance (VAR)
Variance measures the dispersion of price data, supporting both logarithmic returns and price modes with smoothing and signal generation.
Usage
variance_result = ta.variance(data, lookback=20, mode="PR", ema_period=20,
filter_lookback=20, ema_length=14, return_components=False)
Parameters
data (array-like): Price data (close prices)
lookback (int, default=20): Variance lookback period
mode (str, default="PR"): Variance mode ("LR" for Logarithmic Returns, "PR" for Price)
ema_period (int, default=20): EMA period for variance smoothing
filter_lookback (int, default=20): Lookback period for variance filter
ema_length (int, default=14): EMA length for z-score smoothing
return_components (bool, default=False): If True, returns all components
Returns
array or tuple: Variance values or (variance, ema_variance, zscore, ema_zscore, stdev) if return_components=True
Example
# Calculate basic variance
variance_20 = ta.variance(df['close'], lookback=20)
df['VARIANCE_20'] = variance_20
# Calculate variance with all components
var_components = ta.variance(df['close'], lookback=20, return_components=True)
variance, ema_var, zscore, ema_zscore, stdev = var_components
df['VARIANCE'] = variance
df['EMA_VARIANCE'] = ema_var
df['VAR_ZSCORE'] = zscore
print(df[['close', 'VARIANCE', 'EMA_VARIANCE', 'VAR_ZSCORE']].tail())
Time Series Forecast (TSF)
Time Series Forecast predicts the next value using linear regression analysis.
Usage
tsf_result = ta.tsf(data, period=14)
Parameters
data (array-like): Price data
period (int, default=14): Period for forecast calculation
Returns
array: Time Series Forecast values in the same format as input
Example
# Calculate 14-period Time Series Forecast
tsf_14 = ta.tsf(df['close'], 14)
# Add to DataFrame
df['TSF_14'] = tsf_14
print(df[['close', 'TSF_14']].tail())
# Compare actual vs forecast
df['TSF_DIFF'] = df['close'] - df['TSF_14']
print("Forecast accuracy (last 10 periods):")
print(df[['close', 'TSF_14', 'TSF_DIFF']].tail(10))
Rolling Median (MEDIAN)
Rolling Median calculates the median value over a rolling window, which is less sensitive to outliers than mean-based indicators.
Usage
median_result = ta.median(data, period=3)
Parameters
data (array-like): Price data (default hl2 in Pine Script)
period (int, default=3): Period for median calculation
Returns
array: Median values in the same format as input
Example
# Calculate 5-period Rolling Median
median_5 = ta.median(df['close'], 5)
# Calculate median of typical price
typical_price = (df['high'] + df['low'] + df['close']) / 3
median_typical = ta.median(typical_price, 7)
# Add to DataFrame
df['MEDIAN_5'] = median_5
df['MEDIAN_TYPICAL'] = median_typical
print(df[['close', 'MEDIAN_5', 'MEDIAN_TYPICAL']].tail())
Median Bands (MEDIAN_BANDS)
Median Bands combine median calculation with ATR-based bands and EMA smoothing for comprehensive analysis.
Usage
median, upper_band, lower_band, median_ema = ta.median_bands.calculate_with_bands(
high, low, close, source=None, median_length=3, atr_length=14, atr_mult=2.0
)
Parameters
high (array-like): High prices
low (array-like): Low prices
close (array-like): Close prices
source (array-like, optional): Source data for median (default: hl2)
median_length (int, default=3): Period for median calculation
atr_length (int, default=14): Period for ATR calculation
atr_mult (float, default=2.0): ATR multiplier for bands
Returns
tuple: (median, upper_band, lower_band, median_ema) arrays
Example
# Calculate Median Bands
median, upper, lower, median_ema = ta.median_bands.calculate_with_bands(
df['high'], df['low'], df['close']
)
# Add to DataFrame
df['MEDIAN'] = median
df['MEDIAN_UPPER'] = upper
df['MEDIAN_LOWER'] = lower
df['MEDIAN_EMA'] = median_ema
print(df[['close', 'MEDIAN', 'MEDIAN_UPPER', 'MEDIAN_LOWER']].tail())
Rolling Mode (MODE)
Rolling Mode calculates the most frequent value over a rolling window using discretization.
Usage
mode_result = ta.mode(data, period=20, bins=10)
Parameters
data (array-like): Price data
period (int, default=20): Period for mode calculation
bins (int, default=10): Number of bins for discretization
Returns
array: Mode values in the same format as input
Example
# Calculate 15-period Rolling Mode
mode_15 = ta.mode(df['close'], period=15, bins=8)
# Add to DataFrame
df['MODE_15'] = mode_15
print(df[['close', 'MODE_15']].tail())
# Calculate mode for volume (often useful for volume analysis)
volume_mode = ta.mode(df['volume'], period=20, bins=12)
df['VOLUME_MODE'] = volume_mode
Complete Example: Statistical Analysis Dashboard
import pandas as pd
from openalgo import api, ta
# Get market data
client = api(api_key='your_api_key_here', host='http://127.0.0.1:5000')
df = client.history(symbol="SBIN",
exchange="NSE",
interval="5m",
start_date="2025-04-01",
end_date="2025-04-08")
# Calculate comprehensive statistical indicators
print("Calculating Statistical Indicators...")
# Trend Analysis
df['LINREG_20'] = ta.linreg(df['close'], 20)
df['LR_SLOPE_20'] = ta.lrslope(df['close'], 20)
df['TSF_14'] = ta.tsf(df['close'], 14)
# Central Tendency
df['MEDIAN_5'] = ta.median(df['close'], 5)
df['MODE_15'] = ta.mode(df['close'], 15)
# Variability Analysis
df['VARIANCE_20'] = ta.variance(df['close'], 20)
# Get variance components for detailed analysis
var_components = ta.variance(df['close'], lookback=20, return_components=True)
variance, ema_var, zscore, ema_zscore, stdev = var_components
df['VARIANCE'] = variance
df['EMA_VARIANCE'] = ema_var
df['VAR_ZSCORE'] = zscore
df['STDEV'] = stdev
# Correlation Analysis
df['CORREL_CLOSE_VOLUME'] = ta.correlation(df['close'], df['volume'], 20)
df['CORREL_HIGH_LOW'] = ta.correlation(df['high'], df['low'], 15)
# Median Bands Analysis
median, upper, lower, median_ema = ta.median_bands.calculate_with_bands(
df['high'], df['low'], df['close'], median_length=5, atr_length=14
)
df['MEDIAN_BANDS'] = median
df['MEDIAN_UPPER'] = upper
df['MEDIAN_LOWER'] = lower
df['MEDIAN_EMA'] = median_ema
# Create analysis summary
analysis_cols = [
'close', 'LINREG_20', 'LR_SLOPE_20', 'TSF_14',
'MEDIAN_5', 'VARIANCE_20', 'VAR_ZSCORE',
'CORREL_CLOSE_VOLUME', 'MEDIAN_BANDS'
]
print("\nStatistical Analysis Summary (Last 10 periods):")
print(df[analysis_cols].tail(10))
# Generate trading signals based on statistical indicators
print("\nGenerating Statistical Trading Signals...")
# Trend Strength Signal (based on Linear Regression Slope)
df['TREND_SIGNAL'] = 'NEUTRAL'
df.loc[df['LR_SLOPE_20'] > 0.5, 'TREND_SIGNAL'] = 'BULLISH'
df.loc[df['LR_SLOPE_20'] < -0.5, 'TREND_SIGNAL'] = 'BEARISH'
# Variance-based Volatility Signal
df['VOLATILITY_SIGNAL'] = 'NORMAL'
df.loc[df['VAR_ZSCORE'] > 1.5, 'VOLATILITY_SIGNAL'] = 'HIGH'
df.loc[df['VAR_ZSCORE'] < -1.5, 'VOLATILITY_SIGNAL'] = 'LOW'
# Price Position relative to Statistical Measures
df['PRICE_VS_LINREG'] = (df['close'] - df['LINREG_20']) / df['LINREG_20'] * 100
df['PRICE_VS_MEDIAN'] = (df['close'] - df['MEDIAN_5']) / df['MEDIAN_5'] * 100
# Forecast Accuracy
df['FORECAST_ERROR'] = abs(df['close'] - df['TSF_14'].shift(1))
df['FORECAST_ACCURACY'] = (1 - df['FORECAST_ERROR'] / df['close']) * 100
print("\nTrading Signals Summary:")
signal_summary = df[['TREND_SIGNAL', 'VOLATILITY_SIGNAL', 'PRICE_VS_LINREG',
'PRICE_VS_MEDIAN', 'FORECAST_ACCURACY']].tail(5)
print(signal_summary)
# Statistical Summary
print("\nStatistical Metrics Summary:")
print(f"Average Correlation (Close vs Volume): {df['CORREL_CLOSE_VOLUME'].mean():.4f}")
print(f"Average Variance: {df['VARIANCE_20'].mean():.4f}")
print(f"Average Forecast Accuracy: {df['FORECAST_ACCURACY'].mean():.2f}%")
print(f"Current Trend Slope: {df['LR_SLOPE_20'].iloc[-1]:.4f}")
# Volatility Analysis
recent_volatility = df['VAR_ZSCORE'].tail(20)
print(f"Recent Volatility Z-Score: {recent_volatility.mean():.2f}")
print(f"Volatility Regime: {df['VOLATILITY_SIGNAL'].iloc[-1]}")
Advanced Statistical Analysis
# Advanced correlation matrix
def calculate_correlation_matrix(df, period=20):
"""Calculate correlation matrix for OHLCV data"""
correlations = {}
price_cols = ['open', 'high', 'low', 'close', 'volume']
for i, col1 in enumerate(price_cols):
for col2 in price_cols[i+1:]:
corr_name = f"CORR_{col1.upper()}_{col2.upper()}"
correlations[corr_name] = ta.correlation(df[col1], df[col2], period)
return correlations
# Calculate all correlations
correlations = calculate_correlation_matrix(df, 20)
for name, values in correlations.items():
df[name] = values
print("\nCorrelation Matrix (Latest Values):")
corr_cols = [col for col in df.columns if col.startswith('CORR_')]
latest_corr = df[corr_cols].iloc[-1]
print(latest_corr)
# Statistical anomaly detection
def detect_statistical_anomalies(df, z_threshold=2.0):
"""Detect statistical anomalies in price data"""
# Price anomalies based on variance z-score
df['PRICE_ANOMALY'] = abs(df['VAR_ZSCORE']) > z_threshold
# Volume anomalies
volume_zscore = (df['volume'] - df['volume'].rolling(20).mean()) / df['volume'].rolling(20).std()
df['VOLUME_ANOMALY'] = abs(volume_zscore) > z_threshold
# Return anomalies
returns = df['close'].pct_change()
returns_zscore = (returns - returns.rolling(20).mean()) / returns.rolling(20).std()
df['RETURN_ANOMALY'] = abs(returns_zscore) > z_threshold
return df
# Detect anomalies
df = detect_statistical_anomalies(df)
# Summary of anomalies
anomaly_summary = df[['PRICE_ANOMALY', 'VOLUME_ANOMALY', 'RETURN_ANOMALY']].sum()
print(f"\nAnomaly Detection Summary:")
print(f"Price Anomalies: {anomaly_summary['PRICE_ANOMALY']}")
print(f"Volume Anomalies: {anomaly_summary['VOLUME_ANOMALY']}")
print(f"Return Anomalies: {anomaly_summary['RETURN_ANOMALY']}")
Performance Tips
Period Selection: Choose appropriate periods based on your analysis timeframe
Data Quality: Ensure clean data for accurate statistical calculations
Correlation Interpretation: Remember correlation doesn't imply causation
Statistical Significance: Consider sample size when interpreting results
Regime Changes: Monitor for changes in statistical relationships over time
Common Use Cases
Trend Analysis: Use Linear Regression and slopes for trend identification
Risk Management: Apply variance and correlation for portfolio risk assessment
Anomaly Detection: Use statistical z-scores to identify unusual market behavior
Forecasting: Combine TSF with other indicators for price prediction
Market Relationships: Analyze correlations between different assets or timeframes
Last updated