Ứng dụng phân tích định lượng trong tài chính

Giới thiệu

Phân tích định lượng trong tài chính (Quantitative Finance) là lĩnh vực ứng dụng các phương pháp toán học, thống kê và khoa học máy tính để giải quyết các vấn đề trong tài chính. Khác với phân tích tài chính truyền thống chủ yếu dựa vào kinh nghiệm và trực giác, phân tích định lượng sử dụng các mô hình toán học phức tạp, thuật toán, và dữ liệu lớn để đưa ra quyết định đầu tư.

Mục lục

Lịch sử và sự phát triển

Phân tích định lượng trong tài chính có lịch sử phát triển đáng chú ý:

Thập niên 1950-1960: Lý thuyết danh mục đầu tư hiện đại (Modern Portfolio Theory) của Harry Markowitz (1952) và Mô hình định giá tài sản vốn (CAPM) do William Sharpe, John Lintner và Jan Mossin phát triển đặt nền móng cho tài chính định lượng.
Thập niên 1970: Mô hình Black-Scholes-Merton ra đời năm 1973, cách mạng hóa việc định giá quyền chọn và các công cụ phái sinh. Robert Merton và Myron Scholes đã nhận giải Nobel Kinh tế năm 1997 cho công trình này.
Thập niên 1980-1990: Sự phát triển của công nghệ thông tin và khả năng tính toán cho phép các mô hình phức tạp hơn được triển khai. Giao dịch thuật toán bắt đầu xuất hiện.
Từ năm 2000 đến nay: Sự bùng nổ của dữ liệu lớn (Big Data), trí tuệ nhân tạo và học máy đã đưa tài chính định lượng lên một tầm cao mới. Giao dịch tần suất cao (HFT) trở nên phổ biến, các thuật toán ngày càng phức tạp và hiệu quả hơn.

Các lĩnh vực ứng dụng chính

1. Ngân hàng đầu tư

Định giá các công cụ phái sinh phức tạp
Phân tích rủi ro và quản lý vốn
Tối ưu hóa sổ giao dịch (trading book)
Tạo và định giá các sản phẩm cấu trúc

2. Quỹ đầu tư và quản lý tài sản

Xây dựng và tối ưu hóa danh mục đầu tư
Phân bổ tài sản chiến lược và chiến thuật
Phân tích và quản lý rủi ro
Đo lường hiệu suất và lựa chọn quản lý quỹ

3. Giao dịch và đầu tư

Giao dịch thuật toán và thực thi tối ưu
Giao dịch tần suất cao (HFT)
Giao dịch thống kê và arbitrage
Kinh doanh các công cụ phái sinh và tìm kiếm chênh lệch giá

4. Quản trị rủi ro

Đo lường và quản lý rủi ro thị trường
Phân tích rủi ro tín dụng
Kiểm tra sức chịu đựng (stress testing)
Mô hình hóa rủi ro thanh khoản

Các công cụ và kỹ thuật phân tích

1. Lý thuyết xác suất và thống kê

Nền tảng của phân tích định lượng là xác suất và thống kê, giúp đo lường và dự báo sự không chắc chắn trong thị trường tài chính.

import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

# Mô phỏng lợi suất cổ phiếu với phân phối chuẩn
np.random.seed(42)
returns = np.random.normal(0.001, 0.02, 1000)  # Trung bình 0.1%, độ lệch chuẩn 2%

# Kiểm tra tính chuẩn của phân phối
k2, p = stats.normaltest(returns)
alpha = 0.05
print("p-value: {0}".format(p))
if p < alpha:
    print("Phân phối không chuẩn với mức ý nghĩa {0}".format(alpha))
else:
    print("Không đủ bằng chứng để kết luận phân phối không chuẩn")

# Vẽ histogram và đường cong phân phối chuẩn
plt.figure(figsize=(10, 6))
plt.hist(returns, bins=50, density=True, alpha=0.6, color='b')

# Đường cong phân phối chuẩn
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 100)
p = stats.norm.pdf(x, np.mean(returns), np.std(returns))
plt.plot(x, p, 'k', linewidth=2)

plt.title('Phân phối lợi suất cổ phiếu')
plt.xlabel('Lợi suất hàng ngày')
plt.ylabel('Tần suất')
plt.grid(True)
plt.show()

# Tính Value at Risk (VaR) ở mức tin cậy 95%
var_95 = np.percentile(returns, 5)
print("Value at Risk (95%): {0:.2%}".format(-var_95))

2. Phân tích chuỗi thời gian

Chuỗi thời gian tài chính có những đặc điểm riêng biệt như tính không dừng, tính tự tương quan, và hiện tượng phân tán biến thiên theo thời gian.

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Đọc dữ liệu chuỗi thời gian
data = pd.read_csv('stock_prices.csv', index_col='Date', parse_dates=True)
prices = data['Close']

# Kiểm tra tính dừng
result = adfuller(prices.dropna())
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])

# Chuyển đổi giá thành lợi suất (để có tính dừng)
returns = prices.pct_change().dropna()

# Kiểm tra tính dừng của lợi suất
result = adfuller(returns.dropna())
print('ADF Statistic (returns): %f' % result[0])
print('p-value (returns): %f' % result[1])

# Vẽ ACF và PACF để xác định tham số mô hình ARIMA
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 8))
plot_acf(returns, ax=ax1)
plot_pacf(returns, ax=ax2)
plt.tight_layout()
plt.show()

# Xây dựng mô hình ARIMA
model = ARIMA(returns, order=(1, 0, 1))
model_fit = model.fit()
print(model_fit.summary())

# Dự báo
forecast = model_fit.forecast(steps=30)
plt.figure(figsize=(10, 6))
plt.plot(returns.index[-100:], returns.values[-100:], label='Historical')
plt.plot(pd.date_range(start=returns.index[-1], periods=31, closed='right'), 
         [returns.values[-1]] + list(forecast), label='Forecast')
plt.title('Dự báo lợi suất cổ phiếu')
plt.xlabel('Ngày')
plt.ylabel('Lợi suất')
plt.legend()
plt.grid(True)
plt.show()

3. Tính toán tài chính

Các phương pháp số và giải thuật được sử dụng rộng rãi trong tài chính để giải quyết các bài toán không có lời giải phân tích.

import numpy as np
from scipy.stats import norm

def monte_carlo_option_pricing(S0, K, r, sigma, T, option_type='call', num_simulations=10000):
    """
    Định giá quyền chọn bằng phương pháp Monte Carlo
    S0: Giá cổ phiếu ban đầu
    K: Giá thực hiện
    r: Lãi suất không rủi ro
    sigma: Độ biến động
    T: Thời gian đáo hạn (năm)
    option_type: 'call' hoặc 'put'
    num_simulations: Số lần mô phỏng
    """
    # Sinh ngẫu nhiên các đường đi của giá cổ phiếu
    np.random.seed(42)
    z = np.random.standard_normal(num_simulations)
    ST = S0 * np.exp((r - 0.5 * sigma**2) * T + sigma * np.sqrt(T) * z)
    
    # Tính toán giá trị quyền chọn tại thời điểm đáo hạn
    if option_type == 'call':
        payoff = np.maximum(ST - K, 0)
    else:
        payoff = np.maximum(K - ST, 0)
    
    # Chiết khấu về hiện tại
    option_price = np.exp(-r * T) * np.mean(payoff)
    
    return option_price

# So sánh với công thức Black-Scholes
def black_scholes(S0, K, r, sigma, T, option_type='call'):
    """
    Định giá quyền chọn bằng công thức Black-Scholes
    """
    d1 = (np.log(S0 / K) + (r + 0.5 * sigma**2) * T) / (sigma * np.sqrt(T))
    d2 = d1 - sigma * np.sqrt(T)
    
    if option_type == 'call':
        price = S0 * norm.cdf(d1) - K * np.exp(-r * T) * norm.cdf(d2)
    else:
        price = K * np.exp(-r * T) * norm.cdf(-d2) - S0 * norm.cdf(-d1)
    
    return price

# Tham số đầu vào
S0 = 100    # Giá cổ phiếu hiện tại
K = 100     # Giá thực hiện
r = 0.05    # Lãi suất không rủi ro
sigma = 0.2 # Độ biến động
T = 1       # Thời gian đáo hạn (1 năm)

# So sánh kết quả
mc_call = monte_carlo_option_pricing(S0, K, r, sigma, T, 'call')
bs_call = black_scholes(S0, K, r, sigma, T, 'call')

mc_put = monte_carlo_option_pricing(S0, K, r, sigma, T, 'put')
bs_put = black_scholes(S0, K, r, sigma, T, 'put')

print(f"Giá quyền chọn mua (Monte Carlo): {mc_call:.4f}")
print(f"Giá quyền chọn mua (Black-Scholes): {bs_call:.4f}")
print(f"Giá quyền chọn bán (Monte Carlo): {mc_put:.4f}")
print(f"Giá quyền chọn bán (Black-Scholes): {bs_put:.4f}")

Mô hình định giá tài sản tài chính

1. Mô hình Black-Scholes

Mô hình Black-Scholes là một bước đột phá trong tài chính định lượng, cung cấp công thức toán học đầu tiên để định giá quyền chọn châu Âu. Công thức này dựa trên các giả định như:

Giá cổ phiếu tuân theo phân phối log-normal
Không có cơ hội arbitrage
Có thể giao dịch liên tục và không có chi phí giao dịch
Lãi suất không rủi ro không đổi
Không có cổ tức

Công thức Black-Scholes:

$$C = S_0 N(d_1) - Ke^{-rT} N(d_2)$$

Trong đó: $$d_1 = \frac{\ln(S_0/K) + (r + \sigma^2/2)T}{\sigma\sqrt{T}}$$ $$d_2 = d_1 - \sigma\sqrt{T}$$

2. Mô hình cây nhị phân (Binomial Tree)

Mô hình cây nhị phân là phương pháp định giá đơn giản hơn, cho phép giá cổ phiếu có thể di chuyển lên hoặc xuống với xác suất nhất định tại mỗi thời điểm.

def binomial_option_pricing(S0, K, r, u, d, T, N, option_type='call'):
    """
    Định giá quyền chọn bằng mô hình cây nhị phân
    S0: Giá cổ phiếu ban đầu
    K: Giá thực hiện
    r: Lãi suất không rủi ro
    u: Hệ số tăng giá (u > 1)
    d: Hệ số giảm giá (d < 1)
    T: Thời gian đáo hạn (năm)
    N: Số bước thời gian
    option_type: 'call' hoặc 'put'
    """
    # Tính các tham số của mô hình
    dt = T / N
    q = (np.exp(r * dt) - d) / (u - d)  # Xác suất trung hòa rủi ro
    
    # Khởi tạo mảng giá cổ phiếu tại thời điểm cuối
    ST = np.zeros(N + 1)
    for j in range(N + 1):
        ST[j] = S0 * (u ** (N - j)) * (d ** j)
    
    # Tính toán giá trị quyền chọn tại thời điểm đáo hạn
    if option_type == 'call':
        option_values = np.maximum(ST - K, 0)
    else:
        option_values = np.maximum(K - ST, 0)
    
    # Tính giá quyền chọn theo quy trình lùi
    for i in range(N - 1, -1, -1):
        for j in range(i + 1):
            option_values[j] = np.exp(-r * dt) * (q * option_values[j] + (1 - q) * option_values[j + 1])
    
    return option_values[0]

# Sử dụng ví dụ
S0 = 100
K = 100
r = 0.05
sigma = 0.2
T = 1
N = 50

# Tính u và d từ sigma
dt = T / N
u = np.exp(sigma * np.sqrt(dt))
d = 1 / u

call_price = binomial_option_pricing(S0, K, r, u, d, T, N, 'call')
put_price = binomial_option_pricing(S0, K, r, u, d, T, N, 'put')

print(f"Giá quyền chọn mua (Binomial Tree): {call_price:.4f}")
print(f"Giá quyền chọn bán (Binomial Tree): {put_price:.4f}")

3. Mô hình CAPM và Multi-factor Models

CAPM (Capital Asset Pricing Model) là một trong những mô hình quan trọng nhất trong tài chính, liên hệ giữa lợi suất kỳ vọng của một tài sản và rủi ro của nó, đo bằng beta.

$$E(R_i) = R_f + \beta_i [E(R_m) - R_f]$$

Mô hình nhiều nhân tố (Multi-factor models) như mô hình Fama-French 3 nhân tố mở rộng CAPM bằng cách thêm các nhân tố quy mô và giá trị.

import pandas as pd
import numpy as np
import statsmodels.api as sm

# Đọc dữ liệu
data = pd.read_csv('stock_returns.csv', index_col='Date', parse_dates=True)
stock_returns = data['AAPL']
market_returns = data['SPY']
risk_free_rate = data['RF']
smb = data['SMB']  # Small Minus Big (nhân tố quy mô)
hml = data['HML']  # High Minus Low (nhân tố giá trị)

# Tính toán excess returns
excess_returns = stock_returns - risk_free_rate
excess_market = market_returns - risk_free_rate

# Mô hình CAPM
X_capm = sm.add_constant(excess_market)
capm_model = sm.OLS(excess_returns, X_capm).fit()
print("Kết quả CAPM:")
print(capm_model.summary())
print(f"Beta của cổ phiếu: {capm_model.params[1]:.4f}")

# Mô hình Fama-French 3 factor
X_ff = sm.add_constant(pd.DataFrame({'Mkt-RF': excess_market, 'SMB': smb, 'HML': hml}))
ff_model = sm.OLS(excess_returns, X_ff).fit()
print("\nKết quả mô hình Fama-French:")
print(ff_model.summary())

Quản lý danh mục đầu tư và tối ưu hóa

1. Lý thuyết danh mục đầu tư hiện đại

Lý thuyết danh mục đầu tư hiện đại (MPT) của Harry Markowitz giới thiệu khái niệm về sự đánh đổi giữa rủi ro và lợi nhuận, và lợi ích của đa dạng hóa. MPT tìm kiếm danh mục đầu tư hiệu quả - những danh mục mang lại lợi suất kỳ vọng cao nhất cho một mức rủi ro nhất định.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.optimize import minimize

# Đọc dữ liệu lợi suất của các cổ phiếu
returns_data = pd.read_csv('stock_returns.csv', index_col='Date', parse_dates=True)
returns = returns_data[['AAPL', 'MSFT', 'AMZN', 'GOOGL', 'FB']]

# Tính toán lợi suất kỳ vọng và ma trận hiệp phương sai
mean_returns = returns.mean()
cov_matrix = returns.cov()

# Hàm mục tiêu: Tối thiểu hóa độ lệch chuẩn (rủi ro)
def portfolio_volatility(weights, mean_returns, cov_matrix):
    return np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))

# Ràng buộc: Tổng trọng số = 1
constraints = ({'type': 'eq', 'fun': lambda x: np.sum(x) - 1})

# Giới hạn: Trọng số không âm (không bán khống)
bounds = tuple((0, 1) for _ in range(len(mean_returns)))

# Giá trị khởi tạo: Trọng số bằng nhau
init_guess = np.array([1/len(mean_returns)] * len(mean_returns))

# Tìm danh mục có rủi ro thấp nhất
optimal_result = minimize(portfolio_volatility, init_guess, 
                         args=(mean_returns, cov_matrix),
                         method='SLSQP', bounds=bounds, constraints=constraints)

min_volatility_weights = optimal_result['x']

# Tạo đường biên hiệu quả (Efficient Frontier)
target_returns = np.linspace(min(mean_returns), max(mean_returns), 50)
efficient_portfolios = []

for target in target_returns:
    # Thêm ràng buộc về lợi suất mục tiêu
    constraints = ({'type': 'eq', 'fun': lambda x: np.sum(x) - 1},
                  {'type': 'eq', 'fun': lambda x: np.sum(x * mean_returns) - target})
    
    result = minimize(portfolio_volatility, init_guess, 
                     args=(mean_returns, cov_matrix),
                     method='SLSQP', bounds=bounds, constraints=constraints)
    
    efficient_portfolios.append({
        'target_return': target,
        'volatility': result['fun'],
        'weights': result['x']
    })

# Vẽ đường biên hiệu quả
ef_returns = [p['target_return'] for p in efficient_portfolios]
ef_volatility = [p['volatility'] for p in efficient_portfolios]

plt.figure(figsize=(10, 6))
plt.plot(ef_volatility, ef_returns, 'b-', linewidth=2)
plt.scatter(portfolio_volatility(min_volatility_weights, mean_returns, cov_matrix),
           np.sum(min_volatility_weights * mean_returns), 
           marker='*', color='r', s=150, label='Danh mục rủi ro tối thiểu')

# Vẽ các cổ phiếu riêng lẻ
for i, stock in enumerate(returns.columns):
    plt.scatter(np.sqrt(cov_matrix.iloc[i, i]), mean_returns[i], 
               marker='o', label=stock)

plt.title('Đường biên hiệu quả')
plt.xlabel('Rủi ro (Độ lệch chuẩn)')
plt.ylabel('Lợi suất kỳ vọng')
plt.legend()
plt.grid(True)
plt.show()

# In trọng số tối ưu
print("Trọng số của danh mục rủi ro tối thiểu:")
for i, stock in enumerate(returns.columns):
    print(f"{stock}: {min_volatility_weights[i]:.4f}")

2. Tối ưu hóa danh mục trong thực tế

Trong thực tế, tối ưu hóa danh mục thường phức tạp hơn, bao gồm nhiều ràng buộc và mục tiêu khác nhau:

Giới hạn phân bổ cho từng ngành hoặc khu vực
Ràng buộc thanh khoản
Tần suất tái cân bằng và chi phí giao dịch
Giới hạn về rủi ro tập trung
Ràng buộc ESG (môi trường, xã hội, quản trị)

# Tối ưu hóa danh mục với các ràng buộc thực tế

def objective_function(weights, mean_returns, cov_matrix, risk_aversion=3):
    """
    Hàm mục tiêu: Tối đa hóa lợi nhuận điều chỉnh theo rủi ro (Utility function)
    U = E[R] - 0.5 * λ * σ^2
    """
    portfolio_return = np.sum(weights * mean_returns)
    portfolio_variance = np.dot(weights.T, np.dot(cov_matrix, weights))
    
    return -(portfolio_return - 0.5 * risk_aversion * portfolio_variance)

# Ràng buộc phân bổ ngành (ví dụ: không quá 30% cho một ngành)
sector_data = {
    'AAPL': 'Technology',
    'MSFT': 'Technology',
    'AMZN': 'Consumer',
    'GOOGL': 'Technology',
    'FB': 'Technology'
}

def sector_constraint(weights, sector='Technology', max_allocation=0.3):
    sector_indices = [i for i, stock in enumerate(returns.columns) if sector_data[stock] == sector]
    sector_allocation = sum(weights[i] for i in sector_indices)
    return max_allocation - sector_allocation

# Ràng buộc thanh khoản (ví dụ: vốn hóa tối thiểu)
market_caps = {
    'AAPL': 2000, # Giả định, tỷ USD
    'MSFT': 1800,
    'AMZN': 1600,
    'GOOGL': 1400,
    'FB': 800
}

def liquidity_constraint(weights):
    min_market_cap = 1000  # Tỷ USD
    avg_market_cap = sum(weights[i] * market_caps[stock] for i, stock in enumerate(returns.columns))
    return avg_market_cap - min_market_cap

# Tất cả các ràng buộc
constraints = [
    {'type': 'eq', 'fun': lambda x: np.sum(x) - 1},  # Tổng trọng số = 1
    {'type': 'ineq', 'fun': lambda x: sector_constraint(x, 'Technology', 0.5)},  # Tối đa 50% vào công nghệ
    {'type': 'ineq', 'fun': liquidity_constraint}  # Yêu cầu thanh khoản
]

# Tối ưu hóa
result = minimize(objective_function, init_guess, 
                 args=(mean_returns, cov_matrix),
                 method='SLSQP', bounds=bounds, constraints=constraints)

optimized_weights = result['x']

# In trọng số tối ưu
print("Trọng số tối ưu với các ràng buộc thực tế:")
for i, stock in enumerate(returns.columns):
    print(f"{stock}: {optimized_weights[i]:.4f}")

# Tính lợi suất và rủi ro của danh mục
portfolio_return = np.sum(optimized_weights * mean_returns)
portfolio_risk = np.sqrt(np.dot(optimized_weights.T, np.dot(cov_matrix, optimized_weights)))

print(f"Lợi suất kỳ vọng: {portfolio_return:.4f}")
print(f"Rủi ro (độ lệch chuẩn): {portfolio_risk:.4f}")
print(f"Tỷ số Sharpe (giả sử Rf=0.02): {(portfolio_return - 0.02) / portfolio_risk:.4f}")

Phân tích rủi ro định lượng

1. Value at Risk (VaR) và Expected Shortfall (ES)

Value at Risk (VaR) là một thước đo rủi ro phổ biến, ước tính mức tổn thất tối đa trong một khoảng thời gian với mức độ tin cậy nhất định. Expected Shortfall (hay Conditional VaR) là mức tổn thất kỳ vọng vượt quá VaR.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

# Đọc dữ liệu lợi suất
returns = pd.read_csv('portfolio_returns.csv', index_col='Date', parse_dates=True)['Returns']

# Phương pháp 1: VaR dựa trên phân phối lịch sử
def historical_var(returns, confidence_level):
    return -np.percentile(returns, 100 * (1 - confidence_level))

# Phương pháp 2: VaR dựa trên phân phối chuẩn
def parametric_var(returns, confidence_level):
    mu = returns.mean()
    sigma = returns.std()
    return -(mu + sigma * stats.norm.ppf(confidence_level))

# Phương pháp 3: VaR dựa trên mô phỏng Monte Carlo
def monte_carlo_var(returns, confidence_level, n_simulations=10000, horizon=1):
    mu = returns.mean()
    sigma = returns.std()
    
    # Mô phỏng lợi suất trong tương lai
    simulated_returns = np.random.normal(mu, sigma, n_simulations)
    
    # Tính VaR
    return -np.percentile(simulated_returns, 100 * (1 - confidence_level))

# Tính Expected Shortfall (ES)
def expected_shortfall(returns, confidence_level):
    var = historical_var(returns, confidence_level)
    return -returns[returns <= -var].mean()

# Tính VaR và ES với mức tin cậy 95% và 99%
confidence_levels = [0.95, 0.99]
var_results = {}
es_results = {}

for cl in confidence_levels:
    var_results[f'Historical VaR {cl*100}%'] = historical_var(returns, cl)
    var_results[f'Parametric VaR {cl*100}%'] = parametric_var(returns, cl)
    var_results[f'Monte Carlo VaR {cl*100}%'] = monte_carlo_var(returns, cl)
    es_results[f'Expected Shortfall {cl*100}%'] = expected_shortfall(returns, cl)

# In kết quả
print("Value at Risk (VaR):")
for method, value in var_results.items():
    print(f"{method}: {value:.4%}")

print("\nExpected Shortfall (ES):")
for method, value in es_results.items():
    print(f"{method}: {value:.4%}")

# Vẽ biểu đồ phân phối lợi suất và VaR
plt.figure(figsize=(10, 6))
plt.hist(returns, bins=50, density=True, alpha=0.6, color='b')

# Thêm đường VaR
colors = ['r', 'g']
for i, cl in enumerate(confidence_levels):
    var_value = var_results[f'Historical VaR {cl*100}%']
    plt.axvline(-var_value, color=colors[i], linestyle='--', 
               label=f'VaR {cl*100}%: {var_value:.2%}')
    
    es_value = es_results[f'Expected Shortfall {cl*100}%']
    plt.axvline(-es_value, color=colors[i], linestyle=':',
               label=f'ES {cl*100}%: {es_value:.2%}')

plt.title('Phân phối lợi suất và thước đo rủi ro')
plt.xlabel('Lợi suất')
plt.ylabel('Tần suất')
plt.legend()
plt.grid(True)
plt.show()

2. Stress Testing và Scenario Analysis

Kiểm tra sức chịu đựng (stress testing) và phân tích kịch bản nhằm đánh giá khả năng chịu đựng của danh mục đầu tư hoặc tổ chức tài chính trong các tình huống thị trường bất lợi cực đoan.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Đọc dữ liệu danh mục đầu tư
portfolio = pd.read_csv('portfolio_holdings.csv')
print(portfolio.head())

# Định nghĩa các kịch bản stress test
scenarios = {
    'Khủng hoảng 2008': {
        'Cổ phiếu': -0.40,  # Giảm 40%
        'Trái phiếu doanh nghiệp': -0.15,  # Giảm 15%
        'Trái phiếu chính phủ': 0.05,  # Tăng 5% (flight to quality)
        'Hàng hóa': -0.30,  # Giảm 30%
        'Tiền mặt': 0.0  # Không thay đổi
    },
    'Đại dịch Covid-19': {
        'Cổ phiếu': -0.35,  # Giảm 35%
        'Trái phiếu doanh nghiệp': -0.10,  # Giảm 10%
        'Trái phiếu chính phủ': 0.03,  # Tăng 3%
        'Hàng hóa': -0.25,  # Giảm 25%
        'Tiền mặt': 0.0  # Không thay đổi
    },
    'Lạm phát cao': {
        'Cổ phiếu': -0.15,  # Giảm 15%
        'Trái phiếu doanh nghiệp': -0.20,  # Giảm 20%
        'Trái phiếu chính phủ': -0.25,  # Giảm 25%
        'Hàng hóa': 0.30,  # Tăng 30%
        'Tiền mặt': -0.10  # Giảm giá trị thực 10% do lạm phát
    }
}

# Tính giá trị danh mục ban đầu
initial_portfolio_value = portfolio['Market Value'].sum()
print(f"Giá trị danh mục ban đầu: ${initial_portfolio_value:,.2f}")

# Phân tích kịch bản
results = {}

for scenario_name, scenario_impacts in scenarios.items():
    portfolio_copy = portfolio.copy()
    
    # Áp dụng tác động của kịch bản
    for asset_class in scenario_impacts:
        impact = scenario_impacts[asset_class]
        portfolio_copy.loc[portfolio_copy['Asset Class'] == asset_class, 'Market Value'] *= (1 + impact)
    
    # Tính giá trị danh mục sau kịch bản
    post_scenario_value = portfolio_copy['Market Value'].sum()
    
    # Tính tổn thất
    loss_amount = post_scenario_value - initial_portfolio_value
    loss_percent = loss_amount / initial_portfolio_value
    
    results[scenario_name] = {
        'Post Scenario Value': post_scenario_value,
        'Loss Amount': loss_amount,
        'Loss Percent': loss_percent
    }

# In kết quả
print("\nKết quả Stress Testing:")
for scenario_name, result in results.items():
    print(f"\n{scenario_name}:")
    print(f"  Giá trị sau kịch bản: ${result['Post Scenario Value']:,.2f}")
    print(f"  Tổn thất: ${result['Loss Amount']:,.2f} ({result['Loss Percent']:.2%})")

# Vẽ biểu đồ so sánh
plt.figure(figsize=(10, 6))
scenario_names = list(results.keys())
losses = [results[s]['Loss Percent'] for s in scenario_names]

plt.bar(scenario_names, losses, color=['r', 'b', 'g'])
plt.axhline(y=0, color='black', linestyle='-', alpha=0.3)
plt.title('Tác động của các kịch bản Stress Test lên danh mục đầu tư')
plt.ylabel('Tổn thất (%)')
plt.grid(axis='y', alpha=0.3)

# Thêm nhãn giá trị
for i, v in enumerate(losses):
    plt.text(i, v - 0.03 if v < 0 else v + 0.01, f"{v:.1%}", ha='center')

plt.tight_layout()
plt.show()

Giao dịch thuật toán và HFT

1. Cơ bản về giao dịch thuật toán

Giao dịch thuật toán (Algorithmic Trading) là việc sử dụng các thuật toán để tự động hóa quyết định giao dịch, thường là để thực thi lệnh lớn một cách tối ưu hoặc tạo ra lợi nhuận từ các cơ hội giao dịch.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime, timedelta

# Chiến lược giao dịch trung bình động đơn giản
class MovingAverageCrossover:
    def __init__(self, short_window=20, long_window=50):
        self.short_window = short_window
        self.long_window = long_window
        self.positions = None
    
    def generate_signals(self, data):
        """
        Tạo tín hiệu giao dịch dựa trên cắt nhau của trung bình động
        """
        # Tạo DataFrame tín hiệu
        signals = pd.DataFrame(index=data.index)
        signals['price'] = data['Close']
        signals['short_mavg'] = data['Close'].rolling(window=self.short_window, min_periods=1).mean()
        signals['long_mavg'] = data['Close'].rolling(window=self.long_window, min_periods=1).mean()
        signals['signal'] = 0.0
        
        # Tạo tín hiệu: 1.0 khi short MA vượt long MA, -1.0 khi ngược lại
        signals['signal'] = np.where(signals['short_mavg'] > signals['long_mavg'], 1.0, 0.0)
        
        # Tạo vị thế: Thay đổi trong tín hiệu
        signals['position'] = signals['signal'].diff()
        
        # Lưu vị thế
        self.positions = signals
        
        return signals
    
    def backtest(self, data, initial_capital=100000.0):
        """
        Backtest chiến lược và tính toán hiệu suất
        """
        # Tạo tín hiệu
        signals = self.generate_signals(data)
        
        # Tạo DataFrame hiệu suất
        portfolio = pd.DataFrame(index=signals.index)
        portfolio['positions'] = signals['signal']
        portfolio['price'] = signals['price']
        portfolio['cash'] = initial_capital - (signals['signal'] * signals['price']).cumsum()
        portfolio['holdings'] = signals['signal'] * signals['price']
        portfolio['total'] = portfolio['cash'] + portfolio['holdings']
        portfolio['returns'] = portfolio['total'].pct_change()
        
        return portfolio

# Đọc dữ liệu
data = pd.read_csv('stock_prices.csv', index_col='Date', parse_dates=True)

# Tạo và backtest chiến lược
mac = MovingAverageCrossover(short_window=20, long_window=50)
portfolio = mac.backtest(data)

# Vẽ biểu đồ kết quả
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8), sharex=True)

# Biểu đồ giá và trung bình động
ax1.plot(data.index, data['Close'], label='Price')
ax1.plot(mac.positions.index, mac.positions['short_mavg'], label=f'{mac.short_window} Day MA')
ax1.plot(mac.positions.index, mac.positions['long_mavg'], label=f'{mac.long_window} Day MA')

# Đánh dấu tín hiệu mua/bán
buy_signals = mac.positions[mac.positions['position'] == 1.0]
sell_signals = mac.positions[mac.positions['position'] == -1.0]
ax1.plot(buy_signals.index, data.loc[buy_signals.index, 'Close'], '^', 
        markersize=10, color='g', label='Buy Signal')
ax1.plot(sell_signals.index, data.loc[sell_signals.index, 'Close'], 'v', 
        markersize=10, color='r', label='Sell Signal')

ax1.set_title('Chiến lược giao dịch trung bình động')
ax1.set_ylabel('Giá')
ax1.legend()
ax1.grid(True)

# Biểu đồ giá trị danh mục
ax2.plot(portfolio.index, portfolio['total'], label='Portfolio Value')
ax2.set_ylabel('Giá trị')
ax2.set_xlabel('Ngày')
ax2.legend()
ax2.grid(True)

plt.tight_layout()
plt.show()

# Tính các chỉ số hiệu suất
total_return = (portfolio['total'][-1] - portfolio['total'][0]) / portfolio['total'][0]
annual_return = total_return / (len(portfolio) / 252)  # Giả sử 252 ngày giao dịch/năm
annual_volatility = portfolio['returns'].std() * np.sqrt(252)
sharpe_ratio = annual_return / annual_volatility  # Giả sử lãi suất không rủi ro = 0

print(f"Tổng lợi nhuận: {total_return:.2%}")
print(f"Lợi nhuận hàng năm: {annual_return:.2%}")
print(f"Biến động hàng năm: {annual_volatility:.2%}")
print(f"Tỷ số Sharpe: {sharpe_ratio:.2f}")

2. Giao dịch tần suất cao (HFT)

Giao dịch tần suất cao (High-Frequency Trading) là một hình thức giao dịch thuật toán sử dụng công nghệ tiên tiến để thực hiện một lượng lớn giao dịch với tốc độ cực nhanh, thường là mili giây hoặc micro giây.

Các chiến lược HFT phổ biến bao gồm:

Market Making: Cung cấp thanh khoản bằng cách đặt lệnh mua và bán đồng thời, kiếm lời từ chênh lệch giá mua-bán.
Statistical Arbitrage: Tận dụng mối tương quan thống kê giữa các công cụ tài chính.
Latency Arbitrage: Tận dụng sự chênh lệch thời gian giữa các thị trường hoặc nguồn dữ liệu.

# Mô phỏng chiến lược Market Making đơn giản

class MarketMaker:
    def __init__(self, spread_threshold=0.05, position_limit=100):
        self.spread_threshold = spread_threshold
        self.position_limit = position_limit
        self.position = 0
        self.trades = []
        self.pnl = 0
    
    def process_tick(self, timestamp, bid, ask):
        """
        Xử lý mỗi tick dữ liệu và đưa ra quyết định giao dịch
        """
        spread = ask - bid
        mid_price = (bid + ask) / 2
        
        # Nếu spread đủ rộng để bù đắp chi phí giao dịch và rủi ro
        if spread > self.spread_threshold:
            # Chiến lược mean-reversion đơn giản
            if self.position < self.position_limit:
                # Đặt lệnh mua ở giá bid
                self.position += 1
                self.trades.append({
                    'timestamp': timestamp,
                    'action': 'BUY',
                    'price': bid,
                    'position': self.position
                })
                self.pnl -= bid
            
            if self.position > -self.position_limit:
                # Đặt lệnh bán ở giá ask
                self.position -= 1
                self.trades.append({
                    'timestamp': timestamp,
                    'action': 'SELL',
                    'price': ask,
                    'position': self.position
                })
                self.pnl += ask
        
        return {
            'timestamp': timestamp,
            'bid': bid,
            'ask': ask,
            'position': self.position,
            'pnl': self.pnl
        }

# Mô phỏng dữ liệu tick
def generate_tick_data(num_ticks=1000, base_price=100.0, volatility=0.1):
    ticks = []
    price = base_price
    
    start_time = datetime.now()
    
    for i in range(num_ticks):
        # Mô phỏng biến động giá theo quá trình ngẫu nhiên
        price_change = np.random.normal(0, volatility)
        price += price_change
        
        # Tạo spread ngẫu nhiên
        spread = np.random.uniform(0.01, 0.2)
        
        # Tính giá bid và ask
        bid = price - spread / 2
        ask = price + spread / 2
        
        # Tạo timestamp
        timestamp = start_time + timedelta(milliseconds=i * 100)
        
        ticks.append({
            'timestamp': timestamp,
            'bid': bid,
            'ask': ask
        })
    
    return pd.DataFrame(ticks)

# Tạo dữ liệu mô phỏng
tick_data = generate_tick_data(num_ticks=1000)

# Mô phỏng chiến lược market making
market_maker = MarketMaker(spread_threshold=0.1, position_limit=10)
results = []

for _, tick in tick_data.iterrows():
    result = market_maker.process_tick(tick['timestamp'], tick['bid'], tick['ask'])
    results.append(result)

# Chuyển kết quả thành DataFrame
results_df = pd.DataFrame(results)

# Vẽ biểu đồ kết quả
fig, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(12, 10), sharex=True)

# Biểu đồ giá bid-ask
ax1.plot(results_df['timestamp'], results_df['bid'], label='Bid', color='g', alpha=0.7)
ax1.plot(results_df['timestamp'], results_df['ask'], label='Ask', color='r', alpha=0.7)
ax1.set_title('Giá Bid-Ask')
ax1.set_ylabel('Giá')
ax1.legend()
ax1.grid(True)

# Biểu đồ vị thế
ax2.plot(results_df['timestamp'], results_df['position'], label='Position', color='b')
ax2.axhline(y=0, color='k', linestyle='-', alpha=0.3)
ax2.set_title('Vị thế')
ax2.set_ylabel('Số lượng')
ax2.grid(True)

# Biểu đồ PnL
ax3.plot(results_df['timestamp'], results_df['pnl'], label='PnL', color='purple')
ax3.set_title('Lợi nhuận/Lỗ (P&L)')
ax3.set_ylabel('USD')
ax3.set_xlabel('Thời gian')
ax3.grid(True)

plt.tight_layout()
plt.show()

# Tính toán thống kê
trades_df = pd.DataFrame(market_maker.trades)
if not trades_df.empty:
    num_trades = len(trades_df)
    buys = trades_df[trades_df['action'] == 'BUY']
    sells = trades_df[trades_df['action'] == 'SELL']
    
    avg_buy_price = buys['price'].mean() if not buys.empty else 0
    avg_sell_price = sells['price'].mean() if not sells.empty else 0
    
    print(f"Tổng số giao dịch: {num_trades}")
    print(f"Số lệnh mua: {len(buys)}")
    print(f"Số lệnh bán: {len(sells)}")
    print(f"Giá mua trung bình: ${avg_buy_price:.2f}")
    print(f"Giá bán trung bình: ${avg_sell_price:.2f}")
    print(f"Chênh lệch trung bình: ${avg_sell_price - avg_buy_price:.2f}")
    print(f"P&L cuối cùng: ${results_df['pnl'].iloc[-1]:.2f}")

Machine Learning trong tài chính định lượng

1. Dự báo giá và phân loại xu hướng

Machine Learning có thể được sử dụng để dự báo giá tài sản hoặc phân loại xu hướng thị trường.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, TimeSeriesSplit
from sklearn.metrics import accuracy_score, confusion_matrix, mean_squared_error
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
import seaborn as sns

# Đọc dữ liệu
data = pd.read_csv('stock_data.csv', index_col='Date', parse_dates=True)

# Tạo các đặc trưng kỹ thuật
def create_features(df):
    """
    Tạo các đặc trưng kỹ thuật từ dữ liệu giá
    """
    df = df.copy()
    
    # Lợi suất
    df['Returns'] = df['Close'].pct_change()
    
    # Trung bình động
    for window in [5, 10, 20, 50]:
        df[f'MA_{window}'] = df['Close'].rolling(window=window).mean()
        df[f'MA_ratio_{window}'] = df['Close'] / df[f'MA_{window}']
    
    # Độ biến động
    for window in [5, 10, 20, 50]:
        df[f'Volatility_{window}'] = df['Returns'].rolling(window=window).std()
    
    # RSI (Relative Strength Index)
    def calc_rsi(price, n=14):
        delta = price.diff()
        up, down = delta.copy(), delta.copy()
        up[up < 0] = 0
        down[down > 0] = 0
        roll_up = up.rolling(n).mean()
        roll_down = abs(down.rolling(n).mean())
        rs = roll_up / roll_down
        rsi = 100.0 - (100.0 / (1.0 + rs))
        return rsi
    
    df['RSI_14'] = calc_rsi(df['Close'])
    
    # MACD (Moving Average Convergence Divergence)
    ema12 = df['Close'].ewm(span=12, adjust=False).mean()
    ema26 = df['Close'].ewm(span=26, adjust=False).mean()
    df['MACD'] = ema12 - ema26
    df['MACD_signal'] = df['MACD'].ewm(span=9, adjust=False).mean()
    df['MACD_hist'] = df['MACD'] - df['MACD_signal']
    
    # Bollinger Bands
    df['BB_middle'] = df['Close'].rolling(window=20).mean()
    df['BB_std'] = df['Close'].rolling(window=20).std()
    df['BB_upper'] = df['BB_middle'] + 2 * df['BB_std']
    df['BB_lower'] = df['BB_middle'] - 2 * df['BB_std']
    df['BB_width'] = (df['BB_upper'] - df['BB_lower']) / df['BB_middle']
    
    # Mục tiêu: Phân loại xu hướng (1 nếu giá tăng sau 5 ngày, 0 nếu không)
    df['Target_Direction'] = (df['Close'].shift(-5) > df['Close']).astype(int)
    
    # Mục tiêu: Dự báo % thay đổi sau 5 ngày
    df['Target_Return'] = df['Close'].pct_change(periods=5).shift(-5)
    
    # Xóa các dòng có giá trị NaN
    df.dropna(inplace=True)
    
    return df

# Tạo đặc trưng
feature_data = create_features(data)

# Phân loại xu hướng
# Chia dữ liệu thành tập huấn luyện và kiểm tra theo thời gian
def prepare_data_for_classification(df, test_size=0.2):
    features = [col for col in df.columns if col not in ['Target_Direction', 'Target_Return', 'Open', 'High', 'Low', 'Close', 'Volume']]
    X = df[features]
    y = df['Target_Direction']
    
    split_idx = int(len(df) * (1 - test_size))
    X_train, X_test = X.iloc[:split_idx], X.iloc[split_idx:]
    y_train, y_test = y.iloc[:split_idx], y.iloc[split_idx:]
    
    # Chuẩn hóa dữ liệu
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    return X_train_scaled, X_test_scaled, y_train, y_test, features

# Huấn luyện mô hình phân loại
X_train, X_test, y_train, y_test, features = prepare_data_for_classification(feature_data)

clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

# Đánh giá mô hình
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Độ chính xác: {accuracy:.2%}")

# Ma trận nhầm lẫn
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=['Giảm', 'Tăng'], 
            yticklabels=['Giảm', 'Tăng'])
plt.xlabel('Dự đoán')
plt.ylabel('Thực tế')
plt.title('Ma trận nhầm lẫn')
plt.show()

# Tầm quan trọng của các đặc trưng
feature_importance = pd.DataFrame({
    'feature': features,
    'importance': clf.feature_importances_
}).sort_values('importance', ascending=False)

plt.figure(figsize=(10, 6))
sns.barplot(x='importance', y='feature', data=feature_importance.head(15))
plt.title('Tầm quan trọng của các đặc trưng trong dự đoán xu hướng')
plt.show()

# Dự báo giá
def prepare_data_for_regression(df, test_size=0.2):
    features = [col for col in df.columns if col not in ['Target_Direction', 'Target_Return', 'Open', 'High', 'Low', 'Close', 'Volume']]
    X = df[features]
    y = df['Target_Return']
    
    split_idx = int(len(df) * (1 - test_size))
    X_train, X_test = X.iloc[:split_idx], X.iloc[split_idx:]
    y_train, y_test = y.iloc[:split_idx], y.iloc[split_idx:]
    
    # Chuẩn hóa dữ liệu
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    return X_train_scaled, X_test_scaled, y_train, y_test, X_test.index

# Huấn luyện mô hình hồi quy
X_train_reg, X_test_reg, y_train_reg, y_test_reg, test_dates = prepare_data_for_regression(feature_data)

reg = RandomForestRegressor(n_estimators=100, random_state=42)
reg.fit(X_train_reg, y_train_reg)

# Dự đoán và đánh giá
y_pred_reg = reg.predict(X_test_reg)
mse = mean_squared_error(y_test_reg, y_pred_reg)
rmse = np.sqrt(mse)
print(f"RMSE: {rmse:.4f}")

# Vẽ biểu đồ dự báo vs thực tế
plt.figure(figsize=(12, 6))
plt.plot(test_dates, y_test_reg.values, label='Thực tế', color='blue')
plt.plot(test_dates, y_pred_reg, label='Dự báo', color='red', linestyle='--')
plt.xlabel('Ngày')
plt.ylabel('Lợi suất sau 5 ngày')
plt.title('Dự báo lợi suất')
plt.legend()
plt.grid(True)
plt.show()

2. Machine Learning trong quản lý danh mục

Thay vì sử dụng các mô hình truyền thống, Machine Learning có thể được áp dụng để tối ưu hóa danh mục đầu tư.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from scipy.optimize import minimize

# Đọc dữ liệu lợi suất
returns_data = pd.read_csv('stock_returns.csv', index_col='Date', parse_dates=True)

# Tính toán thống kê của lợi suất
mean_returns = returns_data.mean()
cov_matrix = returns_data.cov()

# Phân cụm các cổ phiếu theo đặc điểm
def cluster_stocks(returns, n_clusters=5):
    """
    Phân cụm các cổ phiếu dựa trên các đặc trưng lợi suất và rủi ro
    """
    # Tạo các đặc trưng cho mỗi cổ phiếu
    features = pd.DataFrame(index=returns.columns)
    features['mean_return'] = returns.mean()
    features['volatility'] = returns.std()
    features['sharpe'] = features['mean_return'] / features['volatility']
    
    # Thêm đặc trưng tương quan
    for i, stock in enumerate(returns.columns):
        for j, other_stock in enumerate(returns.columns):
            if i < j:  # Chỉ lấy phần tam giác dưới của ma trận tương quan
                corr = returns[stock].corr(returns[other_stock])
                features[f'corr_{stock}_{other_stock}'] = corr
    
    # Chuẩn hóa dữ liệu
    scaler = StandardScaler()
    scaled_features = scaler.fit_transform(features)
    
    # Phân cụm
    kmeans = KMeans(n_clusters=n_clusters, random_state=42)
    clusters = kmeans.fit_predict(scaled_features)
    
    # Thêm nhãn cụm vào DataFrame
    features['cluster'] = clusters
    
    return features

# Tối ưu hóa danh mục dựa trên kết quả phân cụm
def optimize_clustered_portfolio(returns, features, risk_aversion=3):
    """
    Tối ưu hóa danh mục với trọng số được phân bổ cho từng cụm trước,
    sau đó phân bổ trong cụm
    """
    mean_returns = returns.mean()
    cov_matrix = returns.cov()
    
    # Số lượng cụm
    n_clusters = features['cluster'].nunique()
    
    # Tối ưu hóa trọng số cho mỗi cụm
    def cluster_objective(weights):
        portfolio_return = 0
        portfolio_variance = 0
        
        for i in range(n_clusters):
            # Lấy các cổ phiếu trong cụm i
            stocks_in_cluster = features[features['cluster'] == i].index
            
            # Tính lợi suất và phương sai cho cụm
            cluster_mean = mean_returns[stocks_in_cluster].mean()
            cluster_var = cov_matrix.loc[stocks_in_cluster, stocks_in_cluster].values.mean()
            
            portfolio_return += weights[i] * cluster_mean
            portfolio_variance += weights[i]**2 * cluster_var
        
        # Hàm mục tiêu: Lợi nhuận điều chỉnh theo rủi ro
        return -(portfolio_return - 0.5 * risk_aversion * portfolio_variance)
    
    # Ràng buộc: Tổng trọng số = 1
    constraints = [{'type': 'eq', 'fun': lambda x: np.sum(x) - 1}]
    
    # Giới hạn: Trọng số không âm
    bounds = tuple((0, 1) for _ in range(n_clusters))
    
    # Giá trị khởi tạo: Phân bổ đều
    init_weights = np.array([1/n_clusters] * n_clusters)
    
    # Tối ưu hóa trọng số cụm
    result = minimize(cluster_objective, init_weights, 
                     method='SLSQP', bounds=bounds, constraints=constraints)
    
    cluster_weights = result['x']
    
    # Tối ưu hóa trọng số trong mỗi cụm
    stock_weights = {}
    
    for i in range(n_clusters):
        # Lấy các cổ phiếu trong cụm i
        stocks_in_cluster = features[features['cluster'] == i].index
        
        if len(stocks_in_cluster) == 0:
            continue
        
        # Tối ưu hóa trong cụm
        def stock_objective(weights):
            cluster_returns = mean_returns[stocks_in_cluster]
            cluster_cov = cov_matrix.loc[stocks_in_cluster, stocks_in_cluster]
            
            portfolio_return = np.sum(cluster_returns * weights)
            portfolio_variance = np.dot(weights.T, np.dot(cluster_cov, weights))
            
            return -(portfolio_return - 0.5 * risk_aversion * portfolio_variance)
        
        # Ràng buộc: Tổng trọng số = 1
        constraints = [{'type': 'eq', 'fun': lambda x: np.sum(x) - 1}]
        
        # Giới hạn: Trọng số không âm
        bounds = tuple((0, 1) for _ in range(len(stocks_in_cluster)))
        
        # Giá trị khởi tạo: Phân bổ đều
        init_stock_weights = np.array([1/len(stocks_in_cluster)] * len(stocks_in_cluster))
        
        # Tối ưu hóa
        stock_result = minimize(stock_objective, init_stock_weights, 
                               method='SLSQP', bounds=bounds, constraints=constraints)
        
        # Lưu trọng số cho mỗi cổ phiếu
        for j, stock in enumerate(stocks_in_cluster):
            stock_weights[stock] = stock_result['x'][j] * cluster_weights[i]
    
    return stock_weights

# Thực hiện phân cụm
stock_clusters = cluster_stocks(returns_data, n_clusters=5)
print("Kết quả phân cụm:")
print(stock_clusters[['mean_return', 'volatility', 'sharpe', 'cluster']].sort_values('cluster'))

# Vẽ biểu đồ phân cụm
plt.figure(figsize=(10, 6))
for cluster in range(stock_clusters['cluster'].nunique()):
    cluster_stocks = stock_clusters[stock_clusters['cluster'] == cluster]
    plt.scatter(cluster_stocks['volatility'], 
               cluster_stocks['mean_return'], 
               label=f'Cụm {cluster}')

plt.xlabel('Rủi ro (độ lệch chuẩn)')
plt.ylabel('Lợi suất kỳ vọng')
plt.title('Phân cụm cổ phiếu theo đặc trưng rủi ro - lợi nhuận')
plt.legend()
plt.grid(True)
plt.show()

# Tối ưu hóa danh mục đầu tư dựa trên kết quả phân cụm
optimized_weights = optimize_clustered_portfolio(returns_data, stock_clusters)

# Hiển thị trọng số
weights_df = pd.DataFrame(list(optimized_weights.items()), columns=['Stock', 'Weight'])
weights_df = weights_df.sort_values('Weight', ascending=False)

plt.figure(figsize=(12, 6))
plt.bar(weights_df['Stock'], weights_df['Weight'])
plt.xticks(rotation=90)
plt.xlabel('Cổ phiếu')
plt.ylabel('Trọng số')
plt.title('Trọng số tối ưu của danh mục dựa trên phân cụm')
plt.grid(True, axis='y')
plt.tight_layout()
plt.show()

# Tính hiệu suất của danh mục tối ưu
portfolio_returns = returns_data.dot(pd.Series(optimized_weights))
cumulative_returns = (1 + portfolio_returns).cumprod()

plt.figure(figsize=(10, 6))
plt.plot(cumulative_returns)
plt.title('Lợi nhuận tích lũy của danh mục tối ưu')
plt.xlabel('Ngày')
plt.ylabel('Giá trị')
plt.grid(True)
plt.show()

# Tính các chỉ số hiệu suất
annual_return = portfolio_returns.mean() * 252
annual_volatility = portfolio_returns.std() * np.sqrt(252)
sharpe_ratio = annual_return / annual_volatility

print(f"Lợi nhuận hàng năm: {annual_return:.2%}")
print(f"Rủi ro hàng năm: {annual_volatility:.2%}")
print(f"Tỷ số Sharpe: {sharpe_ratio:.2f}")

3. Deep Learning trong dự báo thị trường

Deep Learning, đặc biệt là các mô hình RNN, LSTM và Transformer, đã cho thấy hiệu quả trong việc xử lý dữ liệu chuỗi thời gian tài chính.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Dropout

# Đọc dữ liệu
data = pd.read_csv('stock_prices.csv', index_col='Date', parse_dates=True)

# Chọn giá đóng cửa và chuẩn hóa dữ liệu
prices = data['Close'].values.reshape(-1, 1)
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_prices = scaler.fit_transform(prices)

# Chuẩn bị dữ liệu cho LSTM
def create_dataset(data, time_steps=60):
    X, y = [], []
    for i in range(len(data) - time_steps):
        X.append(data[i:(i + time_steps), 0])
        y.append(data[i + time_steps, 0])
    return np.array(X), np.array(y)

time_steps = 60  # Sử dụng 60 ngày để dự đoán ngày tiếp theo
X, y = create_dataset(scaled_prices, time_steps)

# Chia dữ liệu thành tập huấn luyện và kiểm tra
train_size = int(len(X) * 0.8)
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]

# Định dạng lại dữ liệu cho LSTM [samples, time steps, features]
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))

# Xây dựng mô hình LSTM
model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=(X_train.shape[1], 1)))
model.add(Dropout(0.2))
model.add(LSTM(units=50, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=50))
model.add(Dropout(0.2))
model.add(Dense(units=1))

model.compile(optimizer='adam', loss='mean_squared_error')

# Huấn luyện mô hình
history = model.fit(X_train, y_train, epochs=20, batch_size=32, 
                   validation_data=(X_test, y_test), verbose=1)

# Vẽ biểu đồ loss
plt.figure(figsize=(10, 6))
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)
plt.show()

# Dự đoán
y_pred = model.predict(X_test)

# Chuyển đổi về giá gốc
y_test_inv = scaler.inverse_transform(y_test.reshape(-1, 1))
y_pred_inv = scaler.inverse_transform(y_pred)

# Tính RMSE
rmse = np.sqrt(mean_squared_error(y_test_inv, y_pred_inv))
print(f'RMSE: {rmse:.2f}')

# Vẽ biểu đồ dự đoán vs thực tế
plt.figure(figsize=(16, 8))
plt.plot(y_test_inv, label='Giá thực tế')
plt.plot(y_pred_inv, label='Giá dự đoán')
plt.title('Dự đoán giá cổ phiếu sử dụng LSTM')
plt.xlabel('Ngày giao dịch')
plt.ylabel('Giá cổ phiếu')
plt.legend()
plt.grid(True)
plt.show()

# Dự đoán giá trong tương lai
def predict_future_prices(model, current_data, future_days, scaler):
    future_predictions = []
    input_data = current_data[-1].reshape(1, time_steps, 1)
    
    for _ in range(future_days):
        next_pred = model.predict(input_data)
        future_predictions.append(next_pred[0, 0])
        
        # Cập nhật dữ liệu đầu vào cho dự đoán tiếp theo
        input_data = np.append(input_data[:, 1:, :], 
                               [[next_pred[0, 0]]], 
                               axis=1)
    
    # Chuyển đổi về giá gốc
    future_predictions = np.array(future_predictions).reshape(-1, 1)
    future_predictions = scaler.inverse_transform(future_predictions)
    
    return future_predictions

# Dự đoán 30 ngày tiếp theo
future_days = 30
future_pred = predict_future_prices(model, X_test, future_days, scaler)

# Vẽ biểu đồ dự đoán tương lai
plt.figure(figsize=(16, 8))
plt.plot(y_test_inv[-100:], label='Giá gần đây')
plt.plot(np.arange(len(y_test_inv[-1:]), len(y_test_inv[-1:]) + future_days), 
         future_pred, label='Dự đoán tương lai')
plt.axvline(x=len(y_test_inv[-1:]) - 1, color='r', linestyle='--')
plt.title('Dự đoán giá cổ phiếu 30 ngày tiếp theo')
plt.xlabel('Ngày giao dịch')
plt.ylabel('Giá cổ phiếu')
plt.legend()
plt.grid(True)
plt.show()

Thách thức và giới hạn

1. Những hạn chế của mô hình

Các mô hình định lượng trong tài chính đối mặt với nhiều hạn chế quan trọng:

Giả định không thực tế: Nhiều mô hình dựa trên các giả định đơn giản hóa quá mức như phân phối chuẩn của lợi suất, trong khi dữ liệu thực tế thường có "đuôi dày" (fat tails) và phi tuyến tính.
Rủi ro mô hình: Sử dụng mô hình không phù hợp có thể dẫn đến quyết định sai lầm và tổn thất lớn.
Overfitting: Các mô hình phức tạp có thể hoạt động tốt với dữ liệu lịch sử nhưng không tổng quát hóa tốt cho dữ liệu mới.
Những biến động cực đoan: Mô hình thường không dự đoán được các sự kiện "thiên nga đen" (black swan events) - những biến động thị trường cực đoan và hiếm gặp.

2. Rủi ro hệ thống

Khi nhiều tổ chức sử dụng các mô hình và chiến lược tương tự, có thể dẫn đến rủi ro hệ thống:

Đồng nhất hành vi: Nếu nhiều nhà giao dịch sử dụng các tín hiệu tương tự, họ có thể hành động cùng lúc, làm trầm trọng thêm các biến động thị trường.
Phản hồi tích cực: Khi các thuật toán phản ứng với biến động thị trường, chúng có thể kích hoạt phản ứng dây chuyền, dẫn đến "flash crash" hoặc biến động cực đoan.
Thiếu thanh khoản đột ngột: Khi nhiều thuật toán đồng thời rút lui khỏi thị trường trong điều kiện căng thẳng, thanh khoản có thể biến mất đột ngột.

3. Thách thức về dữ liệu

Chất lượng dữ liệu là yếu tố quyết định thành công trong tài chính định lượng:

Tiếng ồn và sai số: Dữ liệu tài chính thường chứa nhiều nhiễu và sai số.
Phi tĩnh: Các mối quan hệ trong dữ liệu tài chính không cố định theo thời gian, làm cho các mô hình nhanh chóng trở nên lỗi thời.
Thiên lệch sinh tồn: Dữ liệu có thể bị thiên lệch vì chỉ bao gồm các công ty "sống sót" (survivorship bias).
Thay đổi cơ chế thị trường: Quy định mới, công nghệ, và thay đổi hành vi có thể làm thay đổi cơ chế hoạt động của thị trường.

Tương lai của tài chính định lượng

1. Xu hướng mới nổi

Tài chính định lượng đang phát triển theo nhiều hướng mới:

AI và học sâu: Các mô hình học sâu ngày càng được áp dụng nhiều hơn để phát hiện các mẫu phức tạp trong dữ liệu tài chính.
Học tăng cường: Sử dụng học tăng cường để tối ưu hóa chiến lược giao dịch và quản lý danh mục đầu tư.
Dữ liệu thay thế: Nguồn dữ liệu phi truyền thống như dữ liệu vệ tinh, mạng xã hội, và dữ liệu cảm biến đang được khai thác để tìm kiếm lợi thế thông tin.
Tính toán lượng tử: Công nghệ máy tính lượng tử có thể mở ra khả năng giải quyết các bài toán tối ưu hóa phức tạp nhanh hơn.

2. Quy định và đạo đức

Khi tài chính định lượng trở nên phổ biến hơn, các vấn đề về quy định và đạo đức trở nên quan trọng:

Minh bạch và khả năng giải thích: Các cơ quan quản lý đang yêu cầu nhiều hơn về khả năng giải thích các mô hình "hộp đen".
Thao túng thị trường: Cần có quy định để ngăn chặn việc sử dụng thuật toán để thao túng thị trường.
Công bằng và tiếp cận: Đảm bảo rằng các lợi thế công nghệ không dẫn đến sự bất bình đẳng nghiêm trọng trong thị trường.
Ổn định hệ thống: Các cơ quan quản lý đang tìm cách giảm thiểu rủi ro hệ thống do giao dịch thuật toán.

3. Sự hội tụ với các ngành khác

Tài chính định lượng đang hội tụ với nhiều lĩnh vực khác:

Khoa học dữ liệu: Kỹ thuật phân tích dữ liệu tiên tiến từ khoa học dữ liệu đang được áp dụng vào tài chính.
Khoa học máy tính: Các kỹ thuật tính toán hiệu suất cao và trí tuệ nhân tạo đang thay đổi cách phân tích và giao dịch.
Vật lý và toán học: Các mô hình từ vật lý thống kê và toán học tiên tiến đang được áp dụng để hiểu động lực thị trường.
Tài chính hành vi: Hiểu biết về tâm lý học và hành vi của nhà đầu tư đang được tích hợp vào các mô hình định lượng.

Kết luận

Phân tích định lượng đã trở thành một phần không thể thiếu của ngành tài chính hiện đại. Từ việc định giá tài sản, quản lý danh mục đầu tư đến giao dịch thuật toán và quản lý rủi ro, các phương pháp định lượng đang định hình lại cách thức hoạt động của thị trường tài chính.

Tuy nhiên, điều quan trọng là phải nhận thức được những hạn chế của các mô hình định lượng và không phụ thuộc hoàn toàn vào chúng. Sự kết hợp giữa phân tích định lượng và phán đoán con người, cùng với sự hiểu biết sâu sắc về bối cảnh kinh tế và rủi ro, là cách tiếp cận cân bằng nhất.

Khi công nghệ tiếp tục phát triển, tài chính định lượng sẽ tiếp tục phát triển, mang lại cả cơ hội và thách thức cho các nhà đầu tư, tổ chức tài chính và cơ quan quản lý. Những người có thể kết hợp hiệu quả các kỹ thuật định lượng tiên tiến với hiểu biết thực tế về thị trường tài chính sẽ có vị thế tốt nhất để thành công trong môi trường đầy thách thức này.

in Phân tích định lượng