The Profitron: Attempting to Build a Scalable Trading Infrastructure

What is this?

I'd like to share some infrastructure I've been building to facilitate the creation and deployment of my own quant trading strategies. This post provides an in-depth overview of the whole thing.

Background

A big motivation towards creating my own system and deploying my own strats was reading The Man Who Solved the Market by Gregory Zuckerman. Highly recommend if you're interested in the quantitative trading space. It particularly got me interested in stat arb, which is the first thing I attempted with this system.

My goals


1. Learning Implementing this system provides unique exposure into concepts like market microstructure, data processing, strategy logic, etc. that I otherwise have glossed over in the past.
2. Profit My understanding is that it's pretty difficult to capture alpha without the knowledge, infrastructure and resources that come with being a part of an actual firm, but with enough learning/research, careful planning and proper execution, I believe that it's possible to collect some crumbs (at least on the mid/low frequency side).

Let's get into it!

Overview of components

Below is a breakdown of the major components of this system:

  • data/: data collection, preprocessing, synchronizing, interpolation, etc. if i need any type of historical or live data, no matter the frequency, i get it from here.

  • strategy/: holds the actual strategy logic. each strategy defines how signals are generated, what data it needs, and what actions to take based on those signals.

  • signals/: this provides an abstraction for a signal, which can be generated by a strategy and fed into my execution component when live trading, or my backtesting component when backtesting.

  • brokers/: wraps broker interfaces (like alpaca). this is where orders actually get submitted in live trading, and where i can plug in paper or mock brokers too.

  • execution/: once a signal is generated, execution handles translating that into actual orders. includes stuff like order sizing, slippage modeling, and basic risk checks.

  • portfolio/: tracks live or simulated positions, cash, and pnl. also handles capital allocation if i’m running multiple strategies.

  • backtest/: runs the full system on historical data so i can test strategies before deploying them live. computes performance metrics like sharpe, max drawdown etc.

  • core/: glue code that ties everything together. the main event loop lives here which fetches data, runs strategies, generates signals, routes them to execution or backtest depending on the mode.

  • research/: mostly notebooks and scripts I use to do exploratory stuff like testing cointegration, plotting spreads, tuning parameters, looking at the shape of data, etc.

  • utils.py: general utilities like timing, logging (soon), data formatting, etc.

  • main.py: entry point for the whole system. reads YAML config and starts up either live trading or backtesting mode depending on what I’m doing.
    trading_diagram.png


Now I shall go in depth with each component!

Data/

This is where everything starts!! Without clean, synchronized data, nothing else matters. I've read more than a couple of experts in data-centric domains say "shit in, shit out". The data/ module is responsible for pulling in both historical and live price data, sometimes from multiple sources depending on the start and end date and asset class. It also handles all the preprocessing stuff like filling missing values, aligning time indices, and resampling to the frequency that each strategy expects.

Everything is built around a DataManager class, which is kind of like a central hub. it knows which symbols I care about, what frequency to use, and which DataSource to call. each data source (like alpaca or binance) is its own class that implements a standard interface so i can swap them in and out depending on what i'm trading.

This component is responsible for:

  • pulling raw historical and live data from APIs or local CSVs
  • forward/backfilling and interpolating missing values
  • resampling and aligning timestamps across assets
  • producing clean pandas DataFrames for downstream use

Strategy/

This is where the brains of the system live. Every strategy has its own logic for interpreting market data and generating signals. Each one inherits from a base TradingStrategy class and defines three key things:

  1. what data it needs
  2. how it processes that data
  3. when it decides to trade

I built it so that each strategy is modular. I can run a single one or multiple at the same time, and each will track its own state and signals. Right now my main focus is on equity stat arb, specifically cointegration-based mean reversion.

How the stat arb strat works

The idea is to find pairs of stocks that are cointegrated, meaning there's a linear relationship between them over time even if each one wanders around on its own. If the spread between them diverges from the mean, there's an opportunity to bet on mean reversion.

For a pair , I compute the hedge ratio via OLS:

The spread is then:

And the z-score is:

I long the spread when , and short it when . The strategy tracks when a trade is opened, monitors the spread, and closes when it mean-reverts back toward zero. I'm trying to optimize these parameters but find very mixed results and I'm afraid of overfitting if I get too precise.

How it plugs in

Each strategy exposes a .generate_signals() method that is called on every new bar of data. This method returns signal objects that get routed either to the execution engine (if live) or to the backtester (if simulating).

Strategies can also define their own config, like lookback windows, frequency, and which pairs to trade. The system uses this to automatically provision the data each strategy needs. As a matter of fact, most components have their own configs in this system which I'm not sure is the best decision. I guess I'll find out later.

Brokers/

This module abstracts away the broker-specific stuff so that the rest of the system doesn’t have to care whether I’m using Alpaca, a mock broker, or something else entirely. It acts like a plug-and-play interface for submitting orders, checking balances and syncing with my current positions.

Right now I’m using Alpaca for equities, but the design allows for easily swapping in other brokers later without having to rewrite my execution or strategy logic.

This component is responsible for:

  • Handling authentication and API keys
  • Submitting whatever orders I need
  • Tracking fills and updating order status
  • Pulling live account data like cash and open positions

for example:

When the strategy decides to buy 100 shares of AAPL, it doesn’t talk directly to Alpaca. Instead, it sends a standardized order object to the broker module, and the broker handles everything from there.

This separation makes it easier to test things too. I can swap in a paper broker for simulations or even use a mock one for unit tests. (this leads me to ask, does anyone know how paper trading differs from live execution in terms of testing the success of a strategy? Because it seems that with paper trading, you might not be properly factoring in slippage and fees because every order just gets filled immediately. How should I deal with this? Just factor in a small slippage coefficient into my pnl?)

Execution/

Once a signal is generated, this component is in charge of actually turning that into orders. It handles the logic for things like order sizing, applying slippage models, checking basic risk limits, and submitting the order to the broker.

The main classes here are the Executor and the OrderManager. The Executor is what receives the signal and decides what kind of order to create. The OrderManager handles tracking those orders, dealing with partial fills, retries, and keeping everything in sync with the broker.

This component is responsible for:

  • determining how much to buy or sell based on the signal and current portfolio
  • creating and formatting orders
  • sending orders to the broker interface
  • handling retries, rejects, and partial fills
  • enforcing basic sanity checks like max position size or max leverage

for example:

Let’s say a strategy sends a long signal for MSFT with a confidence score. The Executor takes that signal, figures out how many shares I can afford to buy given my cash and risk settings, and creates an order. That order is passed to the OrderManager, which sends it to the broker and tracks it until it’s either filled or canceled.

This part of the system is critical for live trading. Even if the strategy is perfect, bad execution can ruin everything, so this component really tries to stay on top of that. (I understand these concepts are not new in any way, I'm just documenting my learning process)

Portfolio/

This module keeps track of my positions, cash, and unrealized pnl whether I'm running live or just backtesting.

LivePortfolio connects to the broker and fetches real-time position data. MetaPortfolio is for balancing multiple different portfolios across multiple different brokerages.

This component is responsible for:

  • tracking open positions and their entry prices
  • updating position sizes after each trade
  • calculating live or simulated pnl
  • enforcing cash limits and position limits
  • supporting multiple strategy portfolios (MetaPortfolio)

for example:

Say I just bought 100 shares of AAPL. The portfolio gets updated with the position size, the entry price, and starts tracking value based on current prices. If I close the position later, it updates my cash balance.

This is also where I can start thinking about capital allocation stuff when I implement more strats. Like perhaps later down the line when I have more variation in the market condition requirements of my strats, I can allocate more capital towards more volatility based strategies in times of market turmoil for example.

Backtest/

This is where I test strategies before going live. The backtester runs the entire system on historical data. It uses the same strategies, the same signal generation, and the same execution logic.

The main components here are the Backtester and PerformanceStats. The backtester handles replaying the data and feeding it into the strategy and execution pipeline. The performance module keeps track of stats like Sharpe ratio, drawdowns, win rate etc.

This component is responsible for:

  • replaying historical data in order
  • running strategies and generating signals at each time step
  • simulating execution and updating the portfolio
  • computing performance metrics and plotting results

Core/

This holds everything together. I think it's pretty self explanatory so I won't waste anyone's time!!

Research/

This is where I pull data, explore it, and try to find some interesting relationship on which I can consistently trade. Let's say I think that when crude oil prices fluctuate, it causes a spike in implied volatility for US airline options. First I'll collect some data to verify that assumption, and maybe run some statistical tests to further develop my ideas. Only after I have found a solid statistical relationship backed by tangible evidence and a sensible hypothesis will I have some real alpha that I can trade with, because otherwise, how can we know when it starts and stops? In order to facilitate this process, I need a strong suite of tools. A sandbox if you will. Mostly Jupyter notebooks, CSVs, and single use scripts. You get the idea.

Here is an example of what one of my YAML configs look like:

mode: "backtest"

strategy:
  module: strategy.equities.mean_reversion.PairsTradingStrategyCollection
  class: PairsTradingStrategyCollection
  params:
    pairs_file: "src/strategy/equities/mean_reversion/config/pairs.csv"
    strategy_params:
      lookback: 60
      entry_z: 1.5
      exit_z: 0.5
      frequency: "1d"
      fields: ["close"]
      poll_interval: 60
      hedge_ratio_method: "ols"

backtest:
  params:
    start_date: "2020-04-10"
    end_date: "2025-04-10"
    capital: 10000
    slippage_bps: 5
    commission_per_trade: 1

Soon I'll update with more actual strategies once I iron that out!

Comments

No comments yet.