Currently @ DXT Commodities · Stamford, CT

MD AMIR KHAN

Agentic AI Engineer & Quantitative Analyst

AI Engineer and Quantitative Analyst at DXT Commodities building intelligent systems for LNG and natural gas markets. I design and deploy LLM-powered pipelines, automated market intelligence tools, and production ML models — combined with deep quantitative work in statistical modeling, energy market forecasting, and portfolio research. I turn raw market data into real-time trading decisions.

MD Amir Khan – Agentic AI Engineer & Quantitative Analyst
DXT
Stevens
NSU

About Me

I am an AI Engineer and Quantitative Analyst at DXT Commodities (Stamford, CT), where I build production-grade AI systems and quantitative models for LNG and natural gas markets. On the AI engineering side, I architect LLM-powered notice parsing pipelines, automated web scraping and alerting systems, FastAPI backends, and full-stack market intelligence tools deployed on AWS EC2. On the quant side, I build statistical production models (OLS/Ridge regression), energy market forecasting frameworks, and supply-demand balance systems that feed directly into trading desk decisions.

My AI engineering work at DXT includes an LLM-based pipeline that parses unstructured pipeline maintenance notices for capacity impact extraction, a real-time EPNG scraper pushing Force Majeure alerts to Microsoft Teams within minutes of posting, and a full-stack Pipeline Maintenance Calendar covering 23 interstate operators across 6 portal systems with SQL Server storage and Power BI dashboards. My quantitative work includes a US LNG feed gas forecasting system covering 126 export trains achieving <3% MAPE on an 8-month holdout, and a Permian Basin production estimation model bridging the 2-month EIA reporting lag.

Alongside industry work, I contribute to academic research at Stevens Institute of Technology on Robust PCA and Dynamic Factor Portfolios, and have built and validated ML models (LightGBM, neural networks, Ridge) for options pricing, auction price prediction, and credit risk across MBS and structured products.

Core expertise:

  • AI Engineering: LLM pipelines, agentic systems, FastAPI, automated scraping & alerting, AWS EC2
  • Quantitative Analysis: Statistical modeling (OLS/Ridge), time series, energy market forecasting, factor modeling
  • Machine Learning: LightGBM, neural networks, ensemble methods, cross-validation frameworks
  • Energy Markets: LNG, pipeline capacity, Waha/Henry Hub basis, EIA/Genscape/FERC data
  • Quantitative Finance: Portfolio optimization, risk modeling (VaR/CVaR), robust PCA, credit risk

3+

Years Experience

10+

Projects Completed

Education

Master's in Financial Engineering & Analytics

Stevens Institute of Technology

2024 - 2025

Focus: Quantitative Finance, Algorithmic Trading, Risk Analytics, Portfolio Optimization

Stochastic Calculus for Financial Eng. Applied Probability & Statistics in Finance Advanced Financial Risk Analytics & Derivatives Machine Learning in Finance Pricing & Hedging Computational Methods in Finance Portfolio Theory & Applications Market Microstructure Algorithmic Trading Strategies Design Patterns & Derivative Pricing in C++ Optimization in Finance

Bachelor in Business Administration

North South University (NSU)

2018 - 2022

Major: Finance | Minor: Mathematics

Key Courses: Calculus, Linear Algebra, Differential Equations, Corporate Finance, Investment Theory, Financial Derivatives, Applied Statistics

Latest News

New Role

Joined DXT Commodities as Agentic AI Engineer & Quantitative Analyst

Started full-time at DXT Commodities (Stamford, CT) in March 2026, working on LNG and natural gas market intelligence. Building Agentic RAG systems for pipeline notice parsing, LNG feed gas forecasting pipelines, and automated trading desk alerting infrastructure.

Course Completed

Advanced RAG (Retrieval-Augmented Generation) — May 2026

Completed a 10-module Advanced RAG course covering the full LLM pipeline stack — embeddings, vector stores, hybrid retrieval, HyDE, CRAG, Self-RAG, Graph RAG, Agentic RAG with LangGraph, and RAGAS evaluation. Directly applied to production AI systems at DXT Commodities.

Research Paper

Towards a Robust PCA and Dynamic Factor Portfolios Updating

Working on a research paper with Professor Papa Momar NDIAYE. We propose a dynamic tracking algorithm that modifies classical PCA to reduce instability of risk levels and principal factors — improving timing of portfolio rebalancing and reducing transaction costs.

The algorithm ensures principal factors remain immune against perturbations on observations and stabilizes factors when updating the covariance matrix, with spectral conditions for detecting risk cluster changes that warrant a full reset of the tracking process.

Experience

DXT Commodities

Mar 2026 – Present  ·  Full-time  ·  Stamford, CT (Hybrid)

Agentic AI Engineer

  • Designed and built an Agentic RAG system using LangChain and LangGraph to parse unstructured pipeline maintenance notices across 6 operator portal formats — architecting the full pipeline from document ingestion and chunking strategy through embedding model selection, vector store indexing, hybrid retrieval (dense + BM25), and LLM-based structured field extraction
  • Engineered prompt templates and structured output schemas for capacity impact extraction — iterating on chain-of-thought prompting and output parsers to reliably extract operator, affected capacity (MMcf/d), duration, and constraint type from free-text notices with varying formats
  • Implemented hybrid retrieval with contextual compression (dense embeddings + BM25 ensemble) and MMR re-ranking over a vector store of historical pipeline notices, improving extraction accuracy on ambiguous capacity constraint language by reducing irrelevant context passed to the LLM
  • Built evaluation harness using RAGAS-style metrics (faithfulness, answer relevancy, context precision) to benchmark RAG pipeline quality across notice types and operator formats, enabling systematic prompt and retrieval tuning
  • Deployed full-stack AI pipeline on AWS EC2 — FastAPI backend exposing RAG extraction endpoints, SQL Server persistence layer for parsed notices, and Power BI dashboard for trading desk consumption — covering 23 interstate pipeline operators

Quantitative Analyst – Market Fundamentals (LNG & Power)

  • Designed and built a Permian Basin gas market intelligence system modeling daily production (~22–25 Bcf/d), pipeline egress capacity across 4 major interstate pipelines, and supply-demand balance to predict Waha basis pricing — directly supporting trading desk decisions
  • Developed a real-time production estimation model using Genscape pipeline scrape data and EIA 914 reports, training a scaling model (OLS/Ridge regression) with R² and MAE validation to bridge the 2-month EIA reporting lag
  • Built a US LNG feed gas forecasting pipeline covering 126 active export trains across 13 terminals (~22,600 MMcf/d nameplate), with 3-model validation framework achieving <3% MAPE on 8-month out-of-sample holdout
  • Built an automated scraping and alerting system for EPNG critical notices — polling Kinder Morgan's EBB portal every 5 minutes, deduplicating notices, and pushing real-time Force Majeure and maintenance alerts to Microsoft Teams via webhook, enabling the trading desk to react within minutes of posting
  • Engineered end-to-end quantitative pipelines in Python and SQL — from raw data ingestion (Genscape, EIA, FERC bulletin boards) through statistical modeling, cross-validation, and automated forecast output — replacing manual workflows across production estimation, capacity tracking, and pricing analysis
Python LangChain LangGraph Agentic RAG Vector Stores Hybrid Search FastAPI AWS EC2 SQL OLS/Ridge Regression LNG Forecasting Pipeline Capacity Analysis EIA/Genscape/FERC Power BI

Stevens Institute of Technology

Jan 2024 – Present  ·  Part-time  ·  Hoboken, NJ (Hybrid)

Quantitative Portfolio Research Analyst

  • Contributing to research on Robust PCA for dynamic factor portfolio updates — improving stability of risk factors and cutting rebalancing costs by 15%
  • Implemented Python-based simulations using corporate bond spread data to study eigenvalue grouping, factor smoothness, and covariance structure robustness
  • Developed algorithms for tracking dynamic factors and detecting data structure ruptures to enhance portfolio timing decisions
  • Developed and validated ML models (Ridge regression, LightGBM, neural networks) for predicting short-term auction price movements, option pricing, and credit default probabilities
  • Researched and implemented advanced risk models including VaR, CVaR, stress testing, and credit risk modeling for MBS, corporate loans, and structured products — covering PD, LGD, and EAD
  • Built and analyzed models for mortgage and loan portfolio performance, incorporating prepayment risk, duration/convexity analysis, and sensitivity to macroeconomic variables
  • Integrated risk parity, covariance shrinkage, and multi-asset volatility estimation techniques to enhance portfolio resilience and improve Sharpe ratios under varying market regimes
Python Robust PCA Machine Learning Risk Modeling VaR/CVaR LightGBM Portfolio Optimization Credit Risk

Projects

Bond Portfolio Optimization and Immunization

August 2025

Comprehensive bond portfolio management system combining quantitative finance with data engineering. Implements duration matching, convexity adjustments, and immunization strategies using real-time data pipelines, automated risk calculations, and scalable portfolio optimization algorithms for fixed income portfolios.

Python Fixed Income Duration Matching Immunization Interest Rate Risk

Vasicek Bond Pricing Model - Monte Carlo, PDE & Analytical

July 2025

Comprehensive implementation of the Vasicek interest rate model featuring three pricing approaches: analytical solutions, Monte Carlo simulations, and PDE finite difference methods for zero-coupon bonds.

Jupyter Notebook Vasicek Model Monte Carlo PDE Analytical Solutions

Portfolio Optimization

July 2025

Strategic asset allocation framework using modern portfolio theory, risk parity, and advanced optimization techniques with Riskfolio-Lib for multi-asset portfolio construction.

Jupyter Notebook Portfolio Theory Riskfolio-Lib Risk Parity Asset Allocation

Vasicek Bond Pricing and Kalman Filtering

June 2025

Multi-method fixed income modeling combining Vasicek interest rate dynamics with Kalman filtering for parameter estimation and state variable tracking in bond pricing applications.

Jupyter Notebook Kalman Filter Fixed Income Parameter Estimation State Space Models

Data Science Projects

June 2025

Collection of data science applications in finance including statistical analysis, machine learning models, and data visualization for financial time series and market data.

Jupyter Notebook Data Science Statistical Analysis Machine Learning Financial Data

Trading Strategy Based on MACD Signals

June 2025

Technical analysis-driven trading strategy using MACD (Moving Average Convergence Divergence) indicators for signal generation, backtesting, and performance evaluation.

Jupyter Notebook Technical Analysis MACD Trading Signals Backtesting

Cryptocurrency Forecasting Using ARIMA

June 2025

Time series forecasting application for cryptocurrency price prediction using ARIMA models, stationarity testing, and model selection for optimal forecasting accuracy.

Jupyter Notebook ARIMA Time Series Cryptocurrency Forecasting

Stock Price Prediction and Trading Strategy Using LSTM

March 2025

Deep learning approach to stock price prediction using LSTM neural networks, combined with algorithmic trading strategy development and performance backtesting.

Jupyter Notebook LSTM Deep Learning Stock Prediction Neural Networks

Stock Brokerage System Low Level Design

February 2025

High-performance stock brokerage system architecture featuring order matching engine, portfolio management, and real-time market data processing.

C++ System Design Order Matching Low Latency Trading Systems

Option Pricing Models

February 2025

Comprehensive options pricing library implementing Black-Scholes, binomial trees, and Monte Carlo methods for European and American options valuation with Greeks calculation.

Jupyter Notebook Options Pricing Black-Scholes Binomial Trees Greeks

SPY Momentum Alpha Backtesting

February 2025

High-frequency momentum trading strategy combining data engineering and quantitative finance. Built robust data pipelines processing 2 years of SPY tick data from Polygon API, implemented real-time signal generation, and achieved 79% total return with comprehensive performance analytics and automated backtesting frameworks.

Jupyter Notebook Momentum Trading Polygon API High Frequency Backtesting

Pairs Trading Strategy

February 2025

Statistical arbitrage strategy using cointegration analysis and mean reversion. Employed Euclidean distance method for pair selection with z-score based entry/exit signals.

Jupyter Notebook Pairs Trading Cointegration Statistical Arbitrage Mean Reversion

Options Pricing Using Machine Learning

September 2024

Advanced machine learning approach to options pricing combining deep learning with financial engineering. Implemented neural networks, random forests, and ensemble methods with automated feature engineering, model validation pipelines, and real-time pricing systems that outperformed traditional Black-Scholes pricing in complex market conditions.

Jupyter Notebook Machine Learning Neural Networks Options Pricing Ensemble Methods

Market Analytics Web Application

August 2024

Full-stack data science application for comprehensive market analysis. Built interactive web application with real-time data ingestion pipelines, advanced data visualization dashboards, automated technical indicator calculations, and machine learning-powered trading signal generation with scalable cloud deployment.

Jupyter Notebook Web Application Market Analysis Data Visualization Technical Indicators

Activities & Awards

Student Membership

CFA Society New York

Student member, actively engaged in professional events

Open Source Contribution

Riskfolio-Lib

Contributing to a leading Python library for portfolio optimization and risk management

Competition Participation

WorldQuant's 2023 International Quant Championship

Competed in crafting & testing advanced trading strategies

Certifications & Licenses

Advanced RAG

Advanced RAG (Retrieval-Augmented Generation)

Self-paced Course

May 2026

10-module course covering the full RAG stack — from document processing, embeddings, and vector stores through advanced retrieval techniques (HyDE, CRAG, Self-RAG, Graph RAG), Agentic RAG with LangGraph, and production deployment. Directly applicable to LLM-powered pipelines at DXT Commodities.

LangChain LangGraph Vector Stores Embeddings Agentic RAG RAGAS Evaluation HyDE / CRAG / Self-RAG Hybrid Search
udemy

Complete Algorithmic Trading Course with Python, ChatGPT, ML

Udemy

July 2025

Comprehensive algorithmic trading course covering Python programming, machine learning integration, and ChatGPT applications for automated trading strategies.

Algorithmic Trading Python Machine Learning ChatGPT

Akuna Capital Options 101

Akuna Capital

July 2025

Professional options trading course from leading market maker covering payoff diagrams, volatility, Greeks, and market-making fundamentals.

Options Trading Greeks Volatility Market Making
udemy

Complete Data Science, Machine Learning, DL NLP Bootcamp

Udemy

July 2025

Comprehensive bootcamp covering data science fundamentals, machine learning algorithms, deep learning, and natural language processing applications.

Data Science Machine Learning Deep Learning NLP
udemy

Taking Python to Production: Professional Onboarding Guide

Udemy

July 2025

Advanced Python course focusing on production deployment, best practices, and professional development workflows for enterprise applications.

Python Production DevOps Enterprise
udemy

The Ultimate JSON With Python Course + JSONSchema & JSONPath

Udemy

July 2025

Comprehensive JSON handling in Python including schema validation, path queries, and advanced data manipulation techniques.

JSON Python JSONSchema Data Processing
udemy

Master Time Series Analysis and Forecasting with Python 2025

Udemy

June 2025

Advanced time series analysis covering ARIMA, SARIMA, Prophet, LSTM, and modern forecasting techniques for financial and business applications.

Time Series ARIMA Prophet LSTM
udemy

Manage Finance Data with Python & Pandas: Unique Masterclass

Udemy

July 2025

Specialized course on financial data management and analysis using Python and Pandas for quantitative finance applications.

Pandas Financial Data Data Management Quantitative Analysis
udemy

Master Regression & Prediction with Pandas and Python [2025]

Udemy

July 2025

Advanced regression analysis and prediction modeling using Python and Pandas for financial and statistical applications.

Regression Analysis Prediction Models Pandas Statistical Analysis
udemy

Mathematics-Basics to Advanced for Data Science And GenAI

Udemy

July 2025

Comprehensive mathematics foundation covering linear algebra, calculus, probability, and statistics for data science and AI applications.

Linear Algebra Calculus Probability Statistics
udemy

Python Object Oriented Programming (OOP): Beginner to Pro

Udemy

July 2025

Advanced Python OOP concepts including inheritance, polymorphism, design patterns, and enterprise-level programming practices.

Python OOP Design Patterns Inheritance Polymorphism
udemy

The Complete SQL Bootcamp (30 Hours): Go from Zero to Hero

Udemy

July 2025

Comprehensive SQL training covering database design, complex queries, optimization, and real-world database management scenarios.

SQL Database Design Query Optimization Data Management
udemy

Fixed Income Analytics: Pricing and Risk Management

Udemy

July 2025

Specialized fixed income course covering bond pricing, yield curve analysis, duration, convexity, and interest rate risk management.

Fixed Income Bond Pricing Yield Curves Risk Management
udemy

Learn Python Requests

Udemy

July 2025

Specialized Python course focusing on HTTP requests, API integration, and web scraping for financial data collection.

Python Requests API Integration Web Scraping Data Collection
udemy

The Ultimate Pandas Bootcamp: Advanced Python Data Analysis

Udemy

July 2025

Advanced Pandas mastery for complex data manipulation, analysis, and visualization in financial and business contexts.

Pandas Data Analysis Data Manipulation Financial Analysis
udemy

FastAPI - The Complete Course 2025 (Beginner + Advanced)

Udemy

July 2025

Modern Python web framework for building high-performance APIs, essential for financial data services and algorithmic trading platforms.

FastAPI REST APIs Web Development Python
udemy

Mathematical Foundations of Machine Learning

Udemy

2025

Deep mathematical foundations covering linear algebra, partial derivatives, calculus, and probability theory for advanced machine learning applications.

Linear Algebra Calculus Probability Machine Learning
udemy

Python Data Analysis: NumPy & Pandas Masterclass

Udemy

2025

Advanced data analysis techniques using NumPy and Pandas for quantitative finance and statistical computing applications.

NumPy Pandas Data Analysis Statistical Computing
edX

GSX Verified Certificate for Probability - The Science of Uncertainty and Data

MIT / edX

December 2022

Rigorous probability theory course covering uncertainty quantification, statistical inference, and data analysis fundamentals from MIT.

Probability Theory Statistical Inference Uncertainty Data Analysis
Coursera

Python and Statistics for Financial Analysis

Coursera

February 2022

Specialized course combining Python programming with statistical methods for financial data analysis and investment decision making.

Python Financial Statistics Investment Analysis Portfolio Management

Technical Skills

Programming Languages

Python
SQL

Data Science & ML Libraries

NumPy Pandas SciPy Scikit-learn PyTorch TensorFlow Keras Matplotlib Seaborn Plotly OpenBB Statsmodels Zipline PyFolio Riskfolio-Lib Vectorbt

Cloud & DevOps

AWS EC2 Google Cloud Azure Azure Data Factory Azure Synapse Analytics Azure Data Lake Storage (ADLS) Docker Kubernetes Jenkins GitHub Actions Terraform Ansible CI/CD Infrastructure as Code

AI Engineering & LLM

LangChain LangGraph RAG (Retrieval-Augmented Generation) Agentic RAG Vector Stores Embeddings Hybrid Search (BM25 + Dense) HyDE CRAG / Self-RAG Graph RAG RAGAS Evaluation LangSmith Prompt Engineering LLM Pipelines

Development & Tools

Jupyter Notebook Git GitHub VS Code FastAPI Streamlit Tableau Power BI Excel MLflow Kubeflow Apache Superset

Data Science & Analytics

Machine Learning Deep Learning Statistical Modeling Predictive Analytics Feature Engineering Model Deployment A/B Testing Data Visualization Time Series Analysis Natural Language Processing Computer Vision Recommendation Systems

Quantitative Finance

Portfolio Optimization Risk Management Algorithmic Trading Derivatives Pricing Backtesting VaR & CVaR Factor Modeling Market Microstructure Fixed Income Analytics Options Pricing Stress Testing Performance Attribution

Energy Markets & Infrastructure

LNG Feed Gas Forecasting Pipeline Capacity Analysis Waha / Henry Hub Basis Natural Gas Production Modeling EIA 914 Data Genscape Data FERC Bulletin Boards Kinder Morgan EBB Supply-Demand Balancing Force Majeure Monitoring MS Teams Webhooks LLM-Powered Notice Parsing

Get In Touch

Email

amir.khan@dxt.com

Phone

+1 (201) 234-7017

Location

Stamford, Connecticut, USA

LinkedIn

linkedin.com/in/amirkhan2317