MarketMind
completedA sentiment-driven market signal pipeline — financial news + social media in, predicted directional moves out. Built mostly to find out, the hard way, why it doesn't work.
the problem
If sentiment moved markets cleanly, every news desk in midtown would be a hedge fund. Sentiment does matter, but the signal is buried in a thick layer of noise: lagging articles, inconsistent terminology, retail crowds chasing things that institutions priced in days ago. I wanted to build the pipeline anyway — partly to learn the NLP plumbing, partly to develop the visceral feeling for why this is hard that you only get from staring at your own bad predictions.
the approach
End-to-end pipeline:
- Ingest — scrape financial news (Bloomberg, Reuters, MarketWatch) and social posts (Reddit, Twitter) on a per-ticker basis.
- Process — tokenize, run sentiment scoring (NLTK + spaCy + a fine-tuned classifier on financial-tone-labeled text), and extract topics.
- Correlate — train a model linking lagged sentiment to next-day directional moves; features include rolling sentiment, sentiment dispersion across sources, and volume of mentions.
- Backtest — walk-forward validation on historical price + sentiment data to be honest about how the model would have performed.
- Visualize — Flask dashboard with Plotly charts showing predictions, confidence, and sentiment trends.
a hard decision
The most informative moment was deciding not to deploy the model live. Backtesting showed the predictive edge collapsed under any reasonable transaction cost model — the model was technically right more often than chance, but not enough to cover slippage. The honest move was writing the failure into the project rather than papering over it. That's the lesson the project actually exists to teach.
what came out
A working pipeline, a clear story about why sentiment alone isn't enough, and an instinct for which factors might extend it (cross-asset signals, microstructure features, sentiment delta vs. baseline). Treat the repo as an educational artifact, not a trading system — the README says so explicitly.
stack
python · scikit-learn · tensorflow · nltk · spacy · pandas · numpy · flask · plotly