Skip to content

SWE project built for a biotech hedgefund to automate tracking of clinical trial activity and patient sentiment across online sources, integrating web scraping, APIs, and email alerts. Used daily by the analyst team to accelerate investment research.

License

Notifications You must be signed in to change notification settings

AlbertMT8/ctp-tracker

Repository files navigation

CTP Tracker

Automated pharmaceutical intelligence platform that monitors Reddit discussions and clinical trial changes for drug development insights.

Python License

Overview

CTP Tracker is a comprehensive pharmaceutical intelligence platform designed to automate the monitoring of drug-related discussions and clinical trial developments. The system addresses the critical need for real-time insights in pharmaceutical research and investment decision-making.

Problem: Manual monitoring of drug mentions across social media and clinical trial databases is time-consuming, error-prone, and often misses critical developments that could impact investment decisions or regulatory timelines.

Solution: An automated monitoring system that continuously scans Reddit for drug discussions and tracks clinical trial protocol changes, providing stakeholders with timely, actionable intelligence through email alerts.

Outcomes:

  • Real-time detection of drug mentions across 50+ medical subreddits
  • Automated tracking of clinical trial protocol version changes
  • Consolidated email alerts with direct links to source material
  • Persistent tracking to avoid duplicate notifications
  • Configurable drug watchlists with date-based filtering

Demo

Sample Email Alert Output

Subject: Reddit Monitoring Alerts (3 new posts)

Syfovre: New treatment for macular degeneration - https://reddit.com/r/medicine/comments/...
Ozempic: Weight loss discussion in diabetes community - https://reddit.com/r/diabetes/comments/...
Keytruda: Cancer treatment updates - https://reddit.com/r/cancer/comments/...

Sample Trial Change Alert

Subject: Daily New Trials & Trial Changes Alert

| Drug | NCT ID | Title | Changes | Compare Link |
|------|--------|-------|---------|--------------|
| Syfovre | NCT06394674 | Study of SYFOVRE... | Updated inclusion criteria | [Compare changes] |

Architecture

flowchart LR
    User[Stakeholders] --> Email[Email Alerts]
    Email --> Brevo[Brevo API]
    
    subgraph "Monitoring System"
        Reddit[Reddit Monitor] --> PRAW[PRAW API]
        Trials[Trial Monitor] --> CT[ClinicalTrials.gov API]
        Trials --> Playwright[Web Scraping]
    end
    
    subgraph "Data Sources"
        PRAW --> RedditData[Reddit Posts]
        CT --> TrialData[Trial Metadata]
        Playwright --> VersionData[Protocol Versions]
    end
    
    subgraph "Storage"
        RedditData --> JSON[redditIDs.json]
        VersionData --> State[trial_versions.json]
        Config[watchlist.xlsx] --> Drugs[Drug Watchlist]
    end
    
    subgraph "Processing"
        Drugs --> Reddit
        Drugs --> Trials
        Reddit --> Email
        Trials --> Email
    end
Loading

πŸš€ Quickstart

Prerequisites

  • Python 3.8+
  • Reddit API credentials (client_id, client_secret, user_agent)
  • Brevo API key for email notifications
  • Playwright for web scraping (auto-installed)

Installation

  1. Clone the repository

    git clone https://github.com/username/ctp-tracker.git
    cd ctp-tracker
  2. Install dependencies

    pip install praw pandas sib_api_v3_sdk python-dotenv playwright beautifulsoup4 requests
    playwright install chromium
  3. Set up environment variables

    cp .env.example .env
    # Edit .env with your API credentials
  4. Configure watchlist

    # Create watchlist.xlsx with required sheets:
    # - SocialMedia: Drug_Name, Date_added
    # - TrialChange: Drug_Name, Date_added
  5. Run monitoring

    # Monitor Reddit mentions
    python redditMonitoring.py
    
    # Monitor trial changes
    python TrialAlert.py
    
    # Test email functionality
    python test_email.py

Minimal Example

from redditMonitoring import RedditMonitor

# Initialize monitor
monitor = RedditMonitor()

# Run monitoring for past week
results = monitor.monitor_all_drugs(time_filter='week', limit_per_search=10)

# Print results
monitor.print_results(results)

βš™οΈ Configuration

Environment Variables (.env)

# Reddit API (read-only access)
REDDIT_CLIENT_ID=your_client_id
REDDIT_CLIENT_SECRET=your_client_secret
REDDIT_USER_AGENT=DrugMonitor/1.0 by YourUsername

# Email notifications
BREVO_API_KEY=your_brevo_api_key

Watchlist Configuration (watchlist.xlsx)

SocialMedia Sheet - For Reddit monitoring:

Drug_Name Date_added
Syfovre 2024-01-15
Ozempic 2024-01-10

TrialChange Sheet - For clinical trial monitoring:

Drug_Name Date_added
Syfovre 2024-01-15
Keytruda 2024-01-10

Data Files

  • redditIDs.json - Persistent storage of seen Reddit post IDs
  • trial_versions.json - Highest protocol version seen for each NCT ID
  • watchlist.xlsx - Drug watchlist with date filters

πŸ“ Project Layout

ctp-tracker/
β”œβ”€β”€ redditMonitoring.py      # Reddit mention monitoring
β”œβ”€β”€ TrialAlert.py           # Clinical trial change tracking
β”œβ”€β”€ watchlist.py            # Excel watchlist parser
β”œβ”€β”€ test_email.py           # Email functionality testing
β”œβ”€β”€ watchlist.xlsx          # Drug configuration (not in repo)
β”œβ”€β”€ redditIDs.json          # Reddit post tracking (auto-generated)
β”œβ”€β”€ trial_versions.json     # Trial version tracking (auto-generated)
β”œβ”€β”€ .env                    # Environment variables (not in repo)
β”œβ”€β”€ .gitignore             # Git ignore rules
β”œβ”€β”€ docs/
β”‚   └── img/               # Documentation images
└── README.md              # This file

πŸ› οΈ Tech Stack

Core Libraries

  • PRAW - Reddit API wrapper for read-only access
  • Pandas - Excel file processing and data manipulation
  • Playwright - Web scraping for clinical trial version pages
  • BeautifulSoup4 - HTML parsing for trial data extraction
  • Requests - HTTP client for ClinicalTrials.gov API

External APIs

  • Reddit API - Social media monitoring (read-only)
  • ClinicalTrials.gov API - Trial metadata retrieval
  • Brevo (Sendinblue) - Transactional email delivery

Data Storage

  • JSON - Persistent state tracking
  • Excel - Configurable drug watchlists
  • CSV - Archival data export

Testing

Run Tests

# Test email functionality
python test_email.py

# Test Reddit monitoring (dry run)
python redditMonitoring.py

# Test trial monitoring
python TrialAlert.py

Code Quality

  • Logging - Comprehensive logging throughout all modules
  • Error Handling - Graceful degradation for API failures
  • Rate Limiting - Respectful API usage with built-in delays

Deployment

Local Development

# Set up virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run monitoring scripts
python redditMonitoring.py

Production Considerations

  • Cron Jobs - Schedule regular monitoring runs
  • Docker - Containerize for consistent deployment
  • Monitoring - Add health checks and alerting
  • Backup - Regular backup of state files

πŸ—ΊοΈ Roadmap

Possible Future Additional Features

  • Google Sheets Integration - Replace Excel with cloud-based watchlists
  • Web Dashboard - Real-time monitoring interface
  • Advanced Filtering - Sentiment analysis and relevance scoring
  • Multiple Social Platforms - Twitter, LinkedIn monitoring
  • Machine Learning - Automated drug mention classification
  • API Endpoints - REST API for external integrations
  • Database Backend - Replace JSON files with proper database
  • Alert Customization - Configurable notification preferences

Technical Improvements

  • Async Processing - Improve performance with async/await
  • Caching Layer - Reduce API calls with intelligent caching
  • Unit Tests - Comprehensive test coverage
  • CI/CD Pipeline - Automated testing and deployment
  • Documentation - API documentation and user guides

Contributing

I welcome contributions! Please follow these guidelines:

Development Setup

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes with proper logging and error handling
  4. Test thoroughly: python test_email.py
  5. Commit with descriptive messages: git commit -m 'Add amazing feature'
  6. Push to your branch: git push origin feature/amazing-feature
  7. Open a Pull Request

Code Style

  • Follow PEP 8 Python style guidelines
  • Add comprehensive docstrings for all functions
  • Include logging statements for debugging
  • Handle exceptions gracefully
  • Add type hints where appropriate

Issue Reporting

  • Use descriptive issue titles
  • Include error logs and stack traces
  • Specify Python version and environment
  • Provide steps to reproduce issues

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Data Sources

  • Reddit - Social media platform for drug discussions
  • ClinicalTrials.gov - Clinical trial registry and database
  • Brevo - Email delivery service

Open Source Libraries

  • PRAW - Reddit API wrapper
  • Pandas - Data manipulation library
  • Playwright - Web automation framework
  • BeautifulSoup4 - HTML parsing library

Research & Development

  • Octagon Invest - Pharmaceutical investment research
  • Medical Community - Reddit medical subreddits and contributors

Note: This system is designed for research and investment intelligence purposes. Always verify information through official sources before making any decisions.

About

SWE project built for a biotech hedgefund to automate tracking of clinical trial activity and patient sentiment across online sources, integrating web scraping, APIs, and email alerts. Used daily by the analyst team to accelerate investment research.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published