Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add header inspection tools #79

Merged
merged 4 commits into from
Feb 4, 2025

Conversation

tharropoulos
Copy link
Contributor

Change Summary

What is this?

This PR adds introduces tools for debugging request headers. It helps users authenticate against private documentation sites and verify that auth headers are being sent correctly.

Changes

Added Features:

  1. New Middleware in header_inspector_middleware.py:

    • HeaderInspectionMiddleware: Debug middleware that logs outgoing request URLs and headers
    • process_request(): Logs request details at debug level for header inspection
    • Auto-registers with crawler through from_crawler() class method
  2. New Tests in auth_test.py:

    • test_spider_auth_attributes(): Verifies basic auth configuration
    • Added fixtures for config and environment variables testing

Code Changes:

  1. In documentation_spider.py:

    • Reorganized imports for better readability
    • Enhanced environment variable handling for auth credentials
    • Integrated with Scrapy's HttpAuthMiddleware through spider attributes:
      • http_user
      • http_pass
      • http_auth_domain
  2. In index.py:

    • Added HeaderInspectionMiddleware to downloader middleware chain
    • Set middleware priority to 901 (after custom downloader middleware)
    • Removed unused Algolia settings import

Demo

To test basic authentication:

  1. Set environment variables:
    export DOCSEARCH_BASICAUTH_USERNAME=your_username
    export DOCSEARCH_BASICAUTH_PASSWORD=your_password
    export DOCSEARCH_AUTH_DOMAIN=your_domain
  2. Enable debug logging to verify headers:
    'LOG_LEVEL': 'DEBUG'

PR Checklist

- Add tests to verify spider's basic auth credentials setup
- Add pytest fixtures for config and environment variables
- Verify http_user, http_pass and http_auth_domain attributes
- Add new `HeaderInspectionMiddleware` for debugging request headers
- Register middleware in crawler pipeline with priority 901
- Add debug logging for outgoing request URLs and headers
@jasonbosco jasonbosco merged commit 678d77b into typesense:master Feb 4, 2025
1 check passed
@jasonbosco
Copy link
Member

Available in 0.12.0.rc6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants