Installation¶
Install the library from the Python Package Index with pipenv
.
pipenv install court-scraper
Upon installation, you should have access to the court-scraper
tool on the command line. Use the
--help
flag to view available sub-commands:
court-scraper --help
Note
See the usage docs for details on using court-scraper on the command line and in custom scripts.
Default cache directory¶
By default, files downloaded by the command-line tool will be saved to the .court-scraper
folder
in the user’s home directory.
On Linux/Mac systems, this will be ~/.court-scraper/
.
Customize cache directory¶
To use an alternate cache directory, set the below environment variable
(e.g. in a ~/.bashrc
or ~/.bash_profile
configuration file):
export COURT_SCRAPER_DIR=/tmp/some_other_dir
Configuration¶
Many court sites require user credentials to log in or present
CAPTCHAs that must be handled
using a paid, third-party service (court-scraper
uses Anti-captcha).
Sensitive information such as user logins and the API key
for a CAPTCHA service should be stored in a YAML configuration file called config.yaml
.
This file is expected to live inside the default storage location for scraped files, logs, etc.
On Linux/Mac, the default location is ~/.court-scraper/config.yaml
.
This configuration file must contain credentials for each
location based on a Place ID, which is a snake_case
combination of state and county (e.g. ga_dekalb
for Dekalb County, GA).
Courts with a common software platform that allow sharing of credentials can inherit credentials from a single entry.
Here’s an example configuration file:
# ~/.court-scraper/config.yaml
captcha_service_api_key: 'YOUR_ANTICAPTCHA_KEY'
platforms:
# Mark a platform user/pass for reuse in multiple sites
odyssey_site: &ODYSSEY_SITE
username: 'user@example.com'
password: 'SECRET_PASS'
# Inherit platform credentials across multiple courts
ga_chatham: *ODYSSEY_SITE
ga_dekalb: *ODYSSEY_SITE
ga_fulton: *ODYSSEY_SITE
# Or simply set site-specific attributes
ny_westchester:
username: 'user2@example.com'
password: 'GREAT_PASSWORD'
CAPTCHA-protected sites¶
court-scraper
uses the Anti-captcha service to handle sites
protected by CAPTCHAs.
If you plan to scrape a CAPTCHA-protected site, register with the Anti-captcha service and obtain an API key.
Then, add your API key to your local court-scraper configuration file as shown below:
# ~/.court-scraper/config.yaml
captcha_service_api_key: 'YOUR_API_KEY'
Once configured, you should be able to query CAPTCHA-protected sites currently supported by court-scraper
.