Usage

court-scraper provides a command-line tool and underlying Python library that can be used to scrape data about court cases. The command-line tool supports basic search by case numbers. The Python library offers a wider range of options for more advanced use cases (e.g. filtering search results by case type).

Our project focuses on scraping data from platforms used by county-level courts. These platforms vary in features. Some only offer basic search by party or case number, whereas others support advanced search by one or more parameters such as date range and case type.

Wherever possible, court-scraper attempts to provide support for search by:

  • date range - to enable automated discovery of new cases and backfilling of previous cases

  • case type - to enable more targeted scrapes in combination with date range

  • case number - to enable ongoing updates of open cases

The library is currently focused on acquring raw file artifacts (e.g. HTML and JSON files containing case data). court-scraper does not automate the extraction and standardization of data from these raw files.

Note

We hope to eventually provide tools to help with data extraction and standardization. However, due to the wide variability of case types even within a single platform, this effort remains on our long-term roadmap. We welcome contributions on this front!

Find a court to scrape

Before you can start scraping court records, you must first pinpoint a county of interest and check whether we currently support it.

Use the command-line tool’s info sub-command to list currently supported counties.

If you don’t see the state or county you’re targeting, it’s worth checking out our Issue tracker to see if it’s on the roadmap. In some cases, we may be actively working on adding support for your jurisdiction. We also have a stable of scrapers that were written by others for project-specific purposes and contributed to our project for integration into our more general framework. We can provide access to these “one-off” scrapers for your customization, even if we have not yet integrated them into court-scraper.

Place IDs

court-scraper requires searches to target courts/jurisdictions in specific counties. Every jurisdiction supported by the framework has a so-called Place ID. These unique identifiers are in “snake case” format (i.e. lower case with underscores): <state_postal>_<county_name>.

For example, the Place ID for Tulsa, Oklahoma is ok_tulsa.

Whether working with the Command line or Custom scripts, you’ll need to identify the Place ID for the target jurisdiction. You can use the command-line tool’s info sub-command to find the Place ID for your jurisdiction.

Command line

Note

Before using the command-line tool, check out the install docs and read up on finding a court site to scrape.

The command-line tool helps pinpoint counties currently supported by court-scraper and enables scraping case files by number.

Use the --help flag to view available sub-commands:

court-scraper --help

Info command

The info sub-command lists the currently supported counties:

court-scraper info

Note

See find a site for advice if your jurisdiction is not among those listed.

Browser mode

Scrapers that use Selenium to drive a web browser by default run in “headless” mode (i.e. the browser will not run visibly). In order to run a Selenium-based scraper with the browser, which can be helpful for debugging, use the --with-browser flag:

court-scraper search --with-browser --place-id wi_green_lake --case-number 2021CV000055

File storage

Files scraped by the search sub-command are saved to a standard – but configurable – location in the user’s home directory, based on the court’s Place ID (~/.court-scraper/cache/<place_id> on Linux/Mac).

For example, HTML files scraped for Tulsa, Oklahoma are stored in ~/.court-scraper/cache/ok_tulsa.

Metadata db

The search sub-command stores basic metadata about scraped cases in a SQLite database located in the standard cache directory: ~/.court-scraper/cases.db.

The database can be helpful for quickly checking which cases have been scraped.

It stores the following fields:

  • created (datetime) - The date and time of the case was initially scraped.

  • updated (datetime) - The date and time of last scrape for the case.

  • place_id (str) - The state postal and county name in “snake case” (e.g. ok_tulsa).

  • number (str) - The case number.

  • filing_date (date) - The filing date of the case (if available).

  • status (str) - Case status (if available).

Custom scripts

court-scraper provides an importable Python package for users who are comfortable creating their own scripts. The Python package provides access to a wider variety of features for added flexibility and more advanced scenarios such as searching by date and filtering by case type.

Note

Court websites offer different search functionality, so it’s important to review the site and its corresponding Site class (and search methods) in this library to get a sense of supported features.

Scrape case details by number

Once you install court-scraper and find a site to scrape, you’re ready to begin using the court_scraper Python package.

Create an instance of Site by passing it the Place ID for the jurisdiction. Then call the search method with one or more case numbers:

from court_scraper import Site
site = Site('ok_tulsa')
case_numbers=['CJ-2021-1904', 'CJ-2021-1905']
results = site.search(case_numbers=case_numbers)

Note

Site provides a generic interface to simplify import and configuration of platform-specific Site classes, such as court_scraper.platforms.oscn.site.Site. Platform Site classes typically have varying options for initialization and search, so it’s a good idea to review their options when using this generic Site class.

Scrape by date

Some court sites support date-based search. In such cases, you can use the platform’s Site.search_by_date method to scrape data for one or more days.

By default, search_by_date only gathers case metadata (e.g. case numbers, filing dates, status, etc.) that typically appear on a results page after performing a search.

Note

See below for details on scraping case detail file artifacts (e.g. HTML, JSON, etc.).

To scrape case metadata for the current day:

from court_scraper import Site
site = Site('ok_tulsa')
results = site.search_by_date()

To search a range of dates, use the start_date and end_date arguments. Their values must be strings of the form YYYY-MM-DD. The below code scrapes metadata for cases filed in Tulsa, Oklahoma during January 2021:

from court_scraper import Site
site = Site('ok_tulsa')
results = site.search_by_date(start_date='2021-01-01', end_date='2021-01-31')

Scrape case details

Court sites typically provide more detailed case information on separate pages devoted to a case. Depending on the site, these pages can include:

  • Case type

  • Case status

  • Litigant information (i.e. names and addresses)

  • Judge name(s)

  • Events related to the case (e.g. filings and decisions)

Links to case detail pages are listed on a results page after conducting a search. These are typically HTML, but may be JSON or other file formats depending on the site.

By default, search_by_date only scrapes metadata from search results pages (as described in Scrape by date).

To scrape case detail files, pass the case_details=True keyword argument:

from court_scraper import Site
site = Site('ok_tulsa')
results = site.search_by_date(
    start_date='2021-01-01',
    end_date='2021-01-31',
    case_details=True # Fetches case detail files
)

Filter by case type

Some court sites support a variety of parameters for more targeted filtering of search results. These filters can be useful for more surgical scrapes, and in scenarios where a site truncates results. If a site limits search results to 500 records, for example, scraping in a more targeted way with filters can help stay under that cap.

To determine if a site supports case-type filtering, you should review the court’s website and the corresponding Site class in court-scraper.

For example, the Wisconsin court system’s Advanced Search page offers a variety of additional search parameters. In court-scraper, the site’s corresponding search_by_date method supports a case_types argument that accepts a list of one or more case types.

Note

For Wisconsin, these case types are two-letter, upper-case codes that can be found by examining the source code for the Case types select menu on the Advanced Search page.

Here’s a sample usage that searches for civil (CV) and small claims (SC) cases on July 1, 2021 in Milwaukee, WI:

from court_scraper import Site
site = Site('wi_milwaukee')
results = site.search_by_date(
    start_date='2021-07-01',
    end_date='2021-07-01',
    case_types=['CV', 'SC'] # Civil and Small Claims case types
)