| Title: | Access and Process 'LOBSTER' High-Frequency Data |
|---|---|
| Description: | Provides tools to authenticate with 'LOBSTER' (Limit Order Book System - The Efficient Reconstruction, <https://app.lobsterdata.com/>), request, download, and process high-frequency limit order book data. Streamlines the end-to-end workflow from data request to analysis-ready datasets. For advanced high-frequency econometric analysis, see the 'highfrequency' package. |
| Authors: | Karol Kulma [aut], Stefan Voigt [aut, cre, cph] |
| Maintainer: | Stefan Voigt <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.0 |
| Built: | 2026-06-02 09:37:52 UTC |
| Source: | https://github.com/voigtstefan/lobster |
Fetches information about available datasets in your LOBSTER account archive. This includes details about symbols, date ranges, order book levels, file sizes, and download links for each available dataset.
account_archive(account)account_archive(account)
account |
list Output from |
The function navigates to the archive page of the authenticated session, scrapes the archive table, extracts download links, and returns a structured tibble. Only datasets with non-zero file sizes are included — datasets still being processed by LOBSTER will not appear until processing is complete.
A tibble with one row per available dataset and columns:
integer. Unique identifier for each dataset.
character. Stock/ETF ticker symbol (e.g., "SPY", "AAPL").
Date. First date of data coverage.
Date. Last date of data coverage.
integer. Order book depth (number of price levels).
integer. File size in bytes.
character. Direct download URL for the dataset.
Rows are ordered by id descending (most recently requested first).
Datasets with zero file size (not yet processed by LOBSTER) are excluded.
account_login(), data_download()
## Not run: acct <- account_login( login = Sys.getenv("LOBSTER_USER"), pwd = Sys.getenv("LOBSTER_PWD") ) archive <- account_archive(acct) archive #> # A tibble: 3 × 7 #> id symbol start_date end_date level size download #> <int> <chr> <date> <date> <int> <int> <chr> #> 1 102 AAPL 2023-01-03 2023-01-03 1 204800 https://… #> 2 101 MSFT 2023-01-03 2023-01-05 2 512000 https://… #> 3 100 SPY 2022-12-01 2022-12-31 10 1048576 https://… # Filter to a single symbol before downloading data_download( requested_data = archive[archive$symbol == "AAPL", ], account_login = acct, path = "data-lobster" ) ## End(Not run)## Not run: acct <- account_login( login = Sys.getenv("LOBSTER_USER"), pwd = Sys.getenv("LOBSTER_PWD") ) archive <- account_archive(acct) archive #> # A tibble: 3 × 7 #> id symbol start_date end_date level size download #> <int> <chr> <date> <date> <int> <int> <chr> #> 1 102 AAPL 2023-01-03 2023-01-03 1 204800 https://… #> 2 101 MSFT 2023-01-03 2023-01-05 2 512000 https://… #> 3 100 SPY 2022-12-01 2022-12-31 10 1048576 https://… # Filter to a single symbol before downloading data_download( requested_data = archive[archive$symbol == "AAPL", ], account_login = acct, path = "data-lobster" ) ## End(Not run)
Logs into your LOBSTER account and creates a session object for subsequent data requests. This function handles the authentication process with lobsterdata.com and validates the login was successful.
account_login(login, pwd)account_login(login, pwd)
login |
character(1) Email address associated with the LOBSTER account. |
pwd |
character(1) Account password. |
The function submits the sign-in form using an AJAX header
(x-requested-with: XMLHttpRequest) and confirms success by checking the
redirect URL. Network connectivity and valid credentials are required.
Store credentials in your .Renviron file
(usethis::edit_r_environ()) to avoid hardcoding them in scripts:
[email protected] LOBSTER_PWD=your-password
A named list with components:
logical(1) — TRUE when authentication succeeded.
rvest session object used for further navigation.
rvest response returned after the sign-in form was submitted.
account_archive(), request_submit()
## Not run: acct <- account_login( login = Sys.getenv("LOBSTER_USER"), pwd = Sys.getenv("LOBSTER_PWD") ) if (acct$valid) { # Retrieve available datasets in the archive archive <- account_archive(acct) # Build and submit a new data request req <- request_query("AAPL", "2023-01-03", "2023-01-05", level = 1) request_submit(acct, req) } ## End(Not run)## Not run: acct <- account_login( login = Sys.getenv("LOBSTER_USER"), pwd = Sys.getenv("LOBSTER_PWD") ) if (acct$valid) { # Retrieve available datasets in the archive archive <- account_archive(acct) # Build and submit a new data request req <- request_query("AAPL", "2023-01-03", "2023-01-05", level = 1) request_submit(acct, req) } ## End(Not run)
Download one or more files listed in requested_data using the
authenticated session in account_login. Files are written to path.
The file write and optional extraction are performed in a background R
process (via callr::r_bg()). If unzip = TRUE the original .7z archive
is removed after extraction.
data_download(requested_data, account_login, path = ".", unzip = TRUE)data_download(requested_data, account_login, path = ".", unzip = TRUE)
requested_data |
data.frame A tibble with archive metadata that must
include at minimum a |
account_login |
list Output from |
path |
character(1) Directory where downloaded files will be written
and (if |
unzip |
logical(1) If |
For each row in requested_data the function fetches the file content via
the authenticated session and spawns a background process to write and
optionally extract the file. Because extraction runs in the background, the
function returns before the files are fully written to disk.
Invisibly returns NULL. Files are written to path by background
R processes launched via callr::r_bg(). These processes are not
monitored after launch; verify that the expected files exist in path
before proceeding with analysis.
account_login(), account_archive()
## Not run: acct <- account_login(Sys.getenv("LOBSTER_USER"), Sys.getenv("LOBSTER_PWD")) archive <- account_archive(acct) # Download all AAPL files to a local directory dir.create("data-lobster", showWarnings = FALSE) data_download( requested_data = archive[archive$symbol == "AAPL", ], account_login = acct, path = "data-lobster" ) # Keep the raw .7z archives without extracting data_download( requested_data = archive, account_login = acct, path = "data-lobster", unzip = FALSE ) ## End(Not run)## Not run: acct <- account_login(Sys.getenv("LOBSTER_USER"), Sys.getenv("LOBSTER_PWD")) archive <- account_archive(acct) # Download all AAPL files to a local directory dir.create("data-lobster", showWarnings = FALSE) data_download( requested_data = archive[archive$symbol == "AAPL", ], account_login = acct, path = "data-lobster" ) # Keep the raw .7z archives without extracting data_download( requested_data = archive, account_login = acct, path = "data-lobster", unzip = FALSE ) ## End(Not run)
Construct a request describing which trading-day files to ask LOBSTER for. For each symbol and date range the function expands the range to one row per calendar day, converts the level to integer, and (optionally) validates the requested days by removing weekends, NYSE holidays and any days already present in the provided account archive.
request_query( symbol, start_date, end_date, level, validate = TRUE, account_archive = NULL, frequency = "1 day" )request_query( symbol, start_date, end_date, level, validate = TRUE, account_archive = NULL, frequency = "1 day" )
symbol |
character vector One or more ticker symbols (e.g. |
start_date |
Date-like (Date or character) Start date(s) for the
requested range(s). Converted with |
end_date |
Date-like (Date or character) End date(s) for the
requested range(s). Converted with |
level |
integer(1) Required order-book snapshot level (e.g. |
validate |
logical(1) If |
account_archive |
data.frame or tibble, optional Archive table as
returned by |
frequency |
character(1) Frequency string passed to |
This function performs no network activity. Use request_submit()
to send the generated request to an authenticated LOBSTER session.
A data.frame with one row per period and columns:
symbol: character
start_date: Date — start of the period (equal to end_date for daily
requests)
end_date: Date — end of the period
level: integer
When validate = TRUE and frequency = "1 day", weekend days and NYSE
holidays are silently removed, so the output typically contains fewer rows
than the full calendar span of the requested date range.
request_submit(), account_archive(), account_login()
# Single symbol, one-week range (weekends and holidays removed automatically) request_query("AAPL", "2023-01-02", "2023-01-06", level = 1) # Multiple symbols with paired date ranges request_query( symbol = c("AAPL", "MSFT"), start_date = c("2023-01-03", "2023-02-01"), end_date = c("2023-01-05", "2023-02-03"), level = 1 ) # Monthly frequency for a large date range (no per-day expansion) request_query( symbol = "SPY", start_date = "2022-01-01", end_date = "2022-12-31", level = 10, frequency = "1 month" ) ## Not run: # Exclude days already in the archive to avoid duplicate requests acct <- account_login(Sys.getenv("LOBSTER_USER"), Sys.getenv("LOBSTER_PWD")) archive <- account_archive(acct) req <- request_query( symbol = "AAPL", start_date = "2023-01-02", end_date = "2023-01-31", level = 1, account_archive = archive ) ## End(Not run)# Single symbol, one-week range (weekends and holidays removed automatically) request_query("AAPL", "2023-01-02", "2023-01-06", level = 1) # Multiple symbols with paired date ranges request_query( symbol = c("AAPL", "MSFT"), start_date = c("2023-01-03", "2023-02-01"), end_date = c("2023-01-05", "2023-02-03"), level = 1 ) # Monthly frequency for a large date range (no per-day expansion) request_query( symbol = "SPY", start_date = "2022-01-01", end_date = "2022-12-31", level = 10, frequency = "1 month" ) ## Not run: # Exclude days already in the archive to avoid duplicate requests acct <- account_login(Sys.getenv("LOBSTER_USER"), Sys.getenv("LOBSTER_PWD")) archive <- account_archive(acct) req <- request_query( symbol = "AAPL", start_date = "2023-01-02", end_date = "2023-01-31", level = 1, account_archive = archive ) ## End(Not run)
Send the prepared request rows to lobsterdata.com using the authenticated
session contained in account_login. Each row in request is submitted
as a separate HTTP request. The function performs network side effects and
returns invisibly.
request_submit(account_login, request)request_submit(account_login, request)
account_login |
list Output from |
request |
data.frame A tibble as returned by |
Invisibly returns NULL. The primary effect is to queue requests on
the LOBSTER server; processing happens server-side and may take some time.
Use account_archive() afterwards to check when files become available.
account_login(), request_query(), account_archive()
## Not run: acct <- account_login( login = Sys.getenv("LOBSTER_USER"), pwd = Sys.getenv("LOBSTER_PWD") ) # Build a request and submit it req <- request_query("AAPL", "2023-01-03", "2023-01-05", level = 1) request_submit(acct, req) # LOBSTER processes the request server-side; this may take several minutes. # Once done, the files appear in the account archive. archive <- account_archive(acct) ## End(Not run)## Not run: acct <- account_login( login = Sys.getenv("LOBSTER_USER"), pwd = Sys.getenv("LOBSTER_PWD") ) # Build a request and submit it req <- request_query("AAPL", "2023-01-03", "2023-01-05", level = 1) request_submit(acct, req) # LOBSTER processes the request server-side; this may take several minutes. # Once done, the files appear in the account archive. archive <- account_archive(acct) ## End(Not run)