Provides a {skimr}-style summary of an Arrow Dataset with statistics organized by variable type. Computes summary statistics efficiently using Arrow's query engine without loading the full dataset into memory.
Value
A list of class "skim_arrow" containing:
- overview
A tibble with dataset dimensions and column type counts
- numeric
A tibble with statistics for numeric columns (missing_pct, mean, sd, min, max)
- character
A tibble with statistics for character columns (missing_pct, n_unique)
- timestamp
A tibble with statistics for timestamp columns (missing_pct, min, max)
Details
The function classifies columns by type and computes appropriate summary statistics for each:
Numeric columns: missing percentage, mean, standard deviation, min, max
Character columns: missing percentage, number of unique values
Timestamp columns: missing percentage, min, max (as POSIXct objects)
All computations are performed using Arrow's query engine, making this function efficient even for very large datasets stored in Parquet files.