Polars
Misc
Also see Python, Polars
Packages
- {polars}
- {tidypolars} - Allows one to use the tidyverse syntax while using the power of polars
- {polarssql} - Provides a polars backend for DBI and dbplyr.
Resources
as.vector(<series>)will remove attributes that might be useful. Therefore, recommended to use<series>$to_r_vector()instead for usual conversionsCoerce an existing dataframe into a polars dataframe:
as_polars_dforas_polars_lffor a lazy dataframe. For creating a df inside the DataFrame, you need!!!pl$DataFrame(!!!data.frame(x = 1, y = "a")) #> shape: (1, 2) #> ┌─────┬─────┐ #> │ x ┆ y │ #> │ --- ┆ --- │ #> │ f64 ┆ str │ #> ╞═════╪═════╡ #> │ 1.0 ┆ a │ #> └─────┴─────┘if you used to pass a vector of column names or a list of expressions, you need to expand it with
!!!dat <- as_polars_df(head(mtcars, 3)) my_exprs <- list(pl$col("drat") + 1, "mpg", "cyl") dat$select(!!!my_exprs) #> shape: (3, 3) #> ┌──────┬──────┬─────┐ #> │ drat ┆ mpg ┆ cyl │ #> │ --- ┆ --- ┆ --- │ #> │ f64 ┆ f64 ┆ f64 │ #> ╞══════╪══════╪═════╡ #> │ 4.9 ┆ 21.0 ┆ 6.0 │ #> │ 4.9 ┆ 21.0 ┆ 6.0 │ #> │ 4.85 ┆ 22.8 ┆ 4.0 │ #> └──────┴──────┴─────┘ pl$col(!!!c("foo", "bar"), "baz") #> cols(["foo", "bar", "baz"])- Same with
lazyframe>$cast()and<lazyframe>$group_by()
- Same with
Mutate
Example: 10-day and 50-day Moving Average (source)
moving_average_pl <- long_pl$with_columns( pl$ col("Price")$ rolling_mean(10)$ over("Stock")$ alias("Price_MA10"), pl$ col("Price")$ rolling_mean(50)$ over("Stock")$ alias("Price_MA50") ) moving_average_pl moving_average_pl |> as_tibble() |> tidyr::pivot_longer( cols = Price:Price_MA50 ) %>% group_by(Stock) |> plot_time_series(as_date(Date), value, .color_var = name, .facet_ncol = 4, .smooth = FALSE)
Summarize
Example: min, max, mean by group
df <- pl$scan_csv(file_name)$ group_by("state")$ agg( pl$ col("measurement")$ min()$ alias("min_m"), pl$ col("measurement")$ max()$ alias("max_m"), pl$ col("measurement")$ mean()$ alias("mean_m") )$ collect()
Pivoting
Example:
pivot_longer(source)long_pl = stock_data_pl$unpivot( index = "Date", value_name = "Price", variable_name = "Stock" ) long_pl long_pl %>% as_tibble() |> group_by(Stock) |> timetk::plot_time_series(as_date(Date), Price, .facet_ncol = 4, .smooth = FALSE)
SQL
Example: min, max, mean by group
lf <- pl$LazyFrame(D) pl$ SQLContext(frame = lf)$ execute( "select min(measurement) as min_m, max(measurement) as max_m, avg(measurement) as mean_m from frame group by state" )$ collect()