Polars
Misc
Also see Python, Polars
Packages
- {polars}
- {tidypolars} - Allows one to use the tidyverse syntax while using the power of polars
- {polarssql} - Provides a polars backend for DBI and dbplyr.
Resources
as.vector(<series>)
will remove attributes that might be useful. Therefore, recommended to use<series>$to_r_vector()
instead for usual conversionsCoerce an existing dataframe into a polars dataframe:
as_polars_df
oras_polars_lf
for a lazy dataframe. For creating a df inside the DataFrame, you need!!!
$DataFrame(!!!data.frame(x = 1, y = "a")) pl#> shape: (1, 2) #> ┌─────┬─────┐ #> │ x ┆ y │ #> │ --- ┆ --- │ #> │ f64 ┆ str │ #> ╞═════╪═════╡ #> │ 1.0 ┆ a │ #> └─────┴─────┘
if you used to pass a vector of column names or a list of expressions, you need to expand it with
!!!
<- as_polars_df(head(mtcars, 3)) dat <- list(pl$col("drat") + 1, "mpg", "cyl") my_exprs $select(!!!my_exprs) dat#> shape: (3, 3) #> ┌──────┬──────┬─────┐ #> │ drat ┆ mpg ┆ cyl │ #> │ --- ┆ --- ┆ --- │ #> │ f64 ┆ f64 ┆ f64 │ #> ╞══════╪══════╪═════╡ #> │ 4.9 ┆ 21.0 ┆ 6.0 │ #> │ 4.9 ┆ 21.0 ┆ 6.0 │ #> │ 4.85 ┆ 22.8 ┆ 4.0 │ #> └──────┴──────┴─────┘ $col(!!!c("foo", "bar"), "baz") pl#> cols(["foo", "bar", "baz"])
- Same with
lazyframe>$cast()
and<lazyframe>$group_by()
- Same with
Mutate
Example: 10-day and 50-day Moving Average (source)
<- moving_average_pl $with_columns( long_pl$ plcol("Price")$ rolling_mean(10)$ over("Stock")$ alias("Price_MA10"), $ plcol("Price")$ rolling_mean(50)$ over("Stock")$ alias("Price_MA50") ) moving_average_pl |> moving_average_pl as_tibble() |> ::pivot_longer( tidyrcols = Price:Price_MA50 %>% ) group_by(Stock) |> plot_time_series(as_date(Date), value, .color_var = name, .facet_ncol = 4, .smooth = FALSE)
Summarize
Example: min, max, mean by group
<- pl$scan_csv(file_name)$ df group_by("state")$ agg( $ plcol("measurement")$ min()$ alias("min_m"), $ plcol("measurement")$ max()$ alias("max_m"), $ plcol("measurement")$ mean()$ alias("mean_m") $ )collect()
Pivoting
Example:
pivot_longer
(source)= stock_data_pl$unpivot( long_pl index = "Date", value_name = "Price", variable_name = "Stock" ) long_pl %>% long_pl as_tibble() |> group_by(Stock) |> ::plot_time_series(as_date(Date), Price, .facet_ncol = 4, .smooth = FALSE) timetk
SQL
Example: min, max, mean by group
<- pl$LazyFrame(D) lf $ plSQLContext(frame = lf)$ execute( "select min(measurement) as min_m, max(measurement) as max_m, avg(measurement) as mean_m from frame group by state" $ )collect()