Skip to content

Drop duplicated rows

Source code

Description

Drop duplicated rows

Usage

<DataFrame>$unique(subset = NULL, ..., keep = "any", maintain_order = FALSE)

Arguments

subset A character vector with the names of the column(s) to use to identify duplicates. If NULL (default), use all columns.
Not used.
keep Which of the duplicate rows to keep:
  • “any” (default): Does not give any guarantee of which row is kept. This allows more optimizations.
  • “first”: Keep first unique row.
  • “last”: Keep last unique row.
  • “none”: Don’t keep duplicate rows.
maintain_order Keep the same order as the original data. Setting this to TRUE makes it more expensive to compute and blocks the possibility to run on the streaming engine.

Value

DataFrame

Examples

library(polars)

df = pl$DataFrame(
  x = c(1:3, 1:3, 3:1, 1L),
  y = c(1:3, 1:3, 1:3, 1L)
)
df$height
#> [1] 10
df$unique()$height
#> [1] 5
# subset to define unique, keep only last or first
df$unique(subset = "x", keep = "last")
#> shape: (3, 2)
#> ┌─────┬─────┐
#> │ x   ┆ y   │
#> │ --- ┆ --- │
#> │ i32 ┆ i32 │
#> ╞═════╪═════╡
#> │ 2   ┆ 2   │
#> │ 1   ┆ 1   │
#> │ 3   ┆ 1   │
#> └─────┴─────┘
df$unique(subset = "x", keep = "first")
#> shape: (3, 2)
#> ┌─────┬─────┐
#> │ x   ┆ y   │
#> │ --- ┆ --- │
#> │ i32 ┆ i32 │
#> ╞═════╪═════╡
#> │ 3   ┆ 3   │
#> │ 2   ┆ 2   │
#> │ 1   ┆ 1   │
#> └─────┴─────┘
# only keep unique rows
df$unique(keep = "none")
#> shape: (2, 2)
#> ┌─────┬─────┐
#> │ x   ┆ y   │
#> │ --- ┆ --- │
#> │ i32 ┆ i32 │
#> ╞═════╪═════╡
#> │ 3   ┆ 1   │
#> │ 1   ┆ 3   │
#> └─────┴─────┘