Skip to content

Join DataFrames

Source code

Description

This function can do both mutating joins (adding columns based on matching observations, for example with how = “left”) and filtering joins (keeping observations based on matching observations, for example with how = “inner”).

Usage

<DataFrame>$join(
  other,
  on = NULL,
  how = c("inner", "left", "outer", "semi", "anti", "cross", "outer_coalesce"),
  ...,
  left_on = NULL,
  right_on = NULL,
  suffix = "_right",
  validate = "m:m",
  join_nulls = FALSE,
  allow_parallel = TRUE,
  force_parallel = FALSE
)

Arguments

other DataFrame to join with.
on Either a vector of column names or a list of expressions and/or strings. Use left_on and right_on if the column names to match on are different between the two DataFrames.
how One of the following methods: "inner", "left", "outer", "semi", "anti", "cross", "outer_coalesce".
Ignored.
left_on, right_on Same as on but only for the left or the right DataFrame. They must have the same length.
suffix Suffix to add to duplicated column names.
validate Checks if join is of specified type:
  • “m:m” (default): many-to-many, doesn’t perform any checks;
  • “1:1”: one-to-one, check if join keys are unique in both left and right datasets;
  • “1:m”: one-to-many, check if join keys are unique in left dataset
  • “m:1”: many-to-one, check if join keys are unique in right dataset
Note that this is currently not supported by the streaming engine, and is only supported when joining by single columns.
join_nulls Join on null values. By default null values will never produce matches.
allow_parallel Allow the physical plan to optionally evaluate the computation of both DataFrames up to the join in parallel.
force_parallel Force the physical plan to evaluate the computation of both DataFrames up to the join in parallel.

Value

DataFrame

Examples

library(polars)

# inner join by default
df1 = pl$DataFrame(list(key = 1:3, payload = c("f", "i", NA)))
df2 = pl$DataFrame(list(key = c(3L, 4L, 5L, NA_integer_)))
df1$join(other = df2, on = "key")
#> shape: (1, 2)
#> ┌─────┬─────────┐
#> │ key ┆ payload │
#> │ --- ┆ ---     │
#> │ i32 ┆ str     │
#> ╞═════╪═════════╡
#> │ 3   ┆ null    │
#> └─────┴─────────┘
# cross join
df1 = pl$DataFrame(x = letters[1:3])
df2 = pl$DataFrame(y = 1:4)
df1$join(other = df2, how = "cross")
#> shape: (12, 2)
#> ┌─────┬─────┐
#> │ x   ┆ y   │
#> │ --- ┆ --- │
#> │ str ┆ i32 │
#> ╞═════╪═════╡
#> │ a   ┆ 1   │
#> │ a   ┆ 2   │
#> │ a   ┆ 3   │
#> │ a   ┆ 4   │
#> │ b   ┆ 1   │
#> │ …   ┆ …   │
#> │ b   ┆ 4   │
#> │ c   ┆ 1   │
#> │ c   ┆ 2   │
#> │ c   ┆ 3   │
#> │ c   ┆ 4   │
#> └─────┴─────┘