DFplyrDFplyr is
a R package available via the Bioconductor repository for packages
and can be downloaded via BiocManager::install():
DFplyr is
inspired by dplyr which
implements a wide variety of common data manipulations
(mutate, select, filter) but
which only operates on objects of class data.frame or
tibble (from r CRANpkg("tibble")).
When working with S4Vectors
DataFrames - which are frequently used as components of,
for example SummarizedExperiment
objects - a common workaround is to convert the DataFrame
to a tibble in order to then use dplyr functions
to manipulate the contents, before converting back to a
DataFrame.
This has several drawbacks, including the fact that
tibble does not support rownames (and dplyr
frequently does not preserve them), does not support S4 columns
(e.g. IRanges
vectors), and requires the back and forth transformation any time
manipulation is desired.
DFplyrTo being with, we create an S4Vectors
DataFrame, including some S4 columns
library(S4Vectors)
m <- mtcars[, c("cyl", "hp", "am", "gear", "disp")]
d <- as(m, "DataFrame")
d$grX <- GenomicRanges::GRanges("chrX", IRanges::IRanges(1:32, width = 10))
d$grY <- GenomicRanges::GRanges("chrY", IRanges::IRanges(1:32, width = 10))
d$nl <- IRanges::NumericList(lapply(d$gear, function(n) round(rnorm(n), 2)))
d
#> DataFrame with 32 rows and 8 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Mazda RX4 6 110 1 4 160 chrX:1-10
#> Mazda RX4 Wag 6 110 1 4 160 chrX:2-11
#> Datsun 710 4 93 1 4 108 chrX:3-12
#> Hornet 4 Drive 6 110 0 3 258 chrX:4-13
#> Hornet Sportabout 8 175 0 3 360 chrX:5-14
#> ... ... ... ... ... ... ...
#> Lotus Europa 4 113 1 5 95.1 chrX:28-37
#> Ford Pantera L 8 264 1 5 351.0 chrX:29-38
#> Ferrari Dino 6 175 1 5 145.0 chrX:30-39
#> Maserati Bora 8 335 1 5 301.0 chrX:31-40
#> Volvo 142E 4 109 1 4 121.0 chrX:32-41
#> grY nl
#> <GRanges> <CompressedNumericList>
#> Mazda RX4 chrY:1-10 1.15,-1.20,-0.67,...
#> Mazda RX4 Wag chrY:2-11 0.94, 0.09,-1.36,...
#> Datsun 710 chrY:3-12 0.40,-1.36,-0.39,...
#> Hornet 4 Drive chrY:4-13 0.64,-1.52, 0.59
#> Hornet Sportabout chrY:5-14 -0.28,-0.83, 0.43
#> ... ... ...
#> Lotus Europa chrY:28-37 -1.84,-2.29,-0.55,...
#> Ford Pantera L chrY:29-38 0.22,1.17,0.07,...
#> Ferrari Dino chrY:30-39 0.71,-0.06, 0.26,...
#> Maserati Bora chrY:31-40 -0.32,-0.17,-1.91,...
#> Volvo 142E chrY:32-41 -0.81,-0.77, 0.68,...This will appear in RStudio’s environment pane as a
Formal class DataFrame (dplyr-compatible)
when using DFplyr. No interference with the actual object is required, but this helps identify that dplyr-compatibility is available.
DataFrames can then be used in dplyr-like
calls the same as data.frame or tibble
objects. Support for working with S4 columns is enabled provided they
have appropriate functions. Adding multiple columns will result in the
new columns being created in alphabetical order. For example, adding a
new column newvar which is the sum of the cyl
and hp columns
mutate(d, newvar = cyl + hp)
#> DataFrame with 32 rows and 9 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Mazda RX4 6 110 1 4 160 chrX:1-10
#> Mazda RX4 Wag 6 110 1 4 160 chrX:2-11
#> Datsun 710 4 93 1 4 108 chrX:3-12
#> Hornet 4 Drive 6 110 0 3 258 chrX:4-13
#> Hornet Sportabout 8 175 0 3 360 chrX:5-14
#> ... ... ... ... ... ... ...
#> Lotus Europa 4 113 1 5 95.1 chrX:28-37
#> Ford Pantera L 8 264 1 5 351.0 chrX:29-38
#> Ferrari Dino 6 175 1 5 145.0 chrX:30-39
#> Maserati Bora 8 335 1 5 301.0 chrX:31-40
#> Volvo 142E 4 109 1 4 121.0 chrX:32-41
#> grY nl newvar
#> <GRanges> <CompressedNumericList> <numeric>
#> Mazda RX4 chrY:1-10 1.15,-1.20,-0.67,... 116
#> Mazda RX4 Wag chrY:2-11 0.94, 0.09,-1.36,... 116
#> Datsun 710 chrY:3-12 0.40,-1.36,-0.39,... 97
#> Hornet 4 Drive chrY:4-13 0.64,-1.52, 0.59 116
#> Hornet Sportabout chrY:5-14 -0.28,-0.83, 0.43 183
#> ... ... ... ...
#> Lotus Europa chrY:28-37 -1.84,-2.29,-0.55,... 117
#> Ford Pantera L chrY:29-38 0.22,1.17,0.07,... 272
#> Ferrari Dino chrY:30-39 0.71,-0.06, 0.26,... 181
#> Maserati Bora chrY:31-40 -0.32,-0.17,-1.91,... 343
#> Volvo 142E chrY:32-41 -0.81,-0.77, 0.68,... 113or doubling the nl column as nl2
mutate(d, nl2 = nl * 2)
#> DataFrame with 32 rows and 9 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Mazda RX4 6 110 1 4 160 chrX:1-10
#> Mazda RX4 Wag 6 110 1 4 160 chrX:2-11
#> Datsun 710 4 93 1 4 108 chrX:3-12
#> Hornet 4 Drive 6 110 0 3 258 chrX:4-13
#> Hornet Sportabout 8 175 0 3 360 chrX:5-14
#> ... ... ... ... ... ... ...
#> Lotus Europa 4 113 1 5 95.1 chrX:28-37
#> Ford Pantera L 8 264 1 5 351.0 chrX:29-38
#> Ferrari Dino 6 175 1 5 145.0 chrX:30-39
#> Maserati Bora 8 335 1 5 301.0 chrX:31-40
#> Volvo 142E 4 109 1 4 121.0 chrX:32-41
#> grY nl nl2
#> <GRanges> <CompressedNumericList> <CompressedNumericList>
#> Mazda RX4 chrY:1-10 1.15,-1.20,-0.67,... 2.30,-2.40,-1.34,...
#> Mazda RX4 Wag chrY:2-11 0.94, 0.09,-1.36,... 1.88, 0.18,-2.72,...
#> Datsun 710 chrY:3-12 0.40,-1.36,-0.39,... 0.80,-2.72,-0.78,...
#> Hornet 4 Drive chrY:4-13 0.64,-1.52, 0.59 1.28,-3.04, 1.18
#> Hornet Sportabout chrY:5-14 -0.28,-0.83, 0.43 -0.56,-1.66, 0.86
#> ... ... ... ...
#> Lotus Europa chrY:28-37 -1.84,-2.29,-0.55,... -3.68,-4.58,-1.10,...
#> Ford Pantera L chrY:29-38 0.22,1.17,0.07,... 0.44,2.34,0.14,...
#> Ferrari Dino chrY:30-39 0.71,-0.06, 0.26,... 1.42,-0.12, 0.52,...
#> Maserati Bora chrY:31-40 -0.32,-0.17,-1.91,... -0.64,-0.34,-3.82,...
#> Volvo 142E chrY:32-41 -0.81,-0.77, 0.68,... -1.62,-1.54, 1.36,...or calculating the length() of the nl
column cells as length_nl
mutate(d, length_nl = lengths(nl))
#> DataFrame with 32 rows and 9 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Mazda RX4 6 110 1 4 160 chrX:1-10
#> Mazda RX4 Wag 6 110 1 4 160 chrX:2-11
#> Datsun 710 4 93 1 4 108 chrX:3-12
#> Hornet 4 Drive 6 110 0 3 258 chrX:4-13
#> Hornet Sportabout 8 175 0 3 360 chrX:5-14
#> ... ... ... ... ... ... ...
#> Lotus Europa 4 113 1 5 95.1 chrX:28-37
#> Ford Pantera L 8 264 1 5 351.0 chrX:29-38
#> Ferrari Dino 6 175 1 5 145.0 chrX:30-39
#> Maserati Bora 8 335 1 5 301.0 chrX:31-40
#> Volvo 142E 4 109 1 4 121.0 chrX:32-41
#> grY nl length_nl
#> <GRanges> <CompressedNumericList> <integer>
#> Mazda RX4 chrY:1-10 1.15,-1.20,-0.67,... 4
#> Mazda RX4 Wag chrY:2-11 0.94, 0.09,-1.36,... 4
#> Datsun 710 chrY:3-12 0.40,-1.36,-0.39,... 4
#> Hornet 4 Drive chrY:4-13 0.64,-1.52, 0.59 3
#> Hornet Sportabout chrY:5-14 -0.28,-0.83, 0.43 3
#> ... ... ... ...
#> Lotus Europa chrY:28-37 -1.84,-2.29,-0.55,... 5
#> Ford Pantera L chrY:29-38 0.22,1.17,0.07,... 5
#> Ferrari Dino chrY:30-39 0.71,-0.06, 0.26,... 5
#> Maserati Bora chrY:31-40 -0.32,-0.17,-1.91,... 5
#> Volvo 142E chrY:32-41 -0.81,-0.77, 0.68,... 4Transformations can involve S4-related functions, such as extracting
the seqnames(), strand(), and
end() of the grX column
mutate(d,
chr = GenomeInfoDb::seqnames(grX),
strand_X = BiocGenerics::strand(grX),
end_X = BiocGenerics::end(grX)
)
#> DataFrame with 32 rows and 11 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Mazda RX4 6 110 1 4 160 chrX:1-10
#> Mazda RX4 Wag 6 110 1 4 160 chrX:2-11
#> Datsun 710 4 93 1 4 108 chrX:3-12
#> Hornet 4 Drive 6 110 0 3 258 chrX:4-13
#> Hornet Sportabout 8 175 0 3 360 chrX:5-14
#> ... ... ... ... ... ... ...
#> Lotus Europa 4 113 1 5 95.1 chrX:28-37
#> Ford Pantera L 8 264 1 5 351.0 chrX:29-38
#> Ferrari Dino 6 175 1 5 145.0 chrX:30-39
#> Maserati Bora 8 335 1 5 301.0 chrX:31-40
#> Volvo 142E 4 109 1 4 121.0 chrX:32-41
#> grY nl chr end_X strand_X
#> <GRanges> <CompressedNumericList> <Rle> <integer> <Rle>
#> Mazda RX4 chrY:1-10 1.15,-1.20,-0.67,... chrX 10 *
#> Mazda RX4 Wag chrY:2-11 0.94, 0.09,-1.36,... chrX 11 *
#> Datsun 710 chrY:3-12 0.40,-1.36,-0.39,... chrX 12 *
#> Hornet 4 Drive chrY:4-13 0.64,-1.52, 0.59 chrX 13 *
#> Hornet Sportabout chrY:5-14 -0.28,-0.83, 0.43 chrX 14 *
#> ... ... ... ... ... ...
#> Lotus Europa chrY:28-37 -1.84,-2.29,-0.55,... chrX 37 *
#> Ford Pantera L chrY:29-38 0.22,1.17,0.07,... chrX 38 *
#> Ferrari Dino chrY:30-39 0.71,-0.06, 0.26,... chrX 39 *
#> Maserati Bora chrY:31-40 -0.32,-0.17,-1.91,... chrX 40 *
#> Volvo 142E chrY:32-41 -0.81,-0.77, 0.68,... chrX 41 *the object returned remains a standard DataFrame, and
further calls can be piped with %>%, in this case
extracting the newly created newvar column
mutate(d, newvar = cyl + hp) %>%
pull(newvar)
#> [1] 116 116 97 116 183 111 253 66 99 129 129 188 188 188 213 223 238 70 56
#> [20] 69 101 158 158 253 183 70 95 117 272 181 343 113Some of the variants of the dplyr verbs also work, such
as transforming the numeric columns using a quosure style lambda
function, in this case squaring them
mutate_if(d, is.numeric, ~ .^2)
#> DataFrame with 32 rows and 8 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Mazda RX4 36 12100 1 16 25600 chrX:1-10
#> Mazda RX4 Wag 36 12100 1 16 25600 chrX:2-11
#> Datsun 710 16 8649 1 16 11664 chrX:3-12
#> Hornet 4 Drive 36 12100 0 9 66564 chrX:4-13
#> Hornet Sportabout 64 30625 0 9 129600 chrX:5-14
#> ... ... ... ... ... ... ...
#> Lotus Europa 16 12769 1 25 9044.01 chrX:28-37
#> Ford Pantera L 64 69696 1 25 123201.00 chrX:29-38
#> Ferrari Dino 36 30625 1 25 21025.00 chrX:30-39
#> Maserati Bora 64 112225 1 25 90601.00 chrX:31-40
#> Volvo 142E 16 11881 1 16 14641.00 chrX:32-41
#> grY nl
#> <GRanges> <CompressedNumericList>
#> Mazda RX4 chrY:1-10 1.15,-1.20,-0.67,...
#> Mazda RX4 Wag chrY:2-11 0.94, 0.09,-1.36,...
#> Datsun 710 chrY:3-12 0.40,-1.36,-0.39,...
#> Hornet 4 Drive chrY:4-13 0.64,-1.52, 0.59
#> Hornet Sportabout chrY:5-14 -0.28,-0.83, 0.43
#> ... ... ...
#> Lotus Europa chrY:28-37 -1.84,-2.29,-0.55,...
#> Ford Pantera L chrY:29-38 0.22,1.17,0.07,...
#> Ferrari Dino chrY:30-39 0.71,-0.06, 0.26,...
#> Maserati Bora chrY:31-40 -0.32,-0.17,-1.91,...
#> Volvo 142E chrY:32-41 -0.81,-0.77, 0.68,...or extracting the start of all of the
"GRanges" columns
mutate_if(d, ~ isa(., "GRanges"), BiocGenerics::start)
#> DataFrame with 32 rows and 8 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <integer>
#> Mazda RX4 6 110 1 4 160 1
#> Mazda RX4 Wag 6 110 1 4 160 2
#> Datsun 710 4 93 1 4 108 3
#> Hornet 4 Drive 6 110 0 3 258 4
#> Hornet Sportabout 8 175 0 3 360 5
#> ... ... ... ... ... ... ...
#> Lotus Europa 4 113 1 5 95.1 28
#> Ford Pantera L 8 264 1 5 351.0 29
#> Ferrari Dino 6 175 1 5 145.0 30
#> Maserati Bora 8 335 1 5 301.0 31
#> Volvo 142E 4 109 1 4 121.0 32
#> grY nl
#> <integer> <CompressedNumericList>
#> Mazda RX4 1 1.15,-1.20,-0.67,...
#> Mazda RX4 Wag 2 0.94, 0.09,-1.36,...
#> Datsun 710 3 0.40,-1.36,-0.39,...
#> Hornet 4 Drive 4 0.64,-1.52, 0.59
#> Hornet Sportabout 5 -0.28,-0.83, 0.43
#> ... ... ...
#> Lotus Europa 28 -1.84,-2.29,-0.55,...
#> Ford Pantera L 29 0.22,1.17,0.07,...
#> Ferrari Dino 30 0.71,-0.06, 0.26,...
#> Maserati Bora 31 -0.32,-0.17,-1.91,...
#> Volvo 142E 32 -0.81,-0.77, 0.68,...Use of tidyselect
helpers is limited to within vars() calls and using the
_at variants
mutate_at(d, vars(starts_with("c")), ~ .^2)
#> DataFrame with 32 rows and 8 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Mazda RX4 36 110 1 4 160 chrX:1-10
#> Mazda RX4 Wag 36 110 1 4 160 chrX:2-11
#> Datsun 710 16 93 1 4 108 chrX:3-12
#> Hornet 4 Drive 36 110 0 3 258 chrX:4-13
#> Hornet Sportabout 64 175 0 3 360 chrX:5-14
#> ... ... ... ... ... ... ...
#> Lotus Europa 16 113 1 5 95.1 chrX:28-37
#> Ford Pantera L 64 264 1 5 351.0 chrX:29-38
#> Ferrari Dino 36 175 1 5 145.0 chrX:30-39
#> Maserati Bora 64 335 1 5 301.0 chrX:31-40
#> Volvo 142E 16 109 1 4 121.0 chrX:32-41
#> grY nl
#> <GRanges> <CompressedNumericList>
#> Mazda RX4 chrY:1-10 1.15,-1.20,-0.67,...
#> Mazda RX4 Wag chrY:2-11 0.94, 0.09,-1.36,...
#> Datsun 710 chrY:3-12 0.40,-1.36,-0.39,...
#> Hornet 4 Drive chrY:4-13 0.64,-1.52, 0.59
#> Hornet Sportabout chrY:5-14 -0.28,-0.83, 0.43
#> ... ... ...
#> Lotus Europa chrY:28-37 -1.84,-2.29,-0.55,...
#> Ford Pantera L chrY:29-38 0.22,1.17,0.07,...
#> Ferrari Dino chrY:30-39 0.71,-0.06, 0.26,...
#> Maserati Bora chrY:31-40 -0.32,-0.17,-1.91,...
#> Volvo 142E chrY:32-41 -0.81,-0.77, 0.68,...and also works with other verbs
select_at(d, vars(starts_with("gr")))
#> DataFrame with 32 rows and 2 columns
#> grX grY
#> <GRanges> <GRanges>
#> Mazda RX4 chrX:1-10 chrY:1-10
#> Mazda RX4 Wag chrX:2-11 chrY:2-11
#> Datsun 710 chrX:3-12 chrY:3-12
#> Hornet 4 Drive chrX:4-13 chrY:4-13
#> Hornet Sportabout chrX:5-14 chrY:5-14
#> ... ... ...
#> Lotus Europa chrX:28-37 chrY:28-37
#> Ford Pantera L chrX:29-38 chrY:29-38
#> Ferrari Dino chrX:30-39 chrY:30-39
#> Maserati Bora chrX:31-40 chrY:31-40
#> Volvo 142E chrX:32-41 chrY:32-41Importantly, grouped operations are supported. DataFrame
does not natively support groups (the same way that
data.frame does not) so these are implemented specifically
for DFplyr with group information shown at the top of the
printed output
group_by(d, cyl, am)
#> DataFrame with 32 rows and 8 columns
#> Groups: cyl, am
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Mazda RX4 6 110 1 4 160 chrX:1-10
#> Mazda RX4 Wag 6 110 1 4 160 chrX:2-11
#> Datsun 710 4 93 1 4 108 chrX:3-12
#> Hornet 4 Drive 6 110 0 3 258 chrX:4-13
#> Hornet Sportabout 8 175 0 3 360 chrX:5-14
#> ... ... ... ... ... ... ...
#> Lotus Europa 4 113 1 5 95.1 chrX:28-37
#> Ford Pantera L 8 264 1 5 351.0 chrX:29-38
#> Ferrari Dino 6 175 1 5 145.0 chrX:30-39
#> Maserati Bora 8 335 1 5 301.0 chrX:31-40
#> Volvo 142E 4 109 1 4 121.0 chrX:32-41
#> grY nl
#> <GRanges> <CompressedNumericList>
#> Mazda RX4 chrY:1-10 1.15,-1.20,-0.67,...
#> Mazda RX4 Wag chrY:2-11 0.94, 0.09,-1.36,...
#> Datsun 710 chrY:3-12 0.40,-1.36,-0.39,...
#> Hornet 4 Drive chrY:4-13 0.64,-1.52, 0.59
#> Hornet Sportabout chrY:5-14 -0.28,-0.83, 0.43
#> ... ... ...
#> Lotus Europa chrY:28-37 -1.84,-2.29,-0.55,...
#> Ford Pantera L chrY:29-38 0.22,1.17,0.07,...
#> Ferrari Dino chrY:30-39 0.71,-0.06, 0.26,...
#> Maserati Bora chrY:31-40 -0.32,-0.17,-1.91,...
#> Volvo 142E chrY:32-41 -0.81,-0.77, 0.68,...Other verbs are similarly implemented, and preserve row names where possible. For example, selecting a limited set of columns using non-standard evaluation (NSE)
select(d, am, cyl)
#> DataFrame with 32 rows and 2 columns
#> am cyl
#> <numeric> <numeric>
#> Mazda RX4 1 6
#> Mazda RX4 Wag 1 6
#> Datsun 710 1 4
#> Hornet 4 Drive 0 6
#> Hornet Sportabout 0 8
#> ... ... ...
#> Lotus Europa 1 4
#> Ford Pantera L 1 8
#> Ferrari Dino 1 6
#> Maserati Bora 1 8
#> Volvo 142E 1 4Arranging rows according to the ordering of a column
arrange(d, desc(hp))
#> DataFrame with 32 rows and 8 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Maserati Bora 8 335 1 5 301 chrX:31-40
#> Ford Pantera L 8 264 1 5 351 chrX:29-38
#> Duster 360 8 245 0 3 360 chrX:7-16
#> Camaro Z28 8 245 0 3 350 chrX:24-33
#> Chrysler Imperial 8 230 0 3 440 chrX:17-26
#> ... ... ... ... ... ... ...
#> Fiat 128 4 66 1 4 78.7 chrX:18-27
#> Fiat X1-9 4 66 1 4 79.0 chrX:26-35
#> Toyota Corolla 4 65 1 4 71.1 chrX:20-29
#> Merc 240D 4 62 0 4 146.7 chrX:8-17
#> Honda Civic 4 52 1 4 75.7 chrX:19-28
#> grY nl
#> <GRanges> <CompressedNumericList>
#> Maserati Bora chrY:31-40 -0.32,-0.17,-1.91,...
#> Ford Pantera L chrY:29-38 0.22,1.17,0.07,...
#> Duster 360 chrY:7-16 1.42, 1.09,-0.50
#> Camaro Z28 chrY:24-33 -0.36,-1.43,-0.12
#> Chrysler Imperial chrY:17-26 -0.24, 0.04,-0.22
#> ... ... ...
#> Fiat 128 chrY:18-27 -0.33, 0.08, 0.52,...
#> Fiat X1-9 chrY:26-35 1.10,2.09,0.27,...
#> Toyota Corolla chrY:20-29 -0.99,-2.19,-0.43,...
#> Merc 240D chrY:8-17 -1.24,-0.50,-0.41,...
#> Honda Civic chrY:19-28 0.83,1.04,0.16,...Filtering to only specific values appearing in a column
filter(d, am == 0)
#> DataFrame with 19 rows and 8 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Hornet 4 Drive 6 110 0 3 258.0 chrX:4-13
#> Hornet Sportabout 8 175 0 3 360.0 chrX:5-14
#> Valiant 6 105 0 3 225.0 chrX:6-15
#> Duster 360 8 245 0 3 360.0 chrX:7-16
#> Merc 240D 4 62 0 4 146.7 chrX:8-17
#> ... ... ... ... ... ... ...
#> Toyota Corona 4 97 0 3 120.1 chrX:21-30
#> Dodge Challenger 8 150 0 3 318.0 chrX:22-31
#> AMC Javelin 8 150 0 3 304.0 chrX:23-32
#> Camaro Z28 8 245 0 3 350.0 chrX:24-33
#> Pontiac Firebird 8 175 0 3 400.0 chrX:25-34
#> grY nl
#> <GRanges> <CompressedNumericList>
#> Hornet 4 Drive chrY:4-13 0.64,-1.52, 0.59
#> Hornet Sportabout chrY:5-14 -0.28,-0.83, 0.43
#> Valiant chrY:6-15 0.10, 1.21,-1.29
#> Duster 360 chrY:7-16 1.42, 1.09,-0.50
#> Merc 240D chrY:8-17 -1.24,-0.50,-0.41,...
#> ... ... ...
#> Toyota Corona chrY:21-30 1.89, 0.11,-1.54
#> Dodge Challenger chrY:22-31 -0.26,-1.35, 0.85
#> AMC Javelin chrY:23-32 0.18,-0.32, 0.02
#> Camaro Z28 chrY:24-33 -0.36,-1.43,-0.12
#> Pontiac Firebird chrY:25-34 0.65,0.07,0.25Selecting specific rows by index
slice(d, 3:6)
#> DataFrame with 4 rows and 8 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Datsun 710 4 93 1 4 108 chrX:3-12
#> Hornet 4 Drive 6 110 0 3 258 chrX:4-13
#> Hornet Sportabout 8 175 0 3 360 chrX:5-14
#> Valiant 6 105 0 3 225 chrX:6-15
#> grY nl
#> <GRanges> <CompressedNumericList>
#> Datsun 710 chrY:3-12 0.40,-1.36,-0.39,...
#> Hornet 4 Drive chrY:4-13 0.64,-1.52, 0.59
#> Hornet Sportabout chrY:5-14 -0.28,-0.83, 0.43
#> Valiant chrY:6-15 0.10, 1.21,-1.29These also work for grouped objects, and also preserve the rownames,
e.g. selecting the first two rows from each group of
gear
group_by(d, gear) %>%
slice(1:2)
#> DataFrame with 6 rows and 8 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Hornet Sportabout 8 175 0 3 360.0 chrX:5-14
#> Merc 450SL 8 180 0 3 275.8 chrX:13-22
#> Mazda RX4 6 110 1 4 160.0 chrX:1-10
#> Mazda RX4 Wag 6 110 1 4 160.0 chrX:2-11
#> Porsche 914-2 4 91 1 5 120.3 chrX:27-36
#> Ford Pantera L 8 264 1 5 351.0 chrX:29-38
#> grY nl
#> <GRanges> <CompressedNumericList>
#> Hornet Sportabout chrY:5-14 -0.28,-0.83, 0.43
#> Merc 450SL chrY:13-22 1.30,0.25,0.19
#> Mazda RX4 chrY:1-10 1.15,-1.20,-0.67,...
#> Mazda RX4 Wag chrY:2-11 0.94, 0.09,-1.36,...
#> Porsche 914-2 chrY:27-36 -1.70, 0.17, 1.36,...
#> Ford Pantera L chrY:29-38 0.22,1.17,0.07,...rename is itself renamed to rename2 due to
conflicts between dplyr and
S4Vectors,
but works in the dplyr sense of
taking new = old replacements with NSE syntax
select(d, am, cyl) %>%
rename2(foo = am)
#> Warning in rename2(., foo = am): DFplyr now properly supports rename with NSE
#> syntaxRow names are not preserved when there may be duplicates or they
don’t make sense, otherwise the first label (according to the current
de-duplication method, in the case of distinct, this is via
BiocGenerics::duplicated). This may have complications for
S4 columns.
distinct(d)
#> DataFrame with 32 rows and 8 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Mazda RX4 6 110 1 4 160 chrX:1-10
#> Mazda RX4 Wag 6 110 1 4 160 chrX:2-11
#> Datsun 710 4 93 1 4 108 chrX:3-12
#> Hornet 4 Drive 6 110 0 3 258 chrX:4-13
#> Hornet Sportabout 8 175 0 3 360 chrX:5-14
#> ... ... ... ... ... ... ...
#> Lotus Europa 4 113 1 5 95.1 chrX:28-37
#> Ford Pantera L 8 264 1 5 351.0 chrX:29-38
#> Ferrari Dino 6 175 1 5 145.0 chrX:30-39
#> Maserati Bora 8 335 1 5 301.0 chrX:31-40
#> Volvo 142E 4 109 1 4 121.0 chrX:32-41
#> grY nl
#> <GRanges> <CompressedNumericList>
#> Mazda RX4 chrY:1-10 1.15,-1.20,-0.67,...
#> Mazda RX4 Wag chrY:2-11 0.94, 0.09,-1.36,...
#> Datsun 710 chrY:3-12 0.40,-1.36,-0.39,...
#> Hornet 4 Drive chrY:4-13 0.64,-1.52, 0.59
#> Hornet Sportabout chrY:5-14 -0.28,-0.83, 0.43
#> ... ... ...
#> Lotus Europa chrY:28-37 -1.84,-2.29,-0.55,...
#> Ford Pantera L chrY:29-38 0.22,1.17,0.07,...
#> Ferrari Dino chrY:30-39 0.71,-0.06, 0.26,...
#> Maserati Bora chrY:31-40 -0.32,-0.17,-1.91,...
#> Volvo 142E chrY:32-41 -0.81,-0.77, 0.68,...Behaviours are ideally the same as those of dplyr wherever possible, for example a grouped tally
group_by(d, cyl, am) %>%
tally(gear)
#> DataFrame with 6 rows and 3 columns
#> cyl am n
#> <numeric> <numeric> <numeric>
#> 1 4 0 11
#> 2 4 1 34
#> 3 6 0 14
#> 4 6 1 13
#> 5 8 0 36
#> 6 8 1 10or a count with weights
count(d, gear, am, cyl)
#> DataFrame with 10 rows and 4 columns
#> gear am cyl n
#> <factor> <Rle> <Rle> <integer>
#> 1 3 0 4 1
#> 2 3 0 6 2
#> 3 3 0 8 12
#> 4 4 0 4 2
#> 5 4 0 6 2
#> 6 4 1 4 6
#> 7 4 1 6 2
#> 8 5 1 4 2
#> 9 5 1 6 1
#> 10 5 1 8 2Joins attempt to preserve rownames and grouping wherever possible
Da <- as(starwars[, c("name", "eye_color", "height", "mass")], "DataFrame") |>
head(10) |>
group_by(eye_color)
Da
#> DataFrame with 10 rows and 4 columns
#> Groups: eye_color
#> name eye_color height mass
#> <character> <character> <integer> <numeric>
#> 1 Luke Skywalker blue 172 77
#> 2 C-3PO yellow 167 75
#> 3 R2-D2 red 96 32
#> 4 Darth Vader yellow 202 136
#> 5 Leia Organa brown 150 49
#> 6 Owen Lars blue 178 120
#> 7 Beru Whitesun Lars blue 165 75
#> 8 R5-D4 red 97 32
#> 9 Biggs Darklighter brown 183 84
#> 10 Obi-Wan Kenobi blue-gray 182 77
Db <- as(starwars[, c("name", "eye_color", "homeworld")], "DataFrame")
Db
#> DataFrame with 87 rows and 3 columns
#> name eye_color homeworld
#> <character> <character> <character>
#> 1 Luke Skywalker blue Tatooine
#> 2 C-3PO yellow Tatooine
#> 3 R2-D2 red Naboo
#> 4 Darth Vader yellow Tatooine
#> 5 Leia Organa brown Alderaan
#> ... ... ... ...
#> 83 Finn dark NA
#> 84 Rey hazel NA
#> 85 Poe Dameron brown NA
#> 86 BB8 black NA
#> 87 Captain Phasma unknown NA
left_join(Da, Db)
#> Joining with `by = c("name", "eye_color")`
#> DataFrame with 10 rows and 5 columns
#> Groups: eye_color
#> name eye_color height mass homeworld
#> <character> <character> <integer> <numeric> <character>
#> 1 Luke Skywalker blue 172 77 Tatooine
#> 2 C-3PO yellow 167 75 Tatooine
#> 3 R2-D2 red 96 32 Naboo
#> 4 Darth Vader yellow 202 136 Tatooine
#> 5 Leia Organa brown 150 49 Alderaan
#> 6 Owen Lars blue 178 120 Tatooine
#> 7 Beru Whitesun Lars blue 165 75 Tatooine
#> 8 R5-D4 red 97 32 Tatooine
#> 9 Biggs Darklighter brown 183 84 Tatooine
#> 10 Obi-Wan Kenobi blue-gray 182 77 Stewjon
right_join(Da, Db)
#> Joining with `by = c("name", "eye_color")`
#> DataFrame with 87 rows and 5 columns
#> Groups: eye_color
#> name eye_color height mass homeworld
#> <character> <character> <integer> <numeric> <character>
#> 1 Luke Skywalker blue 172 77 Tatooine
#> 2 C-3PO yellow 167 75 Tatooine
#> 3 R2-D2 red 96 32 Naboo
#> 4 Darth Vader yellow 202 136 Tatooine
#> 5 Leia Organa brown 150 49 Alderaan
#> ... ... ... ... ... ...
#> 83 BB8 black NA NA NA
#> 84 Captain Phasma unknown NA NA NA
#> 85 San Hill gold NA NA Muunilinst
#> 86 Shaak Ti black NA NA Shili
#> 87 Grievous green, yellow NA NA Kalee
inner_join(Da, Db[1:3, ])
#> Joining with `by = c("name", "eye_color")`
#> DataFrame with 3 rows and 5 columns
#> Groups: eye_color
#> name eye_color height mass homeworld
#> <character> <character> <integer> <numeric> <character>
#> 1 Luke Skywalker blue 172 77 Tatooine
#> 2 C-3PO yellow 167 75 Tatooine
#> 3 R2-D2 red 96 32 Naboo
full_join(Da, Db[1:3, ])
#> Joining with `by = c("name", "eye_color")`
#> DataFrame with 10 rows and 5 columns
#> Groups: eye_color
#> name eye_color height mass homeworld
#> <character> <character> <integer> <numeric> <character>
#> 1 Luke Skywalker blue 172 77 Tatooine
#> 2 C-3PO yellow 167 75 Tatooine
#> 3 R2-D2 red 96 32 Naboo
#> 4 Leia Organa brown 150 49 NA
#> 5 Owen Lars blue 178 120 NA
#> 6 Beru Whitesun Lars blue 165 75 NA
#> 7 Darth Vader yellow 202 136 NA
#> 8 Biggs Darklighter brown 183 84 NA
#> 9 Obi-Wan Kenobi blue-gray 182 77 NA
#> 10 R5-D4 red 97 32 NADFplyrWe hope that DFplyr will be useful for your research. Please use the following information to cite the package and the overall approach. Thank you!
citation("DFplyr")
#> To cite package 'DFplyr' in publications use:
#>
#> Carroll J (2025). _DFplyr: A `DataFrame` (`S4Vectors`) backend for
#> `dplyr`_. R package version 1.4.0,
#> <https://github.com/jonocarroll/DFplyr>.
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Manual{,
#> title = {DFplyr: A `DataFrame` (`S4Vectors`) backend for `dplyr`},
#> author = {Jonathan Carroll},
#> year = {2025},
#> note = {R package version 1.4.0},
#> url = {https://github.com/jonocarroll/DFplyr},
#> }#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.5.2 (2025-10-31)
#> os Ubuntu 24.04.3 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate C
#> ctype en_US.UTF-8
#> tz Etc/UTC
#> date 2025-11-16
#> pandoc 3.6.3 @ /usr/local/bin/ (via rmarkdown)
#> quarto 1.8.24 @ /usr/local/bin/quarto
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> BiocGenerics * 0.56.0 2025-10-29 [2] https://bioc-release.r-universe.dev (R 4.5.2)
#> BiocManager 1.30.27 2025-11-14 [2] https://cran.r-universe.dev (R 4.5.2)
#> BiocStyle * 2.38.0 2025-10-29 [2] https://bioc-release.r-universe.dev (R 4.5.2)
#> bslib 0.9.0 2025-01-30 [2] RSPM (R 4.5.0)
#> buildtools 1.0.0 2025-11-11 [3] local (/pkg)
#> cachem 1.1.0 2024-05-16 [2] RSPM (R 4.5.0)
#> cli 3.6.5 2025-04-23 [2] RSPM (R 4.5.0)
#> DFplyr * 1.4.0 2025-10-29 [1] https://bioc-release.r-universe.dev (R 4.5.2)
#> digest 0.6.38 2025-11-12 [2] RSPM (R 4.5.0)
#> dplyr * 1.1.4 2023-11-17 [2] RSPM (R 4.5.0)
#> evaluate 1.0.5 2025-08-27 [2] RSPM (R 4.5.0)
#> fastmap 1.2.0 2024-05-15 [2] RSPM (R 4.5.0)
#> generics * 0.1.4 2025-05-09 [2] RSPM (R 4.5.0)
#> GenomeInfoDb 1.46.0 2025-10-29 [2] https://bioc-release.r-universe.dev (R 4.5.2)
#> GenomicRanges 1.62.0 2025-10-29 [2] https://bioc-release.r-universe.dev (R 4.5.2)
#> glue 1.8.0 2024-09-30 [2] RSPM (R 4.5.0)
#> htmltools 0.5.8.1 2024-04-04 [2] RSPM (R 4.5.0)
#> httr 1.4.7 2023-08-15 [2] RSPM (R 4.5.0)
#> IRanges 2.44.0 2025-10-29 [2] https://bioc-release.r-universe.dev (R 4.5.2)
#> jquerylib 0.1.4 2021-04-26 [2] RSPM (R 4.5.0)
#> jsonlite 2.0.0 2025-03-27 [2] RSPM (R 4.5.0)
#> knitr 1.50 2025-03-16 [2] RSPM (R 4.5.0)
#> lifecycle 1.0.4 2023-11-07 [2] RSPM (R 4.5.0)
#> magrittr 2.0.4 2025-09-12 [2] RSPM (R 4.5.0)
#> maketools 1.3.2 2025-01-25 [3] RSPM (R 4.5.0)
#> pillar 1.11.1 2025-09-17 [2] RSPM (R 4.5.0)
#> pkgconfig 2.0.3 2019-09-22 [2] RSPM (R 4.5.0)
#> R6 2.6.1 2025-02-15 [2] RSPM (R 4.5.0)
#> rlang 1.1.6 2025-04-11 [2] RSPM (R 4.5.0)
#> rmarkdown 2.30 2025-09-28 [2] RSPM (R 4.5.0)
#> S4Vectors * 0.48.0 2025-10-29 [2] https://bioc-release.r-universe.dev (R 4.5.2)
#> sass 0.4.10 2025-04-11 [2] RSPM (R 4.5.0)
#> Seqinfo 1.0.0 2025-10-29 [2] https://bioc-release.r-universe.dev (R 4.5.2)
#> sessioninfo 1.2.3 2025-02-05 [2] RSPM (R 4.5.0)
#> sys 3.4.3 2024-10-04 [2] RSPM (R 4.5.0)
#> tibble 3.3.0 2025-06-08 [2] RSPM (R 4.5.0)
#> tidyselect 1.2.1 2024-03-11 [2] RSPM (R 4.5.0)
#> UCSC.utils 1.6.0 2025-10-29 [2] https://bioc-release.r-universe.dev (R 4.5.2)
#> vctrs 0.6.5 2023-12-01 [2] RSPM (R 4.5.0)
#> withr 3.0.2 2024-10-28 [2] RSPM (R 4.5.0)
#> xfun 0.54 2025-10-30 [2] RSPM (R 4.5.0)
#> yaml 2.3.10 2024-07-26 [2] RSPM (R 4.5.0)
#>
#> [1] /tmp/Rtmp10tcCS/Rinstb2d19f7bd8d
#> [2] /github/workspace/pkglib
#> [3] /usr/local/lib/R/site-library
#> [4] /usr/lib/R/site-library
#> [5] /usr/lib/R/library
#> * ── Packages attached to the search path.
#>
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────