Finding and extracting values and indices

Fred Hasselman

2022-08-16

Finding and extracting values

The function groups insiders, outsiders and extractors provide infix functions that can be used to extract values from vectors.

insiders and outsiders

These functions return values inside or outside a given interval. Inclusion or exclusion of interval endpoints follows the common notation for open and closed intervals: [ and ] means inclusion, and ( and ) means exclusion of endpoints.

The syntax is always:

vector infix interval

Depending on which function is called, the return value is either a logical vector indicating which values are inside or outside the interval, or, the actual values (use the functions with a dot between the operators %[.]%)

The syntax and function is similar to those provided in package DescTools (I did not test whether they give the same results).

x <- 0:9

# Inside open interval
x %()% c(5,9)
>  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE

# Inside closed interval
x %[]% c(5,9)
>  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE

# Outside open interval
x %)(% c(5,9)
>  [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE

# Outside closed interval
x %][% c(5,9)
>  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE

# All variations left/right open/closed are possible
x %[)% c(5,9)
>  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE
x %](% c(5,9)
>  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE

How to useā€¦

Indices are commonly used to extract values, if you add a dot . inbetween the the interval symbols, values will be extracted.

# Regular indexing works, but is a bit 'wordy'
x[x %[]% c(5,9)]
> [1] 5 6 7 8 9

# Easier to use the special functions
x %[.]% c(5,9)
> [1] 5 6 7 8 9

# Extract first, last, or, middle value of x
x %:% "f"
> [1] 0
x %:% "m"
> [1] 4
x %:% "l"
> [1] 9

# Simulate a sample from a standard normal distribution
set.seed(4321)
Zscore <- rnorm(100)

# Find Z-scores that are 'significant' at alpha = .05
Zscore %).(% c(-1.96,1.96)
> [1]  2.080248 -2.450016 -2.439320

# Old indexing has a lot of repetition, so does tidyverse, e.g. using filter()
Zscore[Zscore < -1.96 | Zscore > 1.96]
> [1]  2.080248 -2.450016 -2.439320

extractors

Extracting a subset of values from the front or rear of a vector is a common task and the base functions head() and tail() can do this. The infix functions in the extractors group mimic some of this behaviour and add the ability to extract from - to, or, up -and-untill, a specific value.

# A character vector
z <- letters

# Extract front by first occurrence of value "n"
z %[f% "n"
>  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n"

# Extact first, middle, last of z
z %:% "f"
> [1] "a"
z %:% "m"
> [1] "m"
z %:% "l"
> [1] "z"

# Extract by percentile
seq(1,10,.5) %(q% .5 # infix
> [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
seq(1,10,.5)[seq(1,10,.5) < quantile(seq(1,10,.5),.5)] # regular syntax
> [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

seq(1,10,.5) %q]% .5 # infix
>  [1]  5.5  6.0  6.5  7.0  7.5  8.0  8.5  9.0  9.5 10.0
seq(1,10,.5)[seq(1,10,.5) >= quantile(seq(1,10,.5),.5)] # regular syntax
>  [1]  5.5  6.0  6.5  7.0  7.5  8.0  8.5  9.0  9.5 10.0

# Random uniform integers
set.seed(123)
x <- round(runif(100,1,100))

# Extract front up and untill index 10
x%[%10 # infix
>  [1] 29 79 41 88 94  6 53 89 56 46
x[1:10] # regular [saves just 1 char]
>  [1] 29 79 41 88 94  6 53 89 56 46

# Extract from index 90 to rear
x%]%90 # infix
>  [1] 18 14 66 35 66 33 20 78 10 47 52
x[90:length(x)] # regular
>  [1] 18 14 66 35 66 33 20 78 10 47 52

# Extract numbers from front to first occurrence of 11
x%[f%11 # infix
>  [1] 29 79 41 88 94  6 53 89 56 46 96 46 68 58 11
x[1:which(x==11)[1]] # regular
>  [1] 29 79 41 88 94  6 53 89 56 46 96 46 68 58 11

# Extract numbers from last occurrence of 11 to rear
x%l]%11 # infix
>  [1] 11 44 99 89 89 18 14 66 35 66 33 20 78 10 47 52
x[which(x==11)[length(which(x==11))]:length(x)] # regular
>  [1] 11 44 99 89 89 18 14 66 35 66 33 20 78 10 47 52

# Extract by indices if an index range provided
# This is a clear case in which the infix is less sensible to use than regular indexing:
x%]%c(6,10) # infix
> [1]  6 53 89 56 46
x[6:10] # regular
> [1]  6 53 89 56 46

z%[%c(6,10) #infix
> [1] "f" "g" "h" "i" "j"
z[6:10] #regular
> [1] "f" "g" "h" "i" "j"

Finding and extracting indices

The fINDexers group provides infix functions that can return column and row names based on indices, or, indices based on column and row names. Take for instance data frame d:

x y txt
ri5 1 6 delta = 5
ri4 2 6 delta = 4
ri3 3 6 delta = 3
ri2 4 6 delta = 2
ri1 5 6 delta = 1

We can use the infix functions to get names and indices of d:

# Columns
 "txt"%ci%d # infix
> [1] 3
 which(colnames(d)%in%"txt") # regular
> [1] 3

 2%ci%d # infix
> [1] "y"
 colnames(d)[2] # regular
> [1] "y"
  
# Rows
 "ri4"%ri%d # infix
> [1] 2
 which(rownames(d)%in%"ri4") # regular
> [1] 2
 
 2%ri%d # infix
> [1] "ri4"
 rownames(d)[2] # regular
> [1] "ri4"
 
# Change column name
 colnames(d)["y"%ci%d] <- "Yhat" # infix
 colnames(d)[colnames(d)%in%"y"] <- "Yhat" # regular

For 1D list and vector objects %ri% and %ci% return the same value.

 l <- list(a=1:100, b=LETTERS)

 2%ci%l == 2%ri%l
> [1] TRUE
 "a"%ci%l == "a"%ri%l
> [1] TRUE

# Named vector
 v <- c("first" = 1, "2nd" = 1000)

 1%ci%v == 1%ri%v
> [1] TRUE
 "2nd"%ci%v == "2nd"%ri%v
> [1] TRUE

Function %mi% will return row and/or column names on 2D objects: data frames, matrices, tibbles, etc.

# Data frame d
 c(5,2) %mi% d
> [1] "ri1"  "Yhat"

 list(r="ri1",c=2) %mi% d
> $r
> [1] 5
> 
> $c
> [1] "Yhat"

# matrix row and column indices
(m <- matrix(1:10,ncol=2, dimnames = list(paste0("ri",0:4),c("xx","yy"))))
>     xx yy
> ri0  1  6
> ri1  2  7
> ri2  3  8
> ri3  4  9
> ri4  5 10

 1 %ci% m
> [1] "xx"
 5 %ci% m # no column 5
> [1] NA

 1 %ri% m
> [1] "ri0"
 5 %ri% m
> [1] "ri4"

 c(5,1)%mi%m
> [1] "ri4" "xx"
 c(1,5)%mi%m
> [1] "ri0" NA

Function %ai% is a version of %in% that returns the indices of all occurrences of one or more values in an object.

# get all indices of the number 1 in v
 1 %ai% v
>   nv first
> 1  1     1

# get all indices of the number 3 and 6 in d
 c(3,6) %ai% d
>   nv row col
> 1  3   3   1
> 2  6   1   2
> 3  6   2   2
> 4  6   3   2
> 5  6   4   2
> 6  6   5   2

 # Simulate a sample from a standard normal distribution
 set.seed(1234)
 Zscores <- rnorm(100)
 
 Zscores%).(%c(-1.96,1.96) %ai% Zscores # returns a data frame with values and indices
>                  nv  V1
> 1 -2.34569770262935   4
> 2  2.41583517848934  20
> 3 -2.18003964894867  37
> 4  2.54899107071786  62
> 5  2.07027086133094  75
> 6  2.12111710537568 100
 
 which(Zscores%)(%c(-1.96,1.96)) # returns an index vector
> [1]   4  20  37  62  75 100