These functions search through one or more text files, extract lines matching a regular expression pattern, and return a tibble containing the results.
seek()
: Discovers files inside one or more directories (recursively or not), applies optional file name and text file filtering, and searches lines.seek_in()
: Searches inside a user-provided character vector of files.
Usage
seek(
pattern,
path = ".",
...,
filter = NULL,
negate = FALSE,
recurse = FALSE,
all = FALSE,
relative_path = TRUE,
matches = FALSE
)
seek_in(files, pattern, ..., matches = FALSE)
Arguments
- pattern
A regular expression pattern used to match lines.
- path
A character vector of one or more directories where files should be discovered (only for
seek()
).- ...
Additional arguments passed to
readr::read_lines()
, such asskip
,n_max
, orlocale
.- filter
Optional. A regular expression pattern used to filter file paths before reading. If
NULL
, all text files are considered.- negate
Logical. If
TRUE
, files matching thefilter
pattern are excluded instead of included. Useful to skip files based on name or extension.- recurse
If
TRUE
recurse fully, if a positive number the number of levels to recurse.- all
If
TRUE
hidden files are also returned.- relative_path
Logical. If TRUE, file paths are made relative to the path argument. If multiple root paths are provided, relative_path is automatically ignored and absolute paths are kept to avoid ambiguity.
- matches
Logical. If
TRUE
, all matches per line are also returned in amatches
list-column.- files
A character vector of files to search (only for
seek_in()
).
Value
A tibble with one row per matched line, containing:
path
: File path (relative or absolute).line_number
: Line number in the file.match
: The first matched substring.matches
: All matched substrings (ifmatches = TRUE
).line
: Full content of the matching line.
Details
The overall process involves the following steps:
File Selection
seek()
: Files are discovered usingfs::dir_ls()
, starting from one or more directories.seek_in()
: Files are directly supplied by the user (no discovery phase).
File Filtering
Files located inside
.git/
folders are automatically excluded.Files with known non-text extensions (e.g.,
.png
,.exe
,.rds
) are excluded.If a file's extension is unknown, a check is performed to detect embedded null bytes (binary indicator).
Optionally, an additional regex-based path filter (
filter
) can be applied.
Line Reading
Files are read line-by-line using
readr::read_lines()
.Only lines matching the provided regular expression
pattern
are retained.If a file cannot be read, it is skipped gracefully without failing the process.
Data Frame Construction
A tibble is constructed with one row per matched line.
These functions are particularly useful for analyzing source code, configuration files, logs, and other structured text data.
Examples
path = system.file("extdata", package = "seekr")
# Search all function definitions in R files
seek("[^\\s]+(?= (=|<-) function\\()", path, filter = "\\.R$")
#> # A tibble: 6 × 4
#> path line_number match line
#> <fs::path> <int> <chr> <chr>
#> 1 /script1.R 1 add_one add_one <- function(x) {
#> 2 /script1.R 5 capitalize capitalize <- function(txt) {
#> 3 /script1.R 9 say_hello say_hello <- function(name) {
#> 4 /script2.R 2 mean_safe mean_safe <- function(x) {
#> 5 /script2.R 7 sd_safe sd_safe <- function(x) {
#> 6 /script2.R 12 print_vector print_vector <- function(v) {
# Search for usage of "TODO" comments in source code in a case insensitive way
seek("(?i)TODO", path, filter = "\\.R$")
#> # A tibble: 1 × 4
#> path line_number match line
#> <fs::path> <int> <chr> <chr>
#> 1 /script2.R 1 TODO # TODO: optimize this function
# Search for error/warning in log files
seek("(?i)error", path, filter = "\\.log$")
#> # A tibble: 14 × 4
#> path line_number match line
#> <fs::path> <int> <chr> <chr>
#> 1 /server.log 3 ERROR 2025-04-29 21:52:17 ERROR : Starting process
#> 2 /server.log 6 ERROR 2025-04-30 04:15:43 ERROR : Retrying request
#> 3 /server.log 7 ERROR 2025-04-29 13:59:14 ERROR : Failed to authenti…
#> 4 /server.log 10 ERROR 2025-04-30 17:20:48 ERROR : User login failed
#> 5 /server.log 11 ERROR 2025-04-30 11:41:17 ERROR : Starting process
#> 6 /server.log 14 ERROR 2025-04-30 00:15:59 ERROR : Connection success…
#> 7 /server.log 16 ERROR 2025-04-29 20:39:24 ERROR : User login failed
#> 8 /server.log 19 ERROR 2025-04-29 17:51:14 ERROR : Timeout reached
#> 9 /server.log 20 ERROR 2025-04-29 18:27:07 ERROR : Retrying request
#> 10 /server.log 21 ERROR 2025-04-30 16:15:44 ERROR : Disk usage high
#> 11 /server.log 23 ERROR 2025-04-30 17:14:15 ERROR : User login failed
#> 12 /server.log 30 ERROR 2025-04-30 00:03:39 ERROR : Connection success…
#> 13 /server.log 35 ERROR 2025-04-30 17:25:05 ERROR : Connection success…
#> 14 /server.log 40 ERROR 2025-04-29 17:49:55 ERROR : Restart scheduled
# Search for config keys in YAML
seek("database:", path, filter = "\\.ya?ml$")
#> # A tibble: 1 × 4
#> path line_number match line
#> <fs::path> <int> <chr> <chr>
#> 1 /config.yaml 1 database: database:
# Looking for "length" in all types of text files
seek("(?i)length", path)
#> # A tibble: 4 × 4
#> path line_number match line
#> <fs::path> <int> <chr> <chr>
#> 1 /iris.csv 1 Length "\"Sepal.Length\",\"Sepal.Width\",\"Petal.Lengt…
#> 2 /script2.R 3 length " if (length(x) == 0) return(NA)"
#> 3 /script2.R 8 length " if (length(x) <= 1) return(NA)"
#> 4 /script2.R 13 length " print(paste('Vector of length', length(v)))"
# Search for specific CSV headers using seek_in() and reading only the first line
csv_files <- list.files(path, "\\.csv$", full.names = TRUE)
seek_in(csv_files, "(?i)specie", n_max = 1)
#> # A tibble: 1 × 4
#> path line_number match line
#> <fs::path> <int> <chr> <chr>
#> 1 /home/runner/work/_temp/Library/seekr/extdata/iris.csv 1 Spec… "\"S…