TidierStrings.jl
Tidier string transformations in Julia, modeled after the stringr R package.
Install / Use
/learn @TidierOrg/TidierStrings.jlREADME
TidierStrings.jl
<img src="/docs/src/assets/TidierStrings_logo.png" align="right" style="padding-left:10px;" width="150"/>What is TidierStrings.jl
TidierStrings.jl is a 100% Julia implementation of the R stringr package.
TidierStrings.jl has one main goal: to implement stringr's straightforward syntax and of ease of use for Julia users. While this package was developed to work seamlessly with TidierData.jl functions and macros, it also works independently as a standalone package.
Installation
For the stable version:
] add TidierStrings
The ] character starts the Julia package manager. Press the backspace key to return to the Julia prompt.
or
For the development version:
using Pkg
Pkg.add(url = "https://github.com/TidierOrg/TidierStrings.jl.git")
What functions does TidierStrings.jl support?
TidierStrings.jl currently supports:
| Category | Function |
|-------------------|----------------------------------------------------------------------------------------------------|
| Matching | str_count, str_detect, str_locate, str_locate_all, str_replace, str_replace_all, |
| | str_remove, str_remove_all, str_split, str_starts, str_ends, str_subset, str_which |
| Concatenation | str_c, str_flatten, str_flatten_comma |
| Characters | str_dup, str_length, str_width, str_trim, str_squish, str_wrap, str_pad |
| Locale | str_equal, str_to_upper, str_to_lower, str_to_title, str_to_sentence, str_unique |
| Other | str_conv, str_like, str_replace_missing, word |
Examples
using Tidier
using TidierStrings
df = DataFrame(
Names = ["Alice", "Bob", "Charlie", "Dave", "Eve", "Frank", "Grace"],
City = ["New York 2019-20", "Los \n\n\n\n\n\n Angeles 2007-12 2020-21", "San Antonio 1234567890 ", " New York City", "LA 2022-23", "Philadelphia 2023-24", "San Jose 9876543210"],
Occupation = ["Doctor", "Engineer", "Final Artist", "Scientist", "Physician", "Lawyer", "Teacher"],
Description = ["Alice is a doctor in New York",
"Bob is is is an engineer in Los Angeles",
"Charlie is an artist in Chicago",
"Dave is a scientist in Houston",
"Eve is a physician in Phoenix",
"Frank is a lawyer in Philadelphia",
"Grace is a teacher in San Antonio"]
)
7×4 DataFrame
Row │ Names City Occupation Description
│ String String String String
─────┼─────────────────────────────────────────────────────────────────────────────────────────────
1 │ Alice New York 2019-20 Doctor Alice is a doctor in New York
2 │ Bob Los \n\n\n\n\n\n Angeles 2… Engineer Bob is is is an engineer in Los …
3 │ Charlie San Antonio 1234567890 Final Artist Charlie is an artist in Chicago
4 │ Dave New York City Scientist Dave is a scientist in Houston
5 │ Eve LA 2022-23 Physician Eve is a physician in Phoenix
6 │ Frank Philadelphia 2023-24 Lawyer Frank is a lawyer in Philadelphia
7 │ Grace San Jose 9876543210 Teacher Grace is a teacher in San Antonio
str_squish(): Removes leading and trailing white spaces from a string and also replaces consecutive white spaces in between words with a single space. It will also remove new lines.
df = @chain df begin
@mutate(City = str_squish(City))
end
7×4 DataFrame
Row │ Names City Occupation Description
│ String String String String
─────┼───────────────────────────────────────────────────────────────────────────────────────
1 │ Alice New York 2019-20 Doctor Alice is a doctor in New York
2 │ Bob Los Angeles 2007-12 2020-21 Engineer Bob is is is an engineer in Los …
3 │ Charlie San Antonio 1234567890 Final Artist Charlie is an artist in Chicago
4 │ Dave New York City Scientist Dave is a scientist in Houston
5 │ Eve LA 2022-23 Physician Eve is a physician in Phoenix
6 │ Frank Philadelphia 2023-24 Lawyer Frank is a lawyer in Philadelphia
7 │ Grace San Jose 9876543210 Teacher Grace is a teacher in San Antonio
Support Regex: str_detect, str_replace, str_replace_all, str_remove, str_remove_all, str_count, str_equal, and str_subset
str_detect()
'str_detect()' checks if a pattern exists in a string. It takes a string and a pattern as arguments and returns a boolean indicating the presence of the pattern in the string. This can be used inside of @filter, @mutate, if_else() and case_when(). str_detect supports logical operators | and &.
case_when() with filter() and str_detect()
@chain df begin
@mutate(Occupation = if_else(str_detect(Occupation, "Doctor | Physician"), "Physician", Occupation))
@filter(str_detect(Description, "artist | doctor"))
end
Row │ Names City Occupation Description
│ String String String String
─────┼────────────────────────────────────────────────────────────────────────────────
1 │ Alice New York 2019-20 Physician Alice is a doctor in New York
2 │ Charlie San Antonio 1234567890 Final Artist Charlie is an artist in Chicago
@chain df begin
@mutate(state = case_when(str_detect(City, "NYC | New York") => "NY",
str_detect(City, "LA | Los Angeles | San & Jose") => "CA",
true => "other"))
end
7×5 DataFrame
Row │ Names City Occupation Description state
│ String String String String String
─────┼───────────────────────────────────────────────────────────────────────────────────────────────
1 │ Alice New York 2019-20 Doctor Alice is a doctor in New York NY
2 │ Bob Los Angeles 2007-12 2020-21 Engineer Bob is is is an engineer in Los … CA
3 │ Charlie San Antonio 1234567890 Final Artist Charlie is an artist in Chicago other
4 │ Dave New York City Scientist Dave is a scientist in Houston NY
5 │ Eve LA 2022-23 Physician Eve is a physician in Phoenix CA
6 │ Frank Philadelphia 2023-24 Lawyer Frank is a lawyer in Philadelphia other
7 │ Grace San Jose 9876543210 Teacher Grace is a teacher in San Antonio CA
str_replace()
Replaces the first occurrence of a pattern in a string with a specified text. Takes a string, pattern to search for, and the replacement text as arguments. It also supports the use of regex and logical operator | . This is in contrast to str_replace_all() which will replace each occurence of a match within a string.
@chain df begin
@mutate(City = str_replace(City, r"\s*20\d{2}-\d{2,4}\s*", " ####-## "))
@mutate(Description = str_replace(Description, "is | a", "will become "))
end
7×4 DataFrame
Row │ Names City Occupation Description
│ String String String String
─────┼───────────────────────────────────────────────────────────────────────────────────────
1 │ Alice New York ####-## Doctor Alice will become a doctor in Ne…
2 │ Bob Los Angeles ####-## 2020-21 Engineer Bob will become is is an enginee…
3 │ Charlie San Antonio 1234567890 Final Artist Charlie will become an artist in…
4 │ Dave New York City Scientist Dave will become a scientist in …
5 │ Eve LA ####-## Physician Eve will become a physician in P…
6 │ Frank Philadelphia ####-## Lawyer Frank will become a lawyer in Ph…
7 │ Grace San Jose 9876543210 Teacher Grace will become a teacher in S…
str_remove and str_remove_all
These remove the first match occurrence or all occurences, respectively.
@chain df begin
@mutate(split = str_remove_all(Description, "is"))
end
7×5 DataFrame
Row │ Names City Occupation Description split
│ String String String String String
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ Alice New York 2019-20
Related Skills
node-connect
339.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
339.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.9kCommit, push, and open a PR
