- Blog/
Analysing the entire Holy Bible in less than a second
UPDATE jan 2026:
Nushell and the polars plugin have both evolved a lot since this was written.
The nushell polars plugin code below must be updated like so:
def bible [] {
let corp = open --raw bible.txt | str downcase | split words | polars into-df
let stop = [i a an and are as at be by for from how in is it of on or that the this to was what when where who will with the]
let tidy = $corp | polars with-column (polars col 0 | polars is-in $stop | polars as wordin)
| polars filter ((polars col wordin) == false ) | polars drop wordin
let freq = $tidy | polars value-counts
let sort = $freq | polars sort-by count
$sort
}
The performance is still at the same awesome level ๐
Word frequency analysis is important in quantitative text analysis.
Nushell has a plugin for Polars support.
When installed, you can analyse a large corpus of text very fast:
def bible [] {
let corp = open --raw /home/lk/Data/king-james-bible.txt | str downcase | split words | polars into-df
let stop = [i a an and are as at be by for from how in is it of on or that the this to was what when where who will with the] | polars into-df
let mask = $corp | polars is-in $stop
let tidy = $corp | polars filter-with ($mask | polars not)
let freq = $tidy | polars value-counts
let sort = $freq | polars sort-by count
$sort
}
Giving:
โญโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโฎ
โ # โ 0 โ count โ
โโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโค
โ 0 โ endow โ 1 โ
โ 1 โ clappeth โ 1 โ
โ 2 โ elishaphat โ 1 โ
โ 3 โ muse โ 1 โ
โ 4 โ makaz โ 1 โ
โ 5 โ swimmeth โ 1 โ
โ 6 โ fidelity โ 1 โ
โ 7 โ jeziah โ 1 โ
โ 8 โ savours โ 1 โ
โ 9 โ ashvath โ 1 โ
โ ... โ ... โ ... โ
โ 13019 โ all โ 5637 โ
โ 13020 โ them โ 6430 โ
โ 13021 โ not โ 6624 โ
โ 13022 โ him โ 6659 โ
โ 13023 โ they โ 7378 โ
โ 13024 โ lord โ 7964 โ
โ 13025 โ his โ 8473 โ
โ 13026 โ unto โ 8997 โ
โ 13027 โ shall โ 9840 โ
โ 13028 โ he โ 10422 โ
โฐโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโฏ
And it took less than a second.
To be specific: 444ms 413ยตs 290ns.
Doing the same in “vanilla” Nushell like so:
def biblenu [] {
let corp = open --raw /home/lk/Data/king-james-bible.txt | str downcase | split words | wrap corp
let stop = [i a an and are as at be by for from how in is it of on or that the this to was what when where who will with the] | wrap stop
let tidy = $corp | where corp in $stop.stop == false
let freq = $tidy | uniq --count
let sort = $freq | sort-by count
$sort | flatten
}
gives
โ 13019 โ all โ 5637 โ
โ 13020 โ them โ 6430 โ
โ 13021 โ not โ 6624 โ
โ 13022 โ him โ 6659 โ
โ 13023 โ they โ 7378 โ
โ 13024 โ lord โ 7964 โ
โ 13025 โ his โ 8473 โ
โ 13026 โ unto โ 8997 โ
โ 13027 โ shall โ 9840 โ
โ 13028 โ he โ 10422 โ
โโโโโโโโโผโโโโโโโโผโโโโโโโโค
โ # โ corp โ count โ
โฐโโโโโโโโดโโโโโโโโดโโโโโโโโฏ
So the same result.
But this time it took: 4sec 118ms 703ยตs 223ns
So Nushell with Polars is around 10x faster for this kind of task than Nushell without Polars.