Skip to main content
  1. Blog/

Analysing the entire Holy Bible in less than a second

·630 words·3 mins

UPDATE jan 2026:


Nushell and the polars plugin have both evolved a lot since this was written.

The nushell polars plugin code below must be updated like so:

def bible [] {
    let corp = open --raw bible.txt | str downcase | split words | polars into-df
    let stop = [i a an and are as at be by for from how in is it of on or that the this to was what when where who will with the]
    let tidy = $corp | polars with-column (polars col 0 | polars is-in $stop | polars as wordin)
               | polars filter ((polars col wordin) == false ) | polars drop wordin 
    let freq = $tidy | polars value-counts 
    let sort = $freq | polars sort-by count
  $sort
}

The performance is still at the same awesome level ๐Ÿš€


Word frequency analysis is important in quantitative text analysis.

Nushell has a plugin for Polars support.

When installed, you can analyse a large corpus of text very fast:

def bible [] {
    let corp = open --raw /home/lk/Data/king-james-bible.txt | str downcase | split words | polars into-df
    let stop = [i a an and are as at be by for from how in is it of on or that the this to was what when where who will with the] | polars into-df
    let mask = $corp | polars is-in $stop
    let tidy = $corp | polars filter-with ($mask | polars not)
    let freq = $tidy | polars value-counts
    let sort = $freq | polars sort-by count
  $sort
}

Giving:

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚     # โ”‚     0      โ”‚ count โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚     0 โ”‚ endow      โ”‚     1 โ”‚
โ”‚     1 โ”‚ clappeth   โ”‚     1 โ”‚
โ”‚     2 โ”‚ elishaphat โ”‚     1 โ”‚
โ”‚     3 โ”‚ muse       โ”‚     1 โ”‚
โ”‚     4 โ”‚ makaz      โ”‚     1 โ”‚
โ”‚     5 โ”‚ swimmeth   โ”‚     1 โ”‚
โ”‚     6 โ”‚ fidelity   โ”‚     1 โ”‚
โ”‚     7 โ”‚ jeziah     โ”‚     1 โ”‚
โ”‚     8 โ”‚ savours    โ”‚     1 โ”‚
โ”‚     9 โ”‚ ashvath    โ”‚     1 โ”‚
โ”‚   ... โ”‚ ...        โ”‚ ...   โ”‚
โ”‚ 13019 โ”‚ all        โ”‚  5637 โ”‚
โ”‚ 13020 โ”‚ them       โ”‚  6430 โ”‚
โ”‚ 13021 โ”‚ not        โ”‚  6624 โ”‚
โ”‚ 13022 โ”‚ him        โ”‚  6659 โ”‚
โ”‚ 13023 โ”‚ they       โ”‚  7378 โ”‚
โ”‚ 13024 โ”‚ lord       โ”‚  7964 โ”‚
โ”‚ 13025 โ”‚ his        โ”‚  8473 โ”‚
โ”‚ 13026 โ”‚ unto       โ”‚  8997 โ”‚
โ”‚ 13027 โ”‚ shall      โ”‚  9840 โ”‚
โ”‚ 13028 โ”‚ he         โ”‚ 10422 โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

And it took less than a second.

To be specific: 444ms 413ยตs 290ns.

Doing the same in “vanilla” Nushell like so:

def biblenu [] {
    let corp = open --raw /home/lk/Data/king-james-bible.txt | str downcase | split words | wrap corp
    let stop = [i a an and are as at be by for from how in is it of on or that the this to was what when where who will with the] | wrap stop
    let tidy = $corp | where corp in $stop.stop == false
    let freq = $tidy | uniq --count
    let sort = $freq | sort-by count
  $sort | flatten
}

gives

โ”‚ 13019 โ”‚ all   โ”‚  5637 โ”‚
โ”‚ 13020 โ”‚ them  โ”‚  6430 โ”‚
โ”‚ 13021 โ”‚ not   โ”‚  6624 โ”‚
โ”‚ 13022 โ”‚ him   โ”‚  6659 โ”‚
โ”‚ 13023 โ”‚ they  โ”‚  7378 โ”‚
โ”‚ 13024 โ”‚ lord  โ”‚  7964 โ”‚
โ”‚ 13025 โ”‚ his   โ”‚  8473 โ”‚
โ”‚ 13026 โ”‚ unto  โ”‚  8997 โ”‚
โ”‚ 13027 โ”‚ shall โ”‚  9840 โ”‚
โ”‚ 13028 โ”‚ he    โ”‚ 10422 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚     # โ”‚ corp  โ”‚ count โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

So the same result.

But this time it took: 4sec 118ms 703ยตs 223ns

So Nushell with Polars is around 10x faster for this kind of task than Nushell without Polars.