Ubuntu Linux setup basics

For username "foo":

$ adduser foo

$ passwd foo

$ sudo usermod -aG sudo foo

mkdir -p /home/foo/.ssh

$ cat the_public_key.pem >> /home/foo/.ssh/authorized_keys

chown -R foo:foo /home/foo/.ssh

$ chmod 700 /home/foo/.ssh

$ chmod 600 /home/foo/.ssh/authorized_keys
Disable default account:
$ usermod -s /usr/sbin/nologin default_username

  • Not useradd.
  • Even when logging in with just SSH key, user must have a password. It will only be used for sudo commands.

Sudoers basics

Define text editor

$ sudo update-alternatives --config editor

Edit /etc/sudoers file

$ sudo visudo /etc/sudoers

Add a user to sudo group

$ sudo usermod -aG sudo username
$ sudo gpasswd -a username sudo

(but CentOS uses "wheel" group instead)

Default config for Ubuntu 22:

# User privilege specification
root    ALL=(ALL:ALL) ALL

# Members of the admin group may gain root privileges
%admin ALL=(ALL) ALL

# Allow members of group sudo to execute any command
%sudo   ALL=(ALL:ALL) ALL

Remove login for default users:

$ usermod -s /usr/sbin/nologin username


Elasticsearch advanced queries

DQL to filter by non-zero length: Advert.location_query:* (does not work as a filter)
Or in Lucene: Advert.location_query:?*

Results are limited to 10,000 records, unless you use the scroll API which can paginate and also make parallel requests.

use strict;
use warnings;
use Data::Dumper::Concise;
use Search::Elasticsearch;
my $ekk = Search::Elasticsearch->new(
client => '7_0::Direct',
nodes => [ 'https://opensearch.example.com/', ],
send_get_body_as => 'POST',
$ENV{USE_EKK_PROXY} ? (handle_args => { https_proxy => $ENV{USE_EKK_PROXY} }) : (),
my $results = $ekk->search(
body => {
"size" => 500,
"sort" => [
"timestamp" => {
"order" => "desc",
"unmapped_type" => "boolean"
"aggs" => {
"2" => {
"date_histogram" => {
"field" => "timestamp",
"calendar_interval" => "1d",
"time_zone" => "Europe/London",
"min_doc_count" => 1
"stored_fields" => ["*"],
"script_fields" => {},
"docvalue_fields" => [
"field" => "timestamp",
"format" => "date_time"
"_source" => {
"excludes" => []
"query" => {
"bool" => {
"must" => [],
"filter" => [
"match_all" => {}
"match_phrase" => {
"foo_field" => "bar_value"
"range" => {
"timestamp" => {
"gte" => "2023-06-28T12:04:15.943Z",
"lte" => "2023-09-28T12:04:15.943Z",
"format" => "strict_date_optional_time"
"should" => [],
"must_not" => []
"highlight" => {
"pre_tags" => ["\@opensearch-dashboards-highlighted-field\@"],
"post_tags" => ["\@/opensearch-dashboards-highlighted-field\@"],
"fields" => {
"*" => {}
"fragment_size" => 2147483647
print Dumper($results);

MySQL date display format conversion

MySQL date time conversion functions: 

  • UNIX_TIMESTAMP(date) docs
  • FROM_UNIXTIME(epoch) docs


select name, from_unixtime(time_added, '%Y-%m-%d %h:%i')

from company

order by date_added desc

limit 10; -- list the most recently added companies

You may also omit the '%Y-%m-%d %h:%i' format string, to get the default format YYYY-MM-DD HH:MM:SS, e.g. `2023-09-07 09:43:51`

Test2 cheat sheet for Perl

Cheat sheet


Docs entry point

  • Test2::Tools::Compare
    • is like isnt unlike
    • match mismatch validator
    • hash array bag object meta number float rounded within string subset bool
    • in_set not_in_set check_set
    • item field call call_list call_hash prop check all_items all_keys all_vals all_values
    • etc end filter_items
    • event fail_events
    • exact_ref



use Test2::V0;

# Match regex in a hash
like( $some_hash, hash {                        # <-- Must use `like` keyword to make regex below work
    field 'message' => qr/Caught exception/;    # <-- Note `field` keyword and trailing semicolon. Quotes around key optional
    end();                                      # <-- Enforce that no other keys are present, optional
}, 'Logged error correctly');                   # <-- Test description and parentheses () are optional


        a => 1,
        b => 'foo',
        a => D(),   # value is Defined
        b => E(),   # value Exists
        c => DNE(), # key/value Does Not Exist
    'I can haz Test2?'

Todo - legacy

use Test::More;

    local $TODO = 'Still working on this';
    ok(0, "work in progress");


use Test2::Tools::Tiny qw/todo/;

todo 'Still working on this' => sub {
    ok(0, "failing test gonna fail");
}; # <--- remember the semicolon!

See t/regression/todo_and_facets.t

Chrome extensions: Download manager reviews

  • DownThemAll: Queues, but doesn't intercept
  • Thunder Download Manager: Intercepts, but doesn't queue
  • Free Download Manager: Just says "Loading..." (on Ubuntu)
  • Chrono Download Manager: Intercepts and queues! And resumes. Perfect!

curl -o / wget -O

curl --output file
curl -o file

wget --output-document file
wget -O file

Best to just always use curl, and know it uses lowercase for common arguments like normal.

It's usually already installed as well.

Analyzing HAR files with jq

jq docs

jq playground

Normal mode

Exploring HAR files

export HAR_FILE="/path/to/har/file"

  • Example to dump responses, for a given request URI

    • REQUEST_URI="https://www.facebook.com/api/graphql/" cat $HAR_FILE | jq -r ".log.entries[] | if .request.url | test(\"$REQUEST_URI\") then .response.content else empty end"
      • Note the string passed to jq is in double quotes " so that the $REQUEST_URI is interpolated
      • But jq wants us to use double quotes for test("foo"), therefore they must be escaped like test(\"foo\")
  • Another way to do the same thing in bash using single quotes. Quotes can be tricky.

    • REQUEST_URI="https://www.facebook.com/api/graphql/" cat $HAR_FILE | jq -r '.log.entries[] | if .request.url | test("'$REQUEST_URI'") then { uri: .request.url, mineType: .response.content.mimeType, content: .response.content.text | .[0:200] } else empty end'
    • Note the string passed to jq is in three parts:
      • '...etc...test("'
      • $REQUEST_URI
      • '") then...etc...else empty end'
    • The content is truncated to the first 200 characters, to make it more readable
  • Dump full the response content, interpreted as JSON

    • ...todo...

Streaming mode


Case studies


Goal: Extract URLs of all your playlists

(under development)

  • Go to https://music.youtube.com/library/playlists in browser, scroll slowly down to the bottom
    • Chrome | DevTools | Network tab | Save all as HAR
  • Extract response text for relevant requests
    • cat $HAR_FILE | jq -r '.log.entries[] | select( .request.url | test("^https://music.youtube.com/youtubei") ) | .response.content.text' > $REQS_FILE
  • Approach 1: Loop over lines of file and extract playlistIDs (status: draft -- this gets playlist titles)
    • cat $REQS_FILE | while read line; do echo "$line" | jq '.contents.singleColumnBrowseResultsRenderer.tabs[].tabRenderer.content.sectionListRenderer.contents[].musicCarouselShelfRenderer.contents[].musicTwoRowItemRenderer.title.runs[] | { name: .text, id: .navigationEndpoint.browseEndpoint.browseId }'; done > $PLAYLISTS_FILE
    • Bugs:
      • duplicate values
      • jq errors
      • last 5 entries are irrelevant
      • missing most entries!
  • Approach 2: Scan for all relevant playlist IDs, wherever they are in the document
    • cat playlists.2 | jq -r 'getpath( paths | select(.[-1] == "browseId") ) | select(. | match("^VLPL"))'
    • Bugs:
      • jq error: parse error: Invalid numeric literal at line 11, column 0
      • missing some entries
  • Approach 3: Give up and use Perl regex
    • cat $REQS_FILE | perl -lne'@ids = m/"browseId":"([^"]+)"/g; print $_ foreach map { s/^VL//; $_ } grep { /^VLPL/ && length($_) > 22 } @ids' | uniq > $PLAYLISTS_FILE
    • Bugs:
      • This was supposed to be a jq cheat sheet, using Perl is cheating!
      • It still misses some playlists from the initial page load.
  • Approach 4: Found another source of data in the page
    • cat $HAR_FILE | jq -r '.log.entries[] | select( .request.url | test("^https://music.youtube.com/library/playlists") ) | .response.content.text' > $SCRIPT_DATA
    • Decode it
      • cat $SCRIPT_DATA | perl -plne's/(\\x[[:xdigit:]]{2})/qq{"$1"}/eeg' > $DECODED_SCRIPT_DATA
    • Maybe little bit of manual munging :/
    • ...TODO... extract the browseIDs


Goal: Extract list of alternative software

Fetch JSON

Extract data

  • export REGEX="software/gmail.json"; cat alternativeto.net.har | jq -r ".log.entries[] | if .request.url | test(\"$REGEX\") then .response.content.text else empty end" > page_per_line
    • this results in 9 lines, one for each 'page' you loaded
  • change the [] above to [0] to get one page, and pipe the result through jq again or use the fromjson filter as follows:
    • export REGEX="software/gmail.json"; cat alternativeto.net.har | jq -r ".log.entries[0] | if .request.url | test(\"$REGEX\") then .response.content.text | fromjson else empty end" > one_page_one_line
  • Now browse this JSON data, preferably in an IDE like vscode that can fold up sections easily to discover the following structure:
    • export REGEX="software/gmail.json"; cat alternativeto.net.har | jq -r ".log.entries[] | if .request.url | test(\"$REGEX\") then .response.content.text | fromjson | .pageProps.items[] | { name: .name, cost: .licenseCost, model: .licenseModel, desc: .shortDescriptionOrTagLine } else empty end" > software.json

Sample output

  "name": "Mailfence",
  "cost": "Freemium",
  "model": "Proprietary",
  "desc": "Mailfence is a secure and private email service that fights for online privacy and digital freedom."
  "name": "Proton Mail",
  "cost": "Freemium",
  "model": "Open Source",
  "desc": "Secure email with absolutely no compromises, brought to you by MIT and CERN scientists."

Tips, tricks and gotchas

Decode HTML entities

e.g. converts AT&amp;T Webmail to AT&T Webmail

npm install -g he
cat software.json | jq '.name' -r | he --decode


For very simple test examples, you must quote inputs twice, i.e. pass "foo" with quotes

echo '"hello"' | jq '.'

Regex. gsub = global substitution. Note the semicolon ; to separate arguments to gsub().

echo '"foo\r\nbar"' | jq -r 'gsub("(\r\n.+)"; "")'

Video processing for fun


  • QuickTime Player
    • Edit menu | Add clip after...
    • 40 videos, total 250Mb = (Not Responding) & Pinwheel of doom
    • 34 videos, total 140Mb = several minutes of editing, then (Not Responding) & Pinwheel of doom
  • iMovie
    • To download for an older version of MacOS, open App Store | Purchased | Install






Command-line data processing 2023

Some developer tools with CLI  for processing XML, XHTML, HTML, JSON, YAML, etc.


  • xsh (perl - Choroba)
    • cpm XML::XSH2
    • xsh -P file.xml
    • ls
    • help ls
    • help | less
    • <TAB> autocompletion
  • xmllint
    • xmllint --xpath "//foo" file.xml
    • xmllint --shell file.xml
  • xmltarlet
  • xq (golang - )
    • apt-get install xq
  • xq (python - jeffbr13)
    • pip install xq


  • jq
    • cat file.json | jq . # format
    • cat file.json | jq '.[]' # extract array


  • fzf
  • ripgrep
  • ag
  • ack
    • Doesn't search "binary" files by default
  • vim - for searching files

Windows 10 robocopy basics

robocopy c:\temp\source c:\temp\destination /E /DCOPY:DAT /R:10 /W:3

Running Emby on Linux

wget https://github.com/MediaBrowser/Emby.Releases/releases/download/

sudo dpkg -i emby-server-deb_4.5.4.0_amd64.deb

sudo systemctl status emby-server.service