A piggy bank of commands, fixes, succinct reviews, some mini articles and technical opinions from a (mostly) Perl developer.

Jump to

Quick reference

Ubuntu Linux setup basics

For username "foo":

$ adduser foo

$ passwd foo

$ sudo usermod -aG sudo foo

mkdir -p /home/foo/.ssh

$ cat the_public_key.pem >> /home/foo/.ssh/authorized_keys

chown -R foo:foo /home/foo/.ssh

$ chmod 700 /home/foo/.ssh

$ chmod 600 /home/foo/.ssh/authorized_keys
Disable default account:
$ usermod -s /usr/sbin/nologin default_username

Notes:
  • Not useradd.
  • Even when logging in with just SSH key, user must have a password. It will only be used for sudo commands.

Sudoers basics

Define text editor

$ sudo update-alternatives --config editor

Edit /etc/sudoers file

$ sudo visudo /etc/sudoers

Add a user to sudo group

$ sudo usermod -aG sudo username
or
$ sudo gpasswd -a username sudo

(but CentOS uses "wheel" group instead)

Default config for Ubuntu 22:

# User privilege specification
root    ALL=(ALL:ALL) ALL

# Members of the admin group may gain root privileges
%admin ALL=(ALL) ALL

# Allow members of group sudo to execute any command
%sudo   ALL=(ALL:ALL) ALL

Remove login for default users:

$ usermod -s /usr/sbin/nologin username

Sources:

Elasticsearch advanced queries

See also ElasticSearch basics.

DQL to filter by non-zero length: Advert.location_query:* (does not work as a filter)
Or in Lucene: Advert.location_query:?*

Results are limited to 10,000 records, unless you use the scroll API which can paginate and also make parallel requests.

Gist:
use strict;
use warnings;
use Data::Dumper::Concise;
use Search::Elasticsearch;
my $ekk = Search::Elasticsearch->new(
client => '7_0::Direct',
nodes => [ 'https://opensearch.example.com/', ],
send_get_body_as => 'POST',
$ENV{USE_EKK_PROXY} ? (handle_args => { https_proxy => $ENV{USE_EKK_PROXY} }) : (),
);
my $results = $ekk->search(
body => {
"size" => 500,
"sort" => [
{
"timestamp" => {
"order" => "desc",
"unmapped_type" => "boolean"
}
}
],
"aggs" => {
"2" => {
"date_histogram" => {
"field" => "timestamp",
"calendar_interval" => "1d",
"time_zone" => "Europe/London",
"min_doc_count" => 1
}
}
},
"stored_fields" => ["*"],
"script_fields" => {},
"docvalue_fields" => [
{
"field" => "timestamp",
"format" => "date_time"
}
],
"_source" => {
"excludes" => []
},
"query" => {
"bool" => {
"must" => [],
"filter" => [
{
"match_all" => {}
},
{
"match_phrase" => {
"foo_field" => "bar_value"
}
},
{
"range" => {
"timestamp" => {
"gte" => "2023-06-28T12:04:15.943Z",
"lte" => "2023-09-28T12:04:15.943Z",
"format" => "strict_date_optional_time"
}
}
}
],
"should" => [],
"must_not" => []
}
},
"highlight" => {
"pre_tags" => ["\@opensearch-dashboards-highlighted-field\@"],
"post_tags" => ["\@/opensearch-dashboards-highlighted-field\@"],
"fields" => {
"*" => {}
},
"fragment_size" => 2147483647
}
}
);
print Dumper($results);

MySQL date display format conversion

MySQL date time conversion functions: 

  • UNIX_TIMESTAMP(date) docs
  • FROM_UNIXTIME(epoch) docs

Examples:

select name, from_unixtime(time_added, '%Y-%m-%d %h:%i')

from company

order by date_added desc

limit 10; -- list the most recently added companies

You may also omit the '%Y-%m-%d %h:%i' format string, to get the default format YYYY-MM-DD HH:MM:SS, e.g. `2023-09-07 09:43:51`


Test2 cheat sheet for Perl

Cheat sheet

Links

Docs entry point

  • Test2::Tools::Compare
    • is like isnt unlike
    • match mismatch validator
    • hash array bag object meta number float rounded within string subset bool
    • in_set not_in_set check_set
    • item field call call_list call_hash prop check all_items all_keys all_vals all_values
    • etc end filter_items
    • T F D DF E DNE FDNE U L
    • event fail_events
    • exact_ref

Summary

=head2 SYNOPSIS

use Test2::V0;

# Match regex in a hash
like( $some_hash, hash {                        # <-- Must use `like` keyword to make regex below work
    field 'message' => qr/Caught exception/;    # <-- Note `field` keyword and trailing semicolon. Quotes around key optional
    end();                                      # <-- Enforce that no other keys are present, optional
}, 'Logged error correctly');                   # <-- Test description and parentheses () are optional

Comparisons

is(
    {
        a => 1,
        b => 'foo',
    },
    {
        a => D(),   # value is Defined
        b => E(),   # value Exists
        c => DNE(), # key/value Does Not Exist
    },
    'I can haz Test2?'
);

Todo - legacy

use Test::More;

TODO: {
    local $TODO = 'Still working on this';
    ok(0, "work in progress");
}

Todo2

use Test2::Tools::Tiny qw/todo/;

todo 'Still working on this' => sub {
    ok(0, "failing test gonna fail");
}; # <--- remember the semicolon!

See t/regression/todo_and_facets.t


Chrome extensions: Download manager reviews


  • DownThemAll: Queues, but doesn't intercept
  • Thunder Download Manager: Intercepts, but doesn't queue
  • Free Download Manager: Just says "Loading..." (on Ubuntu)
  • Chrono Download Manager: Intercepts and queues! And resumes. Perfect!

curl -o / wget -O

curl --output file
curl -o file

wget --output-document file
wget -O file

Best to just always use curl, and know it uses lowercase for common arguments like normal.

It's usually already installed as well.

Analyzing HAR files with jq

See gist:

jq docs

jq playground

Normal mode

Exploring HAR files

export HAR_FILE="/path/to/har/file"

  • Example to dump responses, for a given request URI

    • REQUEST_URI="https://www.facebook.com/api/graphql/" cat $HAR_FILE | jq -r ".log.entries[] | if .request.url | test(\"$REQUEST_URI\") then .response.content else empty end"
      • Note the string passed to jq is in double quotes " so that the $REQUEST_URI is interpolated
      • But jq wants us to use double quotes for test("foo"), therefore they must be escaped like test(\"foo\")
  • Another way to do the same thing in bash using single quotes. Quotes can be tricky.

    • REQUEST_URI="https://www.facebook.com/api/graphql/" cat $HAR_FILE | jq -r '.log.entries[] | if .request.url | test("'$REQUEST_URI'") then { uri: .request.url, mineType: .response.content.mimeType, content: .response.content.text | .[0:200] } else empty end'
    • Note the string passed to jq is in three parts:
      • '...etc...test("'
      • $REQUEST_URI
      • '") then...etc...else empty end'
    • The content is truncated to the first 200 characters, to make it more readable
  • Dump full the response content, interpreted as JSON

    • ...todo...

Streaming mode

...todo

Case studies

Youtube

Goal: Extract URLs of all your playlists

(under development)

  • Go to https://music.youtube.com/library/playlists in browser, scroll slowly down to the bottom
    • Chrome | DevTools | Network tab | Save all as HAR
  • Extract response text for relevant requests
    • cat $HAR_FILE | jq -r '.log.entries[] | select( .request.url | test("^https://music.youtube.com/youtubei") ) | .response.content.text' > $REQS_FILE
  • Approach 1: Loop over lines of file and extract playlistIDs (status: draft -- this gets playlist titles)
    • cat $REQS_FILE | while read line; do echo "$line" | jq '.contents.singleColumnBrowseResultsRenderer.tabs[].tabRenderer.content.sectionListRenderer.contents[].musicCarouselShelfRenderer.contents[].musicTwoRowItemRenderer.title.runs[] | { name: .text, id: .navigationEndpoint.browseEndpoint.browseId }'; done > $PLAYLISTS_FILE
    • Bugs:
      • duplicate values
      • jq errors
      • last 5 entries are irrelevant
      • missing most entries!
  • Approach 2: Scan for all relevant playlist IDs, wherever they are in the document
    • cat playlists.2 | jq -r 'getpath( paths | select(.[-1] == "browseId") ) | select(. | match("^VLPL"))'
    • Bugs:
      • jq error: parse error: Invalid numeric literal at line 11, column 0
      • missing some entries
  • Approach 3: Give up and use Perl regex
    • cat $REQS_FILE | perl -lne'@ids = m/"browseId":"([^"]+)"/g; print $_ foreach map { s/^VL//; $_ } grep { /^VLPL/ && length($_) > 22 } @ids' | uniq > $PLAYLISTS_FILE
    • Bugs:
      • This was supposed to be a jq cheat sheet, using Perl is cheating!
      • It still misses some playlists from the initial page load.
  • Approach 4: Found another source of data in the page
    • cat $HAR_FILE | jq -r '.log.entries[] | select( .request.url | test("^https://music.youtube.com/library/playlists") ) | .response.content.text' > $SCRIPT_DATA
    • Decode it
      • cat $SCRIPT_DATA | perl -plne's/(\\x[[:xdigit:]]{2})/qq{"$1"}/eeg' > $DECODED_SCRIPT_DATA
    • Maybe little bit of manual munging :/
    • ...TODO... extract the browseIDs

AlternativeTo

Goal: Extract list of alternative software

Fetch JSON

Extract data

  • export REGEX="software/gmail.json"; cat alternativeto.net.har | jq -r ".log.entries[] | if .request.url | test(\"$REGEX\") then .response.content.text else empty end" > page_per_line
    • this results in 9 lines, one for each 'page' you loaded
  • change the [] above to [0] to get one page, and pipe the result through jq again or use the fromjson filter as follows:
    • export REGEX="software/gmail.json"; cat alternativeto.net.har | jq -r ".log.entries[0] | if .request.url | test(\"$REGEX\") then .response.content.text | fromjson else empty end" > one_page_one_line
  • Now browse this JSON data, preferably in an IDE like vscode that can fold up sections easily to discover the following structure:
    • export REGEX="software/gmail.json"; cat alternativeto.net.har | jq -r ".log.entries[] | if .request.url | test(\"$REGEX\") then .response.content.text | fromjson | .pageProps.items[] | { name: .name, cost: .licenseCost, model: .licenseModel, desc: .shortDescriptionOrTagLine } else empty end" > software.json

Sample output

{
  "name": "Mailfence",
  "cost": "Freemium",
  "model": "Proprietary",
  "desc": "Mailfence is a secure and private email service that fights for online privacy and digital freedom."
}
{
  "name": "Proton Mail",
  "cost": "Freemium",
  "model": "Open Source",
  "desc": "Secure email with absolutely no compromises, brought to you by MIT and CERN scientists."
}
...etc

Tips, tricks and gotchas

Decode HTML entities

e.g. converts AT&amp;T Webmail to AT&T Webmail

npm install -g he
cat software.json | jq '.name' -r | he --decode

Debugging

For very simple test examples, you must quote inputs twice, i.e. pass "foo" with quotes

echo '"hello"' | jq '.'

Regex. gsub = global substitution. Note the semicolon ; to separate arguments to gsub().

echo '"foo\r\nbar"' | jq -r 'gsub("(\r\n.+)"; "")'

Video processing for fun

Mac

  • QuickTime Player
    • Edit menu | Add clip after...
    • 40 videos, total 250Mb = (Not Responding) & Pinwheel of doom
    • 34 videos, total 140Mb = several minutes of editing, then (Not Responding) & Pinwheel of doom
  • iMovie
    • To download for an older version of MacOS, open App Store | Purchased | Install

Linux

Windows

Web

Android

iPhone


Command-line data processing 2023

Some developer tools with CLI  for processing XML, XHTML, HTML, JSON, YAML, etc.

XML

  • xsh (perl - Choroba)
    • cpm XML::XSH2
    • xsh -P file.xml
    • ls
    • help ls
    • help | less
    • <TAB> autocompletion
  • xmllint
    • xmllint --xpath "//foo" file.xml
    • xmllint --shell file.xml
  • xmltarlet
  • xq (golang - )
    • apt-get install xq
  • xq (python - jeffbr13)
    • pip install xq

JSON

  • jq
    • cat file.json | jq . # format
    • cat file.json | jq '.[]' # extract array

Search

  • fzf
  • ripgrep
  • ag
  • ack
    • Doesn't search "binary" files by default
  • vim - for searching files

Windows 10 robocopy basics

robocopy c:\temp\source c:\temp\destination /E /DCOPY:DAT /R:10 /W:3

Running Emby on Linux

wget https://github.com/MediaBrowser/Emby.Releases/releases/download/4.5.4.0/emby-server-deb_4.5.4.0_amd64.deb


sudo dpkg -i emby-server-deb_4.5.4.0_amd64.deb


sudo systemctl status emby-server.service