jboy.eu

code | media | stuff

Current projects

About

My name is Jeremy, I am a software developer and you probably don’t give a shit about this. I will use this site mostly for notes concering coding and stuff I found on the interwebz. I tend to be forgetful from time to time (:

Attribution

  • This site is powered by Hugo, an awesome static website engine.
  • The logo is derived from a photography by NASA, 1949.

Blog

Parse CSV files with FreeBSD

I recently read a blog post by Dale Scott where he lists several options for parsing CSV files on UNIX systems. Unfortunately, the comment section is closed, so I decided to add some lines here.

In his article, Dale names 6 and describes 4 options for parsing CSV files on FreeBSD, namely:

My personal use case is to parse a CSV file that contains crash reports for an app in Google Play Store. I need a software that accepts a CSV file and a format string (i.e. “App Version %1, Exception %2…”) and fills the placeholders with CSV columns. I tried to achieve this with csvtool first, which is present in the FreeBSD ports tree (textproc/ocaml-csv).

csvtool

I started developing my script using ocaml-csv, which has a very handy program csvtool, which accepts an input file and a format string, just as I wanted:

jb@jb-mbp:~ % cat tmp/sample.csv 
name,type,food
John,nice,pizza
Sue,sweet,taco
jb@jb-mbp:~ % csvtool drop 1 tmp/sample.csv | csvtool format "%1 is a %2 %3 eater\n" -
John is a nice pizza eater
Sue is a sweet taco eater

After installing textproc/ocaml-csv on my FreeBSD box, I figured that csvtool is not installed as part of it. The sources for csvtool can be found here:

root@jb:~ # ls /usr/obj/usr/ports/textproc/ocaml-csv/work/csv-1.2.1/examples/
.depend.ocaml Makefile      csvtool       csvtool.annot csvtool.cmi   csvtool.cmo   csvtool.exe   csvtool.ml    example.annot example.cmi   example.cmo   example.exe   example.ml    example1.csv  example2.csv

I might look into this when I find the time, but I had to get this script working, so I took a look at csvkit instead, which you should find at textproc/py-csvkit. csvkit comes with several tools, but I will only use two of them, csvcut and csvformat.

csvcut

csvcut lets you select columns from your input CSV file:

usage: csvcut [-h] [-d DELIMITER] [-t] [-q QUOTECHAR] [-u {0,1,2,3}] [-b]
      [-p ESCAPECHAR] [-z MAXFIELDSIZE] [-e ENCODING] [-S] [-H] [-v]
      [-l] [--zero] [-n] [-c COLUMNS] [-C NOT_COLUMNS] [-x]
      [FILE]

Filter and truncate CSV files. Like unix "cut" command, but for tabular data.

positional arguments:
  FILE                  The CSV file to operate on. If omitted, will accept
            input on STDIN.

optional arguments:
  -h, --help            show this help message and exit
  -d DELIMITER, --delimiter DELIMITER
            Delimiting character of the input CSV file.
  -t, --tabs            Specifies that the input CSV file is delimited with
            tabs. Overrides "-d".
  -q QUOTECHAR, --quotechar QUOTECHAR
            Character used to quote strings in the input CSV file.
  -u {0,1,2,3}, --quoting {0,1,2,3}
            Quoting style used in the input CSV file. 0 = Quote
            Minimal, 1 = Quote All, 2 = Quote Non-numeric, 3 =
            Quote None.
  -b, --doublequote     Whether or not double quotes are doubled in the input
            CSV file.
  -p ESCAPECHAR, --escapechar ESCAPECHAR
            Character used to escape the delimiter if --quoting 3
            ("Quote None") is specified and to escape the
            QUOTECHAR if --doublequote is not specified.
  -z MAXFIELDSIZE, --maxfieldsize MAXFIELDSIZE
            Maximum length of a single field in the input CSV
            file.
  -e ENCODING, --encoding ENCODING
            Specify the encoding the input CSV file.
  -S, --skipinitialspace
            Ignore whitespace immediately following the delimiter.
  -H, --no-header-row   Specifies that the input CSV file has no header row.
            Will create default headers.
  -v, --verbose         Print detailed tracebacks when errors occur.
  -l, --linenumbers     Insert a column of line numbers at the front of the
            output. Useful when piping to grep or as a simple
            primary key.
  --zero                When interpreting or displaying column numbers, use
            zero-based numbering instead of the default 1-based
            numbering.
  -n, --names           Display column names and indices from the input CSV
            and exit.
  -c COLUMNS, --columns COLUMNS
            A comma separated list of column indices or names to
            be extracted. Defaults to all columns.
  -C NOT_COLUMNS, --not-columns NOT_COLUMNS
            A comma separated list of column indices or names to
            be excluded. Defaults to no columns.
  -x, --delete-empty-rows
            After cutting, delete rows which are completely empty.

csvformat

csvformat lets you change the format of your CSV file. You can also pipe csvcut into csvformat (which I do):

usage: csvformat [-h] [-d DELIMITER] [-t] [-q QUOTECHAR] [-u {0,1,2,3}] [-b]
         [-p ESCAPECHAR] [-z MAXFIELDSIZE] [-e ENCODING] [-S] [-v]
         [-D OUT_DELIMITER] [-T] [-Q OUT_QUOTECHAR] [-U {0,1,2,3}]
         [-B] [-P OUT_ESCAPECHAR] [-M OUT_LINETERMINATOR]
         [FILE]

Convert a CSV file to a custom output format.

positional arguments:
  FILE                  The CSV file to operate on. If omitted, will accept
            input on STDIN.

optional arguments:
  -h, --help            show this help message and exit
  -d DELIMITER, --delimiter DELIMITER
            Delimiting character of the input CSV file.
  -t, --tabs            Specifies that the input CSV file is delimited with
            tabs. Overrides "-d".
  -q QUOTECHAR, --quotechar QUOTECHAR
            Character used to quote strings in the input CSV file.
  -u {0,1,2,3}, --quoting {0,1,2,3}
            Quoting style used in the input CSV file. 0 = Quote
            Minimal, 1 = Quote All, 2 = Quote Non-numeric, 3 =
            Quote None.
  -b, --doublequote     Whether or not double quotes are doubled in the input
            CSV file.
  -p ESCAPECHAR, --escapechar ESCAPECHAR
            Character used to escape the delimiter if --quoting 3
            ("Quote None") is specified and to escape the
            QUOTECHAR if --doublequote is not specified.
  -z MAXFIELDSIZE, --maxfieldsize MAXFIELDSIZE
            Maximum length of a single field in the input CSV
            file.
  -e ENCODING, --encoding ENCODING
            Specify the encoding the input CSV file.
  -S, --skipinitialspace
            Ignore whitespace immediately following the delimiter.
  -v, --verbose         Print detailed tracebacks when errors occur.
  -D OUT_DELIMITER, --out-delimiter OUT_DELIMITER
            Delimiting character of the output CSV file.
  -T, --out-tabs        Specifies that the output CSV file is delimited with
            tabs. Overrides "-D".
  -Q OUT_QUOTECHAR, --out-quotechar OUT_QUOTECHAR
            Character used to quote strings in the output CSV
            file.
  -U {0,1,2,3}, --out-quoting {0,1,2,3}
            Quoting style used in the output CSV file. 0 = Quote
            Minimal, 1 = Quote All, 2 = Quote Non-numeric, 3 =
            Quote None.
  -B, --out-doublequote
            Whether or not double quotes are doubled in the output
            CSV file.
  -P OUT_ESCAPECHAR, --out-escapechar OUT_ESCAPECHAR
            Character used to escape the delimiter in the output
            CSV file if --quoting 3 ("Quote None") is specified
            and to escape the QUOTECHAR if --doublequote is not
            specified.
  -M OUT_LINETERMINATOR, --out-lineterminator OUT_LINETERMINATOR
            Character used to terminate lines in the output CSV
            file.

(To be continued)

09.01.17

Repair sparse bundle for Time Machine

sudo -s
cd /Volumes/TimeMachineBackup
chflags -R nouchg atar.sparsebundle
hdiutil attach -nomount -noverify -noautofsck atar.sparsebundle
/dev/disk2           Apple_partition_scheme
/dev/disk2s1         Apple_partition_map
/dev/disk2s2         Apple_HFSX
fsck_hfs -drfy /dev/disk2s2
23.02.16

Restore Mac OS X 10.11 El Capitan from Time Machine backup on AFP

I recently had difficulties restoring a Time Machine backup which is stored on a netatalk server to a MacBook Pro using the builtin Recovery Mode.

mkdir /Volumes/backup
mount_afp afp://user:pwd@example.org/afpShareName /Volumes/backup
hdid /Volumes/backup/$SPARSBUNDLENAME

This fixes the issue. :-)

22.02.16