[OBM Admin] archive tables and databases

Mon Oct 31 19:43:06 CET 2016

Hi all,

I wrote a simple archive script for openbiomaps databases and tables.
I'm using a very similar version on the OBM servers and now I'm sharing
it with you to inspire to use it on your gekko servers.

It will be included in the next gekko release!

So, if you need a regular archive, use this or something similar.

Few steps necessary to prepare the automatic usage of it:

1) 
Create a .pgpass file in your home directory
set it to read only for you (chmod 600 .pgpass)
write your databases access information into it, in the following way:
localhost:5432:gisdata:gisadmin:YOURPASSWD
localhost:5432:biomaps:gisadmin:YOURPASSWD

2)
Make sure that the "local all all" line is using md5 authentication type
in the pg_hba.conf file (/etc/postgresql/9.x/main/pg_hba.conf).

3)
Create a writable destination directory somewhere
(e.g.: sudo mkdir /home/archive; sudo chown $USER /home/archive)

4)
Set up your archive path and table names in the script

5)
Run the script automatically from cron. See the examples in the code

I'm thinking to set up a volunteer based network file system on
the gekko servers for cross archiving data. It is a very-very open and
optional thing. I'm thinking to use some kind of client-to-client system
like moosefs or torrent .... any idea?

The archive script code bellow and here:
https://github.com/OpenBioMaps/archive_scripts

Miki,

#!/bin/bash

#OpenBioMaps archive script by Miki Bán banm at vocs.unideb.hu
#2016-10-31
#feel free to upgrade it!
#please share your improvements:
#administrator at openbiomaps.org
#https://github.com/OpenBioMaps/archive_scripts

# crontab usage examples:
# only tables from Monday to Saturday
#15 04 * * 1-6 /home/banm/archive.sh normal &
# tables and whole databases on every Sunday
#15 04 * * 7 /home/banm/archive.sh full &

# Variables - set them as you need
# Local
date=`date +"%b-%d-%y_%H:%M"`
# tables in gisdata
tables=(templates templates_genetics templates_taxon files file_connect)
dbs=(gisdata biomaps)
archive_path="/home/archives"

# Remote
remote_user=''
remote_site=''
remote_path=''

case "$1" in
normal) echo "dumping tables"

    for i in "${tables[@]}"
    do 
        printf "pg_dump -U gisadmin -f %s/gisdata_%s_%s.sql -n public
    -t %s gisdata\n" $archive_path $i $date $i | bash done

echo "."
;;
full) echo "dumping tables and databases"

    echo "tables"
    for i in "${tables[@]}"
    do 
        printf "pg_dump -U gisadmin -f %s/gisdata_%s_%s.sql -n public
-t %s gisdata" $archive_path $i $date $i | bash done

    echo "databases"
    for i in "${dbs[@]}"
    do 
        printf "pg_dump -U gisadmin -f %s/%s_%s.sql -n public %s"
    $archive_path $i $date $i | bash done

echo "."
;;
sync) echo "syncing to remote hosts"

    pattern="$2"
    if [ "$pattern" = '' ]; then
        # simple copy
        rsync -ave ssh $archive_path/
$remote_user@$remote_site:$remote_path/ else
        # complex pattern based copy
        find $archive_path -name "$pattern" -print0 | tar --null
--files-from=/dev/stdin -cf - | ssh $remote_user@$remote_site tar -xf -
-C $remote_path fi

echo "."
;;
clean) echo "cleaning: gzipping sql files and deleting old gzip files"

    # run it every month
    n1=7
    n2=30
    if [[ ! -z "$2" && -z "$3" ]]; then
        n1="$2"
    elif [ ! -z "$3" ]; then
        n1="$2"
        n2="$3"
    fi
    printf "find %s -type f -name '*.sql' -mtime +$n1 -print -exec gzip
{} \;" $archive_path | bash printf "find %s -type f -name '*.gz' -mtime
+$n2 -print -exec rm {} \;" $archive_path | bash

echo "."
;;

esac
exit 0

-- 
Miklós Bán, PhD
MTA-DE "Lendület" Behavioural Ecology Research Group
Department of Evolutionary Zoology, University of Debrecen
H-4010 Debrecen, Egyetem tér 1.
Phone:  +36 52 512-900 ext. 62356
http://zoology.unideb.hu/?m=Miklos_Ban
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~