Backend
Getting Started
- Install osm2pgsql > v1.6.0
sudo apt-get install osm2pgsql
- Install other system dependencies that are necessary to build the system
sudo apt -y install python-is-python3 # for sanity
sudo apt -y install python3-virtualenv
sudo apt -y install libpq-dev # for building psycopg2
- Clone rawdata and navigate to backend dir
git clone https://github.com/hotosm/raw-data-api.git && cd backend
- Install Requirements
Install psycopg2, osmium and dateutil , wget in your python env . You can install using requirements.txt
too
pip install -r requirements.txt
- Start the Process
You can either export your db params as env variables or pass to script , or create .env and hit source .env
export PGHOST=localhost
export PGPORT=5432
export PGUSER=admin
export PGPASSWORD=admin
export PGDATABASE=postgres
Choose your source for the Import
You can Download Planet pbf file Here or Use Geofabrik Pbf file Here with full metadata (Tested with .pbf file) , or pass download link to script itself . Follow -h help
-
Test with the sample data
python raw_backend --insert
This will insert the sample data provided on the code , You can check your backend setup with this
-
For eg : To import Turkey and Enable replication later on
python raw_backend --insert --replication --source https://download.geofabrik.de/europe/turkey-latest.osm.pbf --country 127
Here Insert option will do the import after downloading osm.pbf file from source , You can supply filepath of already downloaded file to --source as well . Replication will enable and Prepare the tables for replication and start the replication right away until data becomes now , Country parameter is read from fid of /backend/countries which will make sure to only maintains the replication data for that country
-
Import your osm file and run replication for only your custom geojson
Example of geojson :python raw_backend --replication --boundary pokhara.geojson
{ "type": "Feature", "properties": {}, "geometry": { "coordinates": [ [ [ 83.90895770965955, 28.279633888511327 ], [ 83.90895770965955, 28.131383546395526 ], [ 84.10168708213502, 28.131383546395526 ], [ 84.10168708213502, 28.279633888511327 ], [ 83.90895770965955, 28.279633888511327 ] ] ], "type": "Polygon" } }
To Run Replication only
python raw_backend --replication
By default this command will run replciation until data becomes up to date and exit ! You can run this script on your custom frequency by specifying your cron / prefeered way to wake the script do the job and sleep
Options to Run the Script :
-h, --help show this help message and exit
--source SOURCE Data source link or file path
--host HOST DB host
--port PORT DB port
--user USER DB user
--password PASSWORD DB password
--database DATABASE DB name
--include_ref Include ref in output tables
--replication Prepare tables for replication and Runs Replication
--country COUNTRY id of the country , if you are loading country , it will filter replication data
--boundary Takes geojson file path or geojson string itself to keep replication within the region
--insert Run osm2pgsql to insert data , Initial Creation Step
--update Run Update on table fields for country info
--download_dir DOWNLOAD_DIR
The directory to download the source file to
--post_index Run Post index only on table
If you are interested on Manual setup find Guide here
Running the backend service via Systemd
- Create a systemd unit file for raw-data-backend service
$ sudo systemctl edit --full --force raw-data-backend.service
[Unit]
Description=Raw Data Backend Service
Documentation=https://github.com/hotosm/raw-data-api/blob/develop/backend/Readme.md
After=network.target syslog.target
Wants=network-online.target systemd-networkd-wait-online.service
StartLimitIntervalSec=500
StartLimitBurst=5
[Service]
Type=simple
User=hotsysadmin
WorkingDirectory=/opt/raw-data-api/backend
ExecStart=/opt/raw-data-api/backend/venv/bin/python raw_backend --replication
Restart=on-failure
EnvironmentFile=/opt/raw-data-api/backend/PGCRED.env
Type=simple
Restart=on-failure
RestartSec=5s
WatchdogSec=43200
[Install]
WantedBy=multi-user.target
$ sudo systemctl start raw-data-backend.service
$ sudo systemctl status raw-data-backend.service
● raw-data-backend.service - Raw Data Backend Service
Loaded: loaded (/etc/systemd/system/raw-data-backend.service; disabled; vendor preset: enabled)
Active: active (running) since Mon 2023-02-13 14:30:03 UTC; 4min 25s ago
Docs: https://github.com/hotosm/raw-data-api/blob/develop/backend/Readme.md
Main PID: 50561 (python)
Tasks: 9 (limit: 4700)
Memory: 94.7M
CPU: 14.996s
CGroup: /system.slice/raw-data-backend.service
├─50561 /opt/raw-data-api/backend/venv/bin/python app --replication --run_minutely
├─50563 python /opt/raw-data-api/backend/replication update -s raw.lua --max-diff-size 10
└─50704 osm2pgsql --append --slim --prefix planet_osm --output=flex --extra-attributes --style raw.lua -d app_backend -U adm_app_backend -H rawdat.postgres.database.azure.com >
Feb 13 14:30:03 raw-data-backend-production systemd[1]: Started Raw Data Backend Service.
Feb 13 14:30:04 raw-data-backend-production python[50562]: 2023-02-13 14:30:04 [INFO]: Initialised updates for service 'https://planet.openstreetmap.org/replication/minute'.
Feb 13 14:30:04 raw-data-backend-production python[50562]: 2023-02-13 14:30:04 [INFO]: Starting at sequence 5348603 (2022-12-06 00:59:10+00:00).
Feb 13 14:30:05 raw-data-backend-production python[50563]: 2023-02-13 14:30:05 [INFO]: Using replication service 'https://planet.openstreetmap.org/replication/minute'. Current sequence 5348603 (2>
Feb 13 14:30:10 raw-data-backend-production python[50704]: 2023-02-13 14:30:10 osm2pgsql version 1.6.0
Feb 13 14:30:10 raw-data-backend-production python[50704]: 2023-02-13 14:30:10 Database version: 14.6
Feb 13 14:30:10 raw-data-backend-production python[50704]: 2023-02-13 14:30:10 PostGIS version: 3.2
- Setup your every minute update timer
$ sudo systemctl edit --full --force raw-data-backend.timer
[Unit]
Description=Trigger a rawdata database update
[Timer]
OnBootSec=10
OnUnitActiveSec=5min
[Install]
WantedBy=timers.target
- Enable timer and reload your systemctl
$ sudo systemctl enable raw-data-backend.timer
$ sudo systemctl daemon-reload