Skip to content

Backend

Getting Started

sudo apt-get install osm2pgsql
  • Install other system dependencies that are necessary to build the system
sudo apt -y install python-is-python3 # for sanity
sudo apt -y install python3-virtualenv
sudo apt -y install libpq-dev # for building psycopg2
  • Clone rawdata and navigate to backend dir
git clone https://github.com/hotosm/raw-data-api.git && cd backend
  • Install Requirements

Install psycopg2, osmium and dateutil , wget in your python env . You can install using requirements.txt too

pip install -r requirements.txt
  • Start the Process

You can either export your db params as env variables or pass to script , or create .env and hit source .env

export PGHOST=localhost
export PGPORT=5432
export PGUSER=admin
export PGPASSWORD=admin
export PGDATABASE=postgres

Choose your source for the Import

You can Download Planet pbf file Here or Use Geofabrik Pbf file Here with full metadata (Tested with .pbf file) , or pass download link to script itself . Follow -h help

  • Test with the sample data

    python raw_backend --insert
    

    This will insert the sample data provided on the code , You can check your backend setup with this

  • For eg : To import Turkey and Enable replication later on

    python raw_backend --insert --replication --source https://download.geofabrik.de/europe/turkey-latest.osm.pbf --country 127
    

    Here Insert option will do the import after downloading osm.pbf file from source , You can supply filepath of already downloaded file to --source as well . Replication will enable and Prepare the tables for replication and start the replication right away until data becomes now , Country parameter is read from fid of /backend/countries which will make sure to only maintains the replication data for that country

  • Import your osm file and run replication for only your custom geojson

      python raw_backend --replication --boundary pokhara.geojson
    
    Example of geojson :
        {
        "type": "Feature",
        "properties": {},
        "geometry": {
          "coordinates": [
            [
              [
                83.90895770965955,
                28.279633888511327
              ],
              [
                83.90895770965955,
                28.131383546395526
              ],
              [
                84.10168708213502,
                28.131383546395526
              ],
              [
                84.10168708213502,
                28.279633888511327
              ],
              [
                83.90895770965955,
                28.279633888511327
              ]
            ]
          ],
          "type": "Polygon"
        }
      }
    

To Run Replication only

python raw_backend --replication

By default this command will run replciation until data becomes up to date and exit ! You can run this script on your custom frequency by specifying your cron / prefeered way to wake the script do the job and sleep

Options to Run the Script :

-h, --help            show this help message and exit
--source SOURCE       Data source link or file path
--host HOST           DB host
--port PORT           DB port
--user USER           DB user
--password PASSWORD   DB password
--database DATABASE   DB name
--include_ref         Include ref in output tables
--replication         Prepare tables for replication and Runs Replication
--country COUNTRY     id of the country , if you are loading country , it will filter replication data
--boundary            Takes geojson file path or geojson string itself to keep replication within the region
--insert              Run osm2pgsql to insert data , Initial Creation Step
--update              Run Update on table fields for country info
--download_dir DOWNLOAD_DIR
                        The directory to download the source file to
--post_index          Run Post index only on table

If you are interested on Manual setup find Guide here

Running the backend service via Systemd

  • Create a systemd unit file for raw-data-backend service
$ sudo systemctl edit --full --force raw-data-backend.service

[Unit]
Description=Raw Data Backend Service
Documentation=https://github.com/hotosm/raw-data-api/blob/develop/backend/Readme.md
After=network.target syslog.target
Wants=network-online.target systemd-networkd-wait-online.service
StartLimitIntervalSec=500
StartLimitBurst=5

[Service]
Type=simple
User=hotsysadmin
WorkingDirectory=/opt/raw-data-api/backend
ExecStart=/opt/raw-data-api/backend/venv/bin/python raw_backend --replication
Restart=on-failure
EnvironmentFile=/opt/raw-data-api/backend/PGCRED.env
Type=simple
Restart=on-failure
RestartSec=5s
WatchdogSec=43200

[Install]
WantedBy=multi-user.target
- Start Your service and look at the status
$ sudo systemctl start raw-data-backend.service
$ sudo systemctl status raw-data-backend.service

● raw-data-backend.service - Raw Data Backend Service
     Loaded: loaded (/etc/systemd/system/raw-data-backend.service; disabled; vendor preset: enabled)
     Active: active (running) since Mon 2023-02-13 14:30:03 UTC; 4min 25s ago
       Docs: https://github.com/hotosm/raw-data-api/blob/develop/backend/Readme.md
   Main PID: 50561 (python)
      Tasks: 9 (limit: 4700)
     Memory: 94.7M
        CPU: 14.996s
     CGroup: /system.slice/raw-data-backend.service
             ├─50561 /opt/raw-data-api/backend/venv/bin/python app --replication --run_minutely
             ├─50563 python /opt/raw-data-api/backend/replication update -s raw.lua --max-diff-size 10
             └─50704 osm2pgsql --append --slim --prefix planet_osm --output=flex --extra-attributes --style raw.lua -d app_backend -U adm_app_backend -H rawdat.postgres.database.azure.com >

Feb 13 14:30:03 raw-data-backend-production systemd[1]: Started Raw Data Backend Service.
Feb 13 14:30:04 raw-data-backend-production python[50562]: 2023-02-13 14:30:04 [INFO]: Initialised updates for service 'https://planet.openstreetmap.org/replication/minute'.
Feb 13 14:30:04 raw-data-backend-production python[50562]: 2023-02-13 14:30:04 [INFO]: Starting at sequence 5348603 (2022-12-06 00:59:10+00:00).
Feb 13 14:30:05 raw-data-backend-production python[50563]: 2023-02-13 14:30:05 [INFO]: Using replication service 'https://planet.openstreetmap.org/replication/minute'. Current sequence 5348603 (2>
Feb 13 14:30:10 raw-data-backend-production python[50704]: 2023-02-13 14:30:10  osm2pgsql version 1.6.0
Feb 13 14:30:10 raw-data-backend-production python[50704]: 2023-02-13 14:30:10  Database version: 14.6
Feb 13 14:30:10 raw-data-backend-production python[50704]: 2023-02-13 14:30:10  PostGIS version: 3.2
  • Setup your every minute update timer
    $ sudo systemctl edit --full --force raw-data-backend.timer
    
[Unit]
Description=Trigger a rawdata database update

[Timer]
OnBootSec=10
OnUnitActiveSec=5min

[Install]
WantedBy=timers.target
  • Enable timer and reload your systemctl
$ sudo systemctl enable raw-data-backend.timer
$ sudo systemctl daemon-reload