Administrator's guide

Data preparation provides a step-by-step procedure on how to integrate data from six different sources and prepare it for running SmartWalk.

Once data is ready, read Running the app to learn how to get the application up and running in development and production settings.

If something is not working as expected, you might find Troubleshooting helpful before searching for a solution on the Web.

Prerequisites

SmartWalk is essentially cross-platform. However, Unix utilities simplify certain aspects of system maintenance. We assume that the application will run on Unix-like environments, such as Linux or Windows Subsystem for Linux.

Please ensure that the following programs are installed on the target system:

  • Docker
  • .NET SDK v6.0
  • Git
  • GNU Bash, Make, Tar, and Wget
  • Node.js v18.x (install via nvm)

Warning

If mentioned, preserve versions of packages due to library dependencies.

Clone the repository with submodules and navigate to its root folder:

$ git clone --recurse-submodules https://github.com/zhukovdm/smartwalk.git
$ cd ./smartwalk/

ADVICE: Docker-related commands require the current user to be a member of the docker group to avoid using sudo (or similar) repeatedly. See details at Manage Docker as a non-root user.

Data preparation

This section explains how to prepare data for two system components: the Database (entity store and index) and the Routing engine.

Warning

The complexity of extracting and building data structures depends on the size of a particular region and might be time- and resource-consuming, especially when processing OSM dumps.

Initialization

Navigate to the data folder, assuming you are in the root folder of the smartwalk repository:

$ cd ./data/

Decide which part of the world you are interested in. Download the corresponding pbf-file at Geofabrik, and store it in the ./assets/osm-maps/ folder. As an example, the following command makes use of the wget utility to obtain the latest dump of the Czech Republic:

$ wget \
    -O ./assets/osm-maps/czech-republic-latest.osm.pbf \
    https://download.geofabrik.de/europe/czech-republic-latest.osm.pbf

Open Makefile and set the value of REGION_FILE accordingly. Some of the OSM dumps are quite large and additional refinement might be necessary. There are four variables REGION_X, where suffix X can be any of W (West), N (North), E (East), or S (South), defining a bounding box. Entities outside this box are filtered out. To switch off filtering, set W=-180.0, N=85.06, E=180.0, and S=-85.06 (see EPSG3857 for details).

Create folders for storing data, and restore project dependencies:

$ make init-data

Routing engine

Build a data structure for the routing engine:

$ make routing-engine

The command pulls this Docker image and builds the search structure in several consecutive phases. The results are stored in the ./assets/routing-engine/ folder.

ADVICE: An instance of OSRM backend is able to load only one osrm-file at a time. This limitation can be overcome via merging (see osmosis).

ADVICE: It is possible to extract routing data for several regions and keep all files in the same folder as long as the original pbf-files have distinct names. Use environment variables to select a part of the world on engine start.

Entity store and index

Start up a containerized database instance:

$ docker compose up -d

ADVICE: Enter docker container ls repeatedly to print out the list of existing containers. Wait until smartwalk-database is healthy.

Clean up all previous data, create new collections and indexes:

$ make init-database

Obtain the most popular OSM keys from Taginfo and store results in the ./assets/taginfo/ folder:

$ make taginfo

ADVICE: A list of tags can be extended by altering Makefile, although this is not enough to enable their full potential. The constructor of KeywordExtractor shall reflect changes as well. Never remove tags from the list as it may brake things unexpectedly. Modifying tag list is not a typical operation and may require deeper knowledge of the system.

Extract entities from the pbf-file:

$ make database-osm

As part of the procedure, the routine makes several GET requests to the Overpass API. A query is configured to time out after 100 seconds, though the server usually responds within the first 10.

ADVICE: To make queries feasible for the Overpass API, the selected bounding box is divided into smaller pieces. The recipe has two switches --rows and --cols defining the grid.

Create stubs for new entities from the Wikidata knowledge graph:

$ make database-wikidata-create

The script attempts to fetch data from a SPARQL endpoint. Requests may time out after one minute. Large regions are more likely to result in failures. Hence, the numeric constants were specifically chosen for the test setup and may not be suitable for other cases.

ADVICE: The recipe has --rows and --cols switches with functionality similar to database-osm.

Enrich the current dataset with information from Wikidata:

$ make database-wikidata-enrich

Enrich the current dataset with information from the DBPedia knowledge graph:

$ make database-dbpedia

Collect supporting data to aid autocomplete functionality:

$ make advice

Finally, stop the database instance:

$ docker compose down

All relevant files are stored in the ./assets/database/ folder.

Incremental updates

The system supports incremental updates to incorporate new versions of datasets. It is possible to re-run blue-highlighted commands with no impact on data integrity. The programs are designed to update only defined properties without replacing entities, acting as an idempotent function.

advice should be re-generated whenever the database state is altered.

step dependencies

Dumping database

Dump the current database state:

$ make dump

The command creates keyword.txt and place.txt in the ./assets/dump/ folder.

If necessary, archive these files for publishing:

$ cd ./assets/dump/
$ tar -czf smartwalk-[kind]-[timestamp].tar.gz *.txt

Restoring database

Ensure that a database container is up and running, see Entity store and index for more details. Clean up the database and restore the state from files:

$ make init-restore && make init-database && make restore

This procedure expects keyword.txt and place.txt to be in the ./assets/dump/ folder.

ADVICE: Before starting the application in development or production mode, stop the database container and double-check that data for the routing engine is available.

Examples of archived dumps can be found here. Those having prod in their names are the most data-rich covering beautiful Prague.

If necessary, unpack a downloaded archive as follows:

$ cd ./assets/dump/
$ tar -xzf smartwalk-[kind]-[timestamp].tar.gz

Running the app

The purpose of this section is to explain how to start the system in development and production modes. We assume that you are in the root folder of the smartwalk repository, and key datasets have been prepared or restored and are available in their respective folders.

Development environment

This environment is intended primarily for developers and testers. It enables controlling parts of the system independently.

There are four system components involved in the setup: the frontend, backend, database, and routing engine. The first two run directly in the terminal, while the last two are Docker containers. The table below summarizes their roles and the mapping of system ports.

Component Ports Role
database localhost:27017 Entity store and index
routing localhost:5000 Routing engine
backend localhost:5017 Application logic (hot reload)
frontend localhost:3000 Static files (hot reload)

Restore backend and frontend dependencies:

$ make init-dev

Note

For convenience, components can be started and stopped directly from the smartwalk root folder. Please refer to Makefile. Recipe names follow the pattern [component_name]-dev[-stop].

Database

Start and stop an instance of database using database-dev[-stop] from Makefile.

Routing engine

Start and stop an instance of routing engine using routing-dev[-stop] from Makefile.

Set OSRM_REGION_FILE in .env.development to load a region other than the Czech Republic.

Backend

The project is located in ./app/backend/. Run dotnet run from there to start the backend in the terminal, and stop it by pressing Ctrl+C. Read more about other commands in README.md.

Warning

This component requires the database to be up and running. Otherwise, it fails to start.

The source code uses SMARTWALK_MONGO_CONN_STR and SMARTWALK_OSRM_BASE_URL environment variables. Adjust launchSettings.json respectively if you wish to alter the default port configuration.

Frontend

Find the project in ./app/frontend/. Run npm start from there to start the frontend in the terminal, and stop it by pressing Ctrl+C. Learn more about other commands in README.md.

The source code uses REACT_APP_SMARTWALK_API_ORIGIN environment variable. Set its value in the .env.development file if you wish to run backend on another port.

Production environment

This environment is a tightly coupled bundle consisting of four interconnected Docker containers; its detailed schema is shown in the picture below.

docker production setup

Note

The database exposes port 27017 for manual diagnostic and performance testing. Hide it if none of the mentioned reasons is your case.

Start and stop production environment from the root folder of the repo:

$ make prod[-stop]

ADVICE: All containers implement healthcheck, run docker container ls to see their state.

The respective environment variables are defined in the .env.production file.

Troubleshooting

WSL runs out of memory

If your WSL consumes too much memory, Windows might suddenly terminate the entire process without prior notice. Try to mitigate the issue by extending the swap file; set swap=XXGB in the .wslconfig file. For more details, see Example .wslconfig file.

A container starts for too long

If any of the containers is unhealthy or starting for too long (healthcheck has failed repeatedly on the background), replace [container_name] placeholder by the name of a problematic instance and press Enter to find out the reason:

$ docker container ls -a

CONTAINER ID   IMAGE                    ...   NAMES
...            ...                      ...   ...
377fe35d4472   smartwalk/proxy:v1.0.0   ...   smartwalk-proxy
...            ...                      ...   ...

$ docker inspect --format "{{json .State.Health }}" [container_name]

Nothing seems to help

If the system does not work properly and you do not know what to do, clean up files and start from scratch. As the first step, remove SmartWalk images:

$ docker image rm smartwalk/proxy
$ docker image rm smartwalk/backend
$ docker image rm smartwalk/routing

Clean up unused Docker files (cached build files, dangling images and volumes, etc.):

Warning

Use this command with caution as it may introduce undesired changes into your Docker host. Read about side effects here.

$ docker system prune --volumes