Skip to content
Commit 850e124b authored by Dennis Nienhüser's avatar Dennis Nienhüser
Browse files

Correct nat earth city names at level 7/9 with geonames database.

The natural earth data set has a lot of spelling / encoding errors
for cities in the detailled data set. During tile creation now the
largest cities database from geonames.org is downloaded and used to
correct city names.

City names are corrected to the city whose name has the smallest
Levenshtein distance and is not more than 5 km away. The distance
check needs to allow for noise as well since there is no agreed
city center point between natural earth and geonames.

Internally the cities database is stored in a tile hash to provide
a fast lookup. Candidates are retrieved from the matching tile and
(due to spatial noise in the city center coordinates) its neighbors.

The correction runs very fast, but might need some minor parameter
tweaking still to avoid false negatives as well as false positives.
Checking alternative city names from geonames.org might be helpful
as well to better deal with translations.
parent 9e11b578
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment