Correct nat earth city names at level 7/9 with geonames database.
The natural earth data set has a lot of spelling / encoding errors for cities in the detailled data set. During tile creation now the largest cities database from geonames.org is downloaded and used to correct city names. City names are corrected to the city whose name has the smallest Levenshtein distance and is not more than 5 km away. The distance check needs to allow for noise as well since there is no agreed city center point between natural earth and geonames. Internally the cities database is stored in a tile hash to provide a fast lookup. Candidates are retrieved from the matching tile and (due to spatial noise in the city center coordinates) its neighbors. The correction runs very fast, but might need some minor parameter tweaking still to avoid false negatives as well as false positives. Checking alternative city names from geonames.org might be helpful as well to better deal with translations.
parent
9e11b578
Please register or sign in to comment