What are the ways to do feature engineering of city name?
Few months back, I was working on building a recommendation engine. While building the dataset for that project, I also thought of using city name as one of the feature. While searching for articles on how to feature engineer city name, i couldn’t find one. But after researching and asking for help, I found a list of the ways to feature engineer a city name.
The below are the list of different ways to embed a city name as a feature:
- find lat long, use them as 2 features
- Climatic features like: average temperature, precipitation, etc
- Economic features like: GDP
- find 3 nearest cities to a city, use them as features
- take lat long for above, add 6 features
- one-hot encode original cities
- one hot encode neighbor cities
- add population of cities as feat
- cluster nearby cities
Other noteworthy ways:
Embed Wikipedia page for city — Julien Chaumond
I also use external datasets to find distance from sea, elevation, whether it’s a capital and so on. — Ishan Dutta
Top landmarks with city name, based on use case. Major highways. Index of stats relative to other cities: weather, crime rate, cost of living, outdoor activities, schools, employment, tax. — Gold finger
City names get mentioned in news articles or wikipedia entries, which might give context or semantic similarity information. For generating embeddings, you can try this code — Nitin Kishore
I was amazed to see so many ways we can represent a city_name as a feature to our ML/DL models. If you have any questions regarding the way to implement any of them, leave them in the comment and i will try to help you find the solution.
Enjoy building Machine Learning solution!!