What are the ways to do feature engineering of city name?

Photo by Raimond Klavins on Unsplash

Few months back, I was working on building a recommendation engine. While building the dataset for that project, I also thought of using city name as one of the feature. While searching for articles on how to feature engineer city name, i couldn’t find one. But after researching and asking for help, I found a list of the ways to feature engineer a city name.

The below are the list of different ways to embed a city name as a feature:

  • find lat long, use them as 2 features
  • Climatic features like: average temperature, precipitation, etc
  • Economic features like: GDP
  • find 3 nearest cities to a city, use them as features
  • take lat long for above, add 6 features
  • one-hot encode original cities
  • one hot encode neighbor cities
  • add population of cities as feat
  • cluster nearby cities

Other noteworthy ways:

Embed Wikipedia page for city — Julien Chaumond

I also use external datasets to find distance from sea, elevation, whether it’s a capital and so on. — Ishan Dutta

Top landmarks with city name, based on use case. Major highways. Index of stats relative to other cities: weather, crime rate, cost of living, outdoor activities, schools, employment, tax. — Gold finger

City names get mentioned in news articles or wikipedia entries, which might give context or semantic similarity information. For generating embeddings, you can try this code — Nitin Kishore

This way you can have city embeddings based on number of mentions in major news, crime rate, notable events, significant people, home city for celebrities etc.

Final words

I was amazed to see so many ways we can represent a city_name as a feature to our ML/DL models. If you have any questions regarding the way to implement any of them, leave them in the comment and i will try to help you find the solution.

Enjoy building Machine Learning solution!!




Machine and Deep Learning Practitioner — NLP & MLOps | Applied ML. Let’s connect! https://www.linkedin.com/in/amittimalsina/

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Understanding distance data embeddings

Notable Nodes: Identifying Influencers with Network Analysis

A Brief Introduction to MLE, MAP in Statistical Modeling

Tax Tax Planning Strategy Driven By Data and Machine Learning

Linear regression: the basics

Start-Up Data Mesh Blueprint: 3 Steps for Becoming a Data-Driven Start-Up through the Data Mesh

Covid-related Tweets Analysis Using Tweepy

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Amit Timalsina

Amit Timalsina

Machine and Deep Learning Practitioner — NLP & MLOps | Applied ML. Let’s connect! https://www.linkedin.com/in/amittimalsina/

More from Medium

Walking Through Training Models

Machine Learning #2: Supervised Learning

A Quick Glance at Machine Learning in Data Science:

Managing Machine Learning Lifecycles with MLflow