Using Python Pandas to turn ISO Country Codes into a string to use as values for a SQL Query

Summary, code, resources

Problem

While querying a table (based on this dataset) with SQL, you realize one of your columns uses 3-character ISO Country Codes. However, some of these 3-character codes aren’t countries but geographical regions or groups of countries, in addition to the actual country codes. How can you filter out rows so you are left with the countries only?

Answer

Use the Python Pandas library to scrape ISO country codes and convert the values to one single string. Then use that string as values for a subsequent SQL query (possibly something like this):

SELECT * FROM [your_table]
WHERE country_code IN ([values from the generated list-as-string separated by commas and encased by single / double quotes]);

Code

# Libraries used in this code
from bs4 import BeautifulSoup
import requests
import csv 
import pandas as pd 

# I found these ISO country codes on the below URL. Pandas makes it easy to read HTML and manipulate it. Very cool!
iso_codes = pd.read_html("https://www.iban.com/country-codes")

# I create a data frame, starting at an index of 0.
df = iso_codes[0]

# But really, all I care about is the 3-digit country code. So I'll make that the df (dataframe) and strip out the index
df = df['Alpha-3 code'].to_string(index=False)

# From here, I'll save this little guy as a text file.
with open("./countries.txt", "w") as f:
    f.write(df)

# I'll set up a list. *** This was my approach, but if you find a better way, feel free to comment or adjust. ***
my_list = []

# Then I'll open that text file and read it in.
file = open("./countries.txt", "r")
countries = file.read()

# I need to remove the "new line" identifiers, so I'm doing that here. 
my_list = countries.split('\n')

# Once I do that, I can create two new strings. I do this with f-Strings. Great article on using them here: https://realpython.com/python-f-strings/ 

# I have two options here: one where the codes are contained by single quotes, the other with double quotes. Oracle Autonomous Database likes single quotes, but your DB may differ.

countries_string_single_quotes = ','.join(f"'{x}'" for x in my_list)

countries_string_double_quotes = ','.join(f'"{x}"' for x in my_list)

# From here, I take those strings and save them in a text file. You don't have to do this; you can print and copy/paste the string. But this might be an excellent addition if you want to refer to these later without running all the code. 

with open("./countries_as_list_single_quotes.txt", "a") as f:
    f.write(countries_string_single_quotes)

with open("./countries_as_list_double_quotes.txt", "a") as f:
    f.write(countries_string_double_quotes)

GitHub repo details

You can find the code from this post in my GitHub repository. The repository consists of the following:

  • The Python code I created for solving this problem
  • A countries.txt file, which is produced midway through the code (temporary placeholder for later processing)
  • ‘Single quotes’ .txt file – the 3-character ISO Country Codes are formatted as a string. The values are enclosed by single quotes; commas throughout
  • “Double quotes” .txt file – the 3-character ISO Country Codes are formatted as a string. The values are enclosed by double quotes; commas throughout

Resources


The in-depth walkthrough

Backstory

A few days ago, I stumbled upon this question in the r/datasets subreddit:

I spent most of the morning figuring out how I would go about this, and after some trial and error, I devised a plan. I decided to take the list of ISO Country Codes (which I found here) and use them as values for filtering in a SQL statement (later on in Oracle SQL Developer Web).

After some research, I figured out the proper SQL syntax for a successful query.

SELECT * FROM [your_table]
WHERE country_code IN ([values from the generated list-as-string separated by commas and encased by single / double quotes]);

From there, I knew I needed to work backward on those ISO Country Codes. Meaning I needed to take something that looked like this:

And turn it into something more workable. It turns out that grabbing this was pretty straightforward. I’m using Pandas primarily for this exercise, but first, I need to import some libraries:

# Libraries used in this code
from bs4 import BeautifulSoup
import requests
import csv 
import pandas as pd 

Next, I’ll use Pandas’ read_html function (this feels like cheating, but it’s incredible) to read in the table.

# I found these ISO country codes on the below URL. Pandas makes it easy to read HTML and manipulate it. Very cool!
iso_codes = pd.read_html("https://www.iban.com/country-codes")

# I create a data frame, starting at an index of 0.
df = iso_codes[0]

This is wild, but this is what the printout looks like:

pandas to_html printout of ISO codes table, chris hoina, senior product manager, oracle rest apis, oracle autonomous database, sql developer web, ORDS
The Pandas read_html() the function is powerful.

If you squint, you can see an “Alpha-2 code” and an “Alpha-3 code” column in the image. From here, I need to isolate the 3-code column. So I reshaped the data frame by making it a single column; dropping the index (this is optional, you could keep the index if you needed it; perhaps you wanted to create a separate table in your database).

# But really, all I care about is the 3-digit country code. So I'll make that the df (dataframe) and strip out the index
df = df['Alpha-3 code'].to_string(index=False)

I’ll save this data frame as a .txt file.

# From here, I'll save this little guy as a text file.
with open("./countries.txt", "w") as f:
    f.write(df)

This is only temporary (FYI: this is the only way I could figure out how to do this). It’ll look like this:

country codes as a temporary text file, chris hoina, senior product manager, oracle rest apis, oracle autonomous database, sql developer web, ORDS
The temporary .txt file of 3-character ISO Country Codes.

Next, I take that temporary text file and read it in. I’m going to add it to a list, so I’ll first create the empty list (aptly named “my_list“). I also need to remove the newline characters from the list; otherwise, if I don’t, then when I create my string of values (that comes in the final step), the string will look like this:

countries string with newline characters, chris hoina, senior product manager, oracle rest apis, oracle autonomous database, sql developer web, ORDS
The “countries” string with “\n” characters.

I remove the newline characters with this piece of code:

# I need to remove the "new line" identifiers, so I'm doing that here. 
my_list = countries.split('\n')

The almost string of values will look like this:

viewing ouput of my_list, chris hoina, senior product manager, oracle rest apis, oracle autonomous database, sql developer web, ORDS
New line characters have now been removed.

I use F-Strings to create the following two strings; countries_strings_single_quotes and countries_strings_double_quotes, respectively. Need to learn about F-Strings (or, more formally, Literal String Interpolation)? No problemo! Check out these three resources:

The code for the F-Strings is below. I loop through my_list and separate the x (the things I’m iterating over) with commas (that’s the join).

# Once I do that, I can create two new strings. I do this with f-Strings. Great article on using them here: https://realpython.com/python-f-strings/ 

# I have two options here: one where the codes are contained by single quotes, the other with double
# quotes. Oracle Autonomous Database likes single quotes, but your DB may differ.

countries_string_single_quotes = ','.join(f"'{x}'" for x in my_list)

countries_string_double_quotes = ','.join(f'"{x}"' for x in my_list)

And now that I have these two objects (are they called objects??). I’ll save them each as a text file. One file has the 3-character codes surrounded by single quotes, the other with double quotes. The code:

# From here, I take those strings and save them in a text file. You don't have to do this; you can print
# and copy/paste the string. But this might be a nice addition if you want to refer to these later 
# without running all the code. 

with open("./countries_as_list_single_quotes.txt", "a") as f:
    f.write(countries_string_single_quotes)

with open("./countries_as_list_double_quotes.txt", "a") as f:
    f.write(countries_string_double_quotes)

The text files look like this now:

side by side comparison of newly created text files, chris hoina, senior product manager, oracle rest apis, oracle autonomous database, sql developer web, ORDS
The country codes are now presented in one long string. Pretty cool, eh?

SQL time

We have arrived! Let me show you what I can do now!

I took the CSV data from the World Bank and loaded it into my Autonomous Database. Our returning intern Layla put together a video of how to do this; you can check it out here:

Once my table was created, I did a SELECT [columns] FROM. Here you can see my “beginning state”.

There are 266 entries; some are countries, and others are not. And if you recall, the original question asked how somebody could filter out the non-countries. Onto that next!

This is the best part. I can take the string I made and use that in a SQL query such as this:

SELECT * from ADMIN.REDDIT_TABLE
WHERE COUNTRY_CODE IN('AFG','ALA','ALB','DZA','ASM','AND','AGO','AIA','ATA',
'ATG','ARG','ARM','ABW','AUS','AUT','AZE','BHS','BHR','BGD','BRB','BLR','BEL',
'BLZ','BEN','BMU','BTN','BOL','BES','BIH','BWA','BVT','BRA','IOT','BRN','BGR',
'BFA','BDI','CPV','KHM','CMR','CAN','CYM','CAF','TCD','CHL','CHN','CXR','CCK',
'COL','COM','COD','COG','COK','CRI','CIV','HRV','CUB','CUW','CYP','CZE','DNK',
'DJI','DMA','DOM','ECU','EGY','SLV','GNQ','ERI','EST','SWZ','ETH','FLK','FRO',
'FJI','FIN','FRA','GUF','PYF','ATF','GAB','GMB','GEO','DEU','GHA','GIB','GRC',
'GRL','GRD','GLP','GUM','GTM','GGY','GIN','GNB','GUY','HTI','HMD','VAT','HND',
'HKG','HUN','ISL','IND','IDN','IRN','IRQ','IRL','IMN','ISR','ITA','JAM','JPN',
'JEY','JOR','KAZ','KEN','KIR','PRK','KOR','KWT','KGZ','LAO','LVA','LBN','LSO',
'LBR','LBY','LIE','LTU','LUX','MAC','MKD','MDG','MWI','MYS','MDV','MLI','MLT',
'MHL','MTQ','MRT','MUS','MYT','MEX','FSM','MDA','MCO','MNG','MNE','MSR','MAR',
'MOZ','MMR','NAM','NRU','NPL','NLD','NCL','NZL','NIC','NER','NGA','NIU','NFK',
'MNP','NOR','OMN','PAK','PLW','PSE','PAN','PNG','PRY','PER','PHL','PCN','POL',
'PRT','PRI','QAT','REU','ROU','RUS','RWA','BLM','SHN','KNA','LCA','MAF','SPM',
'VCT','WSM','SMR','STP','SAU','SEN','SRB','SYC','SLE','SGP','SXM','SVK','SVN',
'SLB','SOM','ZAF','SGS','SSD','ESP','LKA','SDN','SUR','SJM','SWE','CHE','SYR',
'TWN','TJK','TZA','THA','TLS','TGO','TKL','TON','TTO','TUN','TUR','TKM','TCA',
'TUV','UGA','UKR','ARE','GBR','UMI','USA','URY','UZB','VUT','VEN','VNM','VGB',
'VIR','WLF','ESH','YEM','ZMB','ZWE')
ORDER BY COUNTRY_CODE ASC;

Once I execute that SQL statement, I’m left with the countries from that list. I opened up the results in another window so you can see a sample.

The end

So yeah, that’s it! I don’t know if this was the best way to go about this, but it was fun. I’m curious (if you’ve made it this far), what do you think? How would you go about it? Let me know.

And two more things: remember to share this and…

Leave a Comment