You are able to request an access token from OCI IAM. Yet, when you issue the subsequent request to your target resource (an ORDS endpoint), you receive the following message (error="invalid_token"):
WWW-Authenticate:Bearerrealm="Oracle REST Data Services",error="invalid_token"
Actions you’ve taken
You’ve done the following in OCI:
Registered an Integrated Application with Oracle Identity and Access Management (IAM)
Created a Primary Audience & Scope
Obtained your Client ID and Client Secret
Configured your networking correctly (or at least have high confidence it’s configured correctly)
Created the JWT Role and Privilege (which should be the same as the OCI Scope name)
And protected your target resource (aka ORDS API)
You’ve placed everything where it should be in your choice of API testing tool (cURL, Postman, Insomnia, etc.).
YET…you still receive this error="invalid_token" message, it is quite possible that you have not made the JWK URL publically accessible in OCI IAM.
Solution
Here is how you can verify and resolve this issue. First, navigate to your domain, then select Settings.
If this Configure client access box is unchecked, it is likely the culprit. Check it, then select Save Changes (the button at the bottom of the screen).
This box must be checked so your application can automatically access a JWK URL (to be used for decoding a JWT) without having to sign in to the OCI tenancy.
Then, once you re-attempt your HTTP request, ORDS will be able to:
Access the JWK URL (which you’ve included when you created your JWT Profile)
Verify the authenticity of your JWT, and
Respond with the results from your resource (ORDS endpoint)
Et voilà! And that’s it, you’re back in business!
To-do list
I think we have some action items, too:
Investigate this error message and see if we can improve the message to the user (we’ve already filed an enhancement request on this)
Update the docs to be more specific on this setting and where to find it (a documentation bug has already been filed for this)
Determine if this is a good candidate for adding to the troubleshooting section of our guide
ALERT: This is going to seem extremely out of context! But this post actually loosely relates to the ORDS Pre-hook functions section of our docs. I'm in the process of (1) working on a presentation and (2) updating this section of the docs as well (productivity trifecta for the win!), hence why we are here.
Hypothetical scenario
Hypothetically speaking, let’s say you were interested in learning more about Common Gateway Interface (CGI) Environment variables1, what they are, and how to use ORDS to REST-enable a function to produce these variables. If that is the case, you are in luck, my friend!
What follows is a quick way for you to learn more about these variables (as they relate to the Oracle database) and use ORDS in the process!
An excerpt from another “work in progress”
For this example, we’ll rely on the OWA_UTIL PL/SQL package, specifically the PRINT_CGI_ENV procedure (an HTML utility; one of three utility subprograms in the OWA_UTIL package). First, create a Resource Module and Template. Then, when creating a Handler, choose plsql/block as the Source Type and use the PRINT_CGI_ENV procedure in the Handler code.
Like this:
Begin OWA_UTIL.PRINT_CGI_ENV;End;
I created this Resource Module on my “localhost;” your URI will differ in an Autonomous Database – Always Free account (sign-up here).Remember you can export PL/SQL definitions.In case you want to reproduce the Resource Module.
From there, either copy and paste this Handler’s URI (in the above example, that is https://localhost:8443/ords/ordstest/v1/api) into a new terminal session (if using a tool like a curl), or Postman (or a similar testing tool), or navigate to the URI in a new browser tab or window. You’ll see all the CGI Environment variables that are sent back (in an unauthenticated server response) to you, a client, or an application. Pretty neat trick, eh?
Here is an example of the response from an Autonomous Database – Always Free tenancy:
Here is a curl command response from a development configuration (i.e., A locally installed ORDS instance running in Standalone mode and a 23ai database in a Podman container).
About using the -k --insecure option in this curl command (HINT: to circumvent TLS for development purposes).
As you can see, there is tons of data to work with; something to remember if you want to use CGI Environment variables with your ORDS pre-hook (YOU DO NOT HAVE TO; I’m just showing you an example of one of the countless possibilities!).
Start small
You might want to start small by implementing a security policy using something as simple as the QUERY_STRING variable (e.g., where perhaps your ORDS prehook function calls upon an underlying function or procedure that uses a query string as a parameter). Our pre-hook example does something like this, actually 😀.
Check this out; look what happens when I append ?chris to the end of this URI:
And like magic, the QUERY_STRING CGI Environment variable now has a value assigned to it! See how simple and automatic this is?
Something to think about: even if you don’t care about CGI Environment variables today, I guarantee this will be useful in the future. I bet you’ve been in a position where at least some of this is relevant to you on any given week. So, if nothing else, maybe REST-enable this PRINT_CGI_ENV procedure, so you have it ready whenever you need it!
The end
That’s all for now, folks. This is a quick post that hopefully will come in handy one day 😎. Until next time, keep calm and query on.
Follow
And don’t forget to follow, like, subscribe, share, taunt, troll, or stalk me!
There’s plenty to talk about in this release. However, I’m most excited about the performance improvements, ORDS sample applications, and documentation changes. Read on to get the whole story.
Enhancements
API Performance
REST API responses from either AutoREST or customer based modules could see as much as a 30% improvement in response times.
About a year ago, we introduced (we owe all our progress to Orla though) an internal program to track performance changes/improvements across ORDS APIs quantitatively. I can’t go into too much detail, but here is what I can divulge:
Although we use K6 for our performance testing, we are not promoting its use over any other available performance testing solution. There are other great tools available (e.g., Artillery, JMeter, and, of course, k6).
Testing is performed nightly against a 23ai database (installed in a PDB); we also include APEX in these tests.
For the tests, 250 schemas are created and then populated with various database objects (e.g., Functions, Materialized Views, PLSQL Packages, Sequences, Tables, Triggers, JSON Relational Duality Views, etc.)
These schemas are then absolutely hammered with Virtual Users. Users perform actions such as auto-REST enabling objects, creating custom Resource Modules, creating JSON Relational Duality Views, interrogating ORDS Metadata, and performing bulk inserts (BATCHLOAD) GETs, POSTs, etc.
These metrics are what we use to track the ORDS quantitative metrics longitudinally.
So, that’s what we mean by “performance improvements.” Pretty cool, eh?
NOTE: I don't know if that 30% average is mean or median. So, for all you stat nerds, don't ask 🤣!
ORDS Sample applications
We have not one but TWO sample ORDS applications for you 😍!
Flask/Python
The first is a fully contained LiveLabs sandbox workshop, which can be found here. But if you want to remix the code, check out my repo here (everything is heavily commented; hopefully, this will ease your pain).
Node.js/React
Secondly, our development team has created a brand new advanced application. Details are here.
NOTE: We'll continue to iterate and improve on both, so please share with us your feedback!
OAuth2.0 changes
A consolidation and streamlining of the OAUTH and OAUTH_ADMIN PLSQL packages. The details:
We’ve consolidated those mentioned above into these two new packages:
ORDS_SECURITY
ORDS_SECURITY_ADMIN
The OAUTH and OAUTH_ADMIN PL/SQL Packages have been deprecated by royal decree. However, they’ll still be included until ORDS version 25.3 (this time next year).
Creating a client and receiving your Client ID and Client Secret is now streamlined, and Client Secrets can now be rotated (by supporting two active Client Secrets while in rotation).
Locating the new PL/SQL Packages:
Finding PL/SQL Packages in Database Actions.
23ai Boolean
ORDS now returns BOOLEAN types as JSON TRUE|FALSE properties instead of 0|1.
What this looks like in practice with various configurations1:
Oracle Database 23ai + ORDS 24.3
Oracle Database 23ai + ORDS 24.2
Oracle DB 21c Enterprise Edition + ORDS 24.3
Not possible. #RedHerring
1Thank you internet stranger for providing us with this juicy bit of code.
Mong[ooohhh, no, you didn’t?!] DB API
Support for even more Database Administration commands:
listIndexes
dropIndexes, and
optional parameter expireAfterSeconds (which applies to the createIndexes command)
The following MongoDB Aggregation Stages are now supported:
UPDATE: MongoDB API update article (October 10, 2024)
A brand new article about the latest MongoDB API updates just dropped! Thanks to Hermann for publishing and sharing the latest. Details are here.
Documentation
Introduced the following new sections:
6.2.4 Using OCI Monitoring Service with Oracle REST Data Services
This new section details the configuration of the recently added ords-metrics utility. You can find details on how to set up this monitoring service (to communicate with OCI) here.
3.2 Deploying ORDS with Central Configuration Server
Along with the updated docs, we’ve included the OpenAPI spec for creating the endpoints required for a central configuration server (and a special video clip of me retrieving the PL/SQL definitions and the OpenAPI spec in Database Actions).
In our Release Notes, we claim support for the following JDKs:
Oracle Java 11, 17, or 21
Oracle GraalVM Enterprise Edition for Java 11
Oracle GraalVM Enterprise Edition for Java 17
Oracle GraalVM Enterprise Edition for Java 21
However, this may be confusing regarding Oracle GraalVM Enterprise Editions. You should know that there are currently TWOOracle GraalVM Enterprise Edition JDKs:
Oracle GraalVM Enterprise Edition 20
Oracle GraalVM Enterprise Edition 21
Instead of how we’ve presented, here is another, cleaner presentation of these JDKs:
Oracle GraalVM Enterprise Edition 20
Oracle GraalVM Enterprise Edition 21
Linux (x86-64): Java 8, 11
Linux (x86-64 and aarch64): Java 8, 11, 17
macOS (x86-64): Java 8, 11
macOS (x86-64 only): 8, 11, 17
Windows (x86-64): Java 8, 11
Windows (x86-64 only): 8, 11, 17
Oracle GraalVM Enterprise Edition details
So when you are choosing your JDK (to use with ORDS), make sure you consider your platform and use cases. Details on using GraalVM with ORDS here.
fin
This concludes the release notes supplement.
This space ⬇️ left intentionally blank.
Follow
And don’t forget to follow, like, subscribe, share, taunt, troll, or stalk me!
If you are coming from the previous related post, then you’ll recall I used the following SQL query:
Remember, this SQL is querying the Movie View.
My next step is to take this SQL and bring it to the REST Workshop, where I’ll turn it into an API.
REST Workshop
There are several ways you can navigate to the REST Workshop. Typically, I return to the Database Actions LaunchPad. From there, I select REST.
The Handler code
I've already created my Resource Module, Template, and Handler. I kept everything default, with no authentication enabled.
The only thing I changed was the SQL query. I removed the final line, fetching the first 10 only. I want to be able to control the pagination of the API. If I were to keep that last line, this eventual endpoint would always only return the first 10 rows. And what if I want the next ten rows thereafter? Well, if I hard-code this, then I can’t really make that work. So, I chose to leave it open-ended.
Technically, it is NOT open-ended because I retained the default pagination of 25. But, by removing that fetch first 10 rows condition, I can now fetch ALL rows that fit those parameters (in increments of 25).
If I visit this new endpoint, it will appear like this:
And if I collapse the items, you’ll see something that is EXTREMELY confusing. If I removed that fetch first 10 rows condition in the original SQL query, then why do we see a limit and offset of 10?
The answer is because I actually set the Items Per Page equal to 10 (in the Resource Handler). This is the REST equivalent of a dirty joke. Consider yourself roasted…
JavaScript
With that endpoint live, I can take the API and drop it into some sample JavaScript and HTML code.
JavaScript and HTML
I learned a great deal about this JavaScript by reviewing this YouTube video. That is where I learned how to map through the items of my ORDS payload. And there was a refresher on JavaScript string interpolation (with template literals) too!
PAUSE: Don't be too intimidated by string interpolation and template literals! Read the link I included, and take your time. If you are coming from Python, its similar to Jinja (when using Flask) and f-string literals 🙃.
You can see that I’m using the map() constructor to iterate through all the data in my JSON payload. Remember, this was the payload in the items portion of my endpoint!
I believe the item in list.map((item) is a reference to an individual item inline 4’s data.items. The reason why I think this is because if I change the items in lines 7-10 in my JavaScript to something random, like the name bobby, things start to break:
However, if I change everything back to item, and start the live server in VS Code, I’ll be met with the following rendering:
That’s it, though. Combining the ORDS API, the Fetch API, simple JavaScript, and HTML will allow you to create this straightforward web page.
Reviewing Inspector, Console, Network
I also have a few more screenshots, one each for the HTML Inspector, Console Log, and Client/Server Network. All of these show what is happening under the covers but in different contexts.
Inspector
In the Inspector, you can see how the JavaScript map() constructor plus the document.querySelector() in line 18 of the JavaScript code work in tandem with line 12 of the HTML script to display contents on the page:
Console
Here, you can see the items in the Console. This is because we added console.log(item)in line 19 of the JavaScript code.
Network
Finally, you can see the 200 GET request from our ORDS API. Then, on the far right of the screen, you can see the JSON payload coming from that same ORDS endpoint.
Admittedly, the way the “Cast” is displayed is not correct. That is yet another array of cast members. And I’ve yet to learn how to structure that correctly. So, if you are reading this, and you know, let me know!
In this example, we call this collection Movie_Collection.
-- create and load movie json collection from a public bucket on object storage
begin dbms_cloud.copy_collection ( collection_name => 'MOVIE_COLLECTION',file_uri_list => 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/c4u04/b/moviestream_gold/o/movie/movies.json', format => '{ignoreblanklines:true}'); end; /
👆🏻 This is the code I used to create, copy, and ingest this collection into my Autonomous Database.
README: As far as I know, the link above (the one in the code example) will remain stable for the foreseeable future. I'm pretty sure we use it in one of many of our LiveLabs. A lot of what I'm covering here is actually in Task 7 of this LiveLab.
Notice the “PL/SQL procedure successfully completed.” message.
Create a view
With that JSON collection in place, I can then create (aka I’ll continue stealing this code from the same LiveLab) a View of it using the following SQL code:
/* Create a view over the collection to make queries easy */
create or replace view movie as select json_value(json_document, '$.movie_id' returning number) as movie_id, json_value(json_document, '$.title') as title, json_value(json_document, '$.budget' returning number) as budget, json_value(json_document, '$.list_price' returning number) as list_price, json_value(json_document, '$.gross' returning number) as gross, json_query(json_document, '$.genre' returning varchar2(400)) as genre, json_value(json_document, '$.sku' returning varchar2(30)) as sku, json_value(json_document, '$.year' returning number) as year, json_value(json_document, '$.opening_date' returning date) as opening_date, json_value(json_document, '$.views' returning number) as views, json_query(json_document, '$.cast' returning varchar2(4000)) as cast, json_query(json_document, '$.crew' returning varchar2(4000)) as crew, json_query(json_document, '$.studio' returning varchar2(4000)) as studio, json_value(json_document, '$.main_subject' returning varchar2(400)) as main_subject, json_query(json_document, '$.awards' returning varchar2(4000)) as awards, json_query(json_document, '$.nominations' returning varchar2(4000)) as nominations, json_value(json_document, '$.runtime' returning number) as runtime, json_value(json_document, '$.summary' returning varchar2(10000)) as summary from movie_collection ;
Here is what the code looks like in the SQL Worksheet (a part of Database Actions).
Accomplished in the SQL Worksheet.
With that View created, you could go one step further and query with even more specific SQL. In this case, I’ll query the View but exclude any entries where a movie cast does not exist:
Select title, year, gross, cast from movie Where cast is not null Order By 3 DESC nulls last Fetch first 10 rows only;
Here is the SQL, with the Script Output below:
Notice the output of the executed SQL.
ORDSify it®
With ORDS, we can REST-enable pretty much any database object we want.
I have objects, Greg. Can you REST-enable me?
But after spending a few minutes with this collection, I found the MOVIE View to be the easiest, most sensible object to highlight. It’s a straightforward process, with primarily right-mouse clicks.
From the Navigator Panel, select Views from the list of available database objects.
From the Navigator Tab, select Views from the drop-down menu.
Then, right-click on the Movie View, and select REST, then Enable.
Right-click on the Movie View and select REST then Enable.
A slider will appear, for this example, I’ll keep everything default and click Enable (no authentication either, I’m being lazy).
This slider will appear; since you are auto-REST enabling, you can accept the default settings and click Enable.
A Confirmation notification will appear in the upper right-hand corner of the browser 👍🏻.
You’ll know it worked because you’ll see this confirmation message appear.
Navigate back to the Movie View, and select the cURL command option. Twasn’t there before, tis now!
Go back to the Movie view and select the [new] cURL command option.
Select the GET ALL endpoint, and copy the URI. JUST the URI portion!
From the list of available auto-magically created REST endpoints, copy the GET ALL endpoint.
Open a new browser tab, or window, and navigate to the URI. You’ll see everything in the Movie View now!
That endpoint (URI) in the browser looks like this 👆🏻
Why a view?
Yeah, good question. There is probably a performance improvement with using Views. Based on what I’m finding, I can’t definitively say, but they require zero storage, so that’s at least a savings on some resource somewhere. Additionally, I think the customization is pretty compelling, too. What do I mean by that? Well, allow me to elucidate:
Views can provide a different representation (such as subsets or supersets) of the data that resides within other tables and views. Views are very powerful because they allow you to tailor the presentation of data to different types of users.
In this case, the Movie View returns everything found in the collection. I can subset this even further though; by taking that SQL query and REST-enabling it, too. I will, but in a future post.
For now, I’ve left you with an easy way to REST-enable a View (In this case, based on a JSON Collection) that resides in your Autonomous Database.
If you want to try the LiveLab (which you should, as it’s easy and VERY informative), go here. You’ll need an Always Free OCI account, too (so you can provision an Autonomous Database). You can sign up here.
Oh, and we are entering into the season of Cloud World 2024, so mark your calendars 🤪!
That’s all for now 😘.
Follow
And don’t forget to follow, like, subscribe, share, taunt, troll, or stalk me!
The plan was to create an ORACLE REST endpoint and then POST a CSV file to that auto-REST enabled table (you can see how I did that here, in section two of my most recent article). But, instead of doing this manually, I wanted to automate this POST request using Apple’s Automator application…
Me…two paragraphs from now
Follow along with the video
The plan
I did it. I went so far down the rabbit hole, I almost didn’t make it back alive. I don’t know when this ridiculous idea popped into my head, but it’s been well over a year. Until now, I either hadn’t had the time or the confidence to really tackle it.
The plan was to create an ORACLE REST endpoint and then POST a CSV file to that auto-REST enabled table (you can see how I did that here, in section two of my most recent article). But, instead of doing this manually, I wanted to automate this POST request using Apple’s Automator application.
The use case I made up was one where a person would need to periodically feed data into a table. The data doesn’t change, nor does the target table. Here is an example of the table I’m using:
The basic structure of the Bank Transfers table
And the DDL, should there be any interest:
CREATE TABLE ADMIN.BANK_TRANSFERS
(TXN_ID NUMBER ,
SRC_ACCT_ID NUMBER ,
DST_ACCT_ID NUMBER ,
DESCRIPTION VARCHAR2 (4000) ,
AMOUNT NUMBER
)
TABLESPACE DATA
LOGGING
;
Once this table was created, I auto-REST enabled the table and retrieved the complete cURL Command for performing a Batch Loadrequest. Remember, we have three examples for cURL Commands now, I chose Bash since I’m on a Mac:
Retrieving the the Batch Load cURL Command
Once I grabbed the cURL Command, I would temporarily save it to a clipboard (e.g. VS Code, TextEdit, etc.). I’d then create a new folder on my desktop.
The newly created ords_curl_post folder
How I actually did it
I’d then search via Spotlight for the Automator application. Once there, I’d choose Folder Action.
Choosing Folder Action for this automation
HEY!! README: I'm going to breeze through this. And it may seem like I am well-aquainted with this application. I am not.I spent hours debugging, reading through old StackExchange forums, and Apple documentation so I could share this with you. There is a ton more work to do. But bottom line, this thing works, and its something that is FREE and accessible for a lot of people. You could have a TON of fun with this stuff, so keep reading!
There’s no easy way to get around this, but to get really good at this, you’ll just need to tinker. Luckily, most of these automation modules are very intuitive. And there is a ton of information online on how to piece them all together.
Automator 🤖
All of these modules are drag-and-drop, so it makes it easy to create an execution path for your Folder Action application. Eventually, I ended up with this (don’t worry, I’ll break it down some, a video is in the works for a more detailed overview):
Complete Folder Action automation for the ORDS Batch Load request
The modules
The modules I’m using are:
Get Specified Finder Items
Get Folder Contents
Run Shell Script (for a zsh shell, the default for this MacBook)
Set Value of Variable
Get Value of Variable
Display Notification
You can see at the very top, that I have to choose a target folder since this is a folder action. I chose the folder I created; ords_curl_post.
Get Specified Finder Items and Get Folder Contents
The first two modules are pretty straightforward. You get the specified finder items (from that specific folder). And then get the contents from that folder (whatever CSV file I drop in there). That will act as a trigger for running the shell script (where the filename/s serve as the input for the cURL Command).
PAUSE: I must confess, I had essentially ZERO experience in shell scripting prior to this, and I got it to work. Its probably not the prettiest, but damn if I'm not stoked that this thing actually does what it is supposed to do.
The only main considerations on this shell script are that you’ll want to stay with zsh and you’ll want to choose “as arguments” in the “Pass input” dropdown menu. Choosing “as arguments” allows you to take that file name and apply it to the For Loop in the shell script. I removed the echo "$f" because all it was doing was printing out the file name (which makes sense since it was the variable in this script).
Choosing “as arguments“
The Shell Script
That cURL Command I copied from earlier looks like this:
I made some modifications though. I made sure Content-Type was text/csv. And then I added some fancy options for additional information (more details on this here, go nuts) when I get a response from the database.
REMINDER: I didn't know how to do this until about 30 mins before I got it to work. I'm emphasizing this because I want to drive home the point that with time and some trial-and-error, you too can get something like this to work!
With my changes, the new cURL Command looks like this:
What a mess…That -w option stands for write-out. When I receive the response from the Batch Load request, I’ll want the following information:
Response Code (e.g. like a 200 or 400)
Total Upload Time
Upload Speed
Upload Size
All of that is completely optional. I just thought it would be neat to show it. Although, as you’ll see in a little bit, Apple notifications has some weird behavior at times so you don’t really get to see all of the output.
I then applied the cURL command to the shell script, (with some slight modifications to the For Loop), and it ended up looking like this:
New shells script with updated cURL command
Here is what the output looked like when I did a test run (with a sample CSV):
Success on the cURL command
Set Value of Variable
All of that output, referred to as “Results”, will then be set as a variable. That variable will be henceforth known as the responseOutput (Fun fact: that is called Camel casing…I learned that like 3-4 months ago). You’ll first need to create the variable, and once you run the folder action, it’ll apply the results to that variable. Like this:
Creating a new variableResults from cURL command applied to variable
Get Value of Variable and Display Notification
Those next two modules simply “GET” that value of the variable/results and then sends that value to the Display Notification module. This section is unremarkable, moving on.
And at this point, I was done. All I needed to do was save the script and then move on to the next step.
Folder Actions Setup
None of this will really work as intended until you perform one final step. I’ll right-click the target folder and select “Folder Actions Setup.” From there a dialog will appear; you’ll want to make sure both the folder and the script are checked.
Selecting Folder Actions SetupDouble checking that everything is enabled
Trying it out
Next, I emptied the folder. Then I dropped in a 5000-row CSV file and let Folder Actions do its thing. This entire process is quick! I’m loving the notification, but the “Show” button simply does not work (I think that is a macOS quirk though). However, when I go back to my Autonomous Database, I can 100% confirm that this ORDS Batch Load worked.
Successful Batch LoadDouble checking the Autonomous Database
Final thoughts
This was relatively easy to do. In total, it took me about 3-4 days of research and trial and error to get this working. There is a lot I do not know about shell scripting. But even with a rudimentary understanding, you too can get this to work.
Next, I’d like to create a dialog window for the notification (the output from the cURL Command). I believe you can do that in AppleScript; I just don’t know how yet.
If you are reading this and can think of anything, please leave a message! If you want to try it out for yourself, I’ve shared the entire workbook on my GitHub repo; which can be found here.
I’ll also be doing an extended video review of this, where I’ll recreate the entire automation from start to finish. Be on the lookout for that too!
Overview and connecting with the python-oracledb library
Part II
Connecting with Oracle REST APIs unauthenticated
Part III
Custom Oracle REST APIs with OAuth2.0 Authorization
Welcome back
I finally had a break in my PM duties to share a small afternoon project [I started a few weeks ago]. I challenged myself to a brief Python coding exercise. I wanted to develop some code that allowed me to connect to my Autonomous Database using either our python-oracledb driver (library) or with Oracle REST Data Services (ORDS).
I undertook this effort as I also wanted to make some comparisons and maybe draw some conclusions from these different approaches.
NOTE: If you don't feel like reading this drivel, you can jump straight to the repository where this code lives. It's all nicely commented and has everything you need to get it to work. You can check that out here.
The test files
Reviewing the code, I’ve created three Python test files. test1.py relies on the python-oracledb library to connect to an Oracle Autonomous database while test2.py and test3.py rely on ORDS (test3.py uses OAuth2.0, but more on that later).
test1.py using the python-oracledb librarytest2.py relies on an unsecured ORDS endpointtest3.py with ORDS, secured with OAuth2
Configuration
Configuration directory
I set up this configuration directory (config_dir) to abstract sensitive information from the test files. My ewallet.pem and tnsnames.ora files live in this config_dir. These are both required for Mutual TLS (mTLS) connection to an Oracle Autonomous database (you can find additional details on mTLS in the docs here).
ewallet.pem and tnsnames.ora files
Other files
OAuth2.0, Test URLs, and Wallet Credential files
Other files include oauth2creds.py, testurls.py, and walletcredentials.py. Depending on the test case, I’ll use some or all of these files (you’ll see that shortly).
NOTE: If not obvious to you, I wouldn't put any sensitive information into a public git repository.
Connecting with python-oracledb
One approach to connecting via your Oracle database is with the python-oracledb driver (library). An Oracle team created this library (people much more experienced and wiser than me), and it makes connecting with Python possible.
FYI: I’m connecting to my Autonomous Database. If you want to try this, refer to the documentation for using this library and the Autonomous database. You can find that here.
The Python code that I came up with to make this work:
#Connecting to an Oracle Autonomous Database using the Python-OracleDB driver.
import oracledb
# A separate python file I created and later import here. It contains my credentials, so as not to show them in this script here.
from walletcredentials import uname, pwd, cdir, wltloc, wltpwd, dsn
# Requires a config directory with ewallet.pem and tnsnames.ora files.
with oracledb.connect(user=uname, password=pwd, dsn=dsn, config_dir=cdir, wallet_location=wltloc, wallet_password=wltpwd) as connection:
with connection.cursor() as cursor:
# SQL statements should not contain a trailing semicolon (“;”) or forward slash (“/”).
sql = """select * from BUSCONFIND where location='ZAF'
order by value ASC """
for r in cursor.execute(sql):
print(r)
In Line 7, you can see how I import the wallet credentials from the walletcredentials.py file. Without that information, this code wouldn’t work. I also import the database username, password, and configuration directory (which includes the ewallet.pem and tnsnames.ora files).
From there, the code is pretty straightforward. However, some library-specific syntax is required (the complete details are in the docs, found here), but aside from that, nothing is too complicated. You’ll see the SQL statement in Lines 16-17; the proper SQL format looks like this:
SELECT * FROM busconfind WHERE location='zaf'
ORDER BY value ASC;
And here is an example of this SQL output in a SQL Worksheet (in Database Actions):
Reviewing the SQL in Database Actions
FYI: This is a Business Confidence Index data-set, in case you were curious (retrieved here).
That SQL allows me to filter on a Location and then return those results in ascending orderaccording to the Value column. When I do this using the python-oracledb driver, I should expect to see the same results.
NOTE: You've probably noticed that the SQL in the python file differs from that seen in the SQL Worksheet. That is because you need to escape the single quotes surrounding ZAF, as well as remove the trailing semi-colon in the SQL statement. Its all in the python-oracledb documentation, you just have to be aware of this.
Once I have all the necessary information in my walletcredentials.py file, I can import that into the test1.py file and execute the code. I chose to run this in an Interactive Window (I’m using VS Code), but you can also do this in your Terminal. In the images (from left to right), you’ll see the test1.py file, then a summary of the output from that SQL query (contained in the test1.py code), and finally, the detailed output (in a text editor).
Executing the Python code in an Interactive WindowSummary output from test1.pyDetailed output from test1.py
Wrap-up
For those that have an existing Free Tier tenancy, this could be a good option for you. Of course, you have to do some light administration. But if you have gone through the steps to create an Autonomous database in your cloud tenancy, you probably know where to look for the tnsnames.ora and other database wallet files.
I’m not a developer, but I think it would be nice to simplify the business logic found in this Python code. Maybe better to abstract it completely. For prototyping an application (perhaps one that isn’t micro services-oriented, this could work) or for data- and business analysts, this could do the trick for you. In fact, the data is returned to you in rows of tuples; so turning this into a CSV or reading it into a data analysis library (such as pandas) should be fairly easy!
Connecting via ORDS: sans OAuth2.0
Auto-REST and cURL
I’m still using the “devuser” (although this may be unnecessary, as any unsecured REST-enabled table would do). I’m using the same table as before; the only change I’ve made is to auto-REST enable the BUSCONFIND table for the test2.py code.
In the following images, I’m retrieving the cURL command for performing a GET request on this table.
NOTE: In a recent ORDS update, we made available different shell variations (this will depend on your OS); I've selected Bash.
From there, I take the URI (learn more on URIs) portion of the cURL command and place it into my browser. Since this table is auto-REST enabled, I’ll only receive 25 rows from this table.
NOTE: The ORDS default pagination is limit = 25.
Getting the cURL command from an already ORDS REST-enabled tableSelecting the GET request for BashGET response in JSONThe raw JSON, pretty printed
The code
And the code for this test2.py looks like this:
# Auto-REST enabled with ORDS; in an Oracle Autonomous Database with query parameters.
import requests
import pprint
# Importing the base URI from this python file.
from testurls import test2_url
# An unprotected endpoint that has been "switched on" with the ORDS Auto-REST enable feature.
# Query parameters can be added/passed to the Base URI for GET-ing more discrete information.
url = (test2_url + '?q={"location":"ZAF","value":{"$gt":100},"$orderby":{"value":"asc"}}}')
# For prototyping an application, in its earlier stages, this could really work. On your front end, you
# expect the user to make certain selections, and you'll still pass those as parameters.
# But here, you do this as a query string. In later stages, you may want to streamline your application
# code by placing all this into a PL/SQL or SQL statement. Thereby separating application
# logic and business logic. You'll see this approach in the test3.py file.
# This works, but you can see how it gets verbose, quick. Its a great jumping-off point.
responsefromadb = requests.get(url)
pprint.pprint(responsefromadb.json())
Lines 8 and 13 are the two areas to focus on in this example. In Line 8 imported my URL from the testurls.py file (again, abstracting it, so it’s not in the main body of the code).
The test2.py and testurls.py files
And then, in Line 13, I appended a query string to the end of that URL. ORDS expects the query parameters to be a JSON object with the following syntax:
[ORDS Endpoint]/?q={"JSON Key": "JSON Value"}
The new, complete query string below requests the same information as was requested in the test1.py example:
This string begins with that same BASE URI for the ORDS endpoint (the auto-REST enabled BUSCONFIND table) and then applies the query string prefix “?q=” followed by the following parameters:
Filter by the location "ZAF"
Limit the search of these locations to values (in the Value column) greater than ($gt) 100
Return these results in ascending order (asc) of the Value column
NOTE: You can manipulate the offsets and limits in the python-oracledb driver too. More info found here. And filtering in queries with ORDS can be found here.
And if I run the test2.py code in the VS Code Interactive Window, I’ll see the following summary output.
Summary output from the response in test2.py
Here is a more detailed view in the VS Code text editor:
Detailed output with helpful links
Wrap-up
A slightly different approach, right? The data is all there, similar to what you saw in the test1.py example. There are a few things to note, though:
The consumer of this ORDS REST API doesn’t need access to the database (i.e. you don’t need to be an admin or have a schema); you can perform GET requests on this URI.
The response body is in JSON (ubiquitous across the web and web applications)
Also, language and framework agnostic (the JSON can be consumed/used widely, and not just with Python)
You are provided a URI for each item (i.e. entry, row, etc.)
No need for SQL; just filter with the JSON query parameters
No business logic in the application code
Needless to say, no ORMs or database modeling is required for this approach
However…security is, ahem…nonexistent. That is a problem and flies in the face of what we recommend in our ORDS Best Practices.
Connecting via ORDS: secured with OAuth2
Note: This is an abbreviated explanation, I'll be posting an expanded write-up on this example post haste!
Since this is what I’m considering “advanced” (it’s not difficult, there are just many pieces) I’m going to keep this section brief. Long story short, I’ll take those query parameters from above and place them into what is referred to as a Resource Handler.
TIME-OUT: Auto-REST enabling a database object (the BUSCONFIND table in this case) is simple in Database Actions. Its a simple left-click > REST-enable. You saw that in the previous example. You are provided an endpoint and you can use the query parameters (i.e. the JSON {key: value} pairs) to access whatever you need from that object.
However, creating a custom ORDS REST endpoint is a little different. First you create a Resource Module, next a (or many) Resource Template/s, and then a (or many) Resource Handler/s. In that Resource Handler, you'll find the related business logic code for that particular HTTP operation (the menu includes: GET, POST, PUT, and DELETE).
The Resource Module
The process of creating a custom ORDS API might be difficult to visualize, so I’ll include the steps I took along with a sample query (in that Resource Handler) to help illustrate.
Creating the Resource Module in the ORDS REST WorkshopCreating the Resource TemplateReviewing the available operations for the Resource TemplateThe newly created Resource GET HandlerPlacing the SQL directly into the Resource HandlerTesting out the code to simulate a GET request using "ZAF" as the locationReviewing the output of that SQL query, in a table format
Chances are you may be the administrator of your Always Free tenancy, so you have full control over this. Other times, you might be provided the REST endpoint. In that case, you may not ever have to worry about these steps. Either way, you can see how we’re simulating (as well as both abstracting and keeping the business logic in the database) the query with this final example (test3.py).
Security
The OAuth 2.0 authorization framework enables a third-party application to obtain limited access to an HTTP service, either on behalf of a resource owner by orchestrating an approval interaction between the resource owner and the HTTP service, or by allowing the third-party application to obtain access on its own behalf.
RFC 6749: The OAuth 2.0 Authorization Framework
I’ll keep this section brief, but I’m protecting this resource through the aid of an ORDS OAuth2.0 client. I’ve created one here:
After creating a client you can use the provided URL for requesting a new Bearer Token
And, as you’ll see shortly, I’ll rely on some Python libraries for requesting an Authorization Token to use with the related Client ID and Client Secret. If you want to nerd out on the OAuth2.0 framework, I challenge you to read this.
test3.py example
NOTE: Remember, I'm keeping this section intentionally brief. It deserves a slightly deeper dive, and class is almost over (so I'm running out of time).
The code for this example:
# Custom ORDS Module in an Oracle Autonomous Database.
import requests
from requests_oauthlib import OAuth2Session
from oauthlib.oauth2 import BackendApplicationClient
import pprint
import json
# Importing the base URI from this python file.
from testurls import test3_url
# A separate python file I created and later import here. It contains my credentials,
# so as not to show them in this script here.
from oauth2creds import token_url, client_id, client_secret
token_url = token_url
client_id = client_id
client_secret = client_secret
client = BackendApplicationClient(client_id=client_id)
oauth = OAuth2Session(client=client)
token = oauth.fetch_token(token_url, client_id=client_id, client_secret=client_secret)
bearer_token = token['access_token']
# Location can be anything from the table. Now, only the single variable needs to be passed. Business logic has been abstracted somewhat; as it now resides within
# ORDS. This could make your application more portable (to other languages and frameworks, since there are fewer idiosyncracies and dependencies):
location = "ZAF"
# print(location)
# ------------------------------------------------------------------------------ #
# In Database Actions, we:
# 1. Create an API Module
# 2. Then create a Resource Template
# 3. Finally, a GET Resource Handler that consists of the code from test1.py:
# select * from BUSCONFIND where location= :id
# order by value ASC
# ------------------------------------------------------------------------------ #
url = (test3_url + location)
# print(url)
responsefromadb = requests.get(url, headers={'Authorization': 'Bearer ' + bearer_token}).json()
# This step isn't necessary; it simply prints out the JSON response object in a more readable format.
pprint.pprint(responsefromadb)
Lines 11 and 16 deserve some attention here. The URL for Line 11 comes from the testurls.py file; seen in the previous example. And the contents from Line 16 come from the oauth2creds.py file. Here are the files, side-by-side:
The test3.py, testurls.py, and oauth2creds.py files
As you can see in the testurls.py file, I’m relying on the test3_url for this example. And the OAuth2.0 information you see comes directly from the OAuth Client I created in Database Actions:
In this image, you can see the Client ID and Client Secret
If I put that all together, I can execute the code in test3.py and “pretty print” the response in my Interactive Window. But first I need to adjust the Resource Handler’s URI (the one I copied and pasted from the “REST Workshop”). It retains the “:id” bind parameter. But the way I have this Python code set up, I need to remove it. It ends up going from this:
With that out of the way, I can run this code and review the output.
Running the test3.py code in the Interactive WindowReviewing the summary output – a JSON arrayReviewing the detailed view of the “items“Scrolling to the bottom of the GET response body to see the available links for additional items
From top-to-bottom, left-to-right you’ll see I first execute the code in the Interactive Window. From there I can review a summary of the response to my GET request. That pretty print library allows us to see the JSON array in a more readable format (one that has indentation and nesting); which you can see in the second image. The third image is a more detailed view of the first half of this response. And I include the final image to highlight the helpful URLs that are included in the response body.
Since I know my limit = 25, and the 'hasMore': True (seen in the output in that third image) exists, I know there are more items. You can adjust the limit and offset in subsequent requests, but I’ll save that for another day.
Wrap-up
You can probably tell, but this is like an expansion of the previous example. But instead of relying on the auto-REST enabling, you are in full control of the Resource Module. And while you don’t need to use OAuth2.0 it’s good practice to use it for database authentication. You can see how the response comes through a little differently, compared to the previous example, but still very similar.
In this example, I did all the work, but that might not be the case for you; much of it might be handled for you. The main thing I like about this example is that we rely on stable and popular Python libraries: requests, requests_oauthlib, and oautlib.
The fact that this is delivered as a JSON object is helpful as well (for the same reasons mentioned in the second example). And finally, I enjoy the fact that you only need to pass a single parameter from your (assumed) presentation layer to your application layer; an example might be a selection from an HTML form or drop-down menu item.
The end
We’re at the end of this fun little exercise. As I mentioned before, I will expand on this third example. There are so many steps, and I think it would be helpful for people to see a more detailed walk-through.
And be on the lookout (BOLO) for a video. There’s no way around this, but a video needs to accompany this post.
And finally, you can find all the code I review in this post in my new “blogs” repository on GitHub. I encourage you to clone, fork, spoon, ladle, knife, etc…
While querying a table (based on this dataset) with SQL, you realize one of your columns uses 3-character ISO Country Codes. However, some of these 3-character codes aren’t countries but geographical regions or groups of countries, in addition to the actual country codes. How can you filter out rows so you are left with the countries only?
Answer
Use the Python Pandas library to scrape ISO country codes and convert the values to one single string. Then use that string as values for a subsequent SQL query (possibly something like this):
SELECT * FROM [your_table]
WHERE country_code IN ([values from the generated list-as-string separated by commas and encased by single / double quotes]);
Code
# Libraries used in this code
from bs4 import BeautifulSoup
import requests
import csv
import pandas as pd
# I found these ISO country codes on the below URL. Pandas makes it easy to read HTML and manipulate it. Very cool!
iso_codes = pd.read_html("https://www.iban.com/country-codes")
# I create a data frame, starting at an index of 0.
df = iso_codes[0]
# But really, all I care about is the 3-digit country code. So I'll make that the df (dataframe) and strip out the index
df = df['Alpha-3 code'].to_string(index=False)
# From here, I'll save this little guy as a text file.
with open("./countries.txt", "w") as f:
f.write(df)
# I'll set up a list. *** This was my approach, but if you find a better way, feel free to comment or adjust. ***
my_list = []
# Then I'll open that text file and read it in.
file = open("./countries.txt", "r")
countries = file.read()
# I need to remove the "new line" identifiers, so I'm doing that here.
my_list = countries.split('\n')
# Once I do that, I can create two new strings. I do this with f-Strings. Great article on using them here: https://realpython.com/python-f-strings/
# I have two options here: one where the codes are contained by single quotes, the other with double quotes. Oracle Autonomous Database likes single quotes, but your DB may differ.
countries_string_single_quotes = ','.join(f"'{x}'" for x in my_list)
countries_string_double_quotes = ','.join(f'"{x}"' for x in my_list)
# From here, I take those strings and save them in a text file. You don't have to do this; you can print and copy/paste the string. But this might be an excellent addition if you want to refer to these later without running all the code.
with open("./countries_as_list_single_quotes.txt", "a") as f:
f.write(countries_string_single_quotes)
with open("./countries_as_list_double_quotes.txt", "a") as f:
f.write(countries_string_double_quotes)
GitHub repo details
You can find the code from this post in my GitHub repository. The repository consists of the following:
The Python code I created for solving this problem
A countries.txt file, which is produced midway through the code (temporary placeholder for later processing)
‘Single quotes’ .txt file – the 3-character ISO Country Codes are formatted as a string. The values are enclosed by single quotes; commas throughout
“Double quotes” .txt file – the 3-character ISO Country Codes are formatted as a string. The values are enclosed by double quotes; commas throughout
I spent most of the morning figuring out how I would go about this, and after some trial and error, I devised a plan. I decided to take the list of ISO Country Codes (which I found here) and use them as values for filtering in a SQL statement (later on in Oracle SQL Developer Web).
After some research, I figured out the proper SQL syntax for a successful query.
SELECT * FROM [your_table]
WHERE country_code IN ([values from the generated list-as-string separated by commas and encased by single / double quotes]);
From there, I knew I needed to work backward on those ISO Country Codes. Meaning I needed to take something that looked like this:
The country code column I’m interested in.Reviewing the HTML for this table, I’m interested in the elements.
And turn it into something more workable. It turns out that grabbing this was pretty straightforward. I’m using Pandas primarily for this exercise, but first, I need to import some libraries:
# Libraries used in this code
from bs4 import BeautifulSoup
import requests
import csv
import pandas as pd
Next, I’ll use Pandas’ read_html function (this feels like cheating, but it’s incredible) to read in the table.
# I found these ISO country codes on the below URL. Pandas makes it easy to read HTML and manipulate it. Very cool!
iso_codes = pd.read_html("https://www.iban.com/country-codes")
# I create a data frame, starting at an index of 0.
df = iso_codes[0]
This is wild, but this is what the printout looks like:
The Pandas read_html() the function is powerful.
If you squint, you can see an “Alpha-2 code” and an “Alpha-3 code” column in the image. From here, I need to isolate the 3-code column. So I reshaped the data frame by making it a single column; dropping the index (this is optional, you could keep the index if you needed it; perhaps you wanted to create a separate table in your database).
# But really, all I care about is the 3-digit country code. So I'll make that the df (dataframe) and strip out the index
df = df['Alpha-3 code'].to_string(index=False)
I’ll save this data frame as a .txt file.
# From here, I'll save this little guy as a text file.
with open("./countries.txt", "w") as f:
f.write(df)
This is only temporary (FYI: this is the only way I could figure out how to do this). It’ll look like this:
The temporary .txt file of 3-character ISO Country Codes.
Next, I take that temporary text file and read it in. I’m going to add it to a list, so I’ll first create the empty list (aptly named “my_list“). I also need to remove the newline characters from the list; otherwise, if I don’t, then when I create my string of values (that comes in the final step), the string will look like this:
The “countries” string with “\n” characters.
I remove the newline characters with this piece of code:
# I need to remove the "new line" identifiers, so I'm doing that here.
my_list = countries.split('\n')
The almost string of values will look like this:
New line characters have now been removed.
I use F-Strings to create the following two strings; countries_strings_single_quotes and countries_strings_double_quotes, respectively. Need to learn about F-Strings (or, more formally, Literal String Interpolation)? No problemo! Check out these three resources:
The code for the F-Strings is below. I loop through my_list and separate the x (the things I’m iterating over) with commas (that’s the join).
# Once I do that, I can create two new strings. I do this with f-Strings. Great article on using them here: https://realpython.com/python-f-strings/
# I have two options here: one where the codes are contained by single quotes, the other with double
# quotes. Oracle Autonomous Database likes single quotes, but your DB may differ.
countries_string_single_quotes = ','.join(f"'{x}'" for x in my_list)
countries_string_double_quotes = ','.join(f'"{x}"' for x in my_list)
The new single quote string.The new double quote string.
And now that I have these two objects (are they called objects??). I’ll save them each as a text file. One file has the 3-character codes surrounded by single quotes, the other with double quotes. The code:
# From here, I take those strings and save them in a text file. You don't have to do this; you can print
# and copy/paste the string. But this might be a nice addition if you want to refer to these later
# without running all the code.
with open("./countries_as_list_single_quotes.txt", "a") as f:
f.write(countries_string_single_quotes)
with open("./countries_as_list_double_quotes.txt", "a") as f:
f.write(countries_string_double_quotes)
The text files look like this now:
The country codes are now presented in one long string. Pretty cool, eh?
SQL time
We have arrived! Let me show you what I can do now!
I took the CSV data from the World Bank and loaded it into my Autonomous Database. Our returning intern Layla put together a video of how to do this; you can check it out here:
Once my table was created, I did a SELECT [columns] FROM. Here you can see my “beginning state”.
At first glance this looks fine.But once you scroll down, you can see all the non-countries and regions.
There are 266 entries; some are countries, and others are not. And if you recall, the original question asked how somebody could filter out the non-countries. Onto that next!
This is the best part. I can take the string I made and use that in a SQL query such as this:
SELECT * from ADMIN.REDDIT_TABLE
WHERE COUNTRY_CODE IN('AFG','ALA','ALB','DZA','ASM','AND','AGO','AIA','ATA',
'ATG','ARG','ARM','ABW','AUS','AUT','AZE','BHS','BHR','BGD','BRB','BLR','BEL',
'BLZ','BEN','BMU','BTN','BOL','BES','BIH','BWA','BVT','BRA','IOT','BRN','BGR',
'BFA','BDI','CPV','KHM','CMR','CAN','CYM','CAF','TCD','CHL','CHN','CXR','CCK',
'COL','COM','COD','COG','COK','CRI','CIV','HRV','CUB','CUW','CYP','CZE','DNK',
'DJI','DMA','DOM','ECU','EGY','SLV','GNQ','ERI','EST','SWZ','ETH','FLK','FRO',
'FJI','FIN','FRA','GUF','PYF','ATF','GAB','GMB','GEO','DEU','GHA','GIB','GRC',
'GRL','GRD','GLP','GUM','GTM','GGY','GIN','GNB','GUY','HTI','HMD','VAT','HND',
'HKG','HUN','ISL','IND','IDN','IRN','IRQ','IRL','IMN','ISR','ITA','JAM','JPN',
'JEY','JOR','KAZ','KEN','KIR','PRK','KOR','KWT','KGZ','LAO','LVA','LBN','LSO',
'LBR','LBY','LIE','LTU','LUX','MAC','MKD','MDG','MWI','MYS','MDV','MLI','MLT',
'MHL','MTQ','MRT','MUS','MYT','MEX','FSM','MDA','MCO','MNG','MNE','MSR','MAR',
'MOZ','MMR','NAM','NRU','NPL','NLD','NCL','NZL','NIC','NER','NGA','NIU','NFK',
'MNP','NOR','OMN','PAK','PLW','PSE','PAN','PNG','PRY','PER','PHL','PCN','POL',
'PRT','PRI','QAT','REU','ROU','RUS','RWA','BLM','SHN','KNA','LCA','MAF','SPM',
'VCT','WSM','SMR','STP','SAU','SEN','SRB','SYC','SLE','SGP','SXM','SVK','SVN',
'SLB','SOM','ZAF','SGS','SSD','ESP','LKA','SDN','SUR','SJM','SWE','CHE','SYR',
'TWN','TJK','TZA','THA','TLS','TGO','TKL','TON','TTO','TUN','TUR','TKM','TCA',
'TUV','UGA','UKR','ARE','GBR','UMI','USA','URY','UZB','VUT','VEN','VNM','VGB',
'VIR','WLF','ESH','YEM','ZMB','ZWE')
ORDER BY COUNTRY_CODE ASC;
Once I execute that SQL statement, I’m left with the countries from that list. I opened up the results in another window so you can see a sample.
SQL query in action – with the new values-as-a-string.Results of the SQL query in another window.
The end
So yeah, that’s it! I don’t know if this was the best way to go about this, but it was fun. I’m curious (if you’ve made it this far), what do you think? How would you go about it? Let me know.
That’s right; I’m back again for yet another installment of this ongoing series dedicated to working with Medium.com story stats. I first introduced this topic in a previous post. Maybe you saw it. If not, you can find it here.
Recap
My end goal was to gather all story stats from my Medium account and place them into my Autonomous Database. I wanted to practice my SQL and see if I could derive insights from the data. Unfortunately, gathering said data is complicated.
Pulling the data down was a breeze once I figured out where to look for these story statistics. I had to decipher what I was looking at in the Medium REST API (I suppose that was somewhat tricky). My search was mostly an exercise in patience (there was a lot of trial and error).
I uploaded a quick video in the previous post. But I’ll embed it here so you can see the process for how I found the specific JSON payload.
Obtaining the raw JSON
Once I found that URL, I saved this JSON as a .json file. The images below show remnants of a JavaScript function captured with the rest of the JSON. I’m no JavaScript expert, so I can’t tell what this function does. But before I load this into my Autonomous Database (I’m using an OCI Free Tier account, you can check it out here if you are curious), it needs to go.
JSON response errorMuch nicer JSON presentation
README
I am pointing out a few things that may seem convoluted and unnecessary here. Please take the time to read this section so you can better understand my madness.
FIRST: Yes, you can manually remove the [presumably] JavaScript saved along with the primary JSON payload (see above paragraphs). I'm showing how to do this in Python as a practical exercise. But I'm also leaving open the opportunity for future automation (as it pertains to cleaning data).
SECOND: When it comes to the Pandas data frame steps, of course, you could do all this in Excel, Numbers, or Sheets! Again, the idea here is to show you how I can clean and process this in Python. Sometimes doing things like this in Excel, Numbers, and Sheets is impossible (thinking about enterprise security here).
THIRD: Admittedly, the date-time conversion is hilarious and convoluted. Of course, I could do this in a spreadsheet application. That's not the point. I was showing the function practically and setting myself up for potential future automation.
FOURTH: I'll be the first to admit that the JSON > TXT > JSON > CSV file conversion is comical. So if you have any suggestions, leave a comment here or on my GitHub repository (I'll link below), and I'll attribute you!
The code
Explaining the code in context, with embedded comments, will be most illuminating.
I’ve named everything in the code as literally as possible. In production, this feels like it might be impractical; however, there is no question about what the hell the code is doing! Being more literal is ideal for debugging and code maintenance.
Here is the entire code block (so CTRL+C/CTRL+V to your heart’s content 😘). I’ll still break this down into discrete sections and review them.
import csv
import json
import pandas as pd
import datetime
from pathlib import Path
# You'll first need to sign in to your account, then you can access this URL without issues:
# https://medium.com/@chrishoina/stats/total/1548525600000/1668776608433
# NOTES:
# Replace the "@chrishoina" with your username
# The two numbers you see are Unix Epochs; you can modify those as # needed; in my case, I
# wanted to see the following:
# * 1548525600000 - At the time of this post, this seems to be
# whenever your first post was published or when
# you first created a Medium account. In this case, for me, this
# was Sat, Jan/26/2019, 6:00:00PM - GMT
# * 1665670606216 - You shouldn't need to change this since it will # just default to the current date.
# For the conversion, I an Epoch Converter tool I found online: https://www.epochconverter.com/
# Step 1 - Convert this to a,(.txt) file
p = Path("/Users/choina/Documents/socialstats/1668776608433.json")
p.rename(p.with_suffix('.txt'))
# Step 2 - "read" in that text file, and remove those pesky
# characters/artifacts from position 0 through position 15.
# I'm only retaining the JSON payload from position 16 onward.
with open("/Users/choina/Documents/socialstats/1668776608433.txt", "r") as f:
stats_in_text_file_format = f.read()
# This [16:] essentially means grabbing everything in this range. Since
# there is nothing after the colon; it will just default to the end (which is
# what I want in this case).
cleansed_stats_from_txt_file = stats_in_text_file_format[16:]
print(cleansed_stats_from_txt_file)
# This took me a day to figure out, but this text file needs to be encoded
# properly, so I can save it as a JSON file (which is about to happen). I
# always need to remember this, but I know that the json.dumps = dump
# string, which json.dump = dump object. There is a difference, I'm not
# the expert, but the docs were helpful.
json.dumps(cleansed_stats_from_txt_file)
# Step 3 - Here, I create a new file, then indicate we will "w"rite to it. I take the
# progress from Step 2 and apply it here.
with open('medium_stats_ready_for_pandas.json', 'w') as f:
f.write(cleansed_stats_from_txt_file)
# Step 4 - Onto Pandas! We've already imported the pandas library as "pd."
# We first create a data frame and name the columns. I kept the names
# very similar to avoid confusion. I feared that timestampMs might be a
# reserved word in Oracle DB or too close, so I renamed it.
df = pd.DataFrame(columns=['USERID', 'FLAGGEDSPAM', 'STATSDATE', 'UPVOTES', 'READS', 'VIEWS', 'CLAPS', 'SUBSCRIBERS'])
with open("/Users/choina/Documents/socialstats/medium_stats_ready_for_pandas.json", "r") as f:
data = json.load(f)
data = data['payload']['value']
print(data)
for i in range(0, len(data)):
df.loc[i] = [data[i]['userId'], data[i]['flaggedSpam'], data[i]['timestampMs'], data[i]['upvotes'], data[i]['reads'], data[i]['views'], data[i]['claps'], data[i]['updateNotificationSubscribers']]
df['STATSDATE'] = pd.to_datetime(df['STATSDATE'], unit="ms")
print(df.columns)
# Step 5 - use the Pandas' df.to_csv function and save the data frame as
# a CSV file
with open("medium_stats_ready_for_database_update.csv", "w") as f:
df.to_csv(f, index=False, header=True)
I used several Python libraries I use for this script:
p = Path("/Users/choina/Documents/socialstats/1668776608433.json")
p.rename(p.with_suffix('.txt')
Pathlib allows you to assign the file’s path to “p”. From there, I changed the .json file extension to a .txt extension.
Note: Again, I'm sure there is a better way to do this, so if you're reading, leave a comment here or on my GitHub repository so I can attribute it to you 🙃.
The before and after of what this step looks like this:
JSONbeforeTXTafter
With that out of the way, I needed to remove that JavaScript “prefix” in the file. I do this in Step 2 (I got so fancy that I probably reached diminishing returns). My approach works, and I can repurpose this for other applications too!
Step 2:
# Step 2 - "read" in that text file, and remove those pesky
# characters/artifacts from position 0 through position 15. Or in other
# words, you'll retain everything from position 16 onward because that's
# where the actual JSON payload is.
with open("/Users/choina/Documents/socialstats/1668776608433.txt", "r") as f:
stats_in_text_file_format = f.read()
# This [16:] essentially means grabbing everything in this range. Since
# there is nothing after the colon; it will just default to the end (which is
# what I want in this case).
cleansed_stats_from_txt_file = stats_in_text_file_format[16:]
print(cleansed_stats_from_txt_file)
# This took me a day to figure out, but this text file needs to be
# appropriately encoded to save as a JSON file (which is about to
# happen). I always forget the difference between "dump" and "dumps";
# json.dumps = dump string, whereas json.dump = dump object. There is
# a difference, I'm not the expert, but the docs were helpful (you should
# read them).
json.dumps(cleansed_stats_from_txt_file)
I needed to remove these remnants from the Medium JSON response
While this initially came through as a JSON payload, those first 0-15 characters had to go.
FULL DISCLAIMER: I couldn't figure out how to get rid of this while it was still a JSON file hence why I converted this to a text file (this was the only way I could figure it out).
I captured position 16 to infinity (or the end of the file, whichever occurs first), then I re-encoded the file as JSON (I interpreted this as “something the target machine can read and understand as JSON“).
OPEN SEASON: CompSci folks, please roast me in the comments if I'm wrong.
Step 3
# Step 3 - I create a new file, then I'll "w"rite to it. I took the result from Step 2 and applied it here.
with open('medium_stats_ready_for_pandas.json', 'w') as f:
f.write(cleansed_stats_from_txt_file)
I’m still at the data-wrangling portion of this journey, but I’m getting close to the end. I’ll create a new JSON file, take the parts of the (freshly encoded) text file I need, and then save them as that new JSON file.
Step 4
# Step 4 - Onto Pandas! We've already imported the pandas library as "pd"
# I first create a data frame and name the columns. I kept the names
# similar to avoid confusion. I feared that timestampMs might be a
# reserved word in Oracle DB or too close, so I renamed it.
df = pd.DataFrame(columns=['USERID', 'FLAGGEDSPAM', 'STATSDATE', 'UPVOTES', 'READS', 'VIEWS', 'CLAPS', 'SUBSCRIBERS'])
with open("/Users/choina/Documents/socialstats/medium_stats_ready_for_pandas.json", "r") as f:
data = json.load(f)
data = data['payload']['value']
print(data)
for i in range(0, len(data)):
df.loc[i] = [data[i]['userId'], data[i]['flaggedSpam'], data[i]['timestampMs'], data[i]['upvotes'],
data[i]['reads'], data[i]['views'], data[i]['claps'], data[i]['updateNotificationSubscribers']]
df['STATSDATE'] = pd.to_datetime(df['STATSDATE'], unit="ms")
print(df.columns)
I won’t teach Pandas (and honestly, you do NOT want me to be the one to teach you Pandas), but I’ll do my best to explain my process. I first created the structure of my data frame (“df” in this case). And then, I named all the column headers (these can be anything, but I kept them very close to the ones found in the original JSON payload).
I then opened the newly-saved JSON file and extracted what I needed.
NOTE: I got stuck here for about a day and a half, so let me explain this part.
The data['payload']['value'] refers to the key and value in this particular {key: value} pair. This approach allowed me to grab all the values of “value“. This image explains what I started with (on the left) and what I ended up with (on the right).
The before and after JSON payload
You’ll notice a {"success": true} key: value pair. With this method, I removed that pair and shed others at the end of the JSON payload.
Removing a great deal of trash
I can’t take credit for organically coming up with this next part; Kidson on YouTube is my savior. I’d watch this video to understand what is happening in this piece of code entirely:
for i in range(0, len(data)):
df.loc[i] = [data[i]['userId'], data[i]['flaggedSpam'], data[i]['timestampMs'], data[i]['upvotes'],
data[i]['reads'], data[i]['views'], data[i]['claps'], data[i]['updateNotificationSubscribers']]
In short, you take the values from the columns in the JSON file (above) and then put them into the column locations named in this piece of code:
For instance, the "userId" values in the JSON file will all go into the 'USERID' column in the Pandas data frame. And the same thing will happen for the other values and associated (Pandas data frame) columns.
Finally, I changed the date (which, if you recall, is still in this Epoch format) with the Datetime library to a more friendly, readable date. Using this code:
with open("medium_stats_ready_for_database_update.csv", "w") as f:
df.to_csv(f, index=False, header=True)
I’m at the home stretch now. I take everything I’ve done in Pandas and save it as a CSV file. I wanted to keep the headers but ditch any indexing. The clean CSV file will look like this:
Cleaned, tidy CSV ready for Data Load via SQL Developer Web
Step 6
Lastly, I logged into SQL Developer Web and clicked the new Data Load button (introduced in Oracle REST Data Services version 22.3) to upload the CSV file into a new table. The Autonomous Database automatically infers column names and data types. I slightly modified the "statsdate" column (honestly, I could have left it alone, but it was easy enough to change).
Before and After
And that’s it! Once uploaded, I can compare what I did previously to what I have achieved most recently. And both ways are correct. For instance, depending on your requirements, you can retain the JSON payload as a CLOB (as seen in the first image) or a more traditional table format (as seen in the second image).
Medium stats as a CLOBMedium stats in a typical table format
Wrap up
If you’ve made it this far, congrats! You should now have two ways to store Medium stats data in a table (that lives in the Oracle Autonomous Database) either as:
a CLOB
an OG table
And if you’d like to review the code, you can find it here.
I feel so silly for posting this because you’ll quickly realize that I will have to leave things unfinished for now. But I was so excited that I got something to work, that I had to share!
If you’ve been following along, you know you can always find me here. But I do try my best to cross-post on other channels as well:
But given that everything I do supports the development community, audience statistics are always crucial to me. Because of this, I’ll periodically review my stats on this site and the others to get a feel for the most popular topics.
I even did a RegEx post a while back that was pretty popular too. Thankfully it wasn’t that popular, as it pained me to work through Regular Expressions.
I can quickly review site statistics on this blog, but other places, like Medium, are more challenging to decipher. Of course, you can download your Audience stats, but sadly not your Story stats 😐.
Audience stats download, but no Story stats download.
Undeterred, I wanted to see if it was somehow possible to acquire my Story stats. And it is possible, in a way…
Show and tell
If after you log into your Medium account, navigate to your stats page, open up the developer tools in your browser and navigate to your “Console.” From there, reload the page and simply observe all the traffic.
You’ll see a bunch of requests:
GET
POST
OPTION (honestly, I’ve no idea what this is, but I also haven’t looked into it yet)
My thought was that the stats content was produced through (or by) one of these API requests. So yes, I (one at a time) expanded every request and reviewed the Response Body of each request. I did that until I found something useful. And after a few minutes, there it was:
The magic GET request.
I confirmed I had struck gold by taking this URL, placing it in a new browser window, and hitting Enter. And after selecting “Raw Data,” I saw this:
Double-checking the raw JSON.
Indeed, we see my Story stats. But the final two paths in the URL made no sense to me.
The paths looked similar; I had no choice but to activate Turing Mode™.
I could see these numbers were similar, so I lined them up in my text editor and saw that they shared the same 166 prefixes. I don’t know much about machine-readable code, but since what was appearing on my screen was the last 30 days, I thought this might be some sort of date. But I’d never seen anything like this, so I wasn’t 100% sure.
Unix Time Stamps
After about 20 mins of searching and almost giving up, I found something in our Oracle docs (a MySQL reference guide of all places) that referenced Unix Time Stamps. Eureka!
About Unix time stamps in the Oracle MySQL docs.
Success, I’d found it. So I searched for a “Unix time stamp calculator” and plugged in the numbers. My hunch was correct; it was indeed the last thirty days!
Verifying the Unix Time Stamp.
So now I’m wondering if I change that leading date in the GET request will it allow me to grab all my story statistics from January 2022 till now? Oh, hell yeah, it will!
All my Story stats from Jan 2022 to the present.
End of the line
Right, so here is where I have to leave it open-ended. I had a finite amount of time to work on this today, but what I’d like to do is see if I can authenticate with Basic Authentication into my Medium account. And at least get a 200 Response Code. Oh wait, I already did that!?
Getting that sweet, sweet 200 Response Code.
And now the Python code!
import requests
import json
from requests.auth import HTTPBasicAuth
url = "https://medium.com/m/signin"
# I found this to work even if I typically sign on through
# the Google Single-sign-on. I just used the same email/password
# I do when I login directly to google (Gmail).
user = "[Your login/email]"
password = "[Your password]"
r = requests.get(url, auth=HTTPBasicAuth(user, password))
print(r)
# I found this URL in the console but then removed everything after
# the query string (the "?"), and used that for the requests URL
# "/m/signin?operation=login&redirect=https%3A%2F%2Fmedium.com%2F&source=--------------------------lo_home_nav-----------"
You’re probably wondering how I found the correct URL for the Medium login page. Easy, I trolled the Console until I found the correct URL. This one was a little tricky, but I got it to work after some adjusting. I initially found this:
And since I thought everything after that “?” was an optional querystring, I just removed it and added the relevant parts to Medium’s base URL to get this:
If I want to keep it as is, I know I can load the JSON with a cURL command and an ORDS Batch Load API with ease. I dropped this into my Autonomous Database (Data Load) to see what it would look like:
My CLOB.
We do something very similar in the Oracle LiveLabs workshop (I just wrote about it here). You can access the workshop here!
I’ll have a follow-up to this. But for now, this is the direction I am headed. If you are reading this, and want to see more content like this, let me know! Leave a comment, retweet, like, whatever. So that I know I’m not developing carpal tunnel for no reason 🤣.