I stumbled upon a new [to me] python library called Folium. It’s a mapping tool that enables python developers (or is it programmers, which is less offensive?) to visualize data on a Leaflet map.
About folium
Folium makes it easy to visualize data that’s been manipulated in Python on an interactive leaflet map. It enables both the binding of data to a map for choropleth visualizations as well as passing rich vector/raster/HTML visualizations as markers on the map.
The library has a number ofbuilt-in tilesets from OpenStreetMap, Mapbox, and Stamen, and supports custom tilesets with Mapbox or Cloudmade API keys. Folium supports Image/Video, GeoJSON and TopoJSON overlays.
Leaflet.js is the leading open-source JavaScript library for mobile-friendly interactive maps. Out-of-the-box features include: tile layers/WMS, markers/popups, vector layers (polylines, polygons, circles, rectangles), image overlays, and GeoJSON
I highlighted the tilesets above, and I’m not sure if this is a GIS or a leaflet term, but it seems that tilesets refer to different map renderings. Folium includes the following:
OpenStreetMap
Stamen Terrain
Stamen Toner
Stamen Watercolor
At the time of writing, I didn’t have API credentials for Mapbox Bright or Mapbox Control Room. But I’m assuming they still work if you have the appropriate API credentials.
Excluding the Mapbox tiles, those packaged by default are more than sufficient:
Tip: Read more on Stamen here, s'il vous plaît. They seem to be the original creators of these Stamen tiles.
Python + Folium + ORDS = Happiness
Folium installation and quick start pages are straightforward – requiring minimal steps and effort. The same goes for installing the Live Server extension too. Once installed, you can right-click on a .html file to load it onto a local development server (so it displays in a new browser window).
Let me jump right into how I set up Folium to work with ORDS. If you recall, most of the tables I have in my Autonomous Databases are REST-enabled (previous post working with ORDS).
An ORDS Review
1. Once in Database Actions, right-click on an object and select REST > Enable.
2. Choose an alias (good security measure) along with Authentication (also good practice).
3. If you forget the URI, you can review the cURL command.
4. There you can copy/paste directly into a Command-Line or copy the URI portion and place into your browser’s address bar.
…and now back to your regularly scheduled blog post
Since I needed latitude and longitude coordinates for Folium, I decided to work with my Restaurant table. Put the REST endpoint (URI) directly into the browser; and you’ll see this:
Afterward, I looked at the JSON object to see how it was structured (there are too many lists, dictionaries, arrays, etc. to remember amirite?!) because I’d need to iterate through all the items in the GET Response – to tease out what Folium needs.
The play-by-play
Printing the response (to see what it looks like)
I’m covering all my bases here; reviewing the ORDS-provided request/object from a table in my Autonomous Database
Using json.() to “decode” it into a python dictionary
Please pay attention to line 10, where I actually create the base map
Review of what Folium needs
Looping through the python dictionary, pulling out what I need for the next step
Including lines 25-29 to create the pop-up markers.
The ('<i>{}</i>'.format(Resty)) allows me to pass the names to the string (stuff in quotes, plus the {}). The HTML for italics is optional.
In this case, ‘x’ and ‘y’ are the coordinates
Line 31 saves a .html file (which uses Bootstrap!)
Newly rendered .HTML
Using the Live Server extension, I can review the Folium map plus the pop-up markers!
NOTE: This isn't the focus, but spoiler alert, I'll be putting this into a Flask application. I'll also build this out, so the individual markers/popups have charts/graphs for each restaurant. Again, all of which will be provided by ORDS-enabled tables (not so shameless plug).
Summary
After it is all said and done, it took me less than an afternoon to figure out how to make this work. So if you have a table with latitude and longitude, what are you waiting for?! REST-enable it with ORDS, and you can come up with a very quick demo on your own!
Also, I didn’t have to create any database models, the dependencies are minimal (a few python libraries), and an Always Free OCI account is really all you need to get started.
Code
import json
import requests
import folium
m = folium.Map(location=[35.78742648626059, -78.78122033558192], zoom_start=12, tiles="Stamen Watercolor")
response = requests.get('your ORDS enpoint/URI').json()
# print(type(response))
# print(response)
for entry in response['items']:
Resty = entry['name']
Lat = entry['y']
Long = entry['x']
folium.Marker(
location=[Lat, Long],
popup = folium.Popup('<i>{}</i>'.format(Resty))
#this line is optional/an alternative to the above line
# popup = folium.Popup('<i>{}</i>'.format(entry['name']))
).add_to(m)
m.save("index.html")
[I did not] reinvent the wheel
Lastly, much of what I learned about Folium came from the following two videos (I recommend bookmarking them for reference):
I log into Database Actions as my newly created “Python Developer” and navigate directly to the “Data Load” page (found under the Data Tools section of the Launchpad). I choose to “Load Data” from a “Local File.” I click “next,” click the pencil icon (see the arrow in the image), and navigate to the “File” tab. I scroll to the “RESTAURANTOPENDATE” column and see this:
In a previous post (if you’re new, you can read the end-to-end process), I discussed how this time (“04:00:00+00” or “05:00:00+00”) wasn’t necessary for me. At that time, I used the “Find and Replace” function in Excel (I’m sure you can do the same with Numbers, Sheets, or Calc) to replace all occurrences of time with “” (i.e., nothing).
But in the spirit of doing things in five days, when they could have taken but five minutes, I opted to see if I could achieve a similar result in python.
Goal
Create a python script that will allow you to remove part of a value in a .CSV file.
WARNING: I thought this would be a simple task. I should have known better. My approach may not be ideal for your situation, but hopefully you’ll learn something. Or at the very least maybe you can bookmark this post, along with the resources (at the end of the post) I'm including for later use.
Regular Expressions
I am confident there is a way to achieve this goal with the .CSV library in python. There is probably a way to do this with python out of the box. I couldn’t figure it out.
I’m also reasonably confident that my approach is on the verge of ridiculous. Nevertheless, Regular Expressions, and the Pandas library, in python are what worked for me.
What are Regular Expressions?
Good question. I still don’t know, but here is what I found on Wikipedia:
“A regular expression (shortened as regex or regexp; also referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually, such patterns are used by string-searching algorithms for “find” or “find and replace” operations on strings, or for input validation. It is a technique developed in theoretical computer science and formal language theory.”
That is so painful to read, but the way I understand it is that we use Regular Expressions in pattern-matching. Essentially you create a pattern and then tell your application (or script) to search for it. From there, you can include more code to perform more actions. In my case, those actions would be to find a pattern and replace that pattern with nothing.
So what is the pattern?
One of the benefits of having zero formal training in application development (in this case, computer science and formal language theory) is that occasionally, you might take an approach that, while unintuitive, works well enough.
And after many, many hours of trial and error, parsing through Stack Overflow, reviewing hours of YouTube, reading pages of blogs, and occasional use of the “I’m Feeling Lucky” button on Google, it occurred to me that my pattern was:
After reviewing more documentation and other various resources (I have an entire section at the end), I more clearly identified a pattern:
Coming up with a clearly-defined pattern helped inform me as to how I could create the following Regular Expression:
'\s\d+:\d+:\d+[^a-c6]\d+$'
I then did some more stuff, and that was it! Follow me for more…
I’m kidding.
Deep Dive
I can’t teach you everything, because I’m probably only ahead of you by a week at this point. But I can explain, in the hopes that it will make sense. Let me continue with the above Regular Expression (again, you’ll want to spend some time in the resources section I’ve included to understand better how this all fits together).
But the above Regular Expression can be defined like this:
I then took the above and plugged it into this nine (12 with sections, slightly more with comments) line script:
from pickle import NONE, TRUE
#This doesn't seem work without pickle, still not sure why
import pandas as pd
#You need this to do anything related to dataframes (i.e. "df");
# if you're a data scientist and you use python, you may already
# be familiar with this
import re
#You need this for Regular Expressions to work
d = '/Users/choina/Documents/untitled folder/Restaurants_in_Wake_County.csv'
#I'm assigning the file to "d"
df = pd.read_csv(d)
#df means dataframe. Here I'm telling panda to read the .CSV file
# that I just created ("d"), and consider that the "dataframe".
print(df.dtypes)
#This is optional, but when I do this I can see what the various
# datatypes are (according to pandas). The RESTAURANTOPENDATE column
# is an 'object'.
for RESTAURANTOPENDATE in df:
df['RESTAURANTOPENDATE'] = df['RESTAURANTOPENDATE'].str.replace('\s\d+:\d+:\d+[^a-c6]\d+$', '', regex=TRUE)
#This is a "For Loop" that says set the RESTAURANTOPENDATE column
# contents equal to the following: FIRST, consider the values in the
# column as string and replace the contents using this process:
# find a pattern that matches this Regular Expression, replace
# it with NOTHING, then take that column,along with the other
# columns and...
df.to_csv('newrest.csv')
#Save it to a new .CSV file called "newrest.csv"
Code as an image for reference:
In short, (I’ve left detailed comments for reference), these nine lines of code search for a pattern and replace it with nothing when found. We then take these results (including the rest of the .CSV file) and save them to a new file called “newrest.csv.”
Please, reference this script. Tweak it as needed. Or drop me a note if you need an outside perspective. Good luck!
One more thing
Pickle, Pandas, and RE are all required for this script to work.
Note: While I understand Pickle is a library that deals with the serialization of objects in python. I've no idea what that means, and reading about it makes my brain melt. Additionally, I’m not sure if this is expected behavior, but when I ran an earlier version of this script, this "Pickles" library just appeared at the very top. If this has happened to you, please leave a comment, because I'm scared there may be an actual ghost in my machine.
Moving on
The rest is pretty straightforward.
I then went back into Database Actions (it had been so long that I’d expected a new version had been released already), loaded my data, and inspected the column in question:
Looks good to me!
Magic
Something I noticed was that my Autonomous Database automatically recognizes the “RESTAURANTOPENDATE” column as a data type of “Date.” I didn’t have to do anything special here, and I thought this was pretty cool. Take a look:
Addendum to the addendum
Once imported, you can also use Regular Expressions in your database (with SQL). But if you’d like to clean your data before it reaches your ADB, the approach I’m writing about here is also a great option!
Resources
As promised, here are all the resources I’ve curated that might assist you should you choose to take the road less traveled…
Stack Oveflow – this is a series of “bookmarked” questions about Python and Regular Expressions, that helped me. They might be of use. They aren’t intended to teach you, but rather aide you in gaining a better perspective.
YouTube – a playlist I created that might be worth reviewing (at a minimum watch Case Digital’s “How to Remove Characters From a Pandas Dataframe In Python” video).
Regular Expression Editors:
Pythex (make sure you click the “Regular expression cheatsheet” button to expand/see a valuable quick reference – also, apparently a more complete version can be found here)
Regular Expressions 101 – this includes an enormous library of previously created Regular Expressions. There is a “debugger” too!
An even simpler solution
I discovered an even simpler solution to this problem. As I was reviewing the Pandas documentation I noticed there was functionality built directly into Pandas. This can be done with the “datatime” function.
And you can take it one step further with what is referred to as a “.dt accessor.” (you’ll see this in the updated code I’m including). This accessor allows you to manipulate a Series (aka a column) within a Pandas dataframe (df).
There are quite a few time-date properties you can manipulate in Pandas, all of which can be found here. The “Time Zone Operations” section is where the “.dt accessor”, is briefly mentioned. For more context, I’d recommend reviewing this section on the .dt accessor and how they interplay with a Pandas Series.
Don’t stop there though, the main python documentation discusses the basics of date and time operations. Without further ado, on to the new code:
import pandas as pd
import numpy as np
#You don't have to do it like this. You could have a single "d" and then just comment/uncomment each line as needed.
d1 = '/Users/choina/Downloads/Food_Inspections.csv'd2 = '/Users/choina/Downloads/Food_Inspection_Violations copy.csv'
d3 = '/Users/choina/Downloads/Restaurants_in_Wake_County copy.csv'
#The same thing here, you could just have a single "d" and then just comment out the other two, when not needed.
df_1 = pd.read_csv(d1)
df_2 = pd.read_csv(d2)
df_3 = pd.read_csv(d3)
#Same goes for this too. This is either very slick of me, or incredibly lazy.
df_1['DATE_'] = pd.to_datetime(df_1['DATE_']).dt.date
df_2['INSPECTDATE'] = pd.to_datetime(df_2['INSPECTDATE']).dt.date
df_3['RESTAURANTOPENDATE'] = pd.to_datetime(df_3['RESTAURANTOPENDATE']).dt.date
#Same, same here as well. You could do one at a time. But it works, I double-checked my work.
df_1.to_csv('newninspect.csv')
df_2.to_csv('newviolate.csv')
df_3.to_csv('newrest.csv')
About the update
This code is certified fresh – meaning, it works. I triple-checked. The only thing I’ve noticed is that lines 12-14 must-have “.dt.date” at the end. From what I am reading (and am inferring from the documentation), it looks as though you need to first “treat” the Series with the “to_datatime” function. After that, the entire dataframe is in limbo (not quite written out to a new .CSV), waiting. Before the next step, we can strip the time portion out using the .dt accessor (i.e. the “.dt.date” portion located at the end of lines 12-14).
From there it is simply a matter of writing out these updated dataframes to the three new expectant .CSV files.
Find Me
Please come back for more. I’m going to be working on views in my Autonomous Database, and then later, I’ll REST-enable them with the help of ORDS!
I’m not a Database Administrator, but I feel as though I may be slowly evolving into one…
In a previous post, I left myself the task of correcting the html on a proposed Python/Flask/ORDS/Autonomous Database application. My then next step was to add a search bar (which I did):
Search bar added to the html for searching my Autonomous Database.
Although this was a small step forward it led to some serious questions. While having a search feature like this is good in theory, in practice it is totally unusable.
The Problem
If a user wants to search for a specific restaurant, they’d need to know a lot of information beforehand. At a minimum, the name of their target restaurant. But even that isn’t enough. Look at when I query for a single restaurant (randomly chosen – I’m not sponsored by Wendy’s btw):
SQL statement and the output (a single restaurant in this case).
Some of these restaurants come back with additional store numbers (probably a reference to either a corporate- or franchise-owned property). So a search like this simply will not do.
That’s problem number one. The second problem is that in the original datasets, while HSISID and Permit ID (read about them here) are unique in the Restaurants table. That isn’t the case with the Violations and Inspections tables. There are a TON of entries for each restaurant in these tables, and the HSISID and Permit IDs are used repeatedly for subsequent inspections. Just take a look at this example (they’re all like that though!):
Using the HSISID over and over again make it challenging to gather necessary information across all tables.
A Closer look
Above, is the Violations table, but the Inspection table looks similar. From what I observer, each time a violation or inspection is recorded that same HSISID is used (Permit IDs too). But in these two tables an Entry ID is used as the unique identifier. I know the image is greyed out, but if you look closely you can see the “Object ID” in that screenshot.
But these Object IDs don’t mean anything outside of this table. And it underscores the point that my original approach to searching (i.e search bar) using a single identifier won’t work. Across the three tables these restaurants are uniquely identified in different ways.
At a minimum, these tables should be linked so that when an HSISID is searched for in the Restaurant table, it gathers all associated data from the Violations and Inspections table too. Meaning, if I search for a specific “Wendy’s” only that target restaurant comes back along with all historical inspection and inspection violation information as well.
Now that we understand the problem, I’ll attempt a solution. For my next feat, I’ll relate the three tables to one another in a logical way. Continue on dear reader…
Navigating
Navigate to your Autonomous Database, then Database Actions. Once there, look under the “Development” section, and choose “SQL”.
NOTE: I'm sure there are other ways to accomplish this, but this is what worked for me.
Once in that SQL Worksheet, look at the the left side of your browser, you’ll see the Navigator tab. Of the two drop-down lists, look at the second one. That is the “Object” selector; for my purposes I made sure that Tables was selected.
Navigator tab in Oracle Database Actions in my Autonomous Database Always Free OCI account.
I have the three tables; two of which (Inspections and Violations) need Foreign Keys while the third table (Restaurants) will need a Primary Key.
My three tables in Oracle Database Actions: Inspections, Restaurants, Violations
TL;DR - The Restaurant table is like my Parent table, whereas the other two tables are like children (or dependents). Establishing these Primary and Foreign keys is a way for me to easily establish interdependence/relation among these three tables.
Establishing a Primary Key
I’ll use the Restaurant table as an example. Here is the step-by-step…
Right-click on the table (Restaurants), then select “Edit”.
If a Primary Key has already been established, then you’ll see a Column (which is actually displayed as a row in this Table Properties table) that already has a check in the “PK” column.
If there isn’t one, continue by clicking the Primary Key option (see the left-most red arrow).
Once in the Primary key option, you’ll see the following boxes with checks:
Enabled
Initially Immediate,
Validate
I’m using the HSISID (found in the “Available Columns” list) as my Primary Key. Once I select that, the “Add Selected Columns” arrow will illuminate (from white to gray). Click it once to move your selection over to the right in the “Selected Columns” list.
Once your column has been moved, you can click “Apply”.
You’ll see this “Output” screen appear. Here you can see that my table was successfully altered. If there are errors, it’ll let you know, and there will be A LOT of red (trust me, I saw tons of red prior to writing this post).
Bonus points, you can always find the Data Definition Language (DDL) in the DDL option (read up on this and the other types of SQL statements here).
As I see it, it looks like everything since the inception of this table has been recorded. I would imagine you could take this and build some sort of automation/orchestration script with it (along with the DDL from the other tables too).
Sample Code
Table as a Primary Key
CREATE TABLE ADMIN.RESTAURANTS
(
OBJECTID NUMBER ,
HSISID NUMBER ,
NAME VARCHAR2 (4000) ,
ADDRESS1 VARCHAR2 (4000) ,
ADDRESS2 VARCHAR2 (4000) ,
CITY VARCHAR2 (4000) ,
STATE VARCHAR2 (4000) ,
POSTALCODE VARCHAR2 (4000) ,
PHONENUMBER VARCHAR2 (4000) ,
RESTAURANTOPENDATE DATE ,
FACILITYTYPE VARCHAR2 (4000) ,
PERMITID NUMBER ,
X NUMBER ,
Y NUMBER ,
GEOCODESTATUS VARCHAR2 (4000)
)
TABLESPACE DATA
LOGGING
;
CREATE UNIQUE INDEX ADMIN.RESTAURANTS_PK ON ADMIN.RESTAURANTS
(
HSISID ASC
)
TABLESPACE DATA
LOGGING
;
ALTER TABLE ADMIN.RESTAURANTS
ADD CONSTRAINT RESTAURANTS_PK PRIMARY KEY ( HSISID )
USING INDEX ADMIN.RESTAURANTS_PK ;
That is literally all did to set this up. Very straightforward, next I’ll walk through how to set up Foreign keys. Then I’ll show you what it looks like in the Data Modeler so you can visually see the newly-established relationships of the tables.
Establishing a Foreign Key
We’re still in the SQL worksheet (found in Database Actions). Both Inspections and Violations tables need a Foreign key; I’ll demonstrate with Inspections.
Right-click on the table and select “Edit”, just like I did when establishing a Primary Key. Next, navigate to the “Foreign Keys” option in the Table Properties.
It’ll be empty, but you’ll want to click the “+” key (see the arrow). You’ll see “NEW_FK_1” appear under the Foreign Key table. If you click on that you can change the name to whatever your heart desires.
I then changed the Foreign Key name to something that I could remember (and something that would make sense at a later date); “Inspections_FK”. I then selected “HSISID” as the Local Column.
And automatically, the schema recognizes the Primary Key I previously established in the Restaurants table. I didn’t do that, that was all done for me automatically! Very cool.
Part 1Part 2
After I click “Apply” you’ll see the output from this change. I can then return to the Foreign Keys option and voilà, the Foreign Key name has been changed. I’ll proceed to do the same to my “Violations” table as well.
Sample Code
Table as a Foreign Key
CREATE TABLE ADMIN.INSPECTIONS
(
OBJECTID NUMBER ,
HSISID NUMBER ,
SCORE NUMBER ,
DATE_ DATE ,
DESCRIPTION VARCHAR2 (4000) ,
TYPE VARCHAR2 (4000) ,
INSPECTOR VARCHAR2 (4000) ,
PERMITID NUMBER
)
TABLESPACE DATA
LOGGING
;
ALTER TABLE ADMIN.INSPECTIONS
ADD CONSTRAINT INSPECTIONS_FK FOREIGN KEY
(
HSISID
)
REFERENCES ADMIN.RESTAURANTS ( HSISID )
ON DELETE CASCADE
NOT DEFERRABLE
;
Data Modeler Primer
This isn’t a “How-to” on the Data Modeler (Documentation here though), but rather just me taking a minute to share how you can visually see the relationships between tables. It definitely helped me better understand why I needed to establish Primary and Foreign Keys.
Andiamo!
From the Database Actions Launchpad, select “Data Modeler”.
Once there, you’ll see two drop-down menus. You’ll see your current Schema (Admin for me, in this case). In the second drop-down menu, you’ll see a list of available “Objects”. I’ll want to make sure that I have “Tables” selected.
You can drag these over one at a time, or select them all and drag all at once. They’ll appear in the Diagram Editor, with the relationships already included (since I already set up the Primary and Foreign Keys).
Expand these diagrams and you’ll reveal additional information. In this image, I’ve expanded the Inspections table to reveal Foreign Key details. I’ll do the same by expanding the bottom of the Violations table to reveal its Foreign Key. That little red arrow signifies that there is additional information hidden.
Okay, but what is this?
What we are looking at is a visual representation of the relationships between these tables. Everything is “linked” through the HSISID. My interpretation of this is that everything begins with the Restaurant table, and then the other two tables act as subordinates.
For a single Restaurant (which has a unique HSISID) there are many entries in both Inspections and Violations that use the HSISID repeatedly. In those cases, that HSISID is no longer unique. It is still important, necessary information.
And now that everything is linked, I’ll want to create a single query (SQL presumably) that when executed will return all relevant information from a single entry in the Restaurant table with all associated (HSISID) entries from the Inspections and Violations table, nested underneath it.
Clear as mud? Good. Hope this helps more than it hurts.
Now what
Next I’ll need to come up with something (in SQL perhaps) that I believe I’ll use in tandem with ORDS. From there I’ll work on building the html portion of this python + flask application so that the search is more intuitive (to the user) and elegant (for the Autonomous Database). Believe it or not, but it took about a week to wrap my head around all this key business. Needless to say, I’m slightly behind schedule. But expect more updates as I progress.
Creating an additional user in your Oracle Cloud Infrastructure (OCI) is not the same thing as creating an additional user in your Autonomous Database (ADB). I spent about two days (on and off) last week, wrapping my head around this. Maybe you knew this…I did not.
Chris Hoina – 2022
If however, you want to create an additional OCI user; one that would be able to access various resources in OCI, then please bookmark this page. This is all you need to get started.
However, you may not even need to create this additional user (as was the case with me). While I followed the steps in that documentation (very simple/straightforward actually), it wasn’t till after I finished, did I realize that doing it was completely unnecessary. What I was really trying to do was create a new database user, not an OCI user. The steps are different, simpler.
Federations
Another area that I got really hung up was with “Federations”. This was a relatively new concept for me. And I’m not sure if I should even do a deep dive on how Federations work. However, let it be known, if I receive even one (legitimate) comment on this subject, I will put together a treatise on Federations in OCI.
It took me about a day and a half to educate myself on how Federations and Identity Providers work in OCI. So, if you are reading this, and you think you’d benefit from a standalone article, then let me know. Otherwise, this page and this page are both very helpful for learning more about Federations in OCI.
Are you me?
If so, then you are acting as an administrator in your Always Free OCI account, as well as the database administrator. Being both is confusing for me, and maybe it is for you too. Since I don’t consider myself a traditional user (I’m blurring the lines when it comes to the different roles), I’m tasked with managing my OCI tenancy and also setting up my development environment. All this so I can begin to work on some applications/proofs of concepts (POCs) for interacting with my Autonomous Database via ORDS (Oracle REST Database Services aka our REST APIs).
Don’t be like me and allow yourself to get too weighed down with all the technical jargon thrown your way. Naturally, our docs read like they are geared toward the System or Database Administrator. And this makes sense; remember when you are in your tenancy you’re the de-facto admin for everything. At a larger organization/enterprise a developer would probably never do any of this setup. You’d just sign in to the database directly or connect via a command line.
Workflow for the “Every Person”
But for me (with my limited experience), the workflow looks something like this:
Start here
Sign-up for an Oracle Cloud Account. You’ll be provided a Cloud Account Name* and credentials.
*You’ll also hear this account being referred to as your Tenancy name (you can modify the name later if you want).
Next, create an Autonomous Database and create the “ADMIN” credentials* for said database. You can see how to do that in Lab 1 of this workshop.
REMINDER: This administrator is different that the OCI Tenancy administrator.
Add a new user by:
Navigating to your Autonomous Database under “Oracle Database”. It might be in your recently viewed.
After you click the database, you’ll see this screen. Click “Database Actions”. You’ll link out to the Database Actions dashboard.
Select the Database Users option in the Administration section.
Select the “Create User” option.
At a minimum, you’ll want to enable “Web Access” for this user. This will automatically grant two roles for the user:
CONNECT
RESOURCE
Both are needed for the developer though.
If you want to take a look at the different roles available, click this tab. Scrolling through, you can see the CONNECT and RESOURCE roles are checked.
HINT: You can always go back and edit the roles if more roles need to be granted. Both are needed for the developer though.
Finish (kind of)
Once complete, you’ll see a URL in that user’s newly-created tile. This URL will link you out to a login page, for accessing the Database Actions console. Since you are the only user, you’ll just want to document the URL. Otherwise, as the admin, I’m assuming you’d share this with the respective recipient.
Next Up
From here on out, I’ll do all my development work with this account (I called mine “appdev”), to mimic what a typical user might encounter in a practical setting.
Right now, I’ m wrapping up a python + flask + database course on LinkedIn Learning. So far, its been pretty informative. If you are interested, you can take a look as well (to see the direction I’m headed). My goal will be to use the templates in this course as a resource for connecting to my database, but with ORDS.
I hope to have something small in the next week or so. And I’ll be sharing here, and on my GitHub as well. So stay tuned.
Helpful resources
What I did here was very simplified, a distillation for creating a new user in your Autonomous Database. But I’ll include some of the resources that helped get me to this level of understanding:
With my Autonomous Database up and running,I needed to find some data to upload. After some quiet reflection, I decided upon three datasets that should be plenty fun.
A bit of context; I currently live in Wake County, North Carolina. And it turns out that we’ve been very progressive with providing open, accessible data to our citizens. I really wanted to do something with food, so I was surprised when I found the following three datasets (among the nearly 300 available):
Upon reading the documentation, I discovered that these datasets could be all connected through their HSISID* Permit ID fields. Which should come in handy later as it could make for some very interesting (if not amusing) revelations.
*I’m not sure of the acronym, but the documentation defines the HSISID as a, “State code identifying the restaurant (also the primary key to identify the restaurant).”
To the data loading
I’m sticking with the Oracle Database Actions* web-based interface for now.
*Oracle Database Actions was previously known as Oracle SQL Developer Web.
The Database Actions Launchpad offers several actions for users, including development, data loading, monitoring, and administration.
And from what I learned in the Autonomous Database workshop, there are two ways to load data into your database (there may be other methods; mea culpa if so):
SQL worksheet (found in the Development section, of your launchpad)
Data Load Tool (found in the Data Tools section)
And I opted for the SQL worksheet method since I need to practice and learn this method regardless.
Obstacles
After choosing my method of data loading, I encountered some obstacles. With no regard for best practices, I proceeded to load all three datasets into my Autonomous Database only to discover two glaring issues.
First
Unfortunately, in my restaurant “violations” table, I encountered two rows that failed to load.
ORA-12899 error in Oracle Autonomous Database on Oracle Cloud Infrastructure.
These values were loaded as a VARCHAR2 datatype, which, by default, allows for a maximum byte size of 4000 bytes. Since these rows exceeded 4000 bytes (in that specific column) they failed. Fortunately, it was easy to increase the bytes to 10,000.
While you can increase it to 32,767 bytes, that is overboard. VARCHAR2 data types can have their max string sizes set to ‘STANDARD’ or ‘EXTENDED’ (i.e 4000 bytes vs 32,767 bytes). I’m assuming the Autonomous Database is set to EXTENDED by default, since I was able to increase this column, and re-upload with zero failed rows. You can read up on VARCHAR2 data types here.
Second
The second obstacle I ran into, took me about a day (on and off) to figure out. The dates in all three of these tables were constructed like this:
Adjusting for the TIMESTAMPS in the Oracle Autonomous Database.
At first glance, I thought I was looking at a time zone offset. Dates were either 04:00:00+00 hrs or 05:00:00+00 hrs; possibly due to daylight savings time. This seemed like a reasonable assumption since the early-November — mid-March dates were an hour later than the mid-March — early-November dates (i.e., UTC-4hrs in Winter, UTC-5hrs all else).
My first thought was to modify the affected columns with something like this parameter:
TIMESTAMP [(fractional_seconds_precision)] WITH TIME ZONE
or with…
TIMESTAMP '1997-01-31 09:26:56.66 +02:00'
But, I’d never seen this date/time stamp before (the ‘esriFieldTypeDate’), so I had to investigate before I did anything (also, I’ve no idea how to update a column with a parameter like this, so it would have added additional time).
Turns out this data originates from ArcGIS Online. This field data type maps directly to Oracle’s ‘TIMESTAMP’ data type. Seen here in the documentation:
In the ArcGIS documentation, the ‘esriFieldTypeDate’ appears to map to Oracle’s ‘TIMESTAMP’ datatype.
I later tinkered with an API Explorer on the Wake County datasets site and reviewed a sample JSON response for this data set. The response also confirmed the field type:
Wake County has an API Explorer on its datasets site, where you can sample a query to review the JSON that is returned.
It’s funny because, after all this hunting, it looks like I won’t be able to include time zones anyways. It’s somewhat inconsequential as I’m only interested in Wake County, but for some that could be unnerving.
I wish I could say that I came up with a cool Python script to automate the cleaning of these columns (I would have had to do this for all three of my .CSV files), but I did not. It would have taken me an afternoon or longer to work through this since I haven’t toyed with .CSV Reader (in the Python standard library), Pandas, or NumPy in quite some time. In the end, (and it pains me to say this, embarrassingly, really), I used the ‘Find + Replace’ feature in Numbers to remove the time/time zone portion of the dates.
I’m pathetic, I know. But the idea of creating a dedicated Python script does seem like a fun weekend project. Who knows, maybe I’ll post to GitHub. Regardless, my Autonomous Database is clean and primed for ORDS.
In closing
In closing
My advice is as such:
Always inspect your data before you do anything major, always make sure you understand the inputs and the resultant outputs of the data loading process, and always refer to the documentation FIRST so you can understand how data is being recorded, coded, displayed, etc..
Chris Hoina, 2022
Oh, and it wouldn’t hurt to think about how you might use this data in the future. I’m guessing that could help you avoid a lot of rework (i.e. technical debt).
Take me as an example. Since I’m focused on ORDS and REST enabling my database, I may want to consider things like:
What am I going to expose (via REST APIs)? or
What fields might the presentation layer of my application include?
In my case, a lot of this data will lay dormant, as I might choose to focus on only a few key fields. Of course, anything I create will be simple and basic. But that might not be the case for you.
I’m off to REST…enable this database now, but in the meantime, if you want to learn more about ORDS, check out:
Recently, I traveled to Colorado. While flying, I finished up an Autonomous Database (ADB) workshop. What I’m saying is…I was in the clouds, on the cloud.
In my new new role, much of my work will be focused on Oracle’s REST Data Services (ORDS). But I’m starting out with the ADB.
After completing the workshop, I thought I’d share initial impressions and lessons learned. But first…a crash course on our Cloud Free Tier accounts and Always Free cloud services. Both are relevant for this workshop.
Oracle Cloud Free Tier
When you sign up for an Oracle Cloud Infrastructure (OCI) account, you are automatically entitled to Always Free services. You’re also given a 30-day/$300 USD worth of free credits trial to use on all eligible OCI services (30 days or $300, whichever comes first). But the Always Free services are available for an unlimited period. This workshop uses Always Free services (lucky you).
Always Free
Errthang you get with Always Free…we are most concerned with the database options though.
On Databases
Credits can be used on many items (see above). And that includes various databases. In the workshop you get to provision an Autonomous Data Warehouse (we’re still Always Free).
The Autonomous Database (ADB)
Overview
The ADB calls OCI home. It’s a fully managed, pre-configured database environment. Four workload types are available:
Autonomous Transaction Processing
Autonomous Data Warehouse
Autonomous JSON Database
Oracle APEX Application Development
NOTE:Always Free entitles you to two of the above databases; each with 1 OCPU* and 20 GB storage. There is also an option to choose a noSQL Database with 133 million reads/writes per month, 25 GB storage per table (up to 3 tables).
ALSO: An “OCPU” is an Oracle Compute Unit. 1 OCPU on the x86 (AMD/Intel) architecture is equivalent to 2 virtual processor cores (vCPUs). Whereas on the ARM architecture, the ratio is 1:1 (i.e., 1 OCPU = 1 vCPU).
Automation
Zero hardware configuration/management or software installation/configuration is required! Once your database is created, nearly everything is automated:
Backups
Patching
Upgrading
Performance tuning
You’re a Database Administrator (DBA) and you didn’t even know it!
Scaling
The ADB shines here. CPU and storage can be scaled automatically. Auto-scaling can automatically add more CPU cores to the base number of cores during periods of high demand, and then automatically reduce the number of cores back to the base number as demand decreases (auto-scaling also allows up to three times the current base number of CPU cores at any time).
DISCLAIMER: Auto-scaling isn’t an Always Free cloud service, but it’s still cool and worth talking about. However, should you choose to upgrade in the future, it’s very easy to do:
To the Workshop!
The FREE workshop consists of five labs (and an optional sixth):
Lab 1: Provision an Autonomous Database
Lab 2: Work with Free Sample Data Sets
Lab 3: Load Data
Lab 4: Query External Data
Lab 5: Visualize Your Data
Once you’ve requested an Oracle Cloud account (everything can be done with the Free Tier option), you’ll then create an Autonomous Database. The instructions for requesting your account are simple and can be found here (but also on the workshop’s landing page).
Lab 1
Just two steps here, simple. Everything is done through the Oracle Cloud dashboard. My two takeaways:
WRITE DOWN all your login/credential information! I hadn’t considered that as an Oracle employee I’d have more than one login (it can be confusing). Not to mention later, you’ll be creating an ADMIN password for the actual database (separate from your tenancy credentials).
Initial Database configuration. I really appreciated that when you first configure your Autonomous Database, you are automatically provided 1 CPU and 20 GB of storage. But should you require more CPU or storage, everything can be adjusted with simple drop-down menus. Is nice that a user (like a developer or non-System/Database Administrator role) can provision their own free playground in a matter of minutes.
Lab 2
Lab 2 reinforces how simple this workshop is. At this point, I’m still in the web browser. But of course, you could use the desktop option, Oracle SQL Developer.
Things seemed fast. And that is saying something, because for part of this workshop (Lab 3 actually), I was in the air flying from Denver, Colorado to Raleigh, North Carolina using the airline wi-fi and objects were actually loading into their OCI buckets (I write about this later).
Second takeaway was caching. When possible, the autonomous database caches query results. Meaning, for subsequent queries, results are just…there. That is a big deal to me, because in my previous role I was a Product Manager for an application performance tool. And there are a couple of things that just absolutely kill an application’s response time and/or throughput:
CPU intensive tasks (often repetitive ones)
I/O processes
TL;DR – we like fast. Fast is good 🙂
Lab 3
You’re creating a “Bucket” in the OCI and then loading Objects (approximately 45 floppy discs worth of .CSV and .DAT files) into that Bucket. Again, this is all point and click, and done in-browser. There is a wizard that guides you through everything. For someone who isn’t familiar with System or Database Administration, this was helpful and it made approaching “infrastructure” so much more accessible and seemingly less daunting.
I should remind that this might not be your typical workflow. You might not create a Bucket and then load Objects every time. But you will still need a URI, authorization token, and credentials to load this data into the specified database.
This lab was still a good exercise even for the non-Systems or Database Administrator. Maybe you’re a Developer or Business Intelligence Analyst. I don’t suspect many will perform all these tasks. But its important to understand how teams/roles/individual members are connected (from end-to-end).
One final thing I appreciated about this lab, was that there were two options for loading data into respective locations:
the DATA LOAD tools option from within the Database Actions dashboard, or
the PL/SQL package (DBMS_CLOUD)
Lab 4
Up until this point I felt like I was an active participant. But here I felt like I was cheating. he code snippets you are provided allow you to create an external table directly from your cloud object store (excerpt of the code; click the image to go to our GitHub).
/* Specify the URL that you copied from your files in OCI Object Storage in the define base_URL line below*/
set define on
/* Specify the URL that you copied from your files in OCI Object Storage in the define base_URL line below*/
set define on
define &file_uri_base = '<bucket URI>'
begin
dbms_cloud.create_external_table(
table_name =>'CHANNELS_EXT',
credential_name =>'OBJ_STORE_CRED',
file_uri_list =>'&file_uri_base/chan_v3.dat',
format => json_object('ignoremissingcolumns' value 'true', 'removequotes' value 'true'),
column_list => 'CHANNEL_ID NUMBER,
CHANNEL_DESC VARCHAR2(20),
CHANNEL_CLASS VARCHAR2(20),
CHANNEL_CLASS_ID NUMBER,
CHANNEL_TOTAL VARCHAR2(13),
CHANNEL_TOTAL_ID NUMBER'
);
dbms_cloud.create_external_table(
table_name =>'COUNTRIES_EXT',
credential_name =>'OBJ_STORE_CRED',
file_uri_list =>'&file_uri_base/coun_v3.dat',
format => json_object('ignoremissingcolumns' value 'true', 'removequotes' value 'true'),
column_list => 'COUNTRY_ID NUMBER ,
COUNTRY_ISO_CODE CHAR(2) ,
COUNTRY_NAME VARCHAR2(40) ,
COUNTRY_SUBREGION VARCHAR2(30) ,
COUNTRY_SUBREGION_ID NUMBER ,
COUNTRY_REGION VARCHAR2(20) ,
COUNTRY_REGION_ID NUMBER ,
COUNTRY_TOTAL VARCHAR2(11) ,
COUNTRY_TOTAL_ID NUMBER ,
COUNTRY_NAME_HIST VARCHAR2(40)'
);
The code snippets you are provided allow you to create an external table directly from your cloud object store. Now you can then query from that external table, outside of the database. Without that code snippet, I would have struggled to do this on my own (or at least, not without some research). That is more of a “me” problem though.
I appreciate that this method of querying is available, especially if you are only concerned with a small subset of your data. Keeping this data closer to you can certainly make subsequent requests quicker.
Lab 5
Here we try our hand at business intelligence. You begin by downloading Oracle Analytics Desktop. The tool reminded me of Tableau or one of the SAS data visualization products that are available.
Consider this: Biases aside, Oracle Analytics Desktop is free. And you can connect to your database within a matter of minutes.
If its forecasting and visualizations you are looking for, this thing looks plenty powerful.
Much like before, everything is GUI-based. But you can also do much of the set-up in Oracle’s SQL Developer, or from within the browser. With some provided SQL, I defined a table from the data I imported in a previous Lab. From there you get some practice establishing a secure connection to your database, and then import that table into Oracle Analytics Desktop.
I don’t know if I should spend too much time here, because currently my focus isn’t data analysis/business intelligence. But once you establish a secure connection to your database, you can easily drop and drag data into the tool’s visualization pane and later manipulate the visualizations in countless ways.
My biggest takeaway here, is that once you have created a wallet from within your OCI dashboard, and established a connection to your data warehouse (from within Oracle’s Analytics Desktop), things are very intuitive.
Everything is drag and drop. And since you’ll pre-define and save a subset of your data, you can’t break anything, you can just start over by pulling down subsequent data. Truthfully, if your focus isn’t Business Intelligence, then you can (and some are going to hate this) skim this lab to get the gist.
The End
There is a totally optional Lab 6, where you export your newly created project to a .DVA file. But I skimmed that and got the idea.
Like I mentioned before, this content will become increasingly more technical. I’ll be doing some exploring with the APIs in ORDS. And I’ll see what I can do with flask + python + ORDS. I’ve found some good references on Github and of course I’ll be share the [heavily commented] code here and on Github too!