Categories
Autonomous Database Python

Clean up a .CSV file with Regular Expressions, Pandas, and Python

Let us begin

I log into Database Actions as my newly created “Python Developer” and navigate directly to the “Data Load” page (found under the Data Tools section of the Launchpad). I choose to “Load Data” from a “Local File.” I click “next,” click the pencil icon (see the arrow in the image), and navigate to the “File” tab. I scroll to the “RESTAURANTOPENDATE” column and see this:

CSV file in data load, Database Actions, Oracle Autonomous Database, Chris Hoina, Senior Product Manager, ORDS, Database Actions

In a previous post (if you’re new, you can read the end-to-end process), I discussed how this time (“04:00:00+00” or “05:00:00+00”) wasn’t necessary for me. At that time, I used the “Find and Replace” function in Excel (I’m sure you can do the same with Numbers, Sheets, or Calc) to replace all occurrences of time with “” (i.e., nothing).

But in the spirit of doing things in five days, when they could have taken but five minutes, I opted to see if I could achieve a similar result in python.

Goal

Create a python script that will allow you to remove part of a value in a .CSV file.

WARNING: I thought this would be a simple task. I should have known better. My approach may not be ideal for your situation, but hopefully you’ll learn something. Or at the very least maybe you can bookmark this post, along with the resources (at the end of the post) I'm including for later use.

Regular Expressions

I am confident there is a way to achieve this goal with the .CSV library in python. There is probably a way to do this with python out of the box. I couldn’t figure it out.

I’m also reasonably confident that my approach is on the verge of ridiculous. Nevertheless, Regular Expressions, and the Pandas library, in python are what worked for me.

What are Regular Expressions?

Good question. I still don’t know, but here is what I found on Wikipedia:

“A regular expression (shortened as regex or regexp; also referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually, such patterns are used by string-searching algorithms for “find” or “find and replace” operations on strings, or for input validation. It is a technique developed in theoretical computer science and formal language theory.”

That is so painful to read, but the way I understand it is that we use Regular Expressions in pattern-matching. Essentially you create a pattern and then tell your application (or script) to search for it. From there, you can include more code to perform more actions. In my case, those actions would be to find a pattern and replace that pattern with nothing.

So what is the pattern?

One of the benefits of having zero formal training in application development (in this case, computer science and formal language theory) is that occasionally, you might take an approach that, while unintuitive, works well enough.

And after many, many hours of trial and error, parsing through Stack Overflow, reviewing hours of YouTube, reading pages of blogs, and occasional use of the “I’m Feeling Lucky” button on Google, it occurred to me that my pattern was:

Pattern for Regular Expressions, Database Actions, Oracle Autonomous Database, Chris Hoina, Senior Product Manager, ORDS, Database Actions

After reviewing more documentation and other various resources (I have an entire section at the end), I more clearly identified a pattern:

A more formal pattern for Regular Expressions, Database Actions, Oracle Autonomous Database, Chris Hoina, Senior Product Manager, ORDS, Database Actions

Coming up with a clearly-defined pattern helped inform me as to how I could create the following Regular Expression:

'\s\d+:\d+:\d+[^a-c6]\d+$'

I then did some more stuff, and that was it! Follow me for more…

I’m kidding.

Deep Dive

I can’t teach you everything, because I’m probably only ahead of you by a week at this point. But I can explain, in the hopes that it will make sense. Let me continue with the above Regular Expression (again, you’ll want to spend some time in the resources section I’ve included to understand better how this all fits together).

But the above Regular Expression can be defined like this:

Regular Expression definition for this script, Database Actions, Oracle Autonomous Database, Chris Hoina, Senior Product Manager, ORDS, Database Actions

I then took the above and plugged it into this nine (12 with sections, slightly more with comments) line script:

from pickle import NONE, TRUE
#This doesn't seem work without pickle, still not sure why
import pandas as pd
#You need this to do anything related to dataframes (i.e. "df"); 
# if you're a data scientist and you use python, you may already 
# be familiar with this
import re 
#You need this for Regular Expressions to work

d = '/Users/choina/Documents/untitled folder/Restaurants_in_Wake_County.csv'
#I'm assigning the file to "d"
df = pd.read_csv(d)
#df means dataframe. Here I'm telling panda to read the .CSV file 
# that I just created ("d"), and consider that the "dataframe".

print(df.dtypes)
#This is optional, but when I do this I can see what the various 
# datatypes are (according to pandas). The RESTAURANTOPENDATE column 
# is an 'object'.
for RESTAURANTOPENDATE in df:
    df['RESTAURANTOPENDATE'] = df['RESTAURANTOPENDATE'].str.replace('\s\d+:\d+:\d+[^a-c6]\d+$', '', regex=TRUE)
#This is a "For Loop" that says set the RESTAURANTOPENDATE column 
# contents equal to the following: FIRST, consider the values in the 
# column as string and replace the contents using this process: 
# find a pattern that matches this Regular Expression, replace
# it with NOTHING, then take that column,along with the other 
# columns and...
df.to_csv('newrest.csv')
#Save it to a new .CSV file called "newrest.csv"

Code as an image for reference:

Python script to clean CSV file for data load, Database Actions, Oracle Autonomous Database, Chris Hoina, Senior Product Manager, ORDS, Database Actions

In short, (I’ve left detailed comments for reference), these nine lines of code search for a pattern and replace it with nothing when found. We then take these results (including the rest of the .CSV file) and save them to a new file called “newrest.csv.”

Please, reference this script. Tweak it as needed. Or drop me a note if you need an outside perspective. Good luck!

One more thing

Pickle, Pandas, and RE are all required for this script to work.

Note: While I understand Pickle is a library that deals with the serialization of objects in python. I've no idea what that means, and reading about it makes my brain melt. Additionally, I’m not sure if this is expected behavior, but when I ran an earlier version of this script, this "Pickles" library just appeared at the very top. If this has happened to you, please leave a comment, because I'm scared there may be an actual ghost in my machine.

Moving on

The rest is pretty straightforward.

I then went back into Database Actions (it had been so long that I’d expected a new version had been released already), loaded my data, and inspected the column in question:

Updated CSV file in data load, Database Actions, Oracle Autonomous Database, Chris Hoina, Senior Product Manager, ORDS, Database Actions

Looks good to me!

Magic

Something I noticed was that my Autonomous Database automatically recognizes the “RESTAURANTOPENDATE” column as a data type of “Date.” I didn’t have to do anything special here, and I thought this was pretty cool. Take a look:

Database Actions automatically recognizes datatypes Oracle Autonomous Database, Chris Hoina, Senior Product Manager, ORDS, Database Actions

Addendum to the addendum

Once imported, you can also use Regular Expressions in your database (with SQL). But if you’d like to clean your data before it reaches your ADB, the approach I’m writing about here is also a great option!

Resources

As promised, here are all the resources I’ve curated that might assist you should you choose to take the road less traveled…

  • Important/relevant documentation:
  • Stack Oveflow – this is a series of “bookmarked” questions about Python and Regular Expressions, that helped me. They might be of use. They aren’t intended to teach you, but rather aide you in gaining a better perspective.
  • YouTube – a playlist I created that might be worth reviewing (at a minimum watch Case Digital’s “How to Remove Characters From a Pandas Dataframe In Python” video).
  • Regular Expression Editors:
    • Pythex (make sure you click the “Regular expression cheatsheet” button to expand/see a valuable quick reference – also, apparently a more complete version can be found here)
    • Regular Expressions 101 – this includes an enormous library of previously created Regular Expressions. There is a “debugger” too!

An even simpler solution

I discovered an even simpler solution to this problem. As I was reviewing the Pandas documentation I noticed there was functionality built directly into Pandas. This can be done with the “datatime” function.

And you can take it one step further with what is referred to as a “.dt accessor.” (you’ll see this in the updated code I’m including). This accessor allows you to manipulate a Series (aka a column) within a Pandas dataframe (df).

There are quite a few time-date properties you can manipulate in Pandas, all of which can be found here. The “Time Zone Operations” section is where the “.dt accessor”, is briefly mentioned. For more context, I’d recommend reviewing this section on the .dt accessor and how they interplay with a Pandas Series.

Don’t stop there though, the main python documentation discusses the basics of date and time operations. Without further ado, on to the new code:

import pandas as pd 
import numpy as np 

#You don't have to do it like this. You could have a single "d" and then just comment/uncomment each line as needed. 

d1 = '/Users/choina/Downloads/Food_Inspections.csv'
d2 = '/Users/choina/Downloads/Food_Inspection_Violations copy.csv'
d3 = '/Users/choina/Downloads/Restaurants_in_Wake_County copy.csv'

#The same thing here, you could just have a single "d" and then just comment out the other two, when not needed.

df_1 = pd.read_csv(d1)
df_2 = pd.read_csv(d2)
df_3 = pd.read_csv(d3)

#Same goes for this too. This is either very slick of me, or incredibly lazy. 

df_1['DATE_'] = pd.to_datetime(df_1['DATE_']).dt.date
df_2['INSPECTDATE'] = pd.to_datetime(df_2['INSPECTDATE']).dt.date
df_3['RESTAURANTOPENDATE'] = pd.to_datetime(df_3['RESTAURANTOPENDATE']).dt.date

#Same, same here as well. You could do one at a time. But it works, I double-checked my work. 

df_1.to_csv('newninspect.csv')
df_2.to_csv('newviolate.csv')
df_3.to_csv('newrest.csv')
New python script to change date time in csv files, Chris Hoina, Senior Product Manager, ORDS, Database Tools, Oracle Autonomous Database

About the update

This code is certified fresh – meaning, it works. I triple-checked. The only thing I’ve noticed is that lines 12-14 must-have “.dt.date” at the end. From what I am reading (and am inferring from the documentation), it looks as though you need to first “treat” the Series with the “to_datatime” function. After that, the entire dataframe is in limbo (not quite written out to a new .CSV), waiting. Before the next step, we can strip the time portion out using the .dt accessor (i.e. the “.dt.date” portion located at the end of lines 12-14).

From there it is simply a matter of writing out these updated dataframes to the three new expectant .CSV files.

Find Me

Please come back for more. I’m going to be working on views in my Autonomous Database, and then later, I’ll REST-enable them with the help of ORDS!

Categories
Autonomous Database

Add Primary and Foreign Keys to your Oracle Autonomous Database

I’m not a Database Administrator, but I feel as though I may be slowly evolving into one…

In a previous post, I left myself the task of correcting the html on a proposed Python/Flask/ORDS/Autonomous Database application. My then next step was to add a search bar (which I did):

Search bar added to application for python, flask, ords, oracle autonomous database application. Chris Hoina, Oracle Senior Product Manager.
Search bar added to the html for searching my Autonomous Database.

Although this was a small step forward it led to some serious questions. While having a search feature like this is good in theory, in practice it is totally unusable.

The Problem

If a user wants to search for a specific restaurant, they’d need to know a lot of information beforehand. At a minimum, the name of their target restaurant. But even that isn’t enough. Look at when I query for a single restaurant (randomly chosen – I’m not sponsored by Wendy’s btw):

SELECT FROM LIKE SQL statement showing a single restaurant name. Chris Hoina, ORDS, Oracle Autonomous Database
SQL statement and the output (a single restaurant in this case).

Some of these restaurants come back with additional store numbers (probably a reference to either a corporate- or franchise-owned property). So a search like this simply will not do.

That’s problem number one. The second problem is that in the original datasets, while HSISID and Permit ID (read about them here) are unique in the Restaurants table. That isn’t the case with the Violations and Inspections tables. There are a TON of entries for each restaurant in these tables, and the HSISID and Permit IDs are used repeatedly for subsequent inspections. Just take a look at this example (they’re all like that though!):

Viewing the HSISID in my table; unique identifier complications, Chris Hoina, ORDS, Oracle Autonomous Database
Using the HSISID over and over again make it challenging to gather necessary information across all tables.

A Closer look

Above, is the Violations table, but the Inspection table looks similar. From what I observer, each time a violation or inspection is recorded that same HSISID is used (Permit IDs too). But in these two tables an Entry ID is used as the unique identifier. I know the image is greyed out, but if you look closely you can see the “Object ID” in that screenshot.

But these Object IDs don’t mean anything outside of this table. And it underscores the point that my original approach to searching (i.e search bar) using a single identifier won’t work. Across the three tables these restaurants are uniquely identified in different ways.

At a minimum, these tables should be linked so that when an HSISID is searched for in the Restaurant table, it gathers all associated data from the Violations and Inspections table too. Meaning, if I search for a specific “Wendy’s” only that target restaurant comes back along with all historical inspection and inspection violation information as well.

Now that we understand the problem, I’ll attempt a solution. For my next feat, I’ll relate the three tables to one another in a logical way. Continue on dear reader…


Navigating

Navigate to your Autonomous Database, then Database Actions. Once there, look under the “Development” section, and choose “SQL”.

Database Actions Development SQL Worksheet Oracle Autonomous Database, Chris Hoina, Senior Product Manager, Database Tools, ORDS
NOTE: I'm sure there are other ways to accomplish this, but this is what worked for me. 

Once in that SQL Worksheet, look at the the left side of your browser, you’ll see the Navigator tab. Of the two drop-down lists, look at the second one. That is the “Object” selector; for my purposes I made sure that Tables was selected.

Navigator tab in Oracle Database Actions, Chris Hoina, Senior Product Manager, Database Tools, OCI, Autonomous Database
Navigator tab in Oracle Database Actions in my Autonomous Database Always Free OCI account.

I have the three tables; two of which (Inspections and Violations) need Foreign Keys while the third table (Restaurants) will need a Primary Key.

My three tables in Oracle Database Actions, Chris Hoina, Senior Product Manager, Database Tools, OCI, Autonomous Database
My three tables in Oracle Database Actions: Inspections, Restaurants, Violations
TL;DR - The Restaurant table is like my Parent table, whereas the other two tables are like children (or dependents). Establishing these Primary and Foreign keys is a way for me to easily establish interdependence/relation among these three tables. 

Establishing a Primary Key

I’ll use the Restaurant table as an example. Here is the step-by-step…

Editing my table in Oracle Database Actions, Chris Hoina, Senior Product Manager, Database Tools, OCI, Autonomous Database

Right-click on the table (Restaurants), then select “Edit”.

Table Properties my table in Oracle Database Actions, Chris Hoina, Senior Product Manager, Database Tools, OCI, Autonomous Database

If a Primary Key has already been established, then you’ll see a Column (which is actually displayed as a row in this Table Properties table) that already has a check in the “PK” column.

If there isn’t one, continue by clicking the Primary Key option (see the left-most red arrow).

Primary Keys section in Database Actions SQL Worksheet Oracle Autonomous Database, Chris Hoina, Senior Product Manager, Database Tools, ORDS

Once in the Primary key option, you’ll see the following boxes with checks:

  • Enabled
  • Initially Immediate,
  • Validate
Choosing HSISID as the Primary Key in Database Actions SQL Worksheet Oracle Autonomous Database, Chris Hoina, Senior Product Manager, Database Tools, ORDS

I’m using the HSISID (found in the “Available Columns” list) as my Primary Key. Once I select that, the “Add Selected Columns” arrow will illuminate (from white to gray). Click it once to move your selection over to the right in the “Selected Columns” list.

Once your column has been moved, you can click “Apply”.

Output from establishing the Primary Key in Database Actions SQL Worksheet Oracle Autonomous Database, Chris Hoina, Senior Product Manager, Database Tools, ORDS

You’ll see this “Output” screen appear. Here you can see that my table was successfully altered. If there are errors, it’ll let you know, and there will be A LOT of red (trust me, I saw tons of red prior to writing this post).

The DDL from a table in Database Actions SQL Worksheet Oracle Autonomous Database, Chris Hoina, Senior Product Manager, Database Tools, ORDS

Bonus points, you can always find the Data Definition Language (DDL) in the DDL option (read up on this and the other types of SQL statements here).

As I see it, it looks like everything since the inception of this table has been recorded. I would imagine you could take this and build some sort of automation/orchestration script with it (along with the DDL from the other tables too).

Sample Code

Table as a Primary Key

CREATE TABLE ADMIN.RESTAURANTS 
    ( 
     OBJECTID           NUMBER , 
     HSISID             NUMBER , 
     NAME               VARCHAR2 (4000) , 
     ADDRESS1           VARCHAR2 (4000) , 
     ADDRESS2           VARCHAR2 (4000) , 
     CITY               VARCHAR2 (4000) , 
     STATE              VARCHAR2 (4000) , 
     POSTALCODE         VARCHAR2 (4000) , 
     PHONENUMBER        VARCHAR2 (4000) , 
     RESTAURANTOPENDATE DATE , 
     FACILITYTYPE       VARCHAR2 (4000) , 
     PERMITID           NUMBER , 
     X                  NUMBER , 
     Y                  NUMBER , 
     GEOCODESTATUS      VARCHAR2 (4000) 
    ) 
    TABLESPACE DATA 
    LOGGING 
;


CREATE UNIQUE INDEX ADMIN.RESTAURANTS_PK ON ADMIN.RESTAURANTS 
    ( 
     HSISID ASC 
    ) 
    TABLESPACE DATA 
    LOGGING 
;

ALTER TABLE ADMIN.RESTAURANTS 
    ADD CONSTRAINT RESTAURANTS_PK PRIMARY KEY ( HSISID ) 
    USING INDEX ADMIN.RESTAURANTS_PK ;

That is literally all did to set this up. Very straightforward, next I’ll walk through how to set up Foreign keys. Then I’ll show you what it looks like in the Data Modeler so you can visually see the newly-established relationships of the tables.


Establishing a Foreign Key

We’re still in the SQL worksheet (found in Database Actions). Both Inspections and Violations tables need a Foreign key; I’ll demonstrate with Inspections.

Foreign Key in Database Actions SQL Worksheet Oracle Autonomous Database, Chris Hoina, Senior Product Manager, Database Tools, ORDS

Right-click on the table and select “Edit”, just like I did when establishing a Primary Key. Next, navigate to the “Foreign Keys” option in the Table Properties.

Adding a Foreign Key in Database Actions SQL Worksheet Oracle Autonomous Database, Chris Hoina, Senior Product Manager, Database Tools, ORDS

It’ll be empty, but you’ll want to click the
“+” key (see the arrow). You’ll see “NEW_FK_1” appear under the Foreign Key table. If you click on that you can change the name to whatever your heart desires.

I then changed the Foreign Key name to something that I could remember (and something that would make sense at a later date); “Inspections_FK”. I then selected “HSISID” as the Local Column.

And automatically, the schema recognizes the Primary Key I previously established in the Restaurants table. I didn’t do that, that was all done for me automatically! Very cool.

Output of the Foreign Key name change in Database Actions SQL Worksheet Oracle Autonomous Database, Chris Hoina, Senior Product Manager, Database Tools, ORDS

After I click “Apply” you’ll see the output from this change. I can then return to the Foreign Keys option and voilà, the Foreign Key name has been changed. I’ll proceed to do the same to my “Violations” table as well.

Sample Code

Table as a Foreign Key

CREATE TABLE ADMIN.INSPECTIONS 
    ( 
     OBJECTID    NUMBER , 
     HSISID      NUMBER , 
     SCORE       NUMBER , 
     DATE_       DATE , 
     DESCRIPTION VARCHAR2 (4000) , 
     TYPE        VARCHAR2 (4000) , 
     INSPECTOR   VARCHAR2 (4000) , 
     PERMITID    NUMBER 
    ) 
    TABLESPACE DATA 
    LOGGING 
;

ALTER TABLE ADMIN.INSPECTIONS 
    ADD CONSTRAINT INSPECTIONS_FK FOREIGN KEY 
    ( 
     HSISID
    ) 
    REFERENCES ADMIN.RESTAURANTS ( HSISID ) 
    ON DELETE CASCADE 
    NOT DEFERRABLE 
;

Data Modeler Primer

This isn’t a “How-to” on the Data Modeler (Documentation here though), but rather just me taking a minute to share how you can visually see the relationships between tables. It definitely helped me better understand why I needed to establish Primary and Foreign Keys.

Andiamo!

Data Modeler in Database Actions in Oracle Autonomous Database, Chris Hoina, Senior Product Manager, Database Tools, ORDS

From the Database Actions Launchpad, select “Data Modeler”.

Navigator Tab in Data Modeler in Database Actions in Oracle Autonomous Database, Chris Hoina, Senior Product Manager, Database Tools, ORDS

Once there, you’ll see two drop-down menus. You’ll see your current Schema (Admin for me, in this case). In the second drop-down menu, you’ll see a list of available “Objects”. I’ll want to make sure that I have “Tables” selected.

Adding Objects to the Diagram Editor in Data Modeler in Database Actions in Oracle Autonomous Database, Chris Hoina, Senior Product Manager, Database Tools, ORDS

You can drag these over one at a time, or select them all and drag all at once. They’ll appear in the Diagram Editor, with the relationships already included (since I already set up the Primary and Foreign Keys).

New Diagram of tables in the Data Modeler Diagram Editor in Database Actions in Oracle Autonomous Database, Chris Hoina, Senior Product Manager, Database Tools, ORDS

Expand these diagrams and you’ll reveal additional information. In this image, I’ve expanded the Inspections table to reveal Foreign Key details. I’ll do the same by expanding the bottom of the Violations table to reveal its Foreign Key. That little red arrow signifies that there is additional information hidden.

Okay, but what is this?

What we are looking at is a visual representation of the relationships between these tables. Everything is “linked” through the HSISID. My interpretation of this is that everything begins with the Restaurant table, and then the other two tables act as subordinates.

For a single Restaurant (which has a unique HSISID) there are many entries in both Inspections and Violations that use the HSISID repeatedly. In those cases, that HSISID is no longer unique. It is still important, necessary information.

And now that everything is linked, I’ll want to create a single query (SQL presumably) that when executed will return all relevant information from a single entry in the Restaurant table with all associated (HSISID) entries from the Inspections and Violations table, nested underneath it.

Clear as mud? Good. Hope this helps more than it hurts.

Now what

Next I’ll need to come up with something (in SQL perhaps) that I believe I’ll use in tandem with ORDS. From there I’ll work on building the html portion of this python + flask application so that the search is more intuitive (to the user) and elegant (for the Autonomous Database). Believe it or not, but it took about a week to wrap my head around all this key business. Needless to say, I’m slightly behind schedule. But expect more updates as I progress.

Hope you learned something.

Find me

HMU on:

Categories
Autonomous Database

Create a user in Oracle Autonomous Database [for Dummies]

An OCI user does not ≠ ADB user

Creating an additional user in your Oracle Cloud Infrastructure (OCI) is not the same thing as creating an additional user in your Autonomous Database (ADB). I spent about two days (on and off) last week, wrapping my head around this. Maybe you knew this…I did not.

Chris Hoina – 2022

If however, you want to create an additional OCI user; one that would be able to access various resources in OCI, then please bookmark this page. This is all you need to get started.

However, you may not even need to create this additional user (as was the case with me). While I followed the steps in that documentation (very simple/straightforward actually), it wasn’t till after I finished, did I realize that doing it was completely unnecessary. What I was really trying to do was create a new database user, not an OCI user. The steps are different, simpler.

Federations

Another area that I got really hung up was with “Federations”. This was a relatively new concept for me. And I’m not sure if I should even do a deep dive on how Federations work. However, let it be known, if I receive even one (legitimate) comment on this subject, I will put together a treatise on Federations in OCI.

It took me about a day and a half to educate myself on how Federations and Identity Providers work in OCI. So, if you are reading this, and you think you’d benefit from a standalone article, then let me know. Otherwise, this page and this page are both very helpful for learning more about Federations in OCI.

Are you me?

If so, then you are acting as an administrator in your Always Free OCI account, as well as the database administrator. Being both is confusing for me, and maybe it is for you too. Since I don’t consider myself a traditional user (I’m blurring the lines when it comes to the different roles), I’m tasked with managing my OCI tenancy and also setting up my development environment. All this so I can begin to work on some applications/proofs of concepts (POCs) for interacting with my Autonomous Database via ORDS (Oracle REST Database Services aka our REST APIs).

Don’t be like me and allow yourself to get too weighed down with all the technical jargon thrown your way. Naturally, our docs read like they are geared toward the System or Database Administrator. And this makes sense; remember when you are in your tenancy you’re the de-facto admin for everything. At a larger organization/enterprise a developer would probably never do any of this setup. You’d just sign in to the database directly or connect via a command line.

Workflow for the “Every Person”

But for me (with my limited experience), the workflow looks something like this:

Start here

Sign-up for an Oracle Cloud Account. You’ll be provided a Cloud Account Name* and credentials.

*You’ll also hear this account being referred to as your Tenancy name (you can modify the name later if you want).

When you sign in, you can quickly tell if you are the administrator by seeing if the "Service User Console" option is available. This is the administrator for the Oracle Cloud Infrastructure. Your tenancy.

Next, create an Autonomous Database and create the “ADMIN” credentials* for said database. You can see how to do that in Lab 1 of this workshop

REMINDER: This administrator is different that the OCI Tenancy administrator.

Add a new user by:

Navigating to your Autonomous Database under "Oracle Database".  It might be in your recently viewed.

Navigating to your Autonomous Database under “Oracle Database”. It might be in your recently viewed.

Once you've created your Autonomous Database, navigate to Database Actions. This is where you need to go to create a new user.

After you click the database, you’ll see this screen. Click “Database Actions”. You’ll link out to the Database Actions dashboard.

Navigate to the Administration tile, and click it. It will take you to a dashboard of all current users, along with an option to create a new user.

Select the Database Users option in the Administration section.

Create a new user. Afterward, you'll be provided an URL that allows a user to directly log into the Oracle Autonomous Database Actions Launchpad.


Select the “Create User” option.

Toggling the Web Access switch will automatically grant the user the CONNECT and RESOURCE roles.

At a minimum, you’ll want to enable “Web Access” for this user. This will automatically grant two roles for the user:

  1. CONNECT
  2. RESOURCE

Both are needed for the developer though.

Here you can see all the available roles. The CONNECT and RESOURCE roles are already checked because you will have selected "Web Access" in the previous tab.

If you want to take a look at the different roles available, click this tab. Scrolling through, you can see the CONNECT and RESOURCE roles are checked.

HINT: You can always go back and edit the roles if more roles need to be granted. Both are needed for the developer though.

Finish (kind of)

You can see the URLs in this image.  This, along with the login credentials are what should be provided to the user.  They can log in to the Database Actions console directly.

Once complete, you’ll see a URL in that user’s newly-created tile. This URL will link you out to a login page, for accessing the Database Actions console. Since you are the only user, you’ll just want to document the URL. Otherwise, as the admin, I’m assuming you’d share this with the respective recipient.

Next Up

From here on out, I’ll do all my development work with this account (I called mine “appdev”), to mimic what a typical user might encounter in a practical setting.

Right now, I’ m wrapping up a python + flask + database course on LinkedIn Learning. So far, its been pretty informative. If you are interested, you can take a look as well (to see the direction I’m headed). My goal will be to use the templates in this course as a resource for connecting to my database, but with ORDS.

I hope to have something small in the next week or so. And I’ll be sharing here, and on my GitHub as well. So stay tuned.

Helpful resources

What I did here was very simplified, a distillation for creating a new user in your Autonomous Database. But I’ll include some of the resources that helped get me to this level of understanding:

Find me

And that’s it for now. But if you want to follow along then check me out at these places…

Categories
Autonomous Database

Table Prep: Data loads and time zones

Next episode

My Autonomous Database in Oracle Cloud Infrastructure

With my Autonomous Database up and running,I needed to find some data to upload. After some quiet reflection, I decided upon three datasets that should be plenty fun.

Restaurants in Wake County, North Carolina. Wake County is where I, Chris Hoina, live.

A bit of context; I currently live in Wake County, North Carolina. And it turns out that we’ve been very progressive with providing open, accessible data to our citizens. I really wanted to do something with food, so I was surprised when I found the following three datasets (among the nearly 300 available):

  1. Restaurants in Wake County
  2. Restaurant Food Inspections/Grades
  3. Restaurant Food Inspection Violation History

Always read the docs

Upon reading the documentation, I discovered that these datasets could be all connected through their HSISID* Permit ID fields. Which should come in handy later as it could make for some very interesting (if not amusing) revelations.

*I’m not sure of the acronym, but the documentation defines the HSISID as a, “State code identifying the restaurant (also the primary key to identify the restaurant).”

To the data loading

I’m sticking with the Oracle Database Actions* web-based interface for now.

*Oracle Database Actions was previously known as Oracle SQL Developer Web.
In the Database Actions Launchpad, there are several actions for users. This includes development, data-loading, monitoring, and administration.
In the Database Actions Launchpad, there are several actions for users. This includes development, data-loading, monitoring, and administration.

And from what I learned in the Autonomous Database workshop, there are two ways to load data into your database (there may be other methods; mea culpa if so):

  1. SQL worksheet (found in the Development section, of your launchpad)
  2. Data Load Tool (found in the Data Tools section)

And I opted for the SQL worksheet method, since I need to practice and learn this method regardless.

Obstacles

After choosing my method of data loading, here is where I encountered some obstacles. With complete lack of regard for best practices, I proceeded to load all three datasets into my Autonomous Database only to discover two glaring issues.

First

Unfortunately, in my restaurant “violations” table, I encountered two rows that failed to load.

ORA-12899 error in Oracle Autonomous Database on Oracle Cloud Infrastructure.
ORA-12899 error in Oracle Autonomous Database on Oracle Cloud Infrastructure.

These values were loaded as a VARCHAR2 datatype; which by default, allows for a maximum byte size of 4000 bytes. Since these rows exceeded 4000 bytes (in that specific column) they failed. Fortunately, it was easy to increase the bytes to 10,000.

Apparently, I can increase to 32,767 bytes, but that is overboard. I also learned that the VARCHAR2 data type can have their max string sizes set to ‘STANDARD’ or ‘EXTENDED’ (i.e 4000 bytes vs 32,767 bytes). I’m assuming the Autonomous Database is set to EXTENDED by default, since I was able to increase this column, and re-upload with zero failed rows. You can read up on VARCHAR2 data types here.

Second

The second obstacle I ran into, took me about a day (on and off) to figure out. The dates in all three of these tables were constructed like this:

Timestamps in Oracle Autonomous Database.
Adjusting for the TIMESTAMPS in the Oracle Autonomous Database.

And at first glance, I thought I was looking at a time zone offset. Dates were either 04:00:00+00 hrs or 05:00:00+00 hrs; possibly due to daylight savings time. Which seemed like a reasonable assumption since the early-November — mid-March dates were an hour later than the mid-March — early-November dates (i.e. UTC-4hrs in Winter, UTC-5hrs all else).

My first thought was to modify the effected columns with something like this parameter:

TIMESTAMP [(fractional_seconds_precision)] WITH TIME ZONE

or with…

TIMESTAMP '1997-01-31 09:26:56.66 +02:00'

But, I’d never seen this date/time stamp before (the ‘esriFieldTypeDate’), so I had to investigate before I did anything (also, I’ve no idea how to update a column with a parameter like this, so it would have added additional time).

Turns out this data originates from ArcGIS Online. This field date type maps directly to Oracle’s ‘TIMESTAMP’ data type. Seen here in the documentation:

ArcGIS datatype mapping to Oracle Database.
In the ArcGIS documentation, it looks like the ‘esriFieldTypeDate’ maps to Oracle’s ‘TIMESTAMP’ datatype.

I later tinkered with an API Explorer on the Wake County datasets site and reviewed a sample JSON response for this data set. The response also confirmed the field type:

Wake County has an API Explorer on their datasets site, which you can sample query to review the JSON that is returned.
Wake County has an API Explorer on their datasets site, which you can sample query to review the JSON that is returned.

It’s funny, because after all this hunting, it looks like I won’t be able to include time zones anyways. It’s somewhat inconsequential as I’m only interested in Wake County, but for some that could be unnerving.

I wish I could say that I came up with a cool Python script to automate the cleaning of these columns (I would have had to do this for all three of my .CSV files), but I did not. It would have taken me an afternoon or longer to work through this since I haven’t toyed with .CSV Reader (in the Python standard library), Pandas, or NumPy in quite some time. In the end, (and it pains me to say this; embarrassingly really), I used the ‘Find + Replace’ feature in Numbers to remove the time/time zone portion of the dates.

I’m pathetic, I know. But the idea of creating a dedicated Python script does seem like a fun weekend project. Who knows, maybe I’ll post to GitHub. Regardless, my Autonomous Database is clean and primed for ORDS.

In closing

In closing

My advice is as such:

Always inspect your data before you do anything major, always make sure you understand the inputs and the resultant outputs of the data loading process, and always refer to the documentation FIRST so you can understand how data is being recorded, coded, displayed, etc..

Chris Hoina, 2022

Oh, and it wouldn’t hurt to think about how you might use this data in the future. I’m guessing that could help you avoid a lot of rework (i.e. technical debt).

Take me as an example. Since I’m focused on ORDS and REST enabling my database, I may want to consider things like:

  • What am I going to expose (via REST APIs)? or
  • What fields might the presentation layer of my application include?

In my case, a lot of this data will lay dormant, as I might choose to focus on only a few key fields. Of course, anything I create will be simple and basic. But that might not be the case for you.

Koala RESTing, Photo by Jordan Whitt on Unsplash

I’m off to REST…enable this database now, but in the meantime if you want to learn more about ORDS, check out:

Find me

Find me

And as always, catch me on:

Categories
Autonomous Database

An Oracle Autonomous Database workshop: here is what I learnt

Pop Quiznos

Recently, I traveled to Colorado. While flying, I finished up an Autonomous Database (ADB) workshop. What I’m saying is…I was in the clouds, on the cloud.

In my new new role, much of my work will be focused on Oracle’s REST Data Services (ORDS). But I’m starting out with the ADB.

After completing the workshop, I thought I’d share initial impressions and lessons learned. But first…a crash course on our Cloud Free Tier accounts and Always Free cloud services. Both are relevant for this workshop.  

Oracle Cloud Free Tier

When you sign up for an Oracle Cloud Infrastructure (OCI) account, you are automatically entitled to Always Free services. You’re also given a 30-day/$300 USD worth of free credits trial to use on all eligible OCI services (30 days or $300, whichever comes first). But the Always Free services are available for an unlimited period. This workshop uses Always Free services (lucky you).

Always Free

Errthang you get with Always Free…we are most concerned with the database options though.

On Databases

Credits can be used on many items (see above). And that includes various databases. In the workshop you get to provision an Autonomous Data Warehouse (we’re still Always Free).  

The Autonomous Database (ADB)

Overview

The ADB calls OCI home. It’s a fully managed, pre-configured database environment. Four workload types are available:

  • Autonomous Transaction Processing
  • Autonomous Data Warehouse
  • Autonomous JSON Database
  • Oracle APEX Application Development
NOTE: Always Free entitles you to two of the above databases; each with 1 OCPU* and 20 GB storage. There is also an option to choose a noSQL Database with 133 million reads/writes per month, 25 GB storage per table (up to 3 tables).
ALSO: An “OCPU” is an Oracle Compute Unit. 1 OCPU on the x86 (AMD/Intel) architecture is equivalent to 2 virtual processor cores (vCPUs). Whereas on the ARM architecture, the ratio is 1:1 (i.e., 1 OCPU = 1 vCPU).

Automation

Zero hardware configuration/management or software installation/configuration is required! Once your database is created, nearly everything is automated:  

  • Backups
  • Patching
  • Upgrading
  • Performance tuning

You’re a Database Administrator (DBA) and you didn’t even know it!

Scaling

The ADB shines here. CPU and storage can be scaled automatically. Auto-scaling can automatically add more CPU cores to the base number of cores during periods of high demand, and then automatically reduce the number of cores back to the base number as demand decreases (auto-scaling also allows up to three times the current base number of CPU cores at any time).

DISCLAIMER: Auto-scaling isn’t an Always Free cloud service, but it’s still cool and worth talking about. However, should you choose to upgrade in the future, it’s very easy to do:

To the Workshop!

The FREE workshop consists of five labs (and an optional sixth):

  • Lab 1: Provision an Autonomous Database
  • Lab 2: Work with Free Sample Data Sets
  • Lab 3: Load Data
  • Lab 4: Query External Data
  • Lab 5: Visualize Your Data

Once you’ve requested an Oracle Cloud account (everything can be done with the Free Tier option), you’ll then create an Autonomous Database. The instructions for requesting your account are simple and can be found here (but also on the workshop’s landing page).

Lab 1

Just two steps here, simple. Everything is done through the Oracle Cloud dashboard. My two takeaways:

  1. WRITE DOWN all your login/credential information! I hadn’t considered that as an Oracle employee I’d have more than one login (it can be confusing). Not to mention later, you’ll be creating an ADMIN password for the actual database (separate from your tenancy credentials).
  2. Initial Database configuration. I really appreciated that when you first configure your Autonomous Database, you are automatically provided 1 CPU and 20 GB of storage. But should you require more CPU or storage, everything can be adjusted with simple drop-down menus. Is nice that a user (like a developer or non-System/Database Administrator role) can provision their own free playground in a matter of minutes.

Lab 2

Oracle Database Web Browser.  My Autonomous Database.

Lab 2 reinforces how simple this workshop is. At this point, I’m still in the web browser. But of course, you could use the desktop option, Oracle SQL Developer.

  1. Things seemed fast. And that is saying something, because for part of this workshop (Lab 3 actually), I was in the air flying from Denver, Colorado to Raleigh, North Carolina using the airline wi-fi and objects were actually loading into their OCI buckets (I write about this later).
  2. Second takeaway was caching. When possible, the autonomous database caches query results. Meaning, for subsequent queries, results are just…there. That is a big deal to me, because in my previous role I was a Product Manager for an application performance tool. And there are a couple of things that just absolutely kill an application’s response time and/or throughput:
  • CPU intensive tasks (often repetitive ones)
  • I/O processes

TL;DR – we like fast. Fast is good 🙂

Lab 3

Floppy discs to represent data in the Oracle Cloud Autonomous Database

You’re creating a “Bucket” in the OCI and then loading Objects (approximately 45 floppy discs worth of .CSV and .DAT files) into that Bucket. Again, this is all point and click, and done in-browser. There is a wizard that guides you through everything. For someone who isn’t familiar with System or Database Administration, this was helpful and it made approaching “infrastructure” so much more accessible and seemingly less daunting.

I should remind that this might not be your typical workflow. You might not create a Bucket and then load Objects every time. But you will still need a URI, authorization token, and credentials to load this data into the specified database.

This lab was still a good exercise even for the non-Systems or Database Administrator. Maybe you’re a Developer or Business Intelligence Analyst. I don’t suspect many will perform all these tasks. But its important to understand how teams/roles/individual members are connected (from end-to-end).

One final thing I appreciated about this lab, was that there were two options for loading data into respective locations:

  1. the DATA LOAD tools option from within the Database Actions dashboard, or
  2. the PL/SQL package (DBMS_CLOUD)

Lab 4

Up until this point I felt like I was an active participant. But here I felt like I was cheating. he code snippets you are provided allow you to create an external table directly from your cloud object store (excerpt of the code; click the image to go to our GitHub). 

/* Specify the URL that you copied from your files in OCI Object Storage in the define base_URL line below*/

set define on
/* Specify the URL that you copied from your files in OCI Object Storage in the define base_URL line below*/
 
set define on
define &file_uri_base = '<bucket URI>'
 
begin
 dbms_cloud.create_external_table(
    table_name =>'CHANNELS_EXT',
    credential_name =>'OBJ_STORE_CRED',
    file_uri_list =>'&file_uri_base/chan_v3.dat',
    format => json_object('ignoremissingcolumns' value 'true', 'removequotes' value 'true'),
    column_list => 'CHANNEL_ID NUMBER,
        CHANNEL_DESC VARCHAR2(20),
        CHANNEL_CLASS VARCHAR2(20),
        CHANNEL_CLASS_ID NUMBER,
        CHANNEL_TOTAL VARCHAR2(13),
        CHANNEL_TOTAL_ID NUMBER'
 );
 
 dbms_cloud.create_external_table(
    table_name =>'COUNTRIES_EXT',
    credential_name =>'OBJ_STORE_CRED',
    file_uri_list =>'&file_uri_base/coun_v3.dat',
    format => json_object('ignoremissingcolumns' value 'true', 'removequotes' value 'true'),
    column_list => 'COUNTRY_ID NUMBER ,
        COUNTRY_ISO_CODE CHAR(2) ,
        COUNTRY_NAME VARCHAR2(40) ,
        COUNTRY_SUBREGION VARCHAR2(30) ,
        COUNTRY_SUBREGION_ID NUMBER ,
        COUNTRY_REGION VARCHAR2(20) ,
        COUNTRY_REGION_ID NUMBER ,
        COUNTRY_TOTAL VARCHAR2(11) ,
        COUNTRY_TOTAL_ID NUMBER ,
        COUNTRY_NAME_HIST VARCHAR2(40)'
 );

The code snippets you are provided allow you to create an external table directly from your cloud object store. Now you can then query from that external table, outside of the database. Without that code snippet, I would have struggled to do this on my own (or at least, not without some research). That is more of a “me” problem though.

I appreciate that this method of querying is available, especially if you are only concerned with a small subset of your data. Keeping this data closer to you can certainly make subsequent requests quicker.

Lab 5

Here we try our hand at business intelligence. You begin by downloading Oracle Analytics Desktop. The tool reminded me of Tableau or one of the SAS data visualization products that are available. 

Consider this: Biases aside, Oracle Analytics Desktop is free. And you can connect to your database within a matter of minutes. 

If its forecasting and visualizations you are looking for, this thing looks plenty powerful.

Much like before, everything is GUI-based. But you can also do much of the set-up in Oracle’s SQL Developer, or from within the browser. With some provided SQL, I defined a table from the data I imported in a previous Lab. From there you get some practice establishing a secure connection to your database, and then import that table into Oracle Analytics Desktop.

I don’t know if I should spend too much time here, because currently my focus isn’t data analysis/business intelligence. But once you establish a secure connection to your database, you can easily drop and drag data into the tool’s visualization pane and later manipulate the visualizations in countless ways.

My biggest takeaway here, is that once you have created a wallet from within your OCI dashboard, and established a connection to your data warehouse (from within Oracle’s Analytics Desktop), things are very intuitive. 

Everything is drag and drop. And since you’ll pre-define and save a subset of your data, you can’t break anything, you can just start over by pulling down subsequent data. Truthfully, if your focus isn’t Business Intelligence, then you can (and some are going to hate this) skim this lab to get the gist.

The End

There is a totally optional Lab 6, where you export your newly created project to a .DVA file. But I skimmed that and got the idea.

Like I mentioned before, this content will become increasingly more technical. I’ll be doing some exploring with the APIs in ORDS. And I’ll see what I can do with flask + python + ORDS. I’ve found some good references on Github and of course I’ll be share the [heavily commented] code here and on Github too!

And as always, you can find me here:

Until next time. Love, peace, and chicken grease.