Category: Autonomous Database

  • Table Prep: Data loads and time zones

    Table Prep: Data loads and time zones

    Next episode

    My Autonomous Database in Oracle Cloud Infrastructure

    With my Autonomous Database up and running,I needed to find some data to upload. After some quiet reflection, I decided upon three datasets that should be plenty fun.

    A bit of context; I currently live in Wake County, North Carolina. And it turns out that we’ve been very progressive with providing open, accessible data to our citizens. I really wanted to do something with food, so I was surprised when I found the following three datasets (among the nearly 300 available):

    1. Restaurants in Wake County
    2. Restaurant Food Inspections/Grades
    3. Restaurant Food Inspection Violation History
    Restaurants in Wake County, North Carolina. Wake County is where I, Chris Hoina, live.

    Always read the docs

    Upon reading the documentation, I discovered that these datasets could be all connected through their HSISID* Permit ID fields. Which should come in handy later as it could make for some very interesting (if not amusing) revelations.

    *I’m not sure of the acronym, but the documentation defines the HSISID as a, “State code identifying the restaurant (also the primary key to identify the restaurant).”

    To the data loading

    I’m sticking with the Oracle Database Actions* web-based interface for now.

    *Oracle Database Actions was previously known as Oracle SQL Developer Web.
    In the Database Actions Launchpad, there are several actions for users. This includes development, data-loading, monitoring, and administration.
    The Database Actions Launchpad offers several actions for users, including development, data loading, monitoring, and administration.

    And from what I learned in the Autonomous Database workshop, there are two ways to load data into your database (there may be other methods; mea culpa if so):

    1. SQL worksheet (found in the Development section, of your launchpad)
    2. Data Load Tool (found in the Data Tools section)

    And I opted for the SQL worksheet method since I need to practice and learn this method regardless.

    Obstacles

    After choosing my method of data loading, I encountered some obstacles. With no regard for best practices, I proceeded to load all three datasets into my Autonomous Database only to discover two glaring issues.

    First

    Unfortunately, in my restaurant “violations” table, I encountered two rows that failed to load.

    ORA-12899 error in Oracle Autonomous Database on Oracle Cloud Infrastructure.
    ORA-12899 error in Oracle Autonomous Database on Oracle Cloud Infrastructure.

    These values were loaded as a VARCHAR2 datatype, which, by default, allows for a maximum byte size of 4000 bytes. Since these rows exceeded 4000 bytes (in that specific column) they failed. Fortunately, it was easy to increase the bytes to 10,000.

    While you can increase it to 32,767 bytes, that is overboard. VARCHAR2 data types can have their max string sizes set to ‘STANDARD’ or ‘EXTENDED’ (i.e 4000 bytes vs 32,767 bytes). I’m assuming the Autonomous Database is set to EXTENDED by default, since I was able to increase this column, and re-upload with zero failed rows. You can read up on VARCHAR2 data types here.

    Second

    The second obstacle I ran into, took me about a day (on and off) to figure out. The dates in all three of these tables were constructed like this:

    Timestamps in Oracle Autonomous Database.
    Adjusting for the TIMESTAMPS in the Oracle Autonomous Database.

    At first glance, I thought I was looking at a time zone offset. Dates were either 04:00:00+00 hrs or 05:00:00+00 hrs; possibly due to daylight savings time. This seemed like a reasonable assumption since the early-November — mid-March dates were an hour later than the mid-March — early-November dates (i.e., UTC-4hrs in Winter, UTC-5hrs all else).

    My first thought was to modify the affected columns with something like this parameter:

    TIMESTAMP [(fractional_seconds_precision)] WITH TIME ZONE

    or with…

    TIMESTAMP '1997-01-31 09:26:56.66 +02:00'

    But, I’d never seen this date/time stamp before (the ‘esriFieldTypeDate’), so I had to investigate before I did anything (also, I’ve no idea how to update a column with a parameter like this, so it would have added additional time).

    Turns out this data originates from ArcGIS Online. This field data type maps directly to Oracle’s ‘TIMESTAMP’ data type. Seen here in the documentation:

    ArcGIS datatype mapping to Oracle Database.
    In the ArcGIS documentation, the ‘esriFieldTypeDate’ appears to map to Oracle’s ‘TIMESTAMP’ datatype.

    I later tinkered with an API Explorer on the Wake County datasets site and reviewed a sample JSON response for this data set. The response also confirmed the field type:

    Wake County has an API Explorer on their datasets site, which you can sample query to review the JSON that is returned.
    Wake County has an API Explorer on its datasets site, where you can sample a query to review the JSON that is returned.

    It’s funny because, after all this hunting, it looks like I won’t be able to include time zones anyways. It’s somewhat inconsequential as I’m only interested in Wake County, but for some that could be unnerving.

    I wish I could say that I came up with a cool Python script to automate the cleaning of these columns (I would have had to do this for all three of my .CSV files), but I did not. It would have taken me an afternoon or longer to work through this since I haven’t toyed with .CSV Reader (in the Python standard library), Pandas, or NumPy in quite some time. In the end, (and it pains me to say this, embarrassingly, really), I used the ‘Find + Replace’ feature in Numbers to remove the time/time zone portion of the dates.

    I’m pathetic, I know. But the idea of creating a dedicated Python script does seem like a fun weekend project. Who knows, maybe I’ll post to GitHub. Regardless, my Autonomous Database is clean and primed for ORDS.

    In closing

    In closing

    My advice is as such:

    Always inspect your data before you do anything major, always make sure you understand the inputs and the resultant outputs of the data loading process, and always refer to the documentation FIRST so you can understand how data is being recorded, coded, displayed, etc..

    Chris Hoina, 2022

    Oh, and it wouldn’t hurt to think about how you might use this data in the future. I’m guessing that could help you avoid a lot of rework (i.e. technical debt).

    Take me as an example. Since I’m focused on ORDS and REST enabling my database, I may want to consider things like:

    • What am I going to expose (via REST APIs)? or
    • What fields might the presentation layer of my application include?

    In my case, a lot of this data will lay dormant, as I might choose to focus on only a few key fields. Of course, anything I create will be simple and basic. But that might not be the case for you.

    Koala RESTing, Photo by Jordan Whitt on Unsplash

    I’m off to REST…enable this database now, but in the meantime, if you want to learn more about ORDS, check out:

    Find me

    And as always, catch me on:

  • An Oracle Autonomous Database workshop: here is what I learnt

    An Oracle Autonomous Database workshop: here is what I learnt

    Pop Quiznos

    Recently, I traveled to Colorado. While flying, I finished up an Autonomous Database (ADB) workshop. What I’m saying is…I was in the clouds, on the cloud.

    In my new new role, much of my work will be focused on Oracle’s REST Data Services (ORDS). But I’m starting out with the ADB.

    After completing the workshop, I thought I’d share initial impressions and lessons learned. But first…a crash course on our Cloud Free Tier accounts and Always Free cloud services. Both are relevant for this workshop.  

    Oracle Cloud Free Tier

    When you sign up for an Oracle Cloud Infrastructure (OCI) account, you are automatically entitled to Always Free services. You’re also given a 30-day/$300 USD worth of free credits trial to use on all eligible OCI services (30 days or $300, whichever comes first). But the Always Free services are available for an unlimited period. This workshop uses Always Free services (lucky you).

    Always Free

    Errthang you get with Always Free…we are most concerned with the database options though.

    On Databases

    Credits can be used on many items (see above). And that includes various databases. In the workshop you get to provision an Autonomous Data Warehouse (we’re still Always Free).  

    The Autonomous Database (ADB)

    Overview

    The ADB calls OCI home. It’s a fully managed, pre-configured database environment. Four workload types are available:

    • Autonomous Transaction Processing
    • Autonomous Data Warehouse
    • Autonomous JSON Database
    • Oracle APEX Application Development
    NOTE: Always Free entitles you to two of the above databases; each with 1 OCPU* and 20 GB storage. There is also an option to choose a noSQL Database with 133 million reads/writes per month, 25 GB storage per table (up to 3 tables).
    ALSO: An “OCPU” is an Oracle Compute Unit. 1 OCPU on the x86 (AMD/Intel) architecture is equivalent to 2 virtual processor cores (vCPUs). Whereas on the ARM architecture, the ratio is 1:1 (i.e., 1 OCPU = 1 vCPU).

    Automation

    Zero hardware configuration/management or software installation/configuration is required! Once your database is created, nearly everything is automated:  

    • Backups
    • Patching
    • Upgrading
    • Performance tuning

    You’re a Database Administrator (DBA) and you didn’t even know it!

    Scaling

    The ADB shines here. CPU and storage can be scaled automatically. Auto-scaling can automatically add more CPU cores to the base number of cores during periods of high demand, and then automatically reduce the number of cores back to the base number as demand decreases (auto-scaling also allows up to three times the current base number of CPU cores at any time).

    DISCLAIMER: Auto-scaling isn’t an Always Free cloud service, but it’s still cool and worth talking about. However, should you choose to upgrade in the future, it’s very easy to do:

    To the Workshop!

    The FREE workshop consists of five labs (and an optional sixth):

    • Lab 1: Provision an Autonomous Database
    • Lab 2: Work with Free Sample Data Sets
    • Lab 3: Load Data
    • Lab 4: Query External Data
    • Lab 5: Visualize Your Data

    Once you’ve requested an Oracle Cloud account (everything can be done with the Free Tier option), you’ll then create an Autonomous Database. The instructions for requesting your account are simple and can be found here (but also on the workshop’s landing page).

    Lab 1

    Just two steps here, simple. Everything is done through the Oracle Cloud dashboard. My two takeaways:

    1. WRITE DOWN all your login/credential information! I hadn’t considered that as an Oracle employee I’d have more than one login (it can be confusing). Not to mention later, you’ll be creating an ADMIN password for the actual database (separate from your tenancy credentials).
    2. Initial Database configuration. I really appreciated that when you first configure your Autonomous Database, you are automatically provided 1 CPU and 20 GB of storage. But should you require more CPU or storage, everything can be adjusted with simple drop-down menus. Is nice that a user (like a developer or non-System/Database Administrator role) can provision their own free playground in a matter of minutes.

    Lab 2

    Oracle Database Web Browser.  My Autonomous Database.

    Lab 2 reinforces how simple this workshop is. At this point, I’m still in the web browser. But of course, you could use the desktop option, Oracle SQL Developer.

    1. Things seemed fast. And that is saying something, because for part of this workshop (Lab 3 actually), I was in the air flying from Denver, Colorado to Raleigh, North Carolina using the airline wi-fi and objects were actually loading into their OCI buckets (I write about this later).
    2. Second takeaway was caching. When possible, the autonomous database caches query results. Meaning, for subsequent queries, results are just…there. That is a big deal to me, because in my previous role I was a Product Manager for an application performance tool. And there are a couple of things that just absolutely kill an application’s response time and/or throughput:
    • CPU intensive tasks (often repetitive ones)
    • I/O processes

    TL;DR – we like fast. Fast is good 🙂

    Lab 3

    Floppy discs to represent data in the Oracle Cloud Autonomous Database

    You’re creating a “Bucket” in the OCI and then loading Objects (approximately 45 floppy discs worth of .CSV and .DAT files) into that Bucket. Again, this is all point and click, and done in-browser. There is a wizard that guides you through everything. For someone who isn’t familiar with System or Database Administration, this was helpful and it made approaching “infrastructure” so much more accessible and seemingly less daunting.

    I should remind that this might not be your typical workflow. You might not create a Bucket and then load Objects every time. But you will still need a URI, authorization token, and credentials to load this data into the specified database.

    This lab was still a good exercise even for the non-Systems or Database Administrator. Maybe you’re a Developer or Business Intelligence Analyst. I don’t suspect many will perform all these tasks. But its important to understand how teams/roles/individual members are connected (from end-to-end).

    One final thing I appreciated about this lab, was that there were two options for loading data into respective locations:

    1. the DATA LOAD tools option from within the Database Actions dashboard, or
    2. the PL/SQL package (DBMS_CLOUD)

    Lab 4

    Up until this point I felt like I was an active participant. But here I felt like I was cheating. he code snippets you are provided allow you to create an external table directly from your cloud object store (excerpt of the code; click the image to go to our GitHub). 

    /* Specify the URL that you copied from your files in OCI Object Storage in the define base_URL line below*/
    
    set define on
    /* Specify the URL that you copied from your files in OCI Object Storage in the define base_URL line below*/
     
    set define on
    define &file_uri_base = '<bucket URI>'
     
    begin
     dbms_cloud.create_external_table(
        table_name =>'CHANNELS_EXT',
        credential_name =>'OBJ_STORE_CRED',
        file_uri_list =>'&file_uri_base/chan_v3.dat',
        format => json_object('ignoremissingcolumns' value 'true', 'removequotes' value 'true'),
        column_list => 'CHANNEL_ID NUMBER,
            CHANNEL_DESC VARCHAR2(20),
            CHANNEL_CLASS VARCHAR2(20),
            CHANNEL_CLASS_ID NUMBER,
            CHANNEL_TOTAL VARCHAR2(13),
            CHANNEL_TOTAL_ID NUMBER'
     );
     
     dbms_cloud.create_external_table(
        table_name =>'COUNTRIES_EXT',
        credential_name =>'OBJ_STORE_CRED',
        file_uri_list =>'&file_uri_base/coun_v3.dat',
        format => json_object('ignoremissingcolumns' value 'true', 'removequotes' value 'true'),
        column_list => 'COUNTRY_ID NUMBER ,
            COUNTRY_ISO_CODE CHAR(2) ,
            COUNTRY_NAME VARCHAR2(40) ,
            COUNTRY_SUBREGION VARCHAR2(30) ,
            COUNTRY_SUBREGION_ID NUMBER ,
            COUNTRY_REGION VARCHAR2(20) ,
            COUNTRY_REGION_ID NUMBER ,
            COUNTRY_TOTAL VARCHAR2(11) ,
            COUNTRY_TOTAL_ID NUMBER ,
            COUNTRY_NAME_HIST VARCHAR2(40)'
     );

    The code snippets you are provided allow you to create an external table directly from your cloud object store. Now you can then query from that external table, outside of the database. Without that code snippet, I would have struggled to do this on my own (or at least, not without some research). That is more of a “me” problem though.

    I appreciate that this method of querying is available, especially if you are only concerned with a small subset of your data. Keeping this data closer to you can certainly make subsequent requests quicker.

    Lab 5

    Here we try our hand at business intelligence. You begin by downloading Oracle Analytics Desktop. The tool reminded me of Tableau or one of the SAS data visualization products that are available. 

    Consider this: Biases aside, Oracle Analytics Desktop is free. And you can connect to your database within a matter of minutes. 
    
    If its forecasting and visualizations you are looking for, this thing looks plenty powerful.

    Much like before, everything is GUI-based. But you can also do much of the set-up in Oracle’s SQL Developer, or from within the browser. With some provided SQL, I defined a table from the data I imported in a previous Lab. From there you get some practice establishing a secure connection to your database, and then import that table into Oracle Analytics Desktop.

    I don’t know if I should spend too much time here, because currently my focus isn’t data analysis/business intelligence. But once you establish a secure connection to your database, you can easily drop and drag data into the tool’s visualization pane and later manipulate the visualizations in countless ways.

    My biggest takeaway here, is that once you have created a wallet from within your OCI dashboard, and established a connection to your data warehouse (from within Oracle’s Analytics Desktop), things are very intuitive. 

    Everything is drag and drop. And since you’ll pre-define and save a subset of your data, you can’t break anything, you can just start over by pulling down subsequent data. Truthfully, if your focus isn’t Business Intelligence, then you can (and some are going to hate this) skim this lab to get the gist.

    The End

    There is a totally optional Lab 6, where you export your newly created project to a .DVA file. But I skimmed that and got the idea.

    Like I mentioned before, this content will become increasingly more technical. I’ll be doing some exploring with the APIs in ORDS. And I’ll see what I can do with flask + python + ORDS. I’ve found some good references on Github and of course I’ll be share the [heavily commented] code here and on Github too!

    And as always, you can find me here:

    Until next time. Love, peace, and chicken grease.