Automating Typical Tableau Tasks With Python: Part 1
Tableau Server exposes REST API that allows you to control and interact with almost all aspects of the server. Tableau Server Client (TSC) library for Python can be used to automate multiple house keeping tasks that most of the Tableau developers and support teams encounter.
Authentication
Lets start building the library of useful Tableau automations by creating a TableauHelper
class and adding a new method to handle authentication on the server.
Tableau API is evolving and it is highly recommended to expose version
parameter to be able to pick the version that your Tableau server supports.
More details on API versions and supported features here.
Refresh Data Source
Now we can add something actually valuable to the class by implementing a method to handle the data source refresh.
So, what’s going on in the code above?
In order to call datasources.refresh
we need to provide a DatasourceItem
object. There are two ways of retrieving that for the desired data source:
- use
datasources.get_by_id
method, which requires you to know the ID of the data source - or, retrieve the list of all data sources available on the server with
datasources.get
and then filter it
Second option can be used with data source names, which makes it more user friendly than ID option. But it comes with a small risk that refresh might be triggered for incorrect object. Setting site
and project
parameters in TableauHelper
class instance allows to reduce the scope where the objects are being searched, hence minimising the risk.
When datasources.refresh
method is called, it will create a new refresh job on the server and return its ID:
{
'data_source': 'Datasource_Name',
'job_id': 'xxxxxxxx-yyyy-zzzz-1111-123456789012'
}
The above example can be easily repurposed to trigger workbook refresh by creating a new method that will call workbooks.get
and workbooks.refresh
.
Get Job Status
Sometimes it makes sense not only to create a job on the Tableau server but also wait for this job to complete and retrieve its status.
We can add a new method to the TableauHelper
class to facilitate that.
Above code will poll the server for the status of the job every second until the job is completed.
And now refresh_datasources
method can be altered a little bit to provide more info about job execution.
{
'completed_at': '2021-01-25 17:58:53 UTC',
'data_source': 'Datasource_Name',
'finish_code': 0,
'job_id': 'xxxxxxxx-yyyy-zzzz-1111-123456789012',
'job_status': 'Success',
'started_at': '2021-01-25 17:58:22 UTC'
}
When the above functionality is implemented, refreshing Tableau data source and getting the refresh status will be as easy as:
The above examples give the general idea how to authenticate on the Tableau server, trigger data source refresh and retrieve job status from the server.
There are a few possible improvements that come to mind:
- trigger refresh and collect job statuses asynchronously — this is not super valuable if you just want to trigger refresh and forget, but job execution can take some time and you don’t want your code to wait for one job to finish, before going to the next one
- add some kind of a polling timeout for the long running jobs.