Loading…
Back To Schedule
Tuesday, October 1 • 4:30pm - 5:30pm
Python Pipeline Primer: Data Engineering with Azure DataBricks

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Azure DataBricks brings a Platform-as-a-Service offering of Apache Spark, which allows for blazing fast data processing, interactive querying and the hosting of machine learning models all in one place! But most of the buzz is around what it means for Data Science & AI - what about the humble data engineer who wants to harness the in-memory processing power within their ETL pipelines? How does it fit into the Modern Data Warehouse? What does data preparation look like in this new world? This session will run through the best practices of implementing Azure DataBricks as your data ingestion, transformation and curation tool of choice. We will: • Introduce the Azure DataBricks service • Introduce Python and why it is the language of choice for Data Engineering on DataBricks • Discuss the various hosting & compute options available • Demonstrate a sample data processing task • Compare and contrast against alternative approaches using SSIS, U-SQL and HDInsight • Demonstrate how to manage and orchestrate your processing pipelines • Review the wider architectures and additional extension patterns The session is aimed at Data Engineers & BI Professionals seeking to put the Azure DataBricks technology in the right context and learn how to use the service. We will not be covering the python programming language in detail.

Speakers
avatar for Simon Whiteley

Simon Whiteley

Coming from a world of traditional BI structures, Simon's now obsessed with utilising cloud technologies to revolutionise these traditions. Does Kimball translate to a serverless lambda architecture? Does rapidly evolving data interaction change our approaches? Is what was right six... Read More →


Tuesday October 1, 2019 4:30pm - 5:30pm CEST
Room 4