SkillAgentSearch skills...

Devrel

This repository contains the notebooks and presentations we use for our Databricks Tech Talks

Install / Use

/learn @databricks/Devrel
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

tech-talks

This repository contains the notebooks and presentations we use for our Databricks Tech Talks.

You can find links to the tech talks below as well as the notebooks for these sessions directly in the repo.

Sections

<a name="Upcoming-Tech-Talks"/>

Upcoming-Tech-Talks

<img src="./images/Machine_learning_Black-01.png" width="32"/> 2020-04-29 - Workshop | Introduction to Data Analysis for Aspiring Data Scientists: Introduction to Apache Spark

<blockquote> This workshop covers the fundamentals of Apache Spark, the most popular big data processing engine. In this workshop, you will learn how to ingest data with Spark, analyze the Spark UI, and gain a better understanding of distributed computing. We will be using data released by the <a href="https://github.com/CSSEGISandData/COVID-19" target="_blank">Johns Hopkins Center for Systems Science and Engineering (CSSE) Novel Coronavirus (COVID-19)</a>. Prior basic Python experience is recommended. </blockquote><br/> <img src="./images/introduction-to-data-analysis-for-aspiring-data-scientists-part-4.jpg" width="800"/><br/>

<img src="https://pages.databricks.com/rs/094-YMS-629/images/delta-lake-tiny-logo.png"> 2020-04-30 Using Delta as a Change Data Capture Source

<blockquote> While it is common to use Delta Lake as a sink for change data captured from traditional data sources; customers are increasingly asking how to use Delta tables as a source for a change data capture (CDC) process. To state a different way, how can we read a stream of changes from a Delta table, so that they can be propagated downstream. In each of these cases, we want to capture a change stream from a Delta table and send it somewhere for further processing. In this session, we will discuss the architecture, use cases, and solutions. </blockquote> <img src="./images/using-delta-as-a-change-data-capture-source.jpeg" width="800"/><br/> <a name="Featured"/>

Featured

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/7/7e/Health_-_The_Noun_Project.svg/240px-Health_-_The_Noun_Project.svg.png" width="32"/> Notebook | Johns Hopkins CSSE COVID-19 Analysis

<blockquote> This notebook processes and performs quick analysis from the 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE (https://github.com/CSSEGISandData/COVID-19). The data is updated in the `/databricks-datasets/COVID/CSSEGISandData/` location regularly so you can access the data directly. The following animated GIF shows the COVID-19 confirmed cases and deaths per 100K people per the Johns Hopkins CSSE dataset spanning March 22nd to April 14th 2020. </blockquote><br/> <img src="./images/covid-19_jhu_v3.gif" width="800" style="border:1px solid black"/><br/>

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/7/7e/Health_-_The_Noun_Project.svg/240px-Health_-_The_Noun_Project.svg.png" width="32"/> Notebook | NY Times COVID-19 Analysis

<blockquote> This notebook processes and performs quick analysis from the NY Times COVID-19 dataset (https://github.com/nytimes/covid-19-data). The data is updated in the `/databricks-datasets/COVID/covid-19-data/` location regularly so you can access the data directly. The following animated GIFs shows the COVID-19 confirmed cases and deaths per 100K people from the NY Times dataset spanning two week window around when educational facilities were closed for Washington (3/13) and New York (3/18) states . </blockquote><br/> <table border=0 cellpadding=0 cellspacing=0> <tr> <td><img src="./images/covid-19_nyt_wa_edu.gif" width="400"/></td> <td><img src="./images/covid-19_nyt_ny_edu.gif" width="400"/></td> </tr> </table> <br/> <a name="Previous-Tech-Talks"/>

Previous-Tech-Talks

<img src="https://pages.databricks.com/rs/094-YMS-629/images/delta-lake-tiny-logo.png"> 2020-04-23 Predictive Maintenance (PdM) on IoT Data for Early Fault Detection w/ Delta Lake

<blockquote> Predictive Maintenance (PdM) is different from other routine or time-based maintenance approaches as it combines various sensor readings and sophisticated analytics on thousands of logged events in near real time and promises several fold improvements in cost savings because tasks are performed only when warranted. The collaborative Data and Analytics platform from Databricks is a great technology fit to facilitate these use cases by providing a single unified platform to ingest the sensor data, perform the necessary transformations and exploration, run ML and generate valuable insights. </blockquote><br/>

<img src="./images/Machine_learning_Black-01.png" width="32"/> 2020-04-22 - Workshop | Introduction to Data Analysis for Aspiring Data Scientists: Machine Learning with scikit-learn

<blockquote> scikit-learn is one of the most popular open-source machine learning libraries among data science practitioners. This workshop will walk through what machine learning is, the different types of machine learning, and how to build a simple machine learning model. This workshop focuses on the techniques of applying and evaluating machine learning methods, rather than the statistical concepts behind them. We will be using data released by the <a href="https://github.com/CSSEGISandData/COVID-19" target="_blank">Johns Hopkins Center for Systems Science and Engineering (CSSE) Novel Coronavirus (COVID-19)</a>. Prior basic Python experience is recommended. </blockquote><br/>

<img src="https://pages.databricks.com/rs/094-YMS-629/images/delta-lake-tiny-logo.png"> 2020-04-16 - Diving into Delta Lake: DML Internals

<blockquote> In the earlier Delta Lake Internals webinar series sessions, we described how the Delta Lake transaction log works. In this session, we will dive deeper into how commits, snapshot isolation, and partition and files change when performing deletes, updates, merges, and structured streaming. </blockquote><br/>

<img src="./images/Machine_learning_Black-01.png" width="32"/> 2020-04-15 - Workshop | Introduction to Data Analysis for Aspiring Data Scientists: Data Analysis with Pandas

<blockquote> This workshop is on pandas, a powerful open-source Python package for data analysis and manipulation. In this workshop, you will learn how to read data, compute summary statistics, check data distributions, conduct basic data cleaning and transformation, and plot simple visualizations. We will be using data released by the <a href="https://github.com/CSSEGISandData/COVID-19" target="_blank">Johns Hopkins Center for Systems Science and Engineering (CSSE) Novel Coronavirus (COVID-19)</a>. Prior basic Python experience is recommended. </blockquote><br/>

<img src="./images/Machine_learning_Black-01.png" width="32"/> 2020-04-08 - Workshop | Introduction to Data Analysis for Aspiring Data Scientists: Introduction to Python on Databricks

<blockquote> Python is a popular programming language because of its wide applications including but not limited to data analysis, machine learning, and web development. This workshop covers major foundational concepts necessary for you to start coding in Python, with a focus on data analysis. You will learn about different types of variables, for loops, functions, and conditional statements. No prior programming knowledge is required. </blockquote><br/>

<img src="https://pages.databricks.com/rs/094-YMS-629/images/delta-lake-tiny-logo.png">2020-04-02 - Diving into Delta Lake: Enforcing and Evolving Schema

<blockquote> As business problems and requirements evolve over time, so too does the structure of your data. With Delta Lake, as the data changes, incorporating new dimensions is easy. Users have access to simple semantics to control the schema of their tables. These tools include schema enforcement, which prevents users from accidentally polluting their tables with mistakes or garbage data, as well as schema evolution, which enables them to automatically add new columns of rich data when those columns belong. In this webinar, we’ll dive into the use of these tools. </blockquote><br/>

<img src="https://pages.databricks.com/rs/094-YMS-629/images/delta-lake-tiny-logo.png">2020-03-26 - Diving into Delta Lake: Unpacking the Transaction Log

<blockquote> The transaction log is key to understanding Delta Lake because it is the common thread that runs through many of its most important features, including ACID transactions, scalable metadata handling, time travel, and more. In this session, we’ll explore what the Delta Lake transaction log is, how it works at the file level, and how it offers an elegant solution to the problem of multiple concurrent reads and writes. </blockquote><br/>

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/7/7e/Health_-_The_Noun_Project.svg/240px-Health_-_The_Noun_Project.svg.png" width="32" /> 2020-03-19 - Analyzing COVID-19: Can the Data Community Help?

<blockquote> With the current concerns over SARS-Cov-2 and COVID-19, there are now various COVID-19 datasets on Kaggle and GitHub, competitions such as the <a href="https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge" target="_blank">COVID-19 Open Researc
View on GitHub
GitHub Stars736
CategoryDevelopment
Updated14d ago
Forks445

Languages

HTML

Security Score

80/100

Audited on Mar 19, 2026

No findings