265 skills found · Page 1 of 9
aws / Aws SDK Pandaspandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
aws-samples / Aws Glue SamplesAWS Glue code samples
awslabs / Aws Glue LibsAWS Glue Libraries are additions and enhancements to Spark for ETL operations.
dgomesbr / Awesome Aws Workshops(Unofficial) curated list of awesome workshops found around in the internet. As we all have been there, finding that workshop that you have just attended shouldn't be hard. The idea is to provide an easy central repository, in a collaborative way.
tokern / PiicatcherScan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub
streamthoughts / JikkouThe Open source Resource as Code framework for Apache Kafka. Jikkou helps you implement GitOps for Kafka at scale!
data-dot-all / DataallA modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
awslabs / Aws Glue Data Catalog Client For Apache Hive MetastoreThe AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog as a central repository to store structural and operational metadata for their data. AWS Glue provides out-of-box integration with Amazon EMR that enables customers to use the AWS Glue Data Catalog as an external Hive Metastore. This is an open-source implementation of the Apache Hive Metastore client on Amazon EMR clusters that uses the AWS Glue Data Catalog as an external Hive Metastore. It serves as a reference implementation for building a Hive Metastore-compatible client that connects to the AWS Glue Data Catalog. It may be ported to other Hive Metastore-compatible platforms such as other Hadoop and Apache Spark distributions
airscholar / RedditDataEngineeringThis project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and services including Apache Airflow, Celery, PostgreSQL, Amazon S3, AWS Glue, Amazon Athena, and Amazon Redshift.
awsdocs / Aws Glue Developer GuideThe open source version of the AWS Glue docs. You can submit feedback & requests for changes by submitting issues in this repo or by making proposed changes & submitting a pull request.
aws-samples / Data Lake As CodeData Lake as Code, featuring ChEMBL and OpenTargets
awslabs / Aws Glue Schema RegistryAWS Glue Schema Registry Client library provides serializers / de-serializers for applications to integrate with AWS Glue Schema Registry Service. The library currently supports Avro, JSON and Protobuf data formats. See https://docs.aws.amazon.com/glue/latest/dg/schema-registry.html to get started.
awslabs / Athena Glue Service LogsGlue scripts for converting AWS Service Logs for use in Athena
aws-samples / Amazon Deequ GlueAutomated data quality suggestions and analysis with Deequ on AWS Glue
aws-samples / Cloud ExperimentsOpen innovation with 60 minute cloud experiments on AWS
aws-samples / Streamlit Application Deployment On AwsStreamlit EDA Dashboard Powered by AWS Cloud
aws-samples / Aws Glue Data Catalog Replication UtilityReplication utility for AWS Glue Data Catalog
Ditectrev / Amazon Web Services Certified AWS Certified Machine Learning MLS C01 Practice Tests Exams Question⛳️ PASS: Amazon Web Services Certified (AWS Certified) Machine Learning Specialty (MLS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.
awslabs / Aws Glue Blueprint LibsNo description available
aws-samples / Aws Ml Data Lake WorkshopAs customers move from building data lakes and analytics on AWS to building machine learning solutions, one of their biggest challenges is getting visibility into their data for feature engineering and data format conversions for using AWS SageMaker. In this workshop, we demonstrate best practices and build data pipelines for training data using Amazon Kinesis Data Firehose, AWS Glue, and Amazon SageMaker, and then we use Amazon SageMaker for inference.