172 skills found · Page 4 of 6
mrezanejad / LineDrawingExtractionAutomatic line drawing extraction from scenes/images using logical/linear operators framework
alexellis / Mkaasmkaas: minikube on Kubernetes with CRDs
ultranet1 / APACHE AIRFLOW DATA PIPELINESProject Description: A music streaming company wants to introduce more automation and monitoring to their data warehouse ETL pipelines and they have come to the conclusion that the best tool to achieve this is Apache Airflow. As their Data Engineer, I was tasked to create a reusable production-grade data pipeline that incorporates data quality checks and allows for easy backfills. Several analysts and Data Scientists rely on the output generated by this pipeline and it is expected that the pipeline runs daily on a schedule by pulling new data from the source and store the results to the destination. Data Description: The source data resides in S3 and needs to be processed in a data warehouse in Amazon Redshift. The source datasets consist of JSON logs that tell about user activity in the application and JSON metadata about the songs the users listen to. Data Pipeline design: At a high-level the pipeline does the following tasks. Extract data from multiple S3 locations. Load the data into Redshift cluster. Transform the data into a star schema. Perform data validation and data quality checks. Calculate the most played songs for the specified time interval. Load the result back into S3. dag Structure of the Airflow DAG Design Goals: Based on the requirements of our data consumers, our pipeline is required to adhere to the following guidelines: The DAG should not have any dependencies on past runs. On failure, the task is retried for 3 times. Retries happen every 5 minutes. Catchup is turned off. Do not email on retry. Pipeline Implementation: Apache Airflow is a Python framework for programmatically creating workflows in DAGs, e.g. ETL processes, generating reports, and retraining models on a daily basis. The Airflow UI automatically parses our DAG and creates a natural representation for the movement and transformation of data. A DAG simply is a collection of all the tasks you want to run, organized in a way that reflects their relationships and dependencies. A DAG describes how you want to carry out your workflow, and Operators determine what actually gets done. By default, airflow comes with some simple built-in operators like PythonOperator, BashOperator, DummyOperator etc., however, airflow lets you extend the features of a BaseOperator and create custom operators. For this project, I developed several custom operators. operators The description of each of these operators follows: StageToRedshiftOperator: Stages data to a specific redshift cluster from a specified S3 location. Operator uses templated fields to handle partitioned S3 locations. LoadFactOperator: Loads data to the given fact table by running the provided sql statement. Supports delete-insert and append style loads. LoadDimensionOperator: Loads data to the given dimension table by running the provided sql statement. Supports delete-insert and append style loads. SubDagOperator: Two or more operators can be grouped into one task using the SubDagOperator. Here, I am grouping the tasks of checking if the given table has rows and then run a series of data quality sql commands. HasRowsOperator: Data quality check to ensure that the specified table has rows. DataQualityOperator: Performs data quality checks by running sql statements to validate the data. SongPopularityOperator: Calculates the top ten most popular songs for a given interval. The interval is dictated by the DAG schedule. UnloadToS3Operator: Stores the analysis result back to the given S3 location. Code for each of these operators is located in the plugins/operators directory. Pipeline Schedule and Data Partitioning: The events data residing on S3 is partitioned by year (2018) and month (11). Our task is to incrementally load the event json files, and run it through the entire pipeline to calculate song popularity and store the result back into S3. In this manner, we can obtain the top songs per day in an automated fashion using the pipeline. Please note, this is a trivial analyis, but you can imagine other complex queries that follow similar structure. S3 Input events data: s3://<bucket>/log_data/2018/11/ 2018-11-01-events.json 2018-11-02-events.json 2018-11-03-events.json .. 2018-11-28-events.json 2018-11-29-events.json 2018-11-30-events.json S3 Output song popularity data: s3://skuchkula-topsongs/ songpopularity_2018-11-01 songpopularity_2018-11-02 songpopularity_2018-11-03 ... songpopularity_2018-11-28 songpopularity_2018-11-29 songpopularity_2018-11-30 The DAG can be configured by giving it some default_args which specify the start_date, end_date and other design choices which I have mentioned above. default_args = { 'owner': 'shravan', 'start_date': datetime(2018, 11, 1), 'end_date': datetime(2018, 11, 30), 'depends_on_past': False, 'email_on_retry': False, 'retries': 3, 'retry_delay': timedelta(minutes=5), 'catchup_by_default': False, 'provide_context': True, } How to run this project? Step 1: Create AWS Redshift Cluster using either the console or through the notebook provided in create-redshift-cluster Run the notebook to create AWS Redshift Cluster. Make a note of: DWN_ENDPOINT :: dwhcluster.c4m4dhrmsdov.us-west-2.redshift.amazonaws.com DWH_ROLE_ARN :: arn:aws:iam::506140549518:role/dwhRole Step 2: Start Apache Airflow Run docker-compose up from the directory containing docker-compose.yml. Ensure that you have mapped the volume to point to the location where you have your DAGs. NOTE: You can find details of how to manage Apache Airflow on mac here: https://gist.github.com/shravan-kuchkula/a3f357ff34cf5e3b862f3132fb599cf3 start_airflow Step 3: Configure Apache Airflow Hooks On the left is the S3 connection. The Login and password are the IAM user's access key and secret key that you created. Basically, by using these credentials, we are able to read data from S3. On the right is the redshift connection. These values can be easily gathered from your Redshift cluster connections Step 4: Execute the create-tables-dag This dag will create the staging, fact and dimension tables. The reason we need to trigger this manually is because, we want to keep this out of main dag. Normally, creation of tables can be handled by just triggering a script. But for the sake of illustration, I created a DAG for this and had Airflow trigger the DAG. You can turn off the DAG once it is completed. After running this DAG, you should see all the tables created in the AWS Redshift. Step 5: Turn on the load_and_transform_data_in_redshift dag As the execution start date is 2018-11-1 with a schedule interval @daily and the execution end date is 2018-11-30, Airflow will automatically trigger and schedule the dag runs once per day for 30 times. Shown below are the 30 DAG runs ranging from start_date till end_date, that are trigged by airflow once per day. schedule
openshift / Operator Framework OlmA management framework for extending Kubernetes with Operators
cen-ngc5139 / Zookeeper OperatorKubernetes operator for deploying and managing ZooKeeper, Implement OAM framework
IBM / Oper8Oper8 is a framework for writing kubernetes operators in python. It implements many common patterns used by large cloud applications that are reusable across many operator design patterns.
BackAged / Tdset OperatorKubernetes operator
bitflow-stream / Go BitflowBitflow's dataflow engine, a lightweight framework for real-time processing of data streams through a graph of operators
blaqkube / Mysql OperatorA Kubernetes Operator for MySQL Community Server
J0LGER / VenomVenom is a collaborative C2 framework used by Red Team operators. providing an interactive Web GUI written in Python and PowerShell.
damienpontifex / LinearAlgebraExtensionsExtensions to simplify the use of the LinearAlgebra framework on iOS 8 and OS X 10.10 utilising Swift operator overloading
jgaskins / KubernetesKubernetes API client in Crystal, providing a framework for writing controllers/operators
tomoleary / DinoDerivative-Informed Neural Operator: An Efficient Framework for High-Dimensional Parametric Derivative Learning
Ortec-Finance / Sailfish HpcSailfish is a HPC Framework that works natively on Kubernetes. This project is an implementation of the Sailfish Framework in RH OpenShift using the RH Supported Operators: AMQ, CustomMetricAutoscaler, OpenShift Serverless and OpenShift Monitoring
Parity-LRX / FSCTEPA Multi-Operator Equivariant Framework for High-Performance Machine Learning Force Fields, supporting External Fields embedding and Physical Tensors prediction.
Neural-Mechanics-Lab / FolaxFolax (Finite Operator Learning with JAX) is a framework for solving and optimizing PDEs by integrating machine learning with numerical methods in computational mechanics.
reachjason / Web3 Operator HandbookActionable and opinionated no-bs ideas, frameworks and resources from successful operators in crypto to help build, grow and scale web3 products
bhaveshjaggi / PestDetectionPEST DETECTION USING IMAGE PROCESSING e The principal idea which empowered us to work on the project PEST DETECTION USING IMAGE PROCESSING is to ensure improved and better farming techniques for farmers. Our Solution: The techniques of image analysis are extensively applied to agricultural science, and it provides maximum protection to crops and also much less use of pesticides which can ultimately lead to better crop management and production. The following softwares are required for the project: OpenCV with C++/Python : It is a library which is designed for computational efficiency with a strong focus on real time applications. Pest Detection System Following are the image processing steps which are used in the proposed system. >Color Image to Gray Image Conversion Therefore, images are converted into gray scale images so that they can be handled easily and require less storage. The following equation shows how images are converted into gray scale images. I(x,y)=0.2989*B +0.5870*G +0.1140*B > Image Filtering The PSNR value is calculated for both the average and median resulting images .The average filter provides better result as compared to the median filter. So this paper uses average filter for further processing. > Image Segmentation To detect the pests from the images, the image background is calculated using morphological operators which is most critical after this image is subtracted from the original image. So the resulting image will only have the objects with pixel values 1 and background pixel values 0. >Noise Removal Noise contains dew drops, dust and other visible parts of leaves. As only the object of interest was to be visible on the images,so the aim was to remove the noise to get better and effective results. The Erosion algorithm has been used to remove isolated noisy pixels and to smoothen object boundaries . After noise removal,the next goal was to enhance the detected pests after segmentation which was performed by using the dilation algorithm. >Feature Extraction Different properties of the images are calculated on the basis of those attributes using which image is classified. For image properties, gray level co-occurrence matrix and regional properties of the images are calculated. These properties are used to train the support vector machine to classify images. >Counting of the pests on the leaves is the main purpose, so that it can give an idea of how much pests are there on a leaf.It uses Moore neighborhood tracing algorithm and Jacob's stopping criterion Feasibility: The present framework of pest detection is quite tedious and laborious for the farmers as they have to carry out their acre-acres surveys themselves and it requires a lot of vigorous efforts to achieve the same.Image analysis provides a realistic opportunity for the automation of insect pest detection.Through this system, crop technicians can easily count the pests from the collected specimens, and right pests’ management can be applied to increase both the quantity and quality of production. Using the automated system, crop technicians can make the monitoring process easier. So in order to bring enhancements in the system,we came up with more productive and well organised system with our idea .Due to this automaton applied,lucrativeness increases and labour is reduced.
janus-idp / OperatorDeprecated - Operator for Backstage, based on the Operator SDK framework - see https://github.com/redhat-developer/rhdh-operator
meta-pytorch / FACTOFramework for Algorithmic Correctness Testing of Operators