640 skills found · Page 10 of 22
yangyang14641 / ParallelProgrammingCourse:coffee:Parallel programming course at Peking University
arneish / Parallel PCA OpenmpA parallelized implementation of Principal Component Analysis (PCA) using Singular Value Decomposition (SVD) in OpenMP for C. The procedure used is Modified Gram Schmidt algorithm. The method for Classical Gram Schmidt is also available for use.
Dainerx / Parallel Distributed Computing CThis repository contains different implementations of algorithms done sequentially and in parallel to show the speed up for each case.
FurongHuang / ConvDicLearnTensorFactorTensor methods have emerged as a powerful paradigm for consistent learning of many latent variable models such as topic models, independent component analysis and dictionary learning. Model parameters are estimated via CP decomposition of the observed higher order input moments. However, in many domains, additional invariances such as shift invariances exist, enforced via models such as convolutional dictionary learning. In this paper, we develop novel tensor decomposition algorithms for parameter estimation of convolutional models. Our algorithm is based on the popular alternating least squares method, but with efficient projections onto the space of stacked circulant matrices. Our method is embarrassingly parallel and consists of simple operations such as fast Fourier transforms and matrix multiplications. Our algorithm converges to the dictionary much faster and more accurately compared to the alternating minimization over filters and activation maps.
Hossamomar / EM070 New FPGA Family For CNN Architectures High Speed Soft Neuron DesignWho doesn’t dream of a new FPGA family that can provide embedded hard neurons in its silicon architecture fabric instead of the conventional DSP and multiplier blocks? The optimized hard neuron design will allow all the software and hardware designers to create or test different deep learning network architectures, especially the convolutional neural networks (CNN), more easily and faster in comparing to any previous FPGA family in the market nowadays. The revolutionary idea about this project is to open the gate of creativity for a precise-tailored new generation of FPGA families that can solve the problems of wasting logic resources and/or unneeded buses width as in the conventional DSP blocks nowadays. The project focusing on the anchor point of the any deep learning architecture, which is to design an optimized high-speed neuron block which should replace the conventional DSP blocks to avoid the drawbacks that designers face while trying to fit the CNN architecture design to it. The design of the proposed neuron also takes the parallelism operation concept as it’s primary keystone, beside the minimization of logic elements usage to construct the proposed neuron cell. The targeted neuron design resource usage is not to exceeds 500 ALM and the expected maximum operating frequency of 834.03 MHz for each neuron. In this project, ultra-fast, adaptive, and parallel modules are designed as soft blocks using VHDL code such as parallel Multipliers-Accumulators (MACs), RELU activation function that will contribute to open a new horizon for all the FPGA designers to build their own Convolutional Neural Networks (CNN). We couldn’t stop imagining INTEL ALTERA to lead the market by converting the proposed designed CNN block and to be a part of their new FPGA architecture fabrics in a separated new Logic Family so soon. The users of such proposed CNN blocks will be amazed from the high-speed operation per seconds that it can provide to them while they are trying to design their own CNN architectures. For instance, and according to the first coding trial, the initial speed of just one MAC unit can reach 3.5 Giga Operations per Second (GOPS) and has the ability to multiply up to 4 different inputs beside a common weight value, which will lead to a revolution in the FPGA capabilities for adopting the era of deep learning algorithms especially if we take in our consideration that also the blocks can work in parallel mode which can lead to increasing the data throughput of the proposed project to about 16 Tera Operations per Second (TOPS). Finally, we believe that this proposed CNN block for FPGA is just the first step that will leave no areas for competitions with the conventional CPUs and GPUs due to the massive speed that it can provide and its flexible scalability that it can be achieved from the parallelism concept of operation of such FPGA-based CNN blocks.
jainsee24 / Parallel Face DetectionImage segmentation is the process of dividing an image into multiple parts. It is typically used to identify objects or other relevant information in digital images. There are many ways to perform image segmentation including Thresholding methods, Color-based segmentation, Transform methods among many others. Alternately edge detection can be used for image segmentation and data extraction in areas such as image processing, computer vision, and machine vision. Image thresholding is a simple, yet effective, way of partitioning an image into a foreground and background. This image analysis technique is a type of image segmentation that isolates objects by converting grayscale images into binary images. Image thresholding is most effective in images with high levels of contrast. Otsu's method, named after Nobuyuki Otsu, is one such implementation of Image Thresholding which involves iterating through all the possible threshold values and calculating a measure of spread for the pixel levels each side of the threshold, i.e. the pixels that either fall in foreground or background. The aim is to find the threshold value where the sum of foreground and background spreads is at its minimum. Edge detection is an image processing technique for finding the boundaries of objects within images. It works by detecting discontinuities in brightness. An image can have horizontal, vertical or diagonal edges. The Sobel operator is used to detect two kinds of edges in an image by making use of a derivative mask, one for the horizontal edges and one for the vertical edges. 1. Introduction Face detection is a computer technology being used in a variety of applications that identifies human faces in digital images. Face detection also refers to the psychological process by which humans locate and attend to faces in a visual scene. Face detection can be regarded as a specific case of object-class detection. In object-class detection, the task is to find the locations and sizes of all objects in an image that belong to a given class. Examples include upper torsos, pedestrians, and cars. Face-detection algorithms focus on the detection of frontal human faces. It is analogous to image detection in which the image of a person is matched bit by bit. Image matches with the image stores in database. Any facial feature changes in the database will invalidate the matching process. 2. Needs/Problems There have been widely applied many researches related to face recognition system. The system is commonly used for video surveillance, human and computer interaction, robot navigation, and etc. Along with the utilization of the system, it leads to the need for a faster system response, such as robot navigation or application for public safety. A number of classification algorithms have been applied to face recognition system, but it still has a problem in terms of computing time. In this system, computing time of the classification or feature extraction is an important thing for further concern. To improve the algorithmic efficiency of face detection, we combine the eigenface method using Haar-like features to detect both of eyes and face, and Robert cross edge detector to locate the human face position. Robert Cross uses the integral image representation and simple rectangular features to eliminate the need of expensive calculation of multi-scale image pyramid. 3. Objectives Some techniques used in this application are 1. Eigen-face technique 2. KLT Algorithm 3. Parallel for loop in openmp 4. OpenCV for face detection. 5. Further uses of the techniques
markwkm / QuicksortParallel quicksort algorithms
spectre900 / Parallel Strassen AlgorithmParallelizing Strassen’s matrix multiplication using OpenMP, MPI and CUDA.
HanjieLuo / EDLine ParallelA parallel implementation of EDLine algorithm.
uma-pi1 / DSGDppImplementations of various parallel algorithms for matrix factorization (including DSGD++)
LIS-Laboratory / CupccuPC: CUDA-based Parallel PC Algorithm for Causal Structure Learning on GPU
purtroppo / PageRankC implementation of the PageRank algorithm, with and without parallelization.
weifengliu-ssslab / Benchmark SpTRSM Using CSCFast Synchronization-Free Algorithms for Parallel Sparse Triangular Solves with Multiple Right-Hand Sides (SpTRSM)
zhangxiaoya / CCLThe implementation of algorithm Parallel graph component labelling with GPUs and CUDA.
abarankab / Parallel BoruvkaAn implementation of parallel Boruvka's algorithm written in C++ using OpenMP.
GonguanLu / Lagrangian Relaxation Algorithm For RRSLRPTo solve the RRS-LRP problem based on resource-space-time network, we developed a Lagrangian Relaxation Algorithm framework to decompose the origin problem into classic knapsack sub-problem and vehicle routing problem with recharging station (VRP-RS). The knapsack problem is solved by dynamic programming algorithm and a dynamic programming algorithm in RST network is developed to solve the VRP-RS. The dual problem of adjusting the Lagrangian multipliers was solved by an ascent method using sub-gradients approach. The algorithm framework is naturally suitable for parallel computing and distributed computing techniques due to the decomposition structure.
legalaspro / Maddpg Zoo TorchA modern implementation of MADDPG and MADDPG-Approx algorithms using PyTorch and PettingZoo environments. This project provides a clean, modular framework for multi-agent reinforcement learning research, featuring parallel training capabilities, comprehensive visualization tools, and support for various cooperative and competitive scenarios
ZPGuiGroupWhu / Spark Based DBSCAN AlgorithmsA parallel algorithm package for DBSCAN based on Apache Spark, including KDBSCAN, KDSG and other optimized DBSCAN algorithms. This framework consists of three parts, i.e., front-end web visualization components, Web service API component and back-end Spark-based algorithm packages.
Brionengine / Brion Quantum A.I. General SystemBrion is the world’s first quantum AI model with the deepest integrations, combining the power of quantum computing and artificial intelligence. Built on cutting-edge quantum algorithms and secure protocols, it is designed for unmatched high-speed, parallel computation that solves complex problems possibly infinitely faster.
vollmerm / Racket GaA parallel genetic algorithm implementation in Racket Scheme