1,205 skills found · Page 13 of 41
mirzayasirabdullahbaig07 / Ethereum ETH USDT Price Forecasting Using ARIMAA time series forecasting project that applies the ARIMA (AutoRegressive Integrated Moving Average) model to predict the future price of Ethereum (ETH) against USDT. This project involves data preprocessing, trend analysis, model tuning, and evaluation using metrics like RMSE and AIC.
jddeguia / Compare Forecast ModelsEnergy production of photovoltaic (PV) system is heavily influenced by solar irradiance. Accurate prediction of solar irradiance leads to optimal dispatching of available energy resources and anticipating end-user demand. However, it is difficult to do due to fluctuating nature of weather patterns. In the study, neural network models were defined to predict solar irradiance values based on weather patterns. Models included in the study are artificial neural network, convolutional neural network, bidirectional long-short term memory (LSTM) and stacked LSTM. Preprocessing methods such as data normalization and principal component analysis were applied before model training. Regression metrics such as mean squared error (MSE), maximum residual error (max error), mean absolute error (MAE), explained variance score (EVS), and regression score function (R2 score), were used to evaluate the performance of model prediction. Plots such as prediction curves, learning curves, and histogram of error distribution were also considered as well for further analysis of model performance. All models showed that it is capable of learning unforeseen values, however, stacked LSTM has the best results with the max error, R2, MAE, MSE, and EVS values of 651.536, 0.953, 41.738, 5124.686, and 0.946, respectively.
Rushikesh8983 / MastersDataScience Deep Learning ProjectLanguage Translation In this project, you’re going to take a peek into the realm of neural network machine translation. You’ll be training a sequence to sequence model on a dataset of English and French sentences that can translate new sentences from English to French. Get the Data Since translating the whole language of English to French will take lots of time to train, we have provided you with a small portion of the English corpus. """ DON'T MODIFY ANYTHING IN THIS CELL """ import helper import problem_unittests as tests source_path = 'data/small_vocab_en' target_path = 'data/small_vocab_fr' source_text = helper.load_data(source_path) target_text = helper.load_data(target_path) Explore the Data Play around with view_sentence_range to view different parts of the data. view_sentence_range = (0, 10) """ DON'T MODIFY ANYTHING IN THIS CELL """ import numpy as np print('Dataset Stats') print('Roughly the number of unique words: {}'.format(len({word: None for word in source_text.split()}))) sentences = source_text.split('\n') word_counts = [len(sentence.split()) for sentence in sentences] print('Number of sentences: {}'.format(len(sentences))) print('Average number of words in a sentence: {}'.format(np.average(word_counts))) print() print('English sentences {} to {}:'.format(*view_sentence_range)) print('\n'.join(source_text.split('\n')[view_sentence_range[0]:view_sentence_range[1]])) print() print('French sentences {} to {}:'.format(*view_sentence_range)) print('\n'.join(target_text.split('\n')[view_sentence_range[0]:view_sentence_range[1]])) Dataset Stats Roughly the number of unique words: 227 Number of sentences: 137861 Average number of words in a sentence: 13.225277634719028 English sentences 0 to 10: new jersey is sometimes quiet during autumn , and it is snowy in april . the united states is usually chilly during july , and it is usually freezing in november . california is usually quiet during march , and it is usually hot in june . the united states is sometimes mild during june , and it is cold in september . your least liked fruit is the grape , but my least liked is the apple . his favorite fruit is the orange , but my favorite is the grape . paris is relaxing during december , but it is usually chilly in july . new jersey is busy during spring , and it is never hot in march . our least liked fruit is the lemon , but my least liked is the grape . the united states is sometimes busy during january , and it is sometimes warm in november . French sentences 0 to 10: new jersey est parfois calme pendant l' automne , et il est neigeux en avril . les états-unis est généralement froid en juillet , et il gèle habituellement en novembre . california est généralement calme en mars , et il est généralement chaud en juin . les états-unis est parfois légère en juin , et il fait froid en septembre . votre moins aimé fruit est le raisin , mais mon moins aimé est la pomme . son fruit préféré est l'orange , mais mon préféré est le raisin . paris est relaxant en décembre , mais il est généralement froid en juillet . new jersey est occupé au printemps , et il est jamais chaude en mars . notre fruit est moins aimé le citron , mais mon moins aimé est le raisin . les états-unis est parfois occupé en janvier , et il est parfois chaud en novembre . Implement Preprocessing Function Text to Word Ids As you did with other RNNs, you must turn the text into a number so the computer can understand it. In the function text_to_ids(), you'll turn source_text and target_text from words to ids. However, you need to add the <EOS> word id at the end of target_text. This will help the neural network predict when the sentence should end. You can get the <EOS> word id by doing: target_vocab_to_int['<EOS>'] You can get other word ids using source_vocab_to_int and target_vocab_to_int. def text_to_ids(source_text, target_text, source_vocab_to_int, target_vocab_to_int): """ Convert source and target text to proper word ids :param source_text: String that contains all the source text. :param target_text: String that contains all the target text. :param source_vocab_to_int: Dictionary to go from the source words to an id :param target_vocab_to_int: Dictionary to go from the target words to an id :return: A tuple of lists (source_id_text, target_id_text) """ # TODO: Implement Function source_id_text = [[source_vocab_to_int[word] for word in sentence.split()] \ for sentence in source_text.split('\n')] target_id_text = [[target_vocab_to_int[word] for word in sentence.split()] + [target_vocab_to_int['<EOS>']] \ for sentence in target_text.split('\n')] return source_id_text, target_id_text """ DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE """ tests.test_text_to_ids(text_to_ids) Tests Passed Preprocess all the data and save it Running the code cell below will preprocess all the data and save it to file. """ DON'T MODIFY ANYTHING IN THIS CELL """ helper.preprocess_and_save_data(source_path, target_path, text_to_ids) Check Point This is your first checkpoint. If you ever decide to come back to this notebook or have to restart the notebook, you can start from here. The preprocessed data has been saved to disk. import problem_unittests as tests """ DON'T MODIFY ANYTHING IN THIS CELL """ import numpy as np import helper (source_int_text, target_int_text), (source_vocab_to_int, target_vocab_to_int), _ = helper.load_preprocess() Check the Version of TensorFlow and Access to GPU This will check to make sure you have the correct version of TensorFlow and access to a GPU """ DON'T MODIFY ANYTHING IN THIS CELL """ from distutils.version import LooseVersion import warnings import tensorflow as tf from tensorflow.python.layers.core import Dense # Check TensorFlow Version assert LooseVersion(tf.__version__) >= LooseVersion('1.1'), 'Please use TensorFlow version 1.1 or newer' print('TensorFlow Version: {}'.format(tf.__version__)) # Check for a GPU if not tf.test.gpu_device_name(): warnings.warn('No GPU found. Please use a GPU to train your neural network.') else: print('Default GPU Device: {}'.format(tf.test.gpu_device_name())) TensorFlow Version: 1.1.0 Default GPU Device: /gpu:0 Build the Neural Network You'll build the components necessary to build a Sequence-to-Sequence model by implementing the following functions below: model_inputs process_decoder_input encoding_layer decoding_layer_train decoding_layer_infer decoding_layer seq2seq_model Input Implement the model_inputs() function to create TF Placeholders for the Neural Network. It should create the following placeholders: Input text placeholder named "input" using the TF Placeholder name parameter with rank 2. Targets placeholder with rank 2. Learning rate placeholder with rank 0. Keep probability placeholder named "keep_prob" using the TF Placeholder name parameter with rank 0. Target sequence length placeholder named "target_sequence_length" with rank 1 Max target sequence length tensor named "max_target_len" getting its value from applying tf.reduce_max on the target_sequence_length placeholder. Rank 0. Source sequence length placeholder named "source_sequence_length" with rank 1 Return the placeholders in the following the tuple (input, targets, learning rate, keep probability, target sequence length, max target sequence length, source sequence length) def model_inputs(): """ Create TF Placeholders for input, targets, learning rate, and lengths of source and target sequences. :return: Tuple (input, targets, learning rate, keep probability, target sequence length, max target sequence length, source sequence length) """ # TODO: Implement Function inputs = tf.placeholder(tf.int32, [None, None], 'input') targets = tf.placeholder(tf.int32, [None, None]) learning_rate = tf.placeholder(tf.float32, []) keep_prob = tf.placeholder(tf.float32, [], 'keep_prob') target_sequence_length = tf.placeholder(tf.int32, [None], 'target_sequence_length') max_target_len = tf.reduce_max(target_sequence_length) source_sequence_length = tf.placeholder(tf.int32, [None], 'source_sequence_length') return inputs, targets, learning_rate, keep_prob, target_sequence_length, max_target_len, source_sequence_length """ DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE """ tests.test_model_inputs(model_inputs) Tests Passed Process Decoder Input Implement process_decoder_input by removing the last word id from each batch in target_data and concat the GO ID to the begining of each batch. def process_decoder_input(target_data, target_vocab_to_int, batch_size): """ Preprocess target data for encoding :param target_data: Target Placehoder :param target_vocab_to_int: Dictionary to go from the target words to an id :param batch_size: Batch Size :return: Preprocessed target data """ # TODO: Implement Function go = tf.constant([[target_vocab_to_int['<GO>']]]*batch_size) # end = tf.slice(target_data, [0, 0], [-1, batch_size]) end = tf.strided_slice(target_data, [0, 0], [batch_size, -1], [1, 1]) return tf.concat([go, end], 1) """ DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE """ tests.test_process_encoding_input(process_decoder_input) Tests Passed Encoding Implement encoding_layer() to create a Encoder RNN layer: Embed the encoder input using tf.contrib.layers.embed_sequence Construct a stacked tf.contrib.rnn.LSTMCell wrapped in a tf.contrib.rnn.DropoutWrapper Pass cell and embedded input to tf.nn.dynamic_rnn() from imp import reload reload(tests) def encoding_layer(rnn_inputs, rnn_size, num_layers, keep_prob, source_sequence_length, source_vocab_size, encoding_embedding_size): """ Create encoding layer :param rnn_inputs: Inputs for the RNN :param rnn_size: RNN Size :param num_layers: Number of layers :param keep_prob: Dropout keep probability :param source_sequence_length: a list of the lengths of each sequence in the batch :param source_vocab_size: vocabulary size of source data :param encoding_embedding_size: embedding size of source data :return: tuple (RNN output, RNN state) """ # TODO: Implement Function embed = tf.contrib.layers.embed_sequence(rnn_inputs, source_vocab_size, encoding_embedding_size) def lstm_cell(): lstm = tf.contrib.rnn.BasicLSTMCell(rnn_size) return tf.contrib.rnn.DropoutWrapper(lstm, keep_prob) stacked_lstm = tf.contrib.rnn.MultiRNNCell([lstm_cell() for _ in range(num_layers)]) # initial_state = stacked_lstm.zero_state(source_sequence_length, tf.float32) return tf.nn.dynamic_rnn(stacked_lstm, embed, source_sequence_length, dtype=tf.float32) # initial_state=initial_state) """ DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE """ tests.test_encoding_layer(encoding_layer) Tests Passed Decoding - Training Create a training decoding layer: Create a tf.contrib.seq2seq.TrainingHelper Create a tf.contrib.seq2seq.BasicDecoder Obtain the decoder outputs from tf.contrib.seq2seq.dynamic_decode def decoding_layer_train(encoder_state, dec_cell, dec_embed_input, target_sequence_length, max_summary_length, output_layer, keep_prob): """ Create a decoding layer for training :param encoder_state: Encoder State :param dec_cell: Decoder RNN Cell :param dec_embed_input: Decoder embedded input :param target_sequence_length: The lengths of each sequence in the target batch :param max_summary_length: The length of the longest sequence in the batch :param output_layer: Function to apply the output layer :param keep_prob: Dropout keep probability :return: BasicDecoderOutput containing training logits and sample_id """ # TODO: Implement Function helper = tf.contrib.seq2seq.TrainingHelper(dec_embed_input, target_sequence_length) decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell, helper, encoder_state, output_layer) dec_train_logits, _ = tf.contrib.seq2seq.dynamic_decode(decoder, maximum_iterations=max_summary_length) # for tensorflow 1.2: # dec_train_logits, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder, maximum_iterations=max_summary_length) return dec_train_logits # keep_prob/dropout not used? """ DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE """ tests.test_decoding_layer_train(decoding_layer_train) Tests Passed Decoding - Inference Create inference decoder: Create a tf.contrib.seq2seq.GreedyEmbeddingHelper Create a tf.contrib.seq2seq.BasicDecoder Obtain the decoder outputs from tf.contrib.seq2seq.dynamic_decode def decoding_layer_infer(encoder_state, dec_cell, dec_embeddings, start_of_sequence_id, end_of_sequence_id, max_target_sequence_length, vocab_size, output_layer, batch_size, keep_prob): """ Create a decoding layer for inference :param encoder_state: Encoder state :param dec_cell: Decoder RNN Cell :param dec_embeddings: Decoder embeddings :param start_of_sequence_id: GO ID :param end_of_sequence_id: EOS Id :param max_target_sequence_length: Maximum length of target sequences :param vocab_size: Size of decoder/target vocabulary :param decoding_scope: TenorFlow Variable Scope for decoding :param output_layer: Function to apply the output layer :param batch_size: Batch size :param keep_prob: Dropout keep probability :return: BasicDecoderOutput containing inference logits and sample_id """ # TODO: Implement Function start_tokens = tf.constant([start_of_sequence_id]*batch_size) helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(dec_embeddings, start_tokens, end_of_sequence_id) decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell, helper, encoder_state, output_layer) dec_infer_logits, _ = tf.contrib.seq2seq.dynamic_decode(decoder, maximum_iterations=max_target_sequence_length) # for tensorflow 1.2: # dec_infer_logits, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder, maximum_iterations=max_target_sequence_length) return dec_infer_logits """ DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE """ tests.test_decoding_layer_infer(decoding_layer_infer)
SagarGaniga / Data PreprocessingData preprocessing is a data mining technique that involves transforming raw data into an understandable format.
FuyixueWang / EPTIEPTI (Echo Planar Time-resolved Imaging) raw-data processing, image reconstruction and data preprocessing.
CaiqueCoelho / Preprocessing Dataset TemplateA template to preprocessing your golden dataset before to put your data in your best model
gallantlab / RealtimefmriReal-time collection, preprocessing, and analysis of fMRI data in Python
ZBigFish / AffectNet Dataset Preprocessing ToolA Python tool for preprocessing the AffectNet dataset into a structure that can be directly read by Pytorch's ImageFolder method.一个用于预处理AffectNet数据集的Python工具,使其可以直接被Pytorch中的ImageFolder方法读取。
JoPfeiff / Nlp Data Loading FrameworkWe are trying to define a framework for NLP tasks that easily maps any kind of word embedding data set with any kind of text data set. The framework should decrease the amount of additional code needed to work on different NLP tasks. We have found that for many NLP tasks similar preprocessing steps are needed.
Quantmetry / Pipeasy Sparkan easy way to define preprocessing data pipeline (similar to sklean-pandas but for Spark ML)
shridhar1504 / Sales Forecasting Datascience ProjectDevelop a data science project using historical sales data to build a regression model that accurately predicts future sales. Preprocess the dataset, conduct exploratory analysis, select relevant features, and employ regression algorithms for model development. Evaluate model performance, optimize hyperparameters, and provide actionable insights.
james77777778 / Keras AugA library that includes Keras 3 preprocessing and augmentation layers, providing support for various data types such as images, labels, bounding boxes, segmentation masks, and more.
rohanmistry231 / Scikit Learn Interview PreparaionA targeted resource for mastering Scikit-Learn, featuring practice problems, code examples, and interview-focused machine learning concepts in Python. Covers model building, evaluation, and preprocessing techniques to excel in data science interviews.
cnchi / HappyMLHappyML is a machine learning library for educational purpose. This library simplified many aspects of machine learning including preprocessing, model creation, data visualization...etc. This library is more experimental and not recommended for production purpose.
MRYingLEE / Time Series Preprocessing Studio In JupyterTime-series Data Preprocessing Studio in Jupyter notebook.
saigerutherford / Fetal CodeAutomated Preprocessing Pipeline for Fetal Resting-State fMRI Data
anne-urai / Pupil Preprocessing TutorialPreprocess EyeLink pupillometry data using the FieldTrip toolbox
shuzhao-li-lab / PythonCentricPipelineForMetabolomicsPython pipeline for metabolomics data preprocessing, QC, standardization and annotation
krypticmouse / 10 Days Of Statistics And Data PreprocessingList of all the resources I used during 10 days of Statistics and Data Preprocessing.
yangvnks / Titanic ClassificationA classification approach to the machine learning Titanic survival challenge on Kaggle.Data visualisation, data preprocessing and different algorithms are tested and explained in form of Jupyter Notebooks