Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adaptation of API for multimodal data (#663)
* API run of multimodal cases - added checking of text and image features in data during preprocessing building - multi_modal_pipeline.py now runs via API - preprocessing builder now can process each source of multimodal data separately * - improved data_has_text_features function * - due to change of multimodal assumption builder mechanics changed data_has_categorical_features function - added new test for text data * - fixed fail of test_assumptions_builder_for_multimodal_data - data_preprocessing now works with unimodal data only, support of multimodal data is provided by iterative preprocessing of each data source node * - multimodal data strategy now defines data sources based on data type * - added text+data multimodal example with dataset - fixed multimodal data preprocessing bug - removed duplicated categorical encoding from preprocessor * - fixed CNN initial assumption bug - updated CNN tests * - added text vectorization operation by word2vec pretrained models * - added test for data with empty text fields * - change of multimodal example and case to be run by API * - CNN node now is added by processing_builder, not by preprocessing * - added test for new functionality of MultimodalStrategy data definition * - modified test_text_features_processed_correctly * - modified test_correct_api_dataset_with_text_preprocessing * - added test for check of DataDefiner work on multimodal data - added some docstrings - removed multi_modal_genre_prediction.py case * - added default_operation_params for word2vec_pretrained and tfidf - added search_space params for word2vec_pretrained and tfidf * - added tests for various text data - modified data_has_text_features function in preprocessing builder * - changed method of phrase vectorizing from sum to average - text vectorizer params initialisation is moved to init part of class * - embeddings download info now writes into logger * - fixed bug with non-working tuner on pipeline with keras CNN * - MultiModalAssumptionsBuilder refactoring * - MultiModalAssumptionsBuilder refactoring [2]
- Loading branch information