Skip to content
pauldevos edited this page Jul 4, 2018 · 1 revision

Welcome to the tools-repository wiki!

tools-repository

A repository of tools used for Data Science, Machine Learning, Deep Learning, Data Engineering, and DataOps revolving around reproducibility and compatability across operating systems

Tools

Repositories

  • Github
  • Bitbucket
  • Gitlab
    • Why I choose Github? Popularity and 3rd party extensions

Python Environment & Package Managers

  • Conda
  • Pyenv
  • venv
  • virtualenv
  • pip
  • ?

CI/CD

  • CircleCI
  • TravisCI

Jupyter Hub

Docker Hub

Kubernetes

Ansible

Web Development

  • Flask
  • Django
  • Tornado

Blog/CMS

  • Pelican
  • Wagtail

Spark

  • PySpark
  • Scala
  • Performance Tuning

Unit Testing

  • pytest
  • unittest
  • nose

Text Editors

  • Sublime
  • Atom
  • Nano
  • Vi/Vim/Emacs

IDEs

  • PyCharm

AWS

  • CloudFormation
  • S3
  • EC2
  • RDS
  • RedShift
  • Aurora

GCP

  • Cloud Storage
  • Cloud SQL
  • Compute Engine
  • BigQuery
  • Cloud Spanner

Airflow

  • Redis
  • Postgres
  • Celery

Machine Learning Algorithms

  • Linear Regression
  • Logistic Regression
  • Decision Trees
  • K-Nearest Neighbors (KNN)
  • K-Means
  • Support Vector Machines (SVM)
  • Principal Component Analysis (PCA)
  • Gradient Boost
  • Neural Networks
    • CNN
    • GNN

Might Put these in a separate repo README (and folder) or just leave this folder as the "link farm"

Artificial Intelligence Fundamentals:

  • Algebraic concepts
  • Linear Algebra
  • Optimization Methods
  • Numerical Algorithms
  • Feature Selection
  • Project Structure
    • Vectorization of Data
    • Feature Engineering
    • Handling missing data and outliers, imputation
    • Train/Validation/Test
    • Metrics

Deep Learning

Databases (non-cloud)

  • ElasticSearch
  • Postgres
  • Redis
  • TimeScaleDB
  • MongoDB
  • Cassandra
  • Neo4J

Python

  • OOP (practices)

    • Types of Classes: Static, Method, Data, etc
    • Decorators
    • 4 Pillars
    • etc
  • FP (practices)

  • Security

    • SSH
    • VPN
    • Ports
    • PHI/PII
    • HiTRUST
    • other?

Above would link to links in format below

Github

Lays out understanding, tutorial, etc Branching, Merging, Pull, Push, etc Multiple Accounts on same computer

Cheatsheet of commands: - command_1 - command_2 - command 3 Other helpful resources:

  • subLink #1
  • subLink #2
  • subLink #3

Python Environment Management

  • Link #1
  • Link #2
  • Link #3
  • Cheatsheet of commands:
    • command_1
    • command_2
    • command 3

Python Package Management

  • tool #1
  • tool #2
  • tool #3