Dask
#
Find similar titles
- (rev. 2)
- Hyungyong Kim
Structured data
- About
- Parallel computing
- Code Repository
- https://github.com/dask/dask
- Programming Language
- Python
- URL
- https://dask.pydata.org/en/latest/
Dask is a flexible parallel computing library for analytic computing.
Dask is composed of two components:
- Dynamic task scheduling optimized for computation. This is similar to Airflow, Luigi, Celery, or Make, but optimized for interactive computational workloads.
- “Big Data” collections like parallel arrays, dataframes, and lists that extend common interfaces like NumPy, pandas, or Python iterators to larger-than-memory or distributed environments. These parallel collections run on top of the dynamic task schedulers.
Dask emphasizes the following virtues:
- Familiar: Provides parallelized NumPy array and Pandas DataFrame objects
- Flexible: Provides a task scheduling interface for more custom workloads and integration with other projects.
- Native: Enables distributed computing in Pure Python with access to the PyData stack.
- Fast: Operates with low overhead, low latency, and minimal serialization necessary for fast numerical algorithms
- Scales up: Runs resiliently on clusters with 1000s of cores
- Scales down: Trivial to set up and run on a laptop in a single process
- Responsive: Designed with interactive computing in mind it provides rapid feedback and diagnostics to aid humans
관련정보
- Data Pre-Processing in Python: How I learned to love parallelized applies with Dask and Numba : Numba와 함께 병렬로 apply 돌리기
Incoming Links #
Related Codes (Code 0) #
Suggested Pages #
- 0.112 Jupyter
- 0.056 Data science
- 0.049 Biopython
- 0.041 VIPER
- 0.034 Genetic algorithm
- 0.030 Conda
- 0.026 MkDocs
- 0.025 Programming language
- 0.025 병렬 컴퓨팅
- 0.025 Invertible matrix
- More suggestions...