Skip to content

Dask #
Find similar titles

Structured data

About
Parallel computing
Code Repository
Programming Language
Python
URL

Dask is a flexible parallel computing library for analytic computing.

Dask is composed of two components:

  1. Dynamic task scheduling optimized for computation. This is similar to Airflow, Luigi, Celery, or Make, but optimized for interactive computational workloads.
  2. “Big Data” collections like parallel arrays, dataframes, and lists that extend common interfaces like NumPy, pandas, or Python iterators to larger-than-memory or distributed environments. These parallel collections run on top of the dynamic task schedulers.

Dask emphasizes the following virtues:

  • Familiar: Provides parallelized NumPy array and Pandas DataFrame objects
  • Flexible: Provides a task scheduling interface for more custom workloads and integration with other projects.
  • Native: Enables distributed computing in Pure Python with access to the PyData stack.
  • Fast: Operates with low overhead, low latency, and minimal serialization necessary for fast numerical algorithms
  • Scales up: Runs resiliently on clusters with 1000s of cores
  • Scales down: Trivial to set up and run on a laptop in a single process
  • Responsive: Designed with interactive computing in mind it provides rapid feedback and diagnostics to aid humans

관련정보

Incoming Links #

Related Codes #

Suggested Pages #

web biohackers.net
0.0.1_20140628_0