Skip to main content

// category

Data Science & AI

Data science, analytics, and AI/ML model development

30 skills in this category

30 matches

ml-engineer 39k

Build production ML systems with PyTorch 2.x, TensorFlow, and modern ML frameworks. Implements model serving, feature engineering, A/B testing, and monitoring.

sickn33 2026-05-30
airflow-dag-patterns 36k

Build production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment. Use when creating data pipelines, orchestrating workflows, or scheduling batch jobs.

wshobson 2026-05-29
data-quality-frameworks 36k

Implement data quality validation with Great Expectations, dbt tests, and data contracts. Use when building data quality pipelines, implementing validation rules, or establishing data contracts.

wshobson 2026-05-29
dbt-transformation-patterns 36k

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

wshobson 2026-05-29
spark-optimization 36k

Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning. Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines.

wshobson 2026-05-29
ray-data 28k

Scalable data processing for ML workloads. Streaming execution across CPU/GPU, supports Parquet/CSV/JSON/images. Integrates with Ray Train, PyTorch, TensorFlow. Scales from single machine to 100s of nodes. Use for batch inference, data preprocessing, multi-modal data loading, or distributed ETL pipelines.

davila7 2026-05-30
ml-engineer 28k

Build production ML systems with PyTorch 2.x, TensorFlow, and modern ML frameworks. Implements model serving, feature engineering, A/B testing, and monitoring.

davila7 2026-05-30
cocoindex 28k

Comprehensive toolkit for developing with the CocoIndex library. Use when users need to create data transformation pipelines (flows), write custom functions, or operate flows via CLI or API. Covers building ETL workflows for AI data processing, including embedding documents into vector databases, building knowledge graphs, creating search indexes, or processing data streams with incremental updates.

davila7 2026-05-30
jupyter-notebook 28k

Use when the user asks to create, scaffold, or edit Jupyter notebooks (`.ipynb`) for experiments, explorations, or tutorials; prefer the bundled templates and run the helper script `new_notebook.py` to generate a clean starting notebook.

davila7 2026-05-30
netlify-deploy 28k

Deploy web projects to Netlify using the Netlify CLI (`npx netlify`). Use when the user asks to deploy, host, publish, or link a site/repo on Netlify, including preview and production deploys.

davila7 2026-05-30
senior-data-engineer 28k

World-class data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, and modern data stack. Includes data modeling, pipeline orchestration, data quality, and DataOps. Use when designing data architectures, building data pipelines, optimizing data workflows, or implementing data governance.

davila7 2026-05-30
senior-data-scientist 28k

World-class data science skill for statistical modeling, experimentation, causal inference, and advanced analytics. Expertise in Python (NumPy, Pandas, Scikit-learn), R, SQL, statistical methods, A/B testing, time series, and business intelligence. Includes experiment design, feature engineering, model evaluation, and stakeholder communication. Use when designing experiments, building predictive models, performing causal analysis, or driving data-driven decisions.

davila7 2026-05-30
spreadsheet 28k

Use when tasks involve creating, editing, analyzing, or formatting spreadsheets (`.xlsx`, `.csv`, `.tsv`) using Python (`openpyxl`, `pandas`), especially when formulas, references, and formatting need to be preserved and verified.

davila7 2026-05-30
dask 28k

Parallel/distributed computing. Scale pandas/NumPy beyond memory, parallel DataFrames/Arrays, multi-file processing, task graphs, for larger-than-RAM datasets and parallel workflows.

davila7 2026-05-30
geopandas 28k

Python library for working with geospatial vector data including shapefiles, GeoJSON, and GeoPackage files. Use when working with geographic data for spatial analysis, geometric operations, coordinate transformations, spatial joins, overlay operations, choropleth mapping, or any task involving reading/writing/analyzing vector geographic data. Supports PostGIS databases, interactive maps, and integration with matplotlib/folium/cartopy. Use for tasks like buffer analysis, spatial joins between datasets, dissolving boundaries, clipping data, calculating areas/distances, reprojecting coordinate systems, creating maps, or converting between spatial file formats.

davila7 2026-05-30
labarchive-integration 28k

Electronic lab notebook API integration. Access notebooks, manage entries/attachments, backup notebooks, integrate with Protocols.io/Jupyter/REDCap, for programmatic ELN workflows.

davila7 2026-05-30
flux-pipeline 2.3k

Build a data pipeline — ETL/ELT with extraction, transformation, loading, error handling, and scheduling. Use when asked to "build ETL", "data pipeline", "move data from X to Y", or "sync data".

jeremylongshore 2026-05-30
building-automl-pipelines 2.3k

'Build automated machine learning pipelines with feature engineering,

jeremylongshore 2026-05-30
databricks-core-workflow-a 2.3k

'Execute Databricks primary workflow: Delta Lake ETL pipelines.

jeremylongshore 2026-05-30
lokalise-deploy-integration 2.3k

'Deploy Lokalise integrations to Vercel, Netlify, and Cloud Run platforms.

jeremylongshore 2026-05-30
maintainx-data-handling 2.3k

'Data synchronization, ETL patterns, and data management for MaintainX.

jeremylongshore 2026-05-30
stackblitz-deploy-integration 2.3k

'Deploy WebContainer apps to Vercel, Netlify with proper COOP/COEP headers.

jeremylongshore 2026-05-30
data-transform 1.0k

Transform, clean, reshape, and preprocess data using pandas and numpy. Works with ANY LLM provider (GPT, Gemini, Claude, etc.).

Starlitnightly 2026-05-30
architecture-paradigm-pipeline 294

Applies pipes-and-filters for sequential data transformations. Use when data flows through discrete stages like ETL, streaming analytics, or CI/CD pipelines.

athola 2026-05-30
etl-core-patterns 39

Core ETL reliability patterns including idempotency, checkpointing, error handling, chunking, retry logic, and logging.

majesticlabs-dev 2026-05-13
etl-incremental-patterns 39

Incremental data loading patterns including backfill strategies, CDC, timestamp-based loads, and pipeline orchestration.

majesticlabs-dev 2026-05-13
etl-patterns 39

Production ETL patterns orchestrator. Routes to core reliability patterns and incremental load strategies.

majesticlabs-dev 2026-05-13
pandas-coder 39

DataFrame manipulation with chunked processing, memory optimization, and vectorized operations.

majesticlabs-dev 2026-05-13
test-fixture-generator 39

Generate synthetic test data with edge cases for ETL pipeline testing.

majesticlabs-dev 2026-05-13
testing-patterns 39

Pytest templates and patterns for ETL pipeline testing - unit, integration, data quality.

majesticlabs-dev 2026-05-13