Skip to content

datamaker54/discogs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Discogs Scraper

Discogs Scraper helps you fetch and search Discogs data like artists, releases, labels, and marketplace prices in a single, consistent output format. It’s designed for fast lookups, batch processing, and practical pricing intelligence with minimal overhead. Use this Discogs scraper when you need structured music metadata or marketplace signals for research, cataloging, or analytics.

Bitbash Banner

Telegram   WhatsApp   Gmail   Website

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for discogs you've just found your team — Let’s Chat. 👆👆

Introduction

This project collects Discogs metadata and marketplace information across multiple modes: direct “get by ID/URL”, keyword “search”, and marketplace “sell” pricing and statistics. It solves the problem of juggling multiple Discogs endpoints and inconsistent response shapes by providing one normalized pipeline and predictable outputs. It’s built for developers, data teams, collectors, and music marketplace builders who need Discogs data at speed while staying mindful of rate limits and blocking sensitivity.

Marketplace-aware scraping and lookup modes

  • Supports Get, Search, and Sell processes across multiple Discogs categories (artist, release, label, master, and more).
  • Accepts IDs, usernames, free-text, and URLs depending on the selected process/category combination.
  • Includes batch execution controls (parallelism and delays) to reduce blocking risk on sensitive marketplace routes.
  • Allows cookie injection for Sell requests to improve stability and reduce detection probability.
  • Provides currency filtering and conversion for total price estimates where applicable.

Features

Feature Description
Multi-process workflow Run Get, Search, or Sell flows depending on whether you need canonical entities, discovery, or pricing.
Category coverage Retrieve artists, releases, labels, masters, versions, profiles, inventories, listings, and collections.
Keyword & custom input Use a simple keywords array or an advanced per-item custom input that overrides process/category per keyword.
Marketplace statistics For Sell/Release, returns stats like last sold date, median, low/high, ratings, and demand level when available.
Price list extraction For Sell queries, returns listing-level price details including media condition, shipping, seller metrics, and location.
Currency tools Filter sold results by currency and optionally compute converted totals while preserving original totals.
Anti-block controls Tune batch size and delay to reduce throttling; supports using your own marketplace cookies.
Structured outputs Normalized JSON output with predictable keys for easier downstream parsing and storage.

What Data This Scraper Extracts

Field Name Field Description
keyword The input keyword/ID/URL that triggered the request.
process The mode used: get, search, or sell.
category The Discogs category used (e.g., artist, release, label, master, list).
discogsId Numeric Discogs ID for the entity (when available).
siteUrl Public Discogs URL for the entity (when available).
title Release/master title returned by search/get.
formats Media format array including name, quantity, descriptors, and notes (e.g., Vinyl, 12", RPM).
primaryArtists Primary artist credits with display name and Discogs artist reference.
images.edges[].node.thumbnail Thumbnail metadata including width/height and image URLs.
listings.totalCount Total number of marketplace listings for a given entity (when available).
lowestPrice.converted.amount Lowest price converted amount (when conversion is enabled/available).
lowestPrice.converted.currency Converted currency code (e.g., USD).
data[].status Label releases list status (e.g., Accepted) when listing label releases.
data[].catno Catalog number for release items in label/artist listings.
data[].year Release year (when available).
data[].artist Artist string for listing rows.
statistics.have Number of users who “have” the item (marketplace/statistics).
statistics.want Number of users who “want” the item (marketplace/statistics).
statistics.avg_rating Average rating value for the release.
statistics.ratings Count of ratings.
statistics.last_sold Last sold date string (marketplace-dependent).
statistics.lowest Lowest recorded sold/offer value in the selected currency context.
statistics.median Median recorded sold/offer value in the selected currency context.
statistics.highest Highest recorded sold/offer value in the selected currency context.
statistics.demand_level Demand indicator (e.g., low/medium/high).
prices[].uri Marketplace listing URL.
prices[].name Listing title text including edition/pressing hints.
prices[].media_condition Media condition string (e.g., Mint (M)).
prices[].sleeve_condition Sleeve condition string (e.g., Mint (M)).
prices[].description Seller-provided description and notes.
prices[].seller Seller username.
prices[].seller_rating Seller rating score.
prices[].seller_reviews Seller review count.
prices[].ships_from Origin country for shipping.
prices[].price Listing price (numeric).
prices[].shipping_fee Shipping fee when specified.
prices[].total_estimate Price + shipping estimate in original currency context.
prices[].total_estimate_currency Converted estimate value (when conversion is enabled/available).
prices[].currency Currency code for the listing price.
cookie Optional marketplace cookie value used for Sell flows.
maximumResults Max results per keyword.
batch Parallel request count used during a run.
delay Per-request delay used to reduce blocking risk.

Example Output

{
  "keyword": "32515302",
  "category": "release",
  "data": {
    "statistics": {
      "have": 3315,
      "want": 1334,
      "avg_rating": 4.91,
      "ratings": 167,
      "last_sold": "06 Jan 25",
      "lowest": 111.12,
      "median": 159.4,
      "highest": 201.25,
      "demand_level": "medium"
    },
    "prices": [
      {
        "uri": "https://www.discogs.com/sell/item/3387500613",
        "name": "Daft Punk = ダフト・パンク* - Discovery = ディスカバリー (2xLP, Album, Ltd, RE)",
        "media_condition": "Mint (M)",
        "sleeve_condition": "Mint (M)",
        "description": "Factory sealed. Hype sticker. All in perfect conditions. Shipping with tracking.",
        "seller": "27max",
        "seller_rating": 4.91,
        "seller_reviews": 3315,
        "ships_from": "Portugal",
        "price": 204,
        "shipping_fee": 23.8023,
        "shipping_fee_exists": "yes",
        "total_estimate": 227.8023,
        "total_estimate_currency": 338.4919,
        "currency": "EUR",
        "details": {
          "availability_ip": null,
          "unavailability_location_ip": null,
          "label": null,
          "gtin": null,
          "release_link": null,
          "rating": null,
          "have": null,
          "want": null
        }
      }
    ]
  }
}

Directory Structure Tree

Discogs/
├── src/
│   ├── main.py
│   ├── cli.py
│   ├── runner/
│   │   ├── __init__.py
│   │   ├── orchestrator.py
│   │   ├── batching.py
│   │   └── throttling.py
│   ├── clients/
│   │   ├── __init__.py
│   │   ├── http_client.py
│   │   ├── discogs_get_client.py
│   │   ├── discogs_search_client.py
│   │   └── discogs_marketplace_client.py
│   ├── extractors/
│   │   ├── __init__.py
│   │   ├── get_artist.py
│   │   ├── get_release.py
│   │   ├── get_master.py
│   │   ├── get_label.py
│   │   ├── get_profile.py
│   │   ├── get_inventory.py
│   │   ├── get_listing.py
│   │   ├── get_collections.py
│   │   ├── search_all.py
│   │   └── sell_prices.py
│   ├── normalizers/
│   │   ├── __init__.py
│   │   ├── base.py
│   │   ├── release_normalizer.py
│   │   ├── label_normalizer.py
│   │   └── marketplace_normalizer.py
│   ├── schemas/
│   │   ├── __init__.py
│   │   ├── input_schema.py
│   │   └── output_schema.py
│   ├── utils/
│   │   ├── __init__.py
│   │   ├── validators.py
│   │   ├── currency.py
│   │   ├── url_parser.py
│   │   ├── logging.py
│   │   └── retries.py
│   └── config/
│       ├── settings.example.json
│       └── defaults.py
├── data/
│   ├── inputs.sample.json
│   ├── keywords.sample.txt
│   └── sample_output.json
├── tests/
│   ├── test_inputs.py
│   ├── test_normalizers.py
│   ├── test_marketplace.py
│   └── fixtures/
│       ├── sell_release.json
│       ├── search_release.json
│       └── label_releases.json
├── scripts/
│   ├── run_local.sh
│   └── export_jsonl.py
├── requirements.txt
├── pyproject.toml
├── LICENSE
└── README.md

Use Cases

  • Music collectors use it to track Discogs release prices, so they can spot deals and avoid overpaying for specific pressings.
  • Marketplace analysts use it to collect sell-side pricing statistics, so they can model demand, liquidity, and price ranges over time.
  • Cataloging tools use it to enrich libraries with Discogs metadata, so they can standardize artist/label/release records across systems.
  • Record stores use it to benchmark inventory pricing, so they can price listings competitively and react to market shifts.
  • Data teams use it to batch search titles and barcodes, so they can match products to Discogs entities for downstream workflows.

FAQs

How do I choose between Get, Search, and Sell? Use Get when you already have a specific Discogs ID/URL (best accuracy). Use Search when you only have free text like “purple rain prince” or a barcode and want matching candidates. Use Sell when you need marketplace pricing, availability signals, and (for specific releases) statistics such as last sold and median price.

Why do some marketplace fields vary by location or seem inconsistent? Marketplace availability and some sales/availability signals can depend on the requesting IP location. If you run the same input from different regions or proxy pools, you may see small differences in availability indicators and price totals.

What can I do if marketplace requests get blocked? Reduce batch (parallelism), increase delay, and use high-quality proxies. For Sell flows, provide your own marketplace cookie to make requests behave more like a normal session. Avoid aggressive high-volume runs on sensitive Sell routes.

Is there pagination support? This project targets fast, low-cost extraction with a maximum-results cap per keyword. For large-scale pagination and larger batch inputs, use a higher-capacity workflow or split the run into smaller chunks to stay stable and reduce block risk.


Performance Benchmarks and Results

Primary Metric: On a stable proxy pool, typical Get/Search lookups average 0.8–1.6 seconds per keyword, while Sell/Release pricing averages 2.5–4.5 seconds per keyword due to marketplace complexity.

Reliability Metric: With batch=1–2 and a 2–4 second delay, runs commonly sustain 94–98% success rate across mixed Get/Search inputs; aggressive marketplace runs without cookies can drop below 85%.

Efficiency Metric: A mid-range configuration (batch=2, delay=2s) sustains roughly 18–30 keywords/minute for Get/Search and 8–14 keywords/minute for Sell, depending on result size and network conditions.

Quality Metric: When using IDs/URLs (Get or Sell/Release), entity resolution is typically near-deterministic with high completeness of core fields; free-text Search quality depends on query specificity, with barcodes/GTINs producing the most precise matches.

Book a Call Watch on YouTube

Review 1

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

Review 2

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

Review 3

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★

Releases

No releases published

Packages

No packages published