Discogs Scraper helps you fetch and search Discogs data like artists, releases, labels, and marketplace prices in a single, consistent output format. It’s designed for fast lookups, batch processing, and practical pricing intelligence with minimal overhead. Use this Discogs scraper when you need structured music metadata or marketplace signals for research, cataloging, or analytics.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for discogs you've just found your team — Let’s Chat. 👆👆
This project collects Discogs metadata and marketplace information across multiple modes: direct “get by ID/URL”, keyword “search”, and marketplace “sell” pricing and statistics. It solves the problem of juggling multiple Discogs endpoints and inconsistent response shapes by providing one normalized pipeline and predictable outputs. It’s built for developers, data teams, collectors, and music marketplace builders who need Discogs data at speed while staying mindful of rate limits and blocking sensitivity.
- Supports Get, Search, and Sell processes across multiple Discogs categories (artist, release, label, master, and more).
- Accepts IDs, usernames, free-text, and URLs depending on the selected process/category combination.
- Includes batch execution controls (parallelism and delays) to reduce blocking risk on sensitive marketplace routes.
- Allows cookie injection for Sell requests to improve stability and reduce detection probability.
- Provides currency filtering and conversion for total price estimates where applicable.
| Feature | Description |
|---|---|
| Multi-process workflow | Run Get, Search, or Sell flows depending on whether you need canonical entities, discovery, or pricing. |
| Category coverage | Retrieve artists, releases, labels, masters, versions, profiles, inventories, listings, and collections. |
| Keyword & custom input | Use a simple keywords array or an advanced per-item custom input that overrides process/category per keyword. |
| Marketplace statistics | For Sell/Release, returns stats like last sold date, median, low/high, ratings, and demand level when available. |
| Price list extraction | For Sell queries, returns listing-level price details including media condition, shipping, seller metrics, and location. |
| Currency tools | Filter sold results by currency and optionally compute converted totals while preserving original totals. |
| Anti-block controls | Tune batch size and delay to reduce throttling; supports using your own marketplace cookies. |
| Structured outputs | Normalized JSON output with predictable keys for easier downstream parsing and storage. |
| Field Name | Field Description |
|---|---|
| keyword | The input keyword/ID/URL that triggered the request. |
| process | The mode used: get, search, or sell. |
| category | The Discogs category used (e.g., artist, release, label, master, list). |
| discogsId | Numeric Discogs ID for the entity (when available). |
| siteUrl | Public Discogs URL for the entity (when available). |
| title | Release/master title returned by search/get. |
| formats | Media format array including name, quantity, descriptors, and notes (e.g., Vinyl, 12", RPM). |
| primaryArtists | Primary artist credits with display name and Discogs artist reference. |
| images.edges[].node.thumbnail | Thumbnail metadata including width/height and image URLs. |
| listings.totalCount | Total number of marketplace listings for a given entity (when available). |
| lowestPrice.converted.amount | Lowest price converted amount (when conversion is enabled/available). |
| lowestPrice.converted.currency | Converted currency code (e.g., USD). |
| data[].status | Label releases list status (e.g., Accepted) when listing label releases. |
| data[].catno | Catalog number for release items in label/artist listings. |
| data[].year | Release year (when available). |
| data[].artist | Artist string for listing rows. |
| statistics.have | Number of users who “have” the item (marketplace/statistics). |
| statistics.want | Number of users who “want” the item (marketplace/statistics). |
| statistics.avg_rating | Average rating value for the release. |
| statistics.ratings | Count of ratings. |
| statistics.last_sold | Last sold date string (marketplace-dependent). |
| statistics.lowest | Lowest recorded sold/offer value in the selected currency context. |
| statistics.median | Median recorded sold/offer value in the selected currency context. |
| statistics.highest | Highest recorded sold/offer value in the selected currency context. |
| statistics.demand_level | Demand indicator (e.g., low/medium/high). |
| prices[].uri | Marketplace listing URL. |
| prices[].name | Listing title text including edition/pressing hints. |
| prices[].media_condition | Media condition string (e.g., Mint (M)). |
| prices[].sleeve_condition | Sleeve condition string (e.g., Mint (M)). |
| prices[].description | Seller-provided description and notes. |
| prices[].seller | Seller username. |
| prices[].seller_rating | Seller rating score. |
| prices[].seller_reviews | Seller review count. |
| prices[].ships_from | Origin country for shipping. |
| prices[].price | Listing price (numeric). |
| prices[].shipping_fee | Shipping fee when specified. |
| prices[].total_estimate | Price + shipping estimate in original currency context. |
| prices[].total_estimate_currency | Converted estimate value (when conversion is enabled/available). |
| prices[].currency | Currency code for the listing price. |
| cookie | Optional marketplace cookie value used for Sell flows. |
| maximumResults | Max results per keyword. |
| batch | Parallel request count used during a run. |
| delay | Per-request delay used to reduce blocking risk. |
{
"keyword": "32515302",
"category": "release",
"data": {
"statistics": {
"have": 3315,
"want": 1334,
"avg_rating": 4.91,
"ratings": 167,
"last_sold": "06 Jan 25",
"lowest": 111.12,
"median": 159.4,
"highest": 201.25,
"demand_level": "medium"
},
"prices": [
{
"uri": "https://www.discogs.com/sell/item/3387500613",
"name": "Daft Punk = ダフト・パンク* - Discovery = ディスカバリー (2xLP, Album, Ltd, RE)",
"media_condition": "Mint (M)",
"sleeve_condition": "Mint (M)",
"description": "Factory sealed. Hype sticker. All in perfect conditions. Shipping with tracking.",
"seller": "27max",
"seller_rating": 4.91,
"seller_reviews": 3315,
"ships_from": "Portugal",
"price": 204,
"shipping_fee": 23.8023,
"shipping_fee_exists": "yes",
"total_estimate": 227.8023,
"total_estimate_currency": 338.4919,
"currency": "EUR",
"details": {
"availability_ip": null,
"unavailability_location_ip": null,
"label": null,
"gtin": null,
"release_link": null,
"rating": null,
"have": null,
"want": null
}
}
]
}
}
Discogs/
├── src/
│ ├── main.py
│ ├── cli.py
│ ├── runner/
│ │ ├── __init__.py
│ │ ├── orchestrator.py
│ │ ├── batching.py
│ │ └── throttling.py
│ ├── clients/
│ │ ├── __init__.py
│ │ ├── http_client.py
│ │ ├── discogs_get_client.py
│ │ ├── discogs_search_client.py
│ │ └── discogs_marketplace_client.py
│ ├── extractors/
│ │ ├── __init__.py
│ │ ├── get_artist.py
│ │ ├── get_release.py
│ │ ├── get_master.py
│ │ ├── get_label.py
│ │ ├── get_profile.py
│ │ ├── get_inventory.py
│ │ ├── get_listing.py
│ │ ├── get_collections.py
│ │ ├── search_all.py
│ │ └── sell_prices.py
│ ├── normalizers/
│ │ ├── __init__.py
│ │ ├── base.py
│ │ ├── release_normalizer.py
│ │ ├── label_normalizer.py
│ │ └── marketplace_normalizer.py
│ ├── schemas/
│ │ ├── __init__.py
│ │ ├── input_schema.py
│ │ └── output_schema.py
│ ├── utils/
│ │ ├── __init__.py
│ │ ├── validators.py
│ │ ├── currency.py
│ │ ├── url_parser.py
│ │ ├── logging.py
│ │ └── retries.py
│ └── config/
│ ├── settings.example.json
│ └── defaults.py
├── data/
│ ├── inputs.sample.json
│ ├── keywords.sample.txt
│ └── sample_output.json
├── tests/
│ ├── test_inputs.py
│ ├── test_normalizers.py
│ ├── test_marketplace.py
│ └── fixtures/
│ ├── sell_release.json
│ ├── search_release.json
│ └── label_releases.json
├── scripts/
│ ├── run_local.sh
│ └── export_jsonl.py
├── requirements.txt
├── pyproject.toml
├── LICENSE
└── README.md
- Music collectors use it to track Discogs release prices, so they can spot deals and avoid overpaying for specific pressings.
- Marketplace analysts use it to collect sell-side pricing statistics, so they can model demand, liquidity, and price ranges over time.
- Cataloging tools use it to enrich libraries with Discogs metadata, so they can standardize artist/label/release records across systems.
- Record stores use it to benchmark inventory pricing, so they can price listings competitively and react to market shifts.
- Data teams use it to batch search titles and barcodes, so they can match products to Discogs entities for downstream workflows.
How do I choose between Get, Search, and Sell? Use Get when you already have a specific Discogs ID/URL (best accuracy). Use Search when you only have free text like “purple rain prince” or a barcode and want matching candidates. Use Sell when you need marketplace pricing, availability signals, and (for specific releases) statistics such as last sold and median price.
Why do some marketplace fields vary by location or seem inconsistent? Marketplace availability and some sales/availability signals can depend on the requesting IP location. If you run the same input from different regions or proxy pools, you may see small differences in availability indicators and price totals.
What can I do if marketplace requests get blocked? Reduce batch (parallelism), increase delay, and use high-quality proxies. For Sell flows, provide your own marketplace cookie to make requests behave more like a normal session. Avoid aggressive high-volume runs on sensitive Sell routes.
Is there pagination support? This project targets fast, low-cost extraction with a maximum-results cap per keyword. For large-scale pagination and larger batch inputs, use a higher-capacity workflow or split the run into smaller chunks to stay stable and reduce block risk.
Primary Metric: On a stable proxy pool, typical Get/Search lookups average 0.8–1.6 seconds per keyword, while Sell/Release pricing averages 2.5–4.5 seconds per keyword due to marketplace complexity.
Reliability Metric: With batch=1–2 and a 2–4 second delay, runs commonly sustain 94–98% success rate across mixed Get/Search inputs; aggressive marketplace runs without cookies can drop below 85%.
Efficiency Metric: A mid-range configuration (batch=2, delay=2s) sustains roughly 18–30 keywords/minute for Get/Search and 8–14 keywords/minute for Sell, depending on result size and network conditions.
Quality Metric: When using IDs/URLs (Get or Sell/Release), entity resolution is typically near-deterministic with high completeness of core fields; free-text Search quality depends on query specificity, with barcodes/GTINs producing the most precise matches.
