TL;DR
What it does: Search 2.5+ million scholarly papers across physics, math, computer science, and more. Returns title, authors, abstract, categories — no API key needed.
Quick start: http://export.arxiv.org/api/query?search_query=all:electron&start=0&max_results=1
No API key needed - just call the URL
Overview
arXiv is a free distribution service and open-access archive for 2.5+ million scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering, and economics. The arXiv API lets you search papers programmatically, returning metadata like title, authors, abstract, publication date, and subject categories. The API returns Atom XML (not JSON) and requires no authentication.
Live Example
Here's the exact URL to call and the real response you'll get:
The URL to call:
http://export.arxiv.org/api/query?search_query=all:electron&start=0&max_results=1
Try This URL Now →
The actual response you get (simplified):
Note: arXiv returns Atom XML, not JSON. Here's what the response looks like:
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<id>http://arxiv.org/api/...</id>
<title>arXiv Query: search_query=all:electron</title>
<entry>
<id>http://arxiv.org/abs/cond-mat/0011267v1</id>
<title>The electronic structure of cuprates from high energy spectroscopy</title>
<published>2000-11-15T16:19:15Z</published>
<summary>We report studies of the electronic structure...</summary>
<author>
<name>A. Author</name>
</author>
<link href="https://arxiv.org/abs/cond-mat/0011267v1" rel="alternate" type="text/html"/>
<category term="cond-mat.supr-con"/>
</entry>
</feed>
What does this data mean?
The arXiv API returns Atom XML. Each paper is inside an <entry> element. Here's what each field means:
- <feed> > <title>
- Shows the query you searched for (e.g., "arXiv Query: search_query=all:electron")
- <entry> > <id>
- The paper's URL on arxiv.org (e.g., http://arxiv.org/abs/cond-mat/0011267v1)
- <entry> > <title>
- The full paper title
- <entry> > <published>
- Publication date in ISO 8601 format (e.g., 2000-11-15T16:19:15Z)
- <entry> > <summary>
- The paper abstract (may contain HTML entities and LaTeX math notation)
- <entry> > <author> > <name>
- Author name. Each entry can have multiple <author> elements.
- <entry> > <link>
- Links to the paper. Look for
rel="alternate"for the abstract page. The PDF link hastitle="pdf"or you can append.pdfto the abstract URL. - <entry> > <category>
- Subject category (e.g., cond-mat.supr-con = Superconductivity). The
termattribute contains the category code.
How to use this API
Important: The arXiv API returns Atom XML, not JSON. You need to parse the XML response. Here's how:
JavaScript Example (with DOMParser)
const url = 'https://export.arxiv.org/api/query?search_query=all:electron&start=0&max_results=3';
fetch(url)
.then(res => res.text())
.then(str => {
const parser = new DOMParser();
const xml = parser.parseFromString(str, 'text/xml');
const entries = xml.querySelectorAll('entry');
entries.forEach(entry => {
const title = entry.querySelector('title').textContent;
const id = entry.querySelector('id').textContent;
const summary = entry.querySelector('summary').textContent;
const authors = [...entry.querySelectorAll('author name')]
.map(a => a.textContent);
const category = entry.querySelector('category').getAttribute('term');
const pdfLink = id.replace('abs', 'pdf');
console.log({ title, id, authors, category, pdfLink });
});
});
Python Example (with xml.etree.ElementTree)
import requests
import xml.etree.ElementTree as ET
url = "http://export.arxiv.org/api/query"
params = {
"search_query": "all:electron",
"start": 0,
"max_results": 3
}
response = requests.get(url, params=params)
# Define the Atom namespace
ns = {'atom': 'http://www.w3.org/2005/Atom'}
root = ET.fromstring(response.content)
entries = root.findall('atom:entry', ns)
for entry in entries:
title = entry.find('atom:title', ns).text.strip()
paper_id = entry.find('atom:id', ns).text
summary = entry.find('atom:summary', ns).text.strip()
published = entry.find('atom:published', ns).text
category = entry.find('atom:category', ns).get('term')
authors = [a.find('atom:name', ns).text
for a in entry.findall('atom:author', ns)]
# PDF link: replace 'abs' with 'pdf' in the ID
pdf_link = paper_id.replace('abs', 'pdf')
print(f"Title: {title}")
print(f"Authors: {', '.join(authors)}")
print(f"Category: {category}")
print(f"PDF: {pdf_link}")
print("---")
Frequently Asked Questions
- Do I need an API key?
- No! The arXiv API is completely free and requires no API key or authentication. Just call the URL with your search query.
- Is there a rate limit?
- arXiv does not document a specific rate limit, but they ask users to be reasonable. For large-scale harvesting, limit requests to a moderate pace (e.g., a few requests per second) and avoid hammering the server.
- How do I search with AND / OR?
- Use
ANDandORin your search query. For example:search_query=all:electron AND all:quantumorsearch_query=cat:cond-mat.supr-con. Useau:for author,ti:for title,abs:for abstract,cat:for category. - Can I get PDF links?
- Yes! Each paper's
<id>is the abstract URL (e.g.,http://arxiv.org/abs/cond-mat/0011267v1). Replaceabswithpdfto get the PDF:http://arxiv.org/pdf/cond-mat/0011267v1. - The API returns XML. How do I parse it?
- See the code examples above. In JavaScript, use
DOMParserafter getting the response as text. In Python, usexml.etree.ElementTreewith the Atom namespacehttp://www.w3.org/2005/Atom. - What's the maximum number of results I can get?
- The maximum is 30,000 results per query (
max_resultsparameter). Use thestartparameter for pagination. The default is 10 results if not specified. - Can I use this commercially?
- Yes, arXiv is an open-access archive and the API is free for personal and commercial use. Check the official arXiv documentation for any specific terms.
API Details
- Base URL
http://export.arxiv.org/api/query- Documentation
- https://arxiv.org/help/api
- Category
- Academic
- Authentication
- None required - completely free
- Rate Limit
- None documented (be reasonable)
- Geographic Coverage
- Global