An Automated System for arXiv Article Downloading, Renaming, and Organizing into Specified Directories

发表于 2025-05-19

Introduction

There is always a problem that has been bothering me: the articles downloaded from Arxiv always have unreadable names (e.g., 1210.2368v3).
Therefore, I would like to automatically rename the articles downloaded from arXiv and move them to a specified folder instead of the system's default download folder in an automated manner.

Tampermonkey

Search Tampermonkey in Microsoft Edge Extensions, and downloand it.
If you prefer Chrome or Firefox, you may also find it in their extensions stores.
Open tampermonkey and add a script as follows:

// ==UserScript==
// @name         arXiv 自动命名下载脚本
// @match        https://arxiv.org/pdf/*
// @grant        GM_download
// @grant        GM_xmlhttpRequest
// ==/UserScript==
// 自定义目标文件夹（需与Python脚本中的路径一致）
const TARGET_FOLDER = "D:/MyDrive/"; // 可改为具体路径，如"D:/Research/ArXiv"

// 提取arXiv ID（如从URL提取"1210.2368"）
const arxivId = window.location.href.match(/\/pdf\/(\d+\.\d+)/)[1];

GM_xmlhttpRequest({
    method: "GET",
    url: `https://export.arxiv.org/api/query?id_list=${arxivId}`,
    onload: function(response) {
        const parser = new DOMParser();
        const xml = parser.parseFromString(response.responseText, "application/xml");

        // 定义命名空间解析器
        const nsResolver = function(prefix) {
            return prefix === "atom" ? "http://www.w3.org/2005/Atom" : null;
        };

        // 使用XPath查询标题（正确处理命名空间）
        const titleNode = xml.evaluate(
            "//atom:entry/atom:title",
            xml,
            nsResolver,
            XPathResult.FIRST_ORDERED_NODE_TYPE,
            null
        ).singleNodeValue;

        // 获取标题，若失败则使用arXiv ID作为备选
        const title = titleNode ? titleNode.textContent.trim() : arxivId.replace(".", "_");
        console.log("提取的标题：", title);

        // 过滤非法文件名字符并限制长度
        const safeTitle = title.replace(/[\\/:*?"<>|]/g, "_").substring(0, 255);
        const fileName = `${safeTitle}.pdf`;

        // 下载文件
        GM_download({
            url: window.location.href,
            name: fileName,
            saveAs: true,  // 强制用户交互，确保下载流程完整
            onload: function(download) {
                console.log("onload 回调被触发 - 下载已完成");
                console.log("下载完成，文件名：", download.filename);

                // 发送到Flask后端处理
                fetch("http://localhost:5000/process-arxiv-file", {
                    method: "POST",
                    headers: { "Content-Type": "application/json" },
                    body: JSON.stringify({
                        filename: download.filename,
                        title: safeTitle,
                        arxiv_id: arxivId
                    })
                })
                    .then(response => response.json())
                    .then(data => console.log("服务器响应：", data))
                    .catch(error => console.error("请求失败：", error));
            },
            onprogress: function(progress) {
                console.log("下载进度：", progress.percent);
            },
            onerror: function(error) {
                console.error("下载错误：", error);
            }
        });
    },
    onerror: function(error) {
        console.error("API请求失败：", error);
        // 失败时使用arXiv ID下载
        const fileName = `${arxivId.replace(".", "_")}.pdf`;
        GM_download({ url: window.location.href, name: fileName });
    }
});

This script can rename the arxiv file.

Python script

Write a python script to move the file. Make sure you have install packages: flask, shutil, and os.

from flask import Flask, request, jsonify
import os
import shutil
import time
from flask_cors import CORS

app = Flask(__name__)
CORS(app)  # 启用CORS

# 配置参数
EDGE_DOWNLOAD_DIR = "C:/Users//Downloads" #Your edge download direction
TARGET_DIR = "D:/MyDrive" #the direction you want to move the file to
os.makedirs(TARGET_DIR, exist_ok=True)

@app.route('/process-arxiv-file', methods=['POST'])
def process_file():
    try:
        data = request.get_json()
        title = data.get('title')
        filename = f"{title}.pdf"
        filename = os.path.join(EDGE_DOWNLOAD_DIR, filename)
        print(f"文件名{filename}")
        arxiv_id = data.get('arxiv_id')
        
        # 关键调试信息
        print(f"接收到请求：{filename}, 标题：{title}")
        
        if not filename or not os.path.exists(filename):
            print(f"文件不存在：{filename}")
            return jsonify({"status": "error", "message": "文件未找到"}), 404

        # 生成目标路径
        new_filename = f"{title}_{arxiv_id}.pdf"
        target_path = os.path.join(TARGET_DIR, new_filename)
        print(f"目标路径：{target_path}")

        # 处理同名文件
        if os.path.exists(target_path):
            os.remove(filename)
            print(f"文件{title}已存在")
            return jsonify({"status": "unsuccess", "message": f"{target_path}文件已存在"})
        # 等待文件释放
        
        else:
            # 移动文件
            shutil.move(filename, target_path)
            print(f"成功移动文件：{filename} → {target_path}")
            return jsonify({"status": "success", "message": f"文件已移动至：{target_path}"})

    except Exception as e:
        print(f"错误：{str(e)}")
        return jsonify({"status": "error", "message": str(e)}), 500

# 添加测试路由
@app.route('/test', methods=['GET'])
def test_route():
    print("测试路由被访问！")
    return "Hello from Flask!"

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000, debug=True)

Hands-on operation

You have to run the python script and Tampermonkey at the same time, so that you can implement the functions described in the introduction. If the python script is named by your_script.py, create a run_script.bat file in the same direction:

1 2	@echo off python %~dp0\your_script.py %*

Add run.bat to the PATH of your computer. Then, when you want to download papers from arXiv, open cmd and type run_script.

Recent Targets

发表于 2024-12-06 更新于 2024-12-07

Here is some targets which I want to acomplish in recent research:

Orbifold Fundamental Group Of 2-Orbifold

Orbifold is a natural generalization of manifold. One can define orbifold fundamental group for an orbifold, as we do for a manifold.
Though it has been almost half a century since Thurston gave this conception, no one gives a total description of the fundamental groups for all 2-orbifolds.

I believe I can calculate it in recent days.

Hurewize Theorem for Orbifold Fundamental Group and Weighted Homology

The motivation for me to calculate the orbifold fundamental groups for all 2-orbifolds, is that I want to prove there exists Hurewize isomorphism between orbifold fundamental group and the weighted homology group.
There are many attempts to build a homology theory for orbifolds, which can reflect the information about singular points of orbifolds. Weight homology is one of them. Try to build some connections with classicial conceptions will show that weight homology is a suitable tool to study orbifolds.

Which Cohomology Element Can Be Realized as Euler Class of Some Vector Bundle

The KO-groups and cohomology groups of toric manifolds are clear. I want to figure out which cohomology element of cohomology element can be realized as Euler class.

Build A Class of Non-positive Curvature Spaces

I am not sure whether I can figure out this. But I think this is an interisting topic.

我成功啦

发表于 2024-10-10 更新于 2024-10-11

一级标题

二级标题

这是第二句话

1 2	for i in range(10): ans=ans+i

https://zhihu.com

三级标题

这是第三句话