Author: Jingjing Lin
In a previous tutorial I introduced how to rename PDF files using Zotero and Zotfile, the same tutorial video was uploaded to YouTube and a user there left a comment that triggered the writing of this follow-up tutorial.
For researchers, this is indeed a constant battle to deal with a bulk of PDF files that you downloaded overtimes but did not rename right away. In the long run, their host folder gradually becomes a grave of ill-named PDF files. So how to rename a whole folder of PDF files into a consistent format that can help you find things out of them? The following screenshot shows the result of renaming. And in the following paragraphs I will show you how to reach this goal.
First of all, you need all PDF files that you want to rename in one folder (e.g., Folder A).
Second, create a new folder in Zotero (e.g., Folder B). I highlighted the button to “add a folder” in red, and the area where all your folders will show in Zotero in the following figure so that you know what a folder is like and how to create it in Zotero.
Now, you drag and drop all PDF files from Folder A in your local disk to Folder B in Zotero.
“By default, Zotero will automatically retrieve metadata for each PDF, create an appropriate parent item, and rename the associated file based on the metadata.” (Read more from Zotero guide: Retrieve PDF Metadata) This automatic function of Zotero is not known by many starters. But it is truly convenient. The software will retrieve the metadata of each PDF file that you feed it. But be aware that not every single PDF has metadata included in it. So there might have some manual work as a follow-up after this automatic process.
But again, the purpose of automizing work is not to completely erase human work. Instead, it is to save time for humans to work in a better and more meaningful way. Meanwhile, algorithms are usually not perfect so far. Therefore, “human in the loop” is still needed to increase the rate of precision. Human in the loop or HITL is a concept in machine learning that I borrowed to use here. It refers to systems that allow humans to give direct feedback to a model for predictions below a certain level of confidence. (Read more from Human-in-the-loop in machine learning: What is it and how does it work?)
If you have a large folder of a lot of PDF files, this process may take a while. You can now leave your desk and go do something else. If your PC is powerful enough that this process does not influence the overall speed, you can continue your other tasks in the operating system. Below I give a screenshot that shows you the processing window, the processed files, and the to-be-processed files. As you can see in the processed files, a parent citation record was automatically generated, and the PDF file was attached under it with its file name automatically renamed already!
During this process, duplicates may be generated in Zotero. There is no function of allowing you to “merge files in bulk” in Zotero. Therefore an additional tool can be used to automatize the task of manually clicking to merge duplicates. I recommend GhostMouse. It is free of charge and can record a script based on your demonstration of mouse clicking or keyboard typing. You will have to record your action first as a script. Then you just stay on the same interface and click to play the script. You can check another tutorial of me here in ResearchIC: Automatically remove duplicates in Zotero.