worldarxiv.org scrapes the "new" page of the scientific pre-print archive arXiv.org every day and places a pin at the originating institution of each paper. For example, for arXiv's astro-ph list, the papers that worldarxiv.org will display will always come from arxiv.org/list/astro-ph/new
How is the originating location of a paper decided?
A paper's marker is placed on the world at the location that corresponds to the institution the first author is affiliated with. If no affiliation data is available for the first author, the second author is used, and so on.
Affiliations are found as best as possible based on what data is available. We first look for affiliation info directly from arXiv. If none is found, the site looks up the author's papers on databases like Harvard ADS and looks for associated affiliation information. If no affiliation is found for the first author on a paper, then the site attempts to find an affiliation for the second author, then third author, and so on. This strategy leads to several data problems:
-
Authors who have recently changed institutions and who's new papers are not on such databases yet could lead the site to display the author's old institutions.
-
Authors can be confused with authors of the same last name, leading to an entirely incorrect affiliation. This is especially an issue when no first name is provided or the author's name is common.
We can add more databases and create more complicated data checks when collecting papers but the real solution to these data problems lies with you, the author. When submitting a paper to the arXiv including the affiliation of at least the first author will ensure the marker is placed at the correct location. Another way to identify yourself (especially if you have a common name) is to register for a unique ORCiD and associating it with your arXiv account. This will enable arXiv and every third-party app built off of arXiv's data like this one to present firm connections between an author and their affiliation. If this is an interesting problem to you you can read more
here.
All the affiliation info from worldarxiv.org is at best a reasonable guess and is not always correct. The "correct" affiliation is almost always in the PDF of the actual paper.
How far back is data available?
Seven days. Arbitrarily.
Can I help? Submit a bug report? Request a feature?
Yes! The code for this site is hosted at
github.com/mef51/worldarxiv. Bugs and features can be requested on the
issues page.