If you want to run a bot on Commons, you must get permission first. To do so, file a request following the instructions below.
Please read Commons:Bots before making a request for bot permission.
Create a user account (while logged in to your normal account) and user page for the bot
On the bot's userpage, add {{Bot}}, which automatically adds the page to Category:Commons bots.
Then add the following information to the bot's userpage (all this is mandatory):
Operator: Who the creator/operator is and how they can best be contacted
Tasks: Details of the bot's task or tasks
Operation: Whether the bot is manually assisted or runs automatically
When: When it operates (continuously, intermittently, or at specified intervals)
Maximum edit rate: The bot's maximum edit rate (eg edits per minute)
Language: The language and/or program that it is running
Create your bot request:
Replace "YourBotName" with your bot's username in the box below and click the button.
Complete the questions on the resulting page and save it. Remember that your bot will be allowed to operate only for those tasks that you have specified in your request.
Add your bot request to the list here:
Edit the following request list, adding the following text to the top of the appropriate section (replacing "YourBotName" with your bot's name):
Test run
You can be demanded to make a short test run with your bot account (30–50 edits/uploads) to allow other users to review your bot's tasks. Unauthorized test run is not allowed.
Waiting for approval.
You now need to wait for community approval. A bureaucrat will close the request and will also grant a bot flag, where necessary. Closed requests are moved to Commons:Bots/Archive.
Before making a bot request, please read the new version of the Commons:Bots page. Read Commons:Bots#Information on bots and make sure you have added the required details to the bot's page. A good example can be found here.
Any user may comment on the merits of the request to run a bot. Please give reasons, as that makes it easier for the closing bureaucrat. Read Commons:Bots before commenting.
I corrected all the uploads with respective PD licenses, dates, categories - the corrections are an output from the bot (as if it is run to upload). Thus, the bot is manually-assisted capable of corrections to the fields on-demand. There are some quirks here and there, but I expect to update the bot logic as they're encountered.
I'm not sure what you meant by using language ta for department field. Could you explain?
For example, {{en|UCLA Digital Library – AIIS Center for Art & Archaeology – Negatives & Slides Collection}}. Is it possible to use proper copyright tag based on part of collection like all photos are 2D reproductions of public domain art? --EugeneZelenko (talk) 16:55, 15 January 2024 (UTC)Reply[reply]
I used the {{PD-Art|PD-old-100}} inside the {{Photograph}} template and {{PD-old-100}}{{PD-US-expired}}{{PD-country}} for the original object {{Artwork}} template. If I understand correctly what you meant, the PD tags highlighted in green would not be necessary - when the original art work/collection is in PD and the photo is its 2D repro. Did I get it right? -- DaxServer (talk) 20:36, 15 January 2024 (UTC)Reply[reply]
Bot's tasks for which permission is being sought: To upload, categorize, rename, provide with a generated index Sanborn Fire Insurance Maps downloaded from the Library of Congress and other places (seldomly).
Automatic or manually assisted: manual
Edit type(e.g. Continuous, daily, one time run): intermittently for a few weeks
Maximum edit rate (e.g. edits per minute): 100 when uploading
here is a proposal for the index which i would add to the category page of a town. this category would then have subcategories for the (year, volume) combo of a map too. But the index should suffice to access all the plates.
the code to add the index is not yet written. i will add comments above and below the generated content to allow automatic edits to easily be undone.
For the moment i just need to upload the images. I can ask for additional permissions when i have the code ready.
Additional permissions will almost certainly also include the renaming of old files that were uploaded by somebody else with a bad naming scheme.
In any case, all accesses of the bot will be restricted to files and categories that start with "Sanborn Fire Insurance Map From". In other words, it may make more sense to just look at the end result as this is really just an upload job for new content with some alteration to old content in the same "namespace". Nowakki (talk) 16:38, 25 December 2023 (UTC)Reply[reply]
Ok, i am now using the in the description the same string as in the filename (File:ALLTHIS.jpg) with a language tag. I will delay editing existing pages in case something else comes up. Nowakki (talk) 05:01, 27 December 2023 (UTC)Reply[reply]
@EugeneZelenko: So what is the ruling here? Can i run the bot or not? If yes, can you transfer the auto patrol rights from my main to the bot account? Or do i have to make a separate request for that? I need that to get rid of the rate limit. Nowakki (talk) 03:55, 28 December 2023 (UTC)Reply[reply]
This is an ongoing process. Functionality is continuously added. However uploads can proceed now. There are now a few more uploads for you to verify.
I will then finalize the code and bring a few example towns to the intended end state, so you can approve of the old file rename and the generated index features when the time comes. SanbornMapBot (talk) 11:55, 29 December 2023 (UTC)Reply[reply]
I have today recategorized and supplied with a {{rename tag all map files from Wyoming (954 files). There are 218,807 files still to be renamed.
As it stands the newly renamed files have a different structure, with different metadata, as provided by the original uploader, from all the plates that i uploaded.
perhaps it would be a good idea to include at least one wiki-link with the newly uploaded files and behind that wikilink would be a page that explains everything about the files in general. SanbornMapBot (talk) 17:48, 2 January 2024 (UTC)Reply[reply]
A request to rename 200,000 files found no support after a recent poll failed.
I am planning to add 500,000 redirects to commons and build an alternative tree of categories to
hold the redirects instead. I am requesting permission to let the bot do that.
To make this possible, all names in this tree will be of the form "Sanborn Fire Insurance Map of"
instead of "Sanborn Fire Insurance Map from".
An advertisement/notification for the "fixed and consistent with Sanborn original indexing practice" filenames will be placed in c:Category:Sanborn maps of the United States by state and in each of the 50 state categories, because 50 is not a big number.That should be sufficient for it to get noticed by anyone involved enough (i.e. using the files often) to benefit from the fix. Nowakki (talk) 16:27, 8 January 2024 (UTC)Reply[reply]
@EugeneZelenko I made a mistake during upload. a few thousand files were uploaded with a different naming scheme from the others. The volume and the year in the filename switched places. Since these files are barely a week old, they should be renamed without leaving redirects. I request permission to take care of the problem. Nowakki (talk) 09:45, 10 January 2024 (UTC)Reply[reply]
Answers like: "I don't know", "I am not authorized", "I'd rather not" are total acceptable. I am an adult and will find ways to deal with it. Nowakki (talk) 13:42, 11 January 2024 (UTC)Reply[reply]
Please do not use a bot account for manual edits, and especially not for discussions. We discuss with bot operators, not with bot accounts. --Krd12:04, 6 January 2024 (UTC)Reply[reply]
I have abandoned further attempts to deal with the incompetence of other people in this matter. In a properly run organization, heads would have rolled and people would have a reason to attempt not to fail so badly. To quote Carl Sagan: In order to upload Sanborn maps to commons, you first have to fix the bugs and fix the procedures and retrain the people. Nowakki (talk) 14:52, 26 January 2024 (UTC)Reply[reply]
Do I get it right? Someone adds an image to osmapp and your bot will transfer it to Commons. If so, how is ensured that the license of that image is a) valid and b) meets Commons' requirements? --Achim55 (talk) 19:50, 19 November 2023 (UTC)Reply[reply]
@Achim55 Yes, i think you got it right. As OpenStreetMap is also a open data project, we aim to have open licenses. Please see the design of upload dialog. I tried to write it the best I can, but I welcome any suggestions. It will add images with direct link to OSM feature, which means also proper map coordinates and category (eg. castle, guidepost, school, bridge etc.) Zbytovsky (talk) 07:35, 20 November 2023 (UTC)Reply[reply]
@EugeneZelenko Well, I didn't think of that, thanks for bringing it to my attention :-) As we are a map application, it is pretty easy to inform users based on country of the object. I created a mockup here - it would show up for the "NO" countries. Do you think it is sufficient for the beginning? I don't expect many users soon, but if it turns out to be an issue, it is quite easy to be more restrictive, or eg. check if there is a building in 1km, etc. Zbytovsky (talk) 20:22, 21 November 2023 (UTC)Reply[reply]
It'll be OK for beginning, but will be good idea to extend database (Wikidata is perfect place to share with WLM if organizers will finally comprehend the need to do so) to include information about sculptress/architects, so it'll be possible to allow what is in public domain. There are also countries with partial freedom of panorama, where photos of buildings are allowed, but not of works of art. --EugeneZelenko (talk) 16:00, 22 November 2023 (UTC)Reply[reply]
I think the FOP aspect should be addressed before is becomes any issue. At least all countries with partial FOP should be excluded; better all images should be manually reviewed, as even in FOP countries there will be images with works not permanently in public space. Krd14:01, 30 December 2023 (UTC)Reply[reply]
Good question! My general approach with these things is to be extremely conservative – imo the V1 bot should be purely additive, and any conflicts should be flagged for manual inspection.
Then a couple of things might happen:
The existing SDC looks wrong, so I make a manual edit from my account to fix it. e.g. I’ve already been looking at the use of source of file (P7482) for Flickr photos in the SDC snapshots, and I found ~200 cases where the URL points to the Flickr URL’s profile (/photos/{username}) rather than the photo itself (/photos/{username}/{photo_id}). Those got dropped on a queue and I’ve been gradually tidying them up by hand – opening the files in question and making a manual edit from my account to point to the more specific URL.
The existing SDC looks right, so I work out why the bot is disagreeing. Is it a bug in my code, have I interpreted the data mapping wrong, is the data mapping at odds with the community approach to SDC, is the bot missing some bit of info on the Flickr photo. But the bot won't do anything on its own.
There might also be cases where the existing SDC is wrong in large numbers and we'd want to write an automated fix, but that's somewhat risky and I’d want to be extremely careful before doing that. Two possible examples spring to mind:
License versions. Flickr photos use CC 2.0 licenses, so that's what the bot will write into the SDC. But what if it finds a Wiki Commons file which links to the 4.0 version of the CC license? That sounds like an easy candidate for a fix buuuut I think there are Flickr users who leave descriptions on their photos saying "I license this as CC 4.0". A human copying their photo across would notice that; the bot might not. So in this case the bot would likely leave it as-is to avoid deleting info.
Date granularity. Flickr has different levels of granularity for "date taken". Most photos are DDMMYY, but there are some which are MMYY or YY or "Circa YY". If there are lots of cases where there's an imprecise data but the SDC claims it's a full DDMMYY, we might consider automating that. (It's pretty obvious when this has happened – Flickr always returns a full timestamp from its API, but it sets all the unknown values to 0/1. So a YYYY becomes taken="1950-01-01 00:00:00" takengranularity="6".) The bot could be written to fix these. But I don't know if that's a widespread issue in practice.
If/when the bot does start editing existing SDC claims, I'll make sure we document those with examples – and if there are cases that seem contentious, I'll bring them back for community discussion before actually implementing them. Alexwlchan (talk) 08:13, 2 November 2023 (UTC)Reply[reply]
To return to this question of "how does the bot handle conflicting edits":
Right now the bot will flag any conflicts as "unknown", not make any edits, and put them in a manual queue for review. I’ll look at them and decide if we need to update the bot code, do a manual edit to the SDC, or leave it be.
This confuses the bot, because it wants to write a different SDC statement to what’s currently in Commons – so it flags it as “unknown”.
I went and had a look at it, and I can see that the license has changed since the initial upload – there’s a license history feature on Flickr, and it was changed from CC BY 2.0 in April 2014, a year after it was uploaded to Commons.
(And now I'm going to look at tweaking the bot code so it gets the license from when the photo was uploaded to Commons, and uses that rather than whatever the license is now. But license is a pretty well-populated field, so I may not need this in practice.) Alexwlchan (talk) 08:22, 13 December 2023 (UTC)Reply[reply]
Brief addendum to this: I’m going to take license out of the bot for now.
1. Licenses are already pretty well-populated in SDC, so the potential gain here is less.
2. I’m encountering a lot of cases where Flickr users have changed their license after the fact, which makes the bot unhappy.
It is possible to see license history on Flickr as far back as 2008, or I could inspect the Wikitext, but I’m going to leave it for now. I can come back later and see how many Flickr photos are actually missing a license in practice. Alexwlchan (talk) 14:45, 13 December 2023 (UTC)Reply[reply]
To add another example to this:
If the bot encounters conflicting information in the "date taken" field, it flags a warning but doesn’t do anything.
e.g. File:STS059-238-074 Strait of Gibraltar.jpg is a photo which was posted to both Flickr and a NASA website. On Flickr the taken date is "April 1994", but on NASA's website we get the more precise date "17 April 1994", which is what's used in the SDC.
Flickypedia would write a statement "April 1994" if it was copying the photo fresh from Flickr, but it doesn't overwrite the existing, more-precise statement when it does the backfill. Alexwlchan (talk) 11:02, 15 December 2023 (UTC)Reply[reply]
I know it’s been a couple of weeks and nothing has happened on this.
I am planning to get back to this bot eventually, but right now I’m prioritising getting the “uploader” part of Flickypedia working. Once that’s done, I’ll come back to the Backfillr bot. Alexwlchan (talk) 09:47, 23 November 2023 (UTC)Reply[reply]
If you use a JSON data specification this can be done by simply merging all the different claims.
Please tag the edits with "BotSDC" as lots of user use this tag to filter out SDC edits
If you use a JSON post request this can be done by adding { "tags", "BotSDC" }
Please make sure you specify a maxlag for your edits as this got me into trouble once and avoid database overload
If you use a JSON post request this can be done by adding { "maxlag", "2" }
In the edit summary, please link the phrase structured data to [[Commons:Structured data|structured data]] or this bot request so users can find out more if needed.
Thanks for the quick feedback! I’ve addressed all four of your suggestions.
1. Done, duh. For some reason I got it into my head that you can’t modify multiple properties at once, but I think that’s just a limitation of the visual editor? API seems fine with it, so that’s changed.
2. Done.
3. Done. I’m also planning to drop a note to somebody who works on the structured data team before I start running the bot at large scale, as a courtesy – backfilling Flickr data means 10s of millions of new statements, and I figure it’ll be easier if they have a direct line to the person adding database load.
4. Done. I’ve also added the property IDs, which I figured might be useful.
Thanks. No further comments from my end. My database issue was described here [2] and as I learned, as long as we respect maxlag it should be fine. As I've myself added 100s of millions of statements, I would not be too concerned about this request. Contrary, I think it is an excellent addition to improving SDC use. --Schlurcher (talk) 13:27, 13 December 2023 (UTC)Reply[reply]
Hi @Krd – sorry for the delay, I took a couple of weeks break from working on that. Getting back into it now, hope to wrap my head around what’s still needed soon! Alexwlchan (talk) 15:37, 22 January 2024 (UTC)Reply[reply]
Just to clarify: Is this to upload as a new version, or to overwrite? If the latter, is there a consensus to do so? I see that those borders include photo credits to the individual photographers, and these are from a respected archive, so I'd just want to make sure that there is agreement that this is desired; I've seen similar situations go either way. Clearly more useful in Wikipedia articles without the borders, but it's not clear to me that we don't want also to host a version with the credit line on the image. - Jmabel ! talk23:58, 1 October 2023 (UTC)Reply[reply]
My thought is to overwrite. I've not seen any written consensus on the matter, but in practice that's what has been done for years in this category. I think that implies a silent consensus, considering these captions have been digitally added by the archive and provide no additional information not already in the description. Beao (talk) 08:36, 2 October 2023 (UTC)Reply[reply]
The "Images with watermarks" category is very big, so the retrieval of file usage statistics is batched to a fixed number of images every hour to avoid performance spikes, and I update the gallery after every batch. Is updating gallery pages too often problematic? I could do it less often (I'm thinking if images are not removed from the category), and also avoid doing it when nothing changes. Beao (talk) 15:53, 20 October 2023 (UTC)Reply[reply]
I appears to me that there are still too many edits or the statistics pages. (Or is there any relevant work done on these maintenance categories?) Krd14:31, 10 November 2023 (UTC)Reply[reply]
I've updated the code a couple of days ago and did some extra runs to confirm that it worked, and since then the non-changing categories haven't updated. But yeah, I'm also removing watermarks! Beao (talk) 07:27, 17 November 2023 (UTC)Reply[reply]
Not extremely useful, not completely useless. But okay, I will limit the updates more by rounding the stats to the nearest 100 or 1000. Beao (talk) 14:11, 30 December 2023 (UTC)Reply[reply]