There are a number of elements which come to bear on the issue:
- Screen physical size.
- Display pixel count.
- Distance between eyes and screen.
A first consideration in terms of 'realism' regarding image scale, is to calculate the FoV to select which will provide an image whose scale matches reality.
The simplest example requires no math. Assuming a flat screen, if your eyes are as far from the screen as 1/2 the screen width, said screen will subtend an apparent angular width of 90 degrees. And so setting the in-game FoV to 90 (which is tied to the horizontal dimension) would result in your view having a 1:1 correspondence with the real world view. If your screen is, say, 90cm wide, when your eyes are 45cm from the screen this condition is met.
Most gamers don't sit this close to their monitors, a more typical viewing distance providing an apparent screen width of perhaps 45-ish degrees. Setting a correspondingly smallish FoV would be pretty restrictive, costing a lot in terms of situational awareness.
The pixel count can play a role in the matter of perception of detail. To keep it simple, if the pixel pitch (number of pixels per unit distance) is low enough to have them resolvable to any extent whatsoever, the display will be presenting less detail than could be seen otherwise. A large display running at a lower resolution and viewed from a close distance increase the chance of resolving pixels. In such case, zooming in at least somewhat is no cheat at all.
I could go much further on this topic, but will leave it at that, lest I induce a coma in the reader.