Posts Tagged ‘caching’

Reduce traverse time of large directory trees on Linux

Tuesday, December 14th, 2021

Every Linux admin is familiar with the long time running through a large directory tree (with hundred of thousands of files and more) can take. Most are aware that if you re-run the same run-through, it will be shorter.

This is caused by a short-valid filesystem cache, where the memory is allocated to other tasks, or the metadata required cache exceeds the available for this task.

If the system is focused on files, meaning that its prime task is holding files (like NFS server, for example) and the memory is largely available, a certain tunable can reduce recurring directory dives (like the ‘find’ or ‘rsync’ commands, which run huge amounts of attribute queries):

sysctl vm.vfs_cache_pressure=10

The default value is 100. Lower values will cause the system to prefer keeping this cache. A quote from kernel’s memory tunables page:

This percentage value controls the tendency of the kernel to reclaim the memory which is used for caching of directory and inode objects.

At the default value of vfs_cache_pressure=100 the kernel will attempt to reclaim dentries and inodes at a “fair” rate with respect to pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will never reclaim dentries and inodes due to memory pressure and this can easily lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100 causes the kernel to prefer to reclaim dentries and inodes.

Increasing vfs_cache_pressure significantly beyond 100 may have negative performance impact. Reclaim code needs to take various locks to find freeable directory and inode objects. With vfs_cache_pressure=1000, it will look for ten times more freeable objects than there are.

Transparently Routing / Proxying Information

Monday, May 15th, 2006

I was required to utilize a transparent proxy. The general idea was to follow a diagram as the one here:

The company did not want any information (http, https, ftp, whatever) to pass directly through the firewall from the internal network to the external network. If we can move it all via some sort of proxy, the general idea says, the added security is well worth it.


Getting an initial configuration to work is rather simple. For port 80 (HTTP), all need not do more than install squid with transparent directives included (can be found here, for example, and on dozens of other web sites), and make sure the router redirects all outbound HTTP traffic to the Linux proxy.


It worked like a charm. Few minor tweeks, and caching was done well.


It didn’t work when it came to other protocols. It appreas Squid cannot transparently redirect (I did not expect it to actually cache the information) SSL requests. The whole idea of SSL is to prevent the possibility of “A-Man-in-the-Middle” attack, so Squid cannot be part of the point-to-point communication, unless directed to do so by the browser, with the CONNECT command. This command can be assigned ONLY if the client is aware of the fact that there is a proxy on the way, aka, configured to use it, which is in contrast to the whole idea of Transparent Proxy.


When it failed, I’ve came up with the next idea – let the Linux machine route onwards the forwarded packets, by acting as a self-sustained NAT server. If it can translate all requests as comming from it, I will be able to redirect all traffic through it. It did not work, and working hard into IPTables chains, and adding logging (iptables -t nat -I PREROUTING -j LOG –log-prefix “PRERouting: “) into it, I’ve discovered that although the PREROUTING chain accepted the packets, they never reached the FORWARD or POSTROUTING chains…


The general conclusion was that the packets were destinated to the Linux machine. The Firewall/Router has redirected all packets to the Linux server not by altering the routing table to point at the Linux server as the next hop, but by altering the destination of the packets themselves. It meant that all redirected packets were to go to the Linux machine.


Why did HTTP succeed in passing the transparent proxy? Because HTTP packets contain the target name (web address) in their data, and not only in their headers. This allows for “Name based shared hosting”, and thus the transparent proxy can actually exist.


There is no such luck with other protocols, I’m afraid.


The solution in this case can be achieved via few methods:


1. Use non-transparent proxy. Set the clients to use it via some script, which will enable them to avoid using it when outside the company. Combined with transparent HTTP proxy, it can block unwanted access.


2. Use stateful inspection on any allowed outbound packets, except HTTP, which will be redirected to the proxy server transparently.


3. Set the Linux machine in the direct path outside, as an additional line of defence.

4. If the firewall/Router is capable of it, set a protocol-based routing. If you only route differently packets outbound for some port, you do not rewrite the packet destination.


I tend to chose option 1, as it allows for access to work silently when using HTTP, and prevents unconfigured clients from accessing disallowed ports. Such a set of rules could look something like (the proxy listens on port 80):


1. From *IN* to *LINUX* outbound to port 80, ALLOW

2. From *IN* to *INTERNET* outbound to port 80 REDIRECT to Linux:80

3. From *IN* to *INTERNET* DENY


Clients with defined proxy settings will work just alright. Clients with undefined proxy settings, will not be able to access HTTPS, FTP, etc, but will still be able to browse the regular web.


In all these cases, control over the allowed URLs and destinations is in the hands of the local IT team.


NT4 Server English, BDC, and problems

Sunday, December 4th, 2005

In the long forgotten days of NT4, there was not Unicode. In these older days, one could use English, and the language the server machine was predefined for. In our poor and sad case, English alone.

This is a story of a poor filer, member of a multi-site NT4 domain, which, due to latency and delay in the creation of the VPN connection between sites, had to become BDC.

Now this server acts as BDC, and all is fine, but as a filer, although Hebrew files can be saved and recalled to/from it (Hebrew names), it cannot access them interally. It cannot backup/restore Hebrew named files, it cannot copy/move them either. We’re stuck.

I’ve got two possible directions: The first would be to "upgrade" or "reinstall" the server using an Hebrew Enabled version of NT4 Server (and with luck, much luck, it might work), or to silently replace it by a Linux acting as NT Domain BDC, caching accesses, etc. Maybe it will work, donno. Worth a try.