6.80 RC4 Header Storage

General NewsBin and Usenet related discussions. Make recommendations for additional specific discussion groups.

Moderators: Quade, dexter

6.80 RC4 Header Storage

Postby Ortiz » Sat Feb 03, 2018 6:36 pm

The changelog for 6.80 RC4 lists the following item:
Header downloads no longer save the headers as GZ files. Instead they're saved as text. Leaving it up to the user to enable folder compression if they want compression.

This is a bit concerning to me. I have seen Newsbin data folders as large as 130 GB. I cannot imagine how large this would be if the data was not compressed.

"Leaving it up to the user to enable folder compression" sounds like it is a suggestion to enable NTFS compression. I made the mistake a few years ago of enabling NTFS folder compression on a file server that I managed. It crippled performance because NTFS compression results in very bad fragmentation.

https://arstechnica.com/civis/viewtopic ... 0&t=973465

Additionally, NTFS folder compression is not going to compress as well as gzip.

https://stackoverflow.com/questions/328 ... ndows-2012

Please reconsider this decision. Fully uncompressed would be a big waste of space and NTFS folder compression is best avoided.
Ortiz
n00b
n00b
 
Posts: 3
Joined: Sun Sep 01, 2013 10:06 pm

Re: 6.80 RC4 Header Storage

Postby Quade » Sat Feb 03, 2018 7:21 pm

This is a bit concerning to me. I have seen Newsbin data folders as large as 130 GB. I cannot imagine how large this would be if the data was not compressed.


They're temporary files. Written and then removed after processing.

You're just turning compression on for the folder. Not the whole drive. If you're using an SSD, fragmentation isn't a thing anymore.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 42785
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: 6.80 RC4 Header Storage

Postby Ortiz » Sun Feb 04, 2018 11:24 am

Oh. It seems I misunderstood. Which folder are the temporary files typically stored in?
Ortiz
n00b
n00b
 
Posts: 3
Joined: Sun Sep 01, 2013 10:06 pm

Re: 6.80 RC4 Header Storage

Postby tl » Sun Feb 04, 2018 11:29 am

Quade wrote:
This is a bit concerning to me. I have seen Newsbin data folders as large as 130 GB. I cannot imagine how large this would be if the data was not compressed.

You're just turning compression on for the folder. Not the whole drive. If you're using an SSD, fragmentation isn't a thing anymore.

Even with folder compression enabled this change will cause significant increase in (temporary) disk space requirements.

The problem is that folder/file compression is much less efficient than any normal compression because it's done independently on each 64kB block of the file to allow random access, while gzip compress the whole file which result in much better compression.

I grabbed 1.61GB of gz files from 6.80RC3 to check how large they were uncompressed (11.9GB) and what Windows 10 Folder Compression would bring it down to (4.65GB). Or to put it differently, in the case I tested disabling gzip increase the disk space requirement by a factor of 7.4, enabling Folder Compression brings it down to "only" 2.9 times as much disk space. Obviously different groups may compress differently but I expect this will be reasonable numbers for most cases.

We can thus guesstimate that the 130GB of gzip'd headers would require ~964GB uncompressed or 374GB with Folder Compression enabled. This wasn't quite as bad I had expected but "near 3x" is definitely noticeable.

It will also cripple sequential access speed (how NBPro access it) on physical disks, so it's a big hit unless you store the data folders on SSD. To be fair many (not sure that "most" is though) probably do but SSD space is far more expensive than physical disks and as a result it's not uncommon to have limited space even if you do have it on SSD, as a result using 3x as much space even temporarily may well be a serious issue for many.
User avatar
tl
Seasoned User
Seasoned User
 
Posts: 110
Joined: Tue Jul 15, 2003 1:55 pm

Registered Newsbin User since: 04/01/03

Re: 6.80 RC4 Header Storage

Postby dexter » Sun Feb 04, 2018 3:42 pm

Ortiz wrote:Oh. It seems I misunderstood. Which folder are the temporary files typically stored in?


The header data is stored in the Import folder under the Newsbin Data folder. Each file is removed as it is processed into the header database for the group. There is one header database for each group under the spool_v6 folder. It'll only backlog if you have a ton (i.e.100's) of groups or you are doing a download all on a very high traffic group. If you are just topping off headers every day, it really shouldn't accumulate much data. The number of header data files waiting to be imported is reflected in the cache display at the bottom of the Newsbin window. So if it says "Cache 400/400(10)" then there will be 10 data files sitting in the Import folder.
User avatar
dexter
Site Admin
Site Admin
 
Posts: 9329
Joined: Fri May 18, 2001 3:50 pm
Location: Northern Virginia, US

Registered Newsbin User since: 10/24/97

Re: 6.80 RC4 Header Storage

Postby Ortiz » Tue Feb 06, 2018 12:29 am

The header data is stored in the Import folder under the Newsbin Data folder...

Cool. Thanks. Is there some sort of advantage to having these files be uncompressed vs gzip? Does it assist with performance or something?
Ortiz
n00b
n00b
 
Posts: 3
Joined: Sun Sep 01, 2013 10:06 pm

Re: 6.80 RC4 Header Storage

Postby dexter » Tue Feb 06, 2018 12:39 am

Yes, we found a performance improvement by not compressing/decompressing all the time. Also it helps avoid conflicts with antivirus software trying to scan the binary data.
User avatar
dexter
Site Admin
Site Admin
 
Posts: 9329
Joined: Fri May 18, 2001 3:50 pm
Location: Northern Virginia, US

Registered Newsbin User since: 10/24/97


Return to General Discussion

Who is online

Users browsing this forum: No registered users and 1 guest

cron