Page 1 of 1

Header dl/import issue 6.55

PostPosted: Sun Nov 19, 2017 8:58 pm
by epoepo
It is well known problem, that sometimes importing downloaded headers takes forever. Currently my system is sitting for hours on some 130MB file, while normally header files are considerably smaller and it mostly takes 15-40 seconds (for 1 million headers) to process them. Now, I checked a content of one of such files and headers inside are like this:

1243093807 [PRiVATE] \ce34cfb5f2\::eafc23e40deec7.eba1a60c3d06b6fbf7f8e0c8ae4c8e.cd0b0754::/ab634679c128/ [newzNZB] [1/3] - yEnc (98/1869) 1339535556 nEwZ[NZB] <pr3d@NET.world> Mon, 30 Jan 2017 09:12:48 GMT <XsHkTlMfJmXwPpInRtEeKhQr-newzNZB-1485767568443@PRIVATE> 739480 5681 Xref: number.nntp.giganews.com alt.binaries.moovee:483384344 alt.binaries.sounds:193527395 alt.binaries.erotica:5102233638 alt.binaries.teevee:1243093807 alt.binaries.inner-sanctum:1021956698

So, header file is filled mostly with cross-posted spam. How NB processes it, I obviously don't know, but something in this seems to cause problems.
Actually I would like to get rid of all such cross-posted spam already at download stage, but this is a bit different story.

Re: Header dl/import issue 6.55

PostPosted: Sun Nov 19, 2017 9:58 pm
by Quade
If you're using the current beta, you can filter out headers you don't like. Some of the older ones (though not 6.55 old) can do it too.

Re: Header dl/import issue 6.55

PostPosted: Tue Nov 21, 2017 8:43 am
by epoepo
I experimented with 6.73 with no good results. Well, I didn't exactly as instruction tells tough. When is filter applied? On downloading or importing?

Re: Header dl/import issue 6.55

PostPosted: Tue Nov 21, 2017 10:12 am
by Quade
On import. It's possible to save the GZ files. I wanted a complete GZ import file so I could experiment with different filtering and re-feed the GZ files to the group.

[<Group Name>]
DownloadFilter=<Filter Profile>

You can do per group or per topic.

[<Topic Name>]
DownloadFilter=<Filter Profile>

Will cover all the groups in this topic.

You can filter on subject and poster. Keep in mind that "Poster" is AND while all the subject type filters are OR.

Pretty sure you're going to need to use 6.80B11 to ensure this works.

[SETTINGS]
SaveGZ=1

If you add this to the NBI file, Newsbin will save the GZ files into a processed folder instead of deleting them. In that way you can copy them back to the import folder as many times as you want while testing the filters.

Re: Header dl/import issue 6.55

PostPosted: Tue Nov 21, 2017 12:59 pm
by epoepo
Thanks. Looked good in sandbox. It is good, it is per group/topic as there are some groups, where I actually tolerate gross-posting.