Help fixing groups list

Tips on writing regular expressions for searching the post list

Moderators: Quade, dexter

Help fixing groups list

Postby The Exorcist » Thu Dec 30, 2004 5:15 pm

I know this is not exactly where this message belongs but its the closest I can find as I know I will have to use regular expressions to do what Im asking for.
I have a unique situation I am working. I have groups lists from 4 servers. I have combined them all into one list. There are duplicate lines from each server with different numbers at the end of each line which seem to represent the number of posts in the group. Here is what I want to do. I want to use a unix like utility to automatically remove the numbers at the end of each line but I dont know which unix utility would be right to do this. Any suggestions on what one to use and maybe an example of how to use it to remove the numbers would be ever so greatly appreciated. After that I can easily use uniq.exe to rid the duplicate lines.
Im far from a unix expert but am somewhat familiar with many of these fine utilities.

I need to automatically process this file with a utility as there seem to be millions of lines in my groups list and would take far too long to edit line by line with a text editor.

These lines represent an example of the data Im working with in this file.
I want to delete the space and the number at the end of each line.

alt.binaries.ebook 220
alt.binaries.ebook 254
alt.binaries.ebook 292
The Exorcist
Occasional Contributor
Occasional Contributor
 
Posts: 37
Joined: Tue Dec 23, 2003 5:24 am

Registered Newsbin User since: 12/22/03

Postby richy99 » Thu Dec 30, 2004 11:17 pm

im not sure what you are asking for, from that you have said you want to delete the number of entries for that group? or do you want to remove the number of lines that make up a binary post?
User avatar
richy99
Elite NewsBin User
Elite NewsBin User
 
Posts: 6353
Joined: Fri Nov 21, 2003 8:04 pm
Location: Wales

Registered Newsbin User since: 12/31/03

Postby itimpi » Fri Dec 31, 2004 6:50 am

I think that you can use the 'cut' command to do what is wanted. A command line something like:

cut -f 1

to only take the first field.
The Newsbin Online documentation
The Usenettools for tutorials, useful information and links
User avatar
itimpi
Elite NewsBin User
Elite NewsBin User
 
Posts: 12604
Joined: Sat Mar 16, 2002 7:11 am
Location: UK

Registered Newsbin User since: 03/28/03

Postby The Exorcist » Sat Jan 01, 2005 9:30 pm

I want to cut / remove the numbers at the end of each line. I would prefer to also remove the space preceding the numbers also.
I know this will make it look like there are no posts to each group but that is ok with me. I dont see any realistic way to combine the lists from all the servers and keep any information about the number of posts that were in each group at the time. My goal is just to have a groups lists containing all possible group names which may or may not be on any one server at a time.
I will however experiment with that cut command and see what that one does. I really should know more about these unix like utilities as they have powers beyond comprehension. 8)
When I manage to get rid of the numbers, I will run uniq.exe on the file to remove the duplicate lines. The numbers just have to be removed first.
Maybe it would be best to find the first "space" in each line and delete the rest of line. No idea what kind of expression to do that but I will keep trying.
Many thanks for the ideas and help.
The Exorcist
Occasional Contributor
Occasional Contributor
 
Posts: 37
Joined: Tue Dec 23, 2003 5:24 am

Registered Newsbin User since: 12/22/03

Postby Quade » Sat Jan 01, 2005 11:28 pm

If you program it's pretty easy. If you don't, you might want to try loading it into Excel or something as a space delimited list, then remove the second column, and remove the dups (Excel can do that can't it?). Or maybe use Access.

Then re-export the list as text.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44867
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Postby The Exorcist » Sun Jan 02, 2005 4:30 pm

Awesome, that cut -f 1 command is just what i needed to know about.
It saved the first fields to a file which I ran uniq on and now its all clean.
I knew someone here had to know the answer to this problem.
many thanks ! :)
The Exorcist
Occasional Contributor
Occasional Contributor
 
Posts: 37
Joined: Tue Dec 23, 2003 5:24 am

Registered Newsbin User since: 12/22/03


Return to Regular Expressions

Who is online

Users browsing this forum: No registered users and 2 guests