Jump to content
xisto Community
Sign in to follow this  
toby

File Line Checker no idea of its proper name.

Recommended Posts

Can't really Google what I can't name ;)I have a massive file of urls, which I want to make sure there is only one of each in. Not really bothered what its in, even js or php script, or something to google would be brilliant.

Share this post


Link to post
Share on other sites

So you are looking for duplicate lines in the same file? A very time consuming method would be to put all the links in a document editor (Notepad, Winword for Windows, Openoffice, gedit for Linux) and search for the url's, if there is a second copy, remove it.Many of the Duplicate Finders (search google) that are on the internet deal with actual files, and not just lines.

Share this post


Link to post
Share on other sites

Well, there are a couple of ways to do this.The first is to write a script that reads the file, puts the contents in an array (seperated by line) then do a duplicate value check on the array. Then rewrite the file from the cleaned array.The second is to copy the conents of your file and paste in a spreadsheet. Sort the spreadsheet and then evaluate the data after it is sorted. You can manually check to see if there are duplicates as the copy would be right below it. You could automate the search by using a spreadsheet formula to make a copy of the entry in the next column over only if the entry above it is not the same. Then your new column of data would be free of duplicates but might have a hole or two where the duplicate entry wasn't copied over. I use this method frequently for various lists.Also, instead of copying the entry to the next column if there isn't a match, you could print a message if there is a match in the row above then just delete the duplicate line.Hope this helps.vujsa

Share this post


Link to post
Share on other sites

The second is to copy the conents of your file and paste in a spreadsheet. Sort the spreadsheet and then evaluate the data after it is sorted. You can manually check to see if there are duplicates as the copy would be right below it. You could automate the search by using a spreadsheet formula to make a copy of the entry in the next column over only if the entry above it is not the same. Then your new column of data would be free of duplicates but might have a hole or two where the duplicate entry wasn't copied over. I use this method frequently for various lists.Also, instead of copying the entry to the next column if there isn't a match, you could print a message if there is a match in the row above then just delete the duplicate line.


Actually this is 1 way to do it and its a great way to do it. And if you are using Microsoft Excel, then you can use advance filter function that will filter all duplicates. So don't have to manually look through your data. Steps will be:

1. Sorting all data.
2. Trim all data without any spaces at the end.
2. Click on Data -> Filter -> Advance Filter.
3. Select Action: Filter the list, in-place.
4. Check on Unique records only.
5. Click ok.

That will basically filter all duplicate records. Hope this help too. And if you are using other spreadsheet, like OpenOffice, you can do that too. But don't know the steps. ;) Cheers

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.