How can we get two sentences out of many sentences and/or paragraph(s) on PHP?

stndn

Golden Member
Mar 10, 2001
1,886
0
0
Hello,

This may sound more complicated than it is, but I'll try my best to explain the question.
Suppose I have five input strings:

String one:
This is a string of four sentences. The second sentence is this one. All four sentences have periods at the end. Should be simple.

String two:
This is a string of four lines.
The second line / sentence is this one, but it doesn't end with a period
All four lines have periods and newline at the end.
This is getting complicated.

String three:
This is a string of four sentences. The second sentence ends with two exclamation marks!! How about the third sentence? Oh my!

String four:
Is this a string of four sentences or four lines?
I'm not sure anymore. Second line contains two sentences, which makes things complicated.
I think I'm going crazy now

String five:
What if the string is only one sentence?


From all the five sample strings, what is the easiest way of getting only the first two sentences from each strings?

Our current solution is in the attached code. As you can see from the output, it works to an extent. Third string is wrong because it shows three sentences (due to second one doesn't end with period)
One of the problems we have is if the sentence are separated line by line, but they don't end with period, exclamation, or question mark. Then our regex will fail. We can't use ^\w because that will also grab punctuations like comma, colon, etc.

The other problem is that we have a feeling there's easier way to do this than what we currently have.

As a side information, the four strings will be reviews submitted by user to a site. We want to make it such that we only show up to two sentences from whatever the user enters.

So, are we on the right track? Or is there an easier way to accomplish this in PHP? Any suggestions for improvements?

I'd like to say 'scrap it and just do review cutoff by strlen!!', but hey ... i'm just a grunt -(

thank you.
-stndn.


Sample output:
before:

String 0: This is a string of four sentences. The second sentence is this one. All four sentences have periods at the end. Should be simple.

String 1: This is a string of four lines.
The second line / sentence is this one, but it doesn't end with a period
All four lines have periods and newline at the end.
This is getting complicated.

String 2: This is a string of four sentences. The second sentence ends with two exclamation marks!! How about the third sentence? Oh my!

String 3: Is this a string of four sentences or four lines?
I'm not sure anymore. Second line contains two sentences, which makes things complicated.
I think I'm going crazy now

String 4: What if the string is only one sentence?

after:

String 0: This is a string of four sentences. The second sentence is this one. -----

String 1: This is a string of four lines. The second line / sentence is this one, but it doesn't end with a period All four lines have periods and newline at the end. -----

String 2: This is a string of four sentences. The second sentence ends with two exclamation marks!! -----

String 3: Is this a string of four sentences or four lines? I'm not sure anymore. -----

String 4: What if the string is only one sentence? -----
 

stndn

Golden Member
Mar 10, 2001
1,886
0
0
maybe a monday morning exercise would help your brain start up? -p


-stndn.
 

Aikouka

Lifer
Nov 27, 2001
30,383
912
126
Well, I'm not going to look at your code from a coding standpoint, because it's been waaaaay too long since I've worked with PHP from a profficiency standpoint and unless I'm at home, I can't peruse my own coding to get back in the groove. So I'll look at it from a method standpoint.

Well, it seems your problem is you're trying to work with input that you cannot guarantee will be proper yet you're trying to assume it will in the code. From what I can see, unless you can guarantee patterns, do not use regular expressions. In some cases, you may never be able to perfectly decipher what a user enters because they may literally just be that unclear.

Here's an example of that:

Hi to you I am reviewing this product it was good but not what I expected do not buy!!

This badboy is in serious run-on-sentence-age, which unless you design a function to separate sentences via grammar modules (essentially designing a CFG for the English language), you'll never be able to separate these.

Just something to think about, which would make the whole length vs sentences soooo much nicer .
 

stndn

Golden Member
Mar 10, 2001
1,886
0
0
Hmm... i see what you're trying to say.
And yes, it does make sense.

We'll try to discuss this problem again with our team members and hopefully we can agree on changing the way we handle user input.

thanks -)


-stndn.
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
Yeah, pretty much what Aikouka said. There's a point at which not even a human can reliably chop up sentences and you certainly don't want to waste time bringing a computer up to even your level of language proficiency (at least not in php or in a business environment). You have to draw a line and say that at some point, it's the user's fault if stuff doesn't get chopped up correctly. I would personally go strictly with sentence-ending punctuation like '.' '?' and '!'. Maybe special cases for '.'s that are in a number like '5.5' and double quotes following a punctuation character à la you said: "i see what you're trying to say."
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |