Program for extracting dvd subtitles. Rob subtitles

DVD subtitles are presented in graphic format, i.e. in the form of pictures. We need to get subtitles in the form of text with timing - SRT. The conversion process is similar to recognizing scanned text in a program Finereader.

Let's launch SubRip and select the menu File > Open VOB(s). In the window that opens, click the button

Open IFO:

Select the IFO file corresponding to the first segment of the main movie on the DVD.

All corresponding VOB files are loaded, checkboxes appear next to them (only these files will be processed), and in the drop-down list Language Stream You can see what subtitles are in this segment. In this case, as we see, we have only one subtitle stream (Russian):

We keep in mind that sometimes the languages ​​are indicated incorrectly (for example, French is indicated, but in fact it is Chinese). This is especially true for so-called “pirates” and Chinese video products.

Closed captions.

Closed Captions (CC) are subtitles embedded in the video stream. Initially, they were invented for titling television video materials for the deaf and dumb, and to view them, a special circuit was needed in the television. Now, of course, closed captions are no longer as relevant as they were 15–20 years ago, but discs with them are still produced.

Previously, ripping closed captions was a bit of a problem, but now, fortunately, I’ve learned how to work with them SubRip. When you open the IFO file, it happily reports: “Closed captions detected” and includes closed subtitles in the general list:

Rip subtitles.

So, the necessary files are marked, the subtitle stream is selected. Now all you have to do is press the button Start, after which the actual process begins, which is called “rip subtitles”.

Since subtitles come in a wide variety of fonts, the text recognizer

V SubRip implemented as self-learning. That is, when encountering another unfamiliar character, the program will offer to enter the corresponding character from the keyboard.

Enter the symbol and press OK(or key Enter), and so on, until the very end of the film.

Please, no mistakes! An incorrectly entered character will then have to be corrected throughout the text.

Generally speaking, if you make a mistake, it's easier to start over from the beginning, and be more careful next time.

It will be difficult only for the first five minutes, then SubRip recognizes almost all text very quickly, only occasionally asking for unfamiliar characters.

Let's look at some problems and situations that may arise during the rip process:

1. English "L" and "i".

When recognizing English subtitles, a problem usually arises with the letters “L” (lowercase) and “i” (uppercase), since the style of these characters is almost identical in most fonts. This problem can only be solved by subsequent running through a spell checker with an English dictionary.

2. Letters sticking together.

When recognized, some characters “stick together” in groups of two, three, or four. This, strictly speaking, is not a problem; we simply enter all these characters in the text field.

3. Letter "Y".

When recognizing Russian subtitles, a problem arises with the letter “y” (both uppercase and lowercase) - SubRip recognizes it as two separate characters. When recognizing the first character, enter a “soft sign”; when recognizing the second, enter the Latin letter “i”. Then the sequence of these two characters can be replaced throughout the text with “s”. We will return to this issue later.

4. Unrecognizable subtitles.

Sometimes SubRip finds it difficult to recognize characters in subtitles and displays the following window, asking you to enter their full text:

In 99% of cases, this happens in multi-line subtitles, when the line spacing is small, and there is a capital letter “Y” in the second line.

Just enter the full text of the subtitle in the field and press the button Done.

5. Stitching of lines.

Occasionally SubRip“glues” characters located in different lines. In this case, you can try using the buttons in the block Best guess choose the appropriate combination, but in my opinion, it will be much faster and, most importantly, more reliable to press the button Enter Manually and enter the entire subtitle text.

6. Tags and formatting.

There are three checkboxes to the right of the button to format text. OK: Bold, Italic And Underline. In principle, there is nothing complicated here - after entering a character in the text field, you need to check the boxes, which turn into tags in external subtitles , , (exactly the same as in HTML). These flags are saved between replicas, that is, then you need to remember to disable them.

Although SubRip supports text formatting in italics, bold and underlining; in practice, of all this splendor, only italics are used (they are used to highlight lines in subtitles when the speaker is not visible in the frame). And even then the necessity of this is questioned by many. Decide for yourself whether to use it or not, but do not forget that tags are not perceived by some “hardware” players.

Saving subtitles.

As soon as the slider at the top of the window reaches the end and shows 100%, the process is complete. All that remains is to save the subtitles in a file.

In the text window (bottom), click the diskette button (or select the menu File > Save As).

If the subtitles are in Russian, then most likely the following question will appear:

This means that the subtitles contain some characters other than Latin letters and numbers (more precisely, Russian letters in our case), and they can be saved either in any of the national encodings or in the Unicode encoding. You can, of course, choose Unicode, but then the subtitles before adjusting through Subtitle Workshop someone will have to convert back

to Windows 1251 encoding, so it's best to do this right away. Select No.

Now in the dropdown list CodePage select 1251: ANSI – Cyrillic and press the button Save. All that remains is to enter the file name and save the subtitles.

Saving matrices.

If you often need to extract subtitles from DVDs or plan to make a whole series of discs (for example, a TV series), it makes sense to save a matrix (this is a set of correspondences between graphic images and text characters).

1. After saving the subtitles, select the menu Character Matrix > Save Character Matrix File As.

2. Enter a file name (arbitrary, for example “001”) and save the matrix in the ChMatrix directory (default extension - *.sum).

When ripping the next disc, just press the button Search for match, And SubRip, having looked through all SUM files in the ChMatrix directory, will select the most suitable matrix for the current font. If the matrix is ​​chosen successfully, subtitle rip will be done much faster, since SubRip already knows most of the recognized characters, and will only ask you about new ones.

After finishing the rip, save the matrix (with new symbols added) under the same name (or under a new one). This way you can collect a whole library of matrices, which will significantly reduce the labor costs for ripping subtitles, if, of course, you have to do this often enough.

Postprocessing.

The external subtitles obtained as a result of rip can be considered as an almost finished semi-finished product. To bring it to mind, you need to perform a few more simple steps.

Replacing characters.

This stage of subtitle processing is performed in the program Notepad(although in principle you can use any text editor that works with text files). Load subtitles into the editor as a regular text file.

Our task is to make some replacements in the text (remember, replace the soft sign and “i” with the letter “Y” and some others). This is done (in Notepad) - By Ctrl-H(or menu Edit > Replace):

Enter the text to replace and replace and click the button Replace All.

Usually it is necessary to make the following replacements:

Save and close the file.

Spell-check.

We load the subtitles into a text editor and check the spelling. If possible, it is also a good idea to proofread the text and correct any inaccuracies in the translation.

Correction in Subtitle Workshop.

Loading subtitles into Subtitle Workshop and check them ( Ctrl-I). Often SubRip superimposes subtitles on top of each other, leaves lines that are too long, extra spaces, and somewhat less often subtitles with too short a duration. Subtitle Workshop will help correct all these minor defects.

Note: To connect subtitles as external ones in Media Player Classic, you need the subtitle file to have the same name as the movie. Then the player will load them automatically.

You can enable them in the Navigate > Subtitle Language menu. For example:

00001.ts - movie 00001.srt - subtitles

Didn't find the required subtitles on the Internet? No problem: they can be “pulled” from a DVD Video or Blu-ray disc. But it is impossible to use such interlinear words in media containers. Let's talk today about how to make them compatible with most video formats.

First I will describe the task. Many cinephiles prefer to watch films in the original language with subtitles, even if there is a translation. There are several reasons for this, but discussing them is beyond the scope of this article (I’ll just say that I often do this myself). It is also no secret to our readers that optical media are gradually becoming a thing of the past. Those who have a home media server are either already converting their movie collection to a discless format, or at least have begun to think about it. Most often, MKV files are used for home storage.

Extract subtitles using HD-DVD/Blu-Ray Stream Extractor. You can remove all threads from the container

There are many tools for creating them - for example, HandBrake (handbrake.fr), which I recently described in the article “Omnivorous Generalist” (see UPgrade #15-16 (570-571)). The only really serious problem faced by those who do their own rips is obtaining subtitles. The fact is that DVD Video and Blu-ray usually use the so-called. pre-rendered subtitles – they are a ready-made picture that is simply superimposed on the frame (more information about the types of subtitles: ru.wikipedia.org/wiki/Subtitles).

But only text ones can be “sewn” into Matroska containers (in fact, there is a way to put subtitles “torn” from an optical disc into MKV files, but this is highly not recommended due to compatibility problems - many players simply will not see them) . On the other hand, searching the Internet for subtitles in text form (SRT / SMI) does not always give the desired result, especially for publications like the “extended director's cut” or, let's say, films that are not very popular among the average consumer of cinema. So you need to somehow extract the interlinear text from the disk, convert it to the required format, and then feed it to the converter.

To the untrained eye, the task boils down to ordinary text recognition. And indeed, if you type the phrase “FineReader Blu-ray” into the Google search bar, in the first five results you will find a link to fairly detailed instructions in Russian on how to do this. But, firstly, you will have to use commercial software, and secondly, the process turns out to be quite labor-intensive. In general, it’s not our choice: we’ll minimize body movements and make do with free software.

First, these same pre-rendered subtitles must be obtained somehow. The exact method depends on the source format. Let me just say that in any case you will need a copy of the film on your hard drive. But since hacking the security is an illegal action, we will have to refrain from describing it. I think anyone can easily find a manual on the Internet.

In the case of Blu-ray, we take the console utility eac3to (madshi.net/eac3to.zip). By the way, you can “attach” some kind of graphical shell to it, of which there are quite a few. Personally, I liked HD-DVD/Blu-Ray Stream Extractor (code.google.com/p/hdbrstreamextractor), which I recommend. All you need to do is unpack the resulting archive into the same directory where eac3to lives, and then run HdBrStreamExtractor.exe. Now you should click the button to the right of the Input field (Select Input File(s) tooltip), select the largest file in the STREAM folder with the *.m2ts extension, and then specify the destination directory in the Output field.

All you have to do is click on Feature(s) and wait until the program finishes reading the container. After this, a list of detected streams will appear in the Stream(s) section. If you want, extract everything, if you want, just subtitles: just check the box next to what you want to extract, and then start the process with the Extract button.

After some time (which depends mainly on the speed of the PC; on modern machines, extracting subtitles from Blu-Ray discs usually takes a little more than an hour), the folder with the results of the work will contain files with the extension *.sup and uninformative names like 1_7_subtitle (the second number, in in this case – 7, means the thread number). These are our subtitles “in pictures”, which now need to be recognized.

If we are talking about DVD Video, you need to take a utility called VobSub Ripper Wizard from the well-known Gabest. It is included in Gordian Knot and other similar packages. However, it is not at all necessary to litter your hard drive with unnecessary software: VSRip lives at: sourceforge.net/projects/guliverkli/files/VSRip. There is a ZIP archive, inside of which there is a single executable file. The interface of the program (which, by the way, was released back in 2003, but works perfectly in Windows 7 x64) is primitive.

The first thing you need to do is open the file with service information (Load IFO... button), which corresponds to the containers in which the movie itself is stored. Determining which one is quite easy: you need to go to the VIDEO_TS folder and find any 1 GB VOB file in it.

Suppose it is called VTS_01_1 - then the ripper needs to “feed” VTS_01_1.IFO (in other words, the numbers in the names after the character set “VTS_” must match). What to indicate in the Save to... field - guess for yourself (smile). Next, click Next and at the next stage of the wizard, select subtitles in the required languages ​​in the Languages ​​list. It is important here that all positions in the Vob/Cell IDs column are selected - otherwise the subtitles will be extracted with gaps. Click Next again and the output will be two files with the same names and extensions *.idx and *.sub. They are what we need.

The internal structure of subtitles pulled from DVD Video and Blu-ray differs. For the latter, you can use the SupRip recognition engine (exar.ch/suprip), which generally copes with its task - although
and I can't say it's perfect. She is able to “decipher” English much better than Russian (but if you want, try it, no one forbids it). She is unfamiliar with “video” subtitles. People recommend the SubRip software for them (zuggy.wz.cz) - by the way, pay attention: the names differ by only one letter, but these are different programs.

Somehow it didn’t work out for me with the second one: I was never able to make it work with the 64-bit “seven”. Your humble servant has never complained about crooked hands before - although, of course, anything can happen. Maybe the truth - as in the famous television series - is somewhere nearby, but I was not able to discover it. Then I tried several more similar utilities, but did not find any understanding with any of them. I am writing about this only so that you do not repeat my mistakes.

There is a solution, and a universal one, and it’s called Subtitle Edit (www.nikse.dk/SubtitleEdit). What I liked about this editor is that it is trained to recognize pre-rendered subtitles of both types and more. It is distributed both in the form of an archive that does not require installation, and in the form of an installer; in essence, they are no different. The interface is perfectly Russified (Options > Choose Language), and there is also quite detailed online help in the language of - forgive the banality - Pushkin and Dostoevsky. The open source Tesseract OCR engine is used for recognition (code.google.com/p/tesseract-ocr).

However, before the program is able to work with Russian subs, something needs to be done. First, go to the website of the engine mentioned above, look for the file rus.traineddata.gz in the downloads section, download it and put it in the Tesseract\tessdata folder in the Subtitle Edit program directory. Then we restart the editor, open the “Spelling” menu > “Get dictionaries...” and in the drop-down list select the item called Russian Spelling, Hyphenation, Thesaurus, and then click “Download” (I think no special explanation is required here). Now everything is ready.

To recognize subtitles obtained from DVD Video, use the “Import/OCR VonSub (sub/idx) subtitles...” item in the “File” menu; for a similar operation with subtitles from optical discs where HD video is stored, use “Import/OCR Blu-ray sup file..."

In the first case, you will be asked to select a stream with the required language (if there are several of them), in the second, the wizard will be launched immediately. The further procedure does not differ. In the “OCR Method” field, you need to leave the “OCR using Tesseract” item, in the “Language” field, select the one that corresponds to the subtitle language, and in the “OCR/Spelling Correction” list, the smart program itself will select the appropriate option if the appropriate dictionaries are installed. I also advise you to enable the “Query for unknown words” option - then you will be asked to manually correct a word unknown to the spell checker.

With some training, it takes 30-40 minutes to process Russian subs of a 2-hour film. With English things go even faster. When the process is completed, text subtitles must be written to disk (“File” > “Save”, leave SubRip in the “File type” field). Subtitle Edit also has a very useful “Synchronization” function, which will help you recalculate time codes from one frame rate to another (useful if you want to attach subs obtained from DVD Video to a BD rip).

In theory, after receiving SRT subtitles, they can be immediately encapsulated in a container. But since automatic recognition often makes mistakes, it is better to edit them in some editor. I personally liked the free Srt Corrector. But since, as always, there was not enough space allocated for the article, look for its detailed description in “Small Programs” in the same issue of UPgrade. U.P.

If you have a DVD with a foreign language movie that contains subtitles (especially in the original language), and this version of the subtitles is not yet available on any of the subtitles sites, then there is a fairly quick way to extract these subtitles from the DVD and save them on your hard drive in the form of 2 compact files: one with the idx extension (size 50-100 kilobytes), the other with the sub extension (size from 1 to 20 megabytes). A pair of these files are subtitles in the vobsub graphic format. A free program that allows you to do this is called, its volume is about 400 kilobytes. This program works very quickly - it takes about 10 minutes to extract subtitles from one DVD (if all the files of the DVD are on your hard drive, then even less - about 5 minutes).

How to use the program VSRip? Let's launch the program. Click on the "Load IFO..." button. Select the DVD disc (or the folder on the hard drive where the DVD was copied), go to the "VIDEO_TS" folder, select the file with the "IFO" extension containing numbers in the name. Moreover, the beginning of the name of this file must coincide with the beginning of the name of files with the “VOB” extension, which are the movie (usually these VOB files are the largest). Most often this IFO file is called "VTS_01_0.IFO" (it is highlighted in the table below bold font).

file name file size
VIDEO_TS.BUP 12 288 DVD menu
VIDEO_TS.IFO 12 288
VIDEO_TS.VOB 12 171 264
VTS_01_0.BUP 55 296 Chapter 01
VTS_01_0.IFO 55 296
VTS_01_0.VOB 226 873 344
VTS_01_1.VOB 1 073 739 776
VTS_01_2.VOB 1 073 739 776
VTS_01_3.VOB 1 073 739 776
VTS_01_4.VOB 1 073 739 776
VTS_01_5.VOB 495 568 896
VTS_02_0.BUP 24 576 Chapter 02
VTS_02_0.IFO 24 576
VTS_02_1.VOB 1 073 739 776
VTS_02_2.VOB 817 969 152

Click on the "Save To..." button, and then select the location on your hard drive where the 2 files (idx and sub) will be saved, and what they will be called. Click the "Next" button at the bottom. The "Extraction settings" menu appears. In the "Languages" section, select the languages ​​that we want to save (it is better to select all languages ​​- who knows, maybe your subtitles will later be useful to someone who speaks this language). Click on the “Next” button and wait patiently while the program extracts subtitles from the DVD, which it will inform you about with the inscription “Done!”

In some DVDs (especially TV series and multi-episode cartoons), in the "Extraction settings" menu you need to select on the left ("Program Chains" section) different parts of one chapter - PGC1, PGC2, etc. - otherwise it is not possible to extract all subtitles. In one "pass" the program extracts subtitles from only one part (for example, PGC1).

2. Extract subtitles embedded in the video stream (“closed captioning”, “closed captions”)

Program VSRip Although it has an option to extract this type of subtitles (the "Extract closed caption" checkbox in the "Extraction setting" window), it extracts them with errors - very often it skips many lines, and it always spoils the encoding (letters with diacritics turn into question marks and so on.). There is another program that copes with this task perfectly - . How to work with it? Download, install it and launch it. In a separate window, open Windows Explorer, and in it we find your DVD disc (or a folder on your hard drive where all the files on the DVD disc are copied). In Explorer, go to the VIDEO_TS directory and find there files with the “VOB” extension, which take up the most space (in the table above with an example of the VIDEO_TS directory, these files are highlighted inclined font). One file at a time, drag them into the program window CCextractor"input files" (files "VTS_01_1.VOB", "VTS_01_2.VOB", "VTS_01_3.VOB", "VTS_01_4.VOB", "VTS_01_5.VOB" in turn). In the "Output file" line, enter the desired file name, for example "matrix.srt". Click the "Start" button at the bottom. The process has begun!

If you have extracted graphical subtitles from a DVD that no one else has, please share them!!!