Lessn 4 Advanced Transfrms Chapter 4B Extract, Split and replace 10 Minutes Chapter Gals In this Chapter, yu will: Understand hw t use the fllwing transfrms: Replace Extract Split Chapter Instructins YOUR GOAL: Practice using rw data brushing t generate pattern suggestins fr: Extracting data int new clumns Replacing data Splitting data int new clumns 1. Open the US Farmers Market dataset in the Transfrmer. 2. Replace text with an empty string. Lcate the Items clumn. replace cl: Items with: '' n: `$ ` glbal: true Brush ver the dllar sign ($) and the red dt ( ) in the Items clumn. The red dt indicates a space. Ntice that the first suggestin card shws the Replace transfrm. When yu select certain values that appear t be junk data fr example, qutatin marks, questin marks, r dllar signs Trifacta prvides the Replace transfrm as the first suggestin. The Replace transfrm als ccurs when yu select any text in a clumn, but it will appear as a less likely ptin. The Replace transfrm finds ccurrences f text that match a defined pattern and replaces that text with a specified value. By default, Trifacta replaces the text with an empty string, which will delete the text frm yur data. Click n the Replace suggestin card and examine the preview. Hw many clumns des this
suggestin affect? Click Mdify t view the transfrm Builder. Ntice that yu can apply the Replace transfrm t multiple clumns at nce by entering mre clumns in the bxes belw the Clumn parameter. T apply this transfrm t all clumns we can use the wildcard r asterisk (*) character in the clumn parameter. This will apply the transfrm t all f the clumns in the dataset. Let s apply the transfrmatin t nly the Items clumn. S, ur transfrmatin step reads as: replace the pattern f $ with the new value f nthing (we leave the values bx blank) n Clumn Items. Click Add t Recipe. 3. Select a string t generate a pattern. Brush ver the fllwing text in the Items clumn: wine_alchl Whenever yu directly select data in the transfrmer grid, Trifacta generates suggestins that apply t ccurrences f text that match the same pattern as the selectin. Ntice that the first suggestin card shws the Extract transfrm. This transfrm finds ccurrences f text that match a defined pattern and places that text int ne r mre new clumns. 4. Determine pattern-matching methds fr Extract. Lk at the first suggestin card. This card uses yur selectin t define a pattern using Trifacta selectin rules. Trifacta selectin rules perate like java- style regular expressins, but are designed t be easy t understand. This selectin pattern matches exactly 12 alphanumeric characters and underscre characters. Muse ver the dts at the bttm f the Extract transfrm suggestin card t see multiple versins f the suggestin. Yu can click n each dt t update the preview in the transfrmer grid and display the result f that suggestin. Based n the ptins displayed n the suggestin 2
card, hw can yu define patterns in the Extract transfrm? 1. Trifacta selectin patterns. 2. Exact string matches. 3. By identifying the values that cme befre and after the pattern that yu want. Click n the first dt shwn n the Extract suggestin card and click Mdify t see the transfrm in the Builder. If yu click n the pattern, yu will see the descriptin f this Trifacta pattern. If yu highlight the Trifacta pattern, yu see a list f all pssible pattern ptins with readable names and descriptins. Nte that the Extract transfrm des nt mdify the riginal clumn. 5. Select multiple strings t refine a pattern. Yu can make multiple text selectins in the same clumn t refine the patterns suggested by Trifacta. When yu d this, yu teach Trifacta t identify the crrect pattern in the clumn. Nte that t refine an existing suggestin, yu need t select text that ccurs in a different rw frm yur first selectin. cl: Items n: `{alphanumunderscre}+` limit: 3 befre: `:` In the first rw, we ve brushed ver the text: wine_alchl. In the secnd rw, brush ver the text: meat_eggs_seafd. Yu can expand the clumn by clicking n the arrw t the right. Yu can drag the clumn edge t expand it as well. Ntice that the pattern shwn n the Extract suggestin card has changed. Instead f reading {alphanum-underscre}{12}, the card nw reads {alphanum-underscre}+. Since yu have selected multiple values in the Items clumn, Trifacta has generated a mre general pattern. {alphanumunderscre}+ means that Trifacta will match ne r mre alphanumeric r underscre characters, while {alphanumunderscre}{12} will nly match exactly 12 alphanumeric r underscre characters. 3
If yu lk at the preview in the Transfrmer Grid, yu shuld als ntice that Trifacta will nw generate three clumns. The Items clumn in this dataset is a list f item categries that the farmers markets culd sell. If there is N after the cln, this market des nt sell these items. If there is a Y after the cln, this market des sell these items. Fr the purpse f this example, let s assume we nly want t extract the item categries. Click n the secnd dt n the Extract suggestin card. Yu ll ntice that this suggestin adds t the pattern t extract alphanumeric r underscre characters befre the cln ( : ). In the preview, yu will see that Trifacta will nw generate tw clumns. Click Mdify. The limit parameter determines the number f times Trifacta will match a pattern, and thus the number f clumns that the Extract transfrm will generate. Change the number in the limit parameter frm 2 t 3 and watch the preview update. Click Add t Recipe. 6. Replace text with a string value. In the Items clumn, brush ver the text: prepared_fd. replace cl: Items with: 'fd' n: `prepared_fd` glbal: true Yu want t change this value t read fd instead f prepared_fd. Scrll thrugh the suggestin cards until yu find the Replace transfrm. Click Mdify t view the Builder. Remember that the Replace transfrm replaces text with a null value by default. In rder t replace the text with a string value, yu need t edit the text f the transfrm. Enter ' fd int the New Value bx. Yu can see that the preview will update t shw that Trifacta will replace all ccurrences f prepared_fd with fd. Click Add t Recipe. 4
7. Generate a Split transfrm. In the Items clumn, brush ver the pipe character: When yu select a delimiter separatr values like cmmas, pipes, r spaces Trifacta will suggest a Split transfrm first. Click n the Split suggestin card and examine the preview. Ntice that the Split transfrm will use the selected delimiter pattern t divide the current clumn int multiple new clumns. The delimiter itself des nt appear in the new clumns, and the riginal clumn is drpped frm the dataset. 8. Determine pattern-matching methds fr Split. Muse ver the dts at the bttm f the Split transfrm suggestin card t see multiple versins f the suggestin. Yu can click n each dt t update the preview in the transfrmer grid t display the result f that suggestin. Click n the first dt shwn n the Split suggestin card and click Mdify. 9. Cntrl the number f clumns generated by the Split transfrm. split cl: Items n: ' ' limit: 5 Examine the transfrm in the Builder. Ntice that the limit parameter cntrls the number f clumns that will be created. Change the limit parameter value frm 4 t 3. Hw des this change the preview? Add the transfrm t the recipe. 10. Und the Split transfrm. Click the und buttn at the tp f the screen t remve the Split transfrm frm yur recipe. 5