Chapter 2 Basic Operatins Lessn B String Operatins 10 Minutes Lab Gals In this Lessn, yu will: Learn hw t use the fllwing Transfrmatins: Set Replace Extract Cuntpattern Split Learn hw t apply certain Transfrmatins t multiple clumns Lessn Instructins 1. Changing Case within a Clumn Click n the clumn header n the Spring_Day_A clumn. Select the suggestin t Set the clumn Spring_Day_A t Uppercase. Select Edit. In the Frmula parameter, change upper t prper. Nte, the $cl indicatr applies the prper functin t all f the clumns selected (in this case just the Spring_Day_A clumn). Select Add t add t yur recipe. Yu shuld ntice that Spring_Day_A s histgram went frm 14 categries t 7, indicating there are n lnger issues with case. 2. Multi-Clumn Set Next, Click Clumns t enter the clumns view. Shift + Click frm Spring_Feq_A t Fall_Freq_B. Select Actin à Frmat à T Prpercase. Yu shuld ntice this initiates a builder step with all f the selected clumns appearing in the Clumns parameter and prper($cl) in the Frmula parameter. Click Add. set cl: Spring_Day_A value: prper($cl) set cl: Spring_Freq_A, Spring_Day_B, Spring_Freq_B, Summer_Day_A, Summer_Freq_A, Summer_Day_B, Summer_Freq_B, Fall_Day_A, Fall_Freq_A, Fall_Day_B, Fall_Freq_B value: prper($cl)
3. Cleaning Irregularities replace cl: Items with: '' n: `$ ` glbal: true While still in the Clumns View, click n the Items clumn, select Actin, then Shw in Grid. This brings yu t the Items clumn in the Grid. Highlight ver the, and select the suggestin t Replace this value. Ntice in the preview t the right that the resulting clumn has been cleaned up. Click Edit t view the transfrm Builder. Ntice that yu can apply the Replace transfrm t multiple clumns at nce by entering mre clumns in the bxes belw the Clumn parameter. T apply this transfrm t all clumns we can use the wildcard r asterisk (*) character in the clumn parameter. This will apply the transfrm t all f the clumns in the dataset. The Replace transfrm finds ccurrences f text that match a defined pattern and replaces that text with a specified value. By default, Trifacta replaces the text with an empty string, which will delete the text frm yur data. Let s apply the transfrmatin t nly the Items clumn. S, ur transfrmatin step reads as: replace the pattern f $ with the new value f nthing (we leave the values bx blank) n Clumn Items. Add t yur recipe. 3. Select a string t generate a pattern. Brush ver the fllwing text in the Items clumn: wine_alchl Whenever yu directly select data in the transfrmer grid, Trifacta generates suggestins that apply t ccurrences f text that match the same pattern as the selectin. Ntice that the first suggestin card shws the Extract transfrm. This transfrm finds ccurrences f text that match a defined pattern and places that text int ne r mre new clumns. 2
4. Determine pattern-matching methds fr Extract. Lk at the first suggestin card. This card uses yur selectin t define a pattern using Trifacta selectin rules. Trifacta selectin rules perate like java- style regular expressins, but are designed t be easy t understand. This selectin pattern matches exactly 12 alphanumeric characters and underscre characters. Muse ver the different Extract suggestin cards. Yu can click n each suggestin t update the preview in the transfrmer grid and display the result f that suggestin. Based n the ptins displayed n the suggestin card, hw can yu define patterns in the Extract transfrm? 1. Trifacta selectin patterns. 2. Exact string matches. 3. By identifying the values that cme befre and after the pattern that yu want. Click n the first Extract suggestin card and click Edit t see the transfrm in the Builder. If yu click n the pattern, yu will see the descriptin f this Trifacta pattern. If yu highlight the Trifacta pattern, yu see a list f all pssible pattern ptins with readable names and descriptins. Nte that the Extract transfrm des nt mdify the riginal clumn. 5. Select multiple strings t refine a pattern. Yu can make multiple text selectins in the same clumn t refine the patterns suggested by Trifacta. When yu d this, yu teach Trifacta t identify the crrect pattern in the clumn. Nte that t refine an existing suggestin, yu need t select text that ccurs in a different rw frm yur first selectin. Extract cl: Items n: `{alphanumunderscre}+` limit: 3 befre: `:` In the first rw, we ve brushed ver the text: wine_alchl. In the secnd rw, brush ver the text: meat_eggs_seafd. Yu can expand the clumn by clicking n the arrw t the right. Yu can drag the clumn edge t expand it as well. 3
Ntice that the pattern shwn n the Extract suggestin card has changed. Instead f reading {alphanum-underscre}{12}, the card nw reads {alphanum-underscre}+. Since yu have selected multiple values in the Items clumn, Trifacta has generated a mre general pattern. {alphanumunderscre}+ means that Trifacta will match ne r mre alphanumeric r underscre characters, while {alphanumunderscre}{12} will nly match exactly 12 alphanumeric r underscre characters. If yu lk at the preview in the Transfrmer Grid, yu shuld als ntice that Trifacta will nw generate three clumns. The Items clumn in this dataset is a list f item categries that the farmers markets culd sell. If there is N after the cln, this market des nt sell these items. If there is a Y after the cln, this market des sell these items. Fr the purpse f this example, let s assume we nly want t extract the item categries. Click n the secnd Extract suggestin card. Yu ll ntice that this suggestin adds t the pattern t extract alphanumeric r underscre characters befre the cln ( : ). In the preview, yu will see that Trifacta will nw generate tw clumns. Click Edit. The limit parameter determines the number f times Trifacta will match a pattern, and thus the number f clumns that the Extract transfrm will generate. Change the number in the limit parameter frm 2 t 3 and watch the preview update. Add t Recipe. 4
6. Use the Cuntpattern transfrm t cunt the number f times a delimiter appears cuntpattern cl: Items n: `\ ` The Cuntpattern transfrm cunts the number f times a specified pattern ccurs in each recrd f a clumn. Select the pipe character ' ' in the Items clumn. Scrll thrugh the suggestin cards until yu find the Cunt values matching suggestin. This uses the Cunt Pattern transfrm. Select the Cuntpattern transfrm and examine the preview in the grid. Yu can see that the Cuntpattern transfrm creates a new clumn that cntains the number f times the selected pattern appears in each recrd. Add the Cuntpattern transfrm t the recipe 7. Generate a Split transfrm. In the Items clumn, brush ver the pipe character: When yu select a delimiter separatr values like cmmas, pipes, r spaces Trifacta will suggest a Split transfrm first. Click n the Split suggestin card and examine the preview. Ntice that the Split transfrm will use the selected delimiter pattern t divide the current clumn int multiple new clumns. The delimiter itself des nt appear in the new clumns, and the riginal clumn is drpped frm the dataset. 8. Determine pattern-matching methds fr Split. Muse ver the different Split transfrm suggestins t see multiple versins f the suggestin. Yu can click n each suggestin t update the preview in the transfrmer grid t display the result f that suggestin. Click n the first Split suggestin and click Edit. 9. Cntrl the number f clumns generated by the Split transfrm. split cl: Items n: ' ' limit: 5 Examine the transfrm in the Builder. Ntice that the limit parameter cntrls the number f clumns that will be created. Change the limit parameter value 5
frm 4 t 3. Hw des this change the preview? Add the transfrm t the recipe. 10. Und the Split transfrm. Click the und buttn at the tp f the screen t remve the Split transfrm frm yur recipe. 6